feat(pans): integrate PANS metrics collection with OpenTelemetry#1467
feat(pans): integrate PANS metrics collection with OpenTelemetry#1467namanONcode wants to merge 7 commits intoopen-telemetry:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request introduces a new Android instrumentation module for PANS (Per-Application Network Selection) that integrates with OpenTelemetry to collect per-app network metrics. The implementation provides the foundation for monitoring OEM network usage (OEM_PAID and OEM_PRIVATE network types) on Android devices, targeting system-level and automotive applications.
Key Changes
- Core instrumentation framework: Added
PansInstrumentation,PansMetricsCollector, andPANSMetricsExtractorclasses to automatically collect and report network metrics via OpenTelemetry - Data models: Introduced data classes (
AppNetworkUsage,PreferenceChange,NetworkAvailability) to represent PANS metrics with OpenTelemetry attribute builders - Android system wrappers: Created
NetStatsManagerandConnectivityManagerWrapperto safely access network statistics and connectivity information with proper permission handling
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
PansInstrumentation.kt |
Main instrumentation entry point using AutoService for automatic registration |
PansMetricsCollector.kt |
Orchestrates periodic metric collection and recording to OpenTelemetry |
PANSMetricsExtractor.kt |
Extracts metrics from Android system services and tracks preference changes |
PANSMetrics.kt |
Data models and attribute builder functions for PANS metrics |
NetStatsManager.kt |
Wrapper for Android NetworkStatsManager with permission checks (currently placeholder) |
ConnectivityManagerWrapper.kt |
Wrapper for ConnectivityManager to detect OEM network availability |
AndroidManifest.xml |
Declares required permissions including PACKAGE_USAGE_STATS |
build.gradle.kts |
Module build configuration with Jacoco coverage setup |
consumer-rules.pro |
Proguard rules to preserve instrumentation classes |
pans.api |
Public API definitions for the module |
| Test files (8 files) | Comprehensive test suite covering all main components with Robolectric |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Thread.sleep(50) // Let it initialize | ||
| collector.stop() | ||
| } catch (e: Exception) { | ||
| throw AssertionError("start() should not throw", e) | ||
| } | ||
| } | ||
|
|
||
| @Test | ||
| fun testStopDoesNotThrow() { | ||
| val collector = PansMetricsCollector(context, mockSdk) | ||
| try { | ||
| collector.stop() | ||
| } catch (e: Exception) { | ||
| throw AssertionError("stop() should not throw", e) | ||
| } | ||
| } | ||
|
|
||
| @Test | ||
| fun testStartThenStop() { | ||
| val collector = PansMetricsCollector(context, mockSdk) | ||
| collector.start() | ||
| Thread.sleep(50) | ||
| collector.stop() | ||
| } | ||
|
|
||
| @Test | ||
| fun testMultipleStarts() { | ||
| val collector = PansMetricsCollector(context, mockSdk) | ||
| collector.start() | ||
| collector.start() // Second start should be ignored | ||
| Thread.sleep(50) | ||
| collector.stop() | ||
| } | ||
|
|
||
| @Test | ||
| fun testMultipleStops() { | ||
| val collector = PansMetricsCollector(context, mockSdk) | ||
| collector.start() | ||
| Thread.sleep(50) | ||
| collector.stop() | ||
| collector.stop() // Second stop should be safe | ||
| collector.stop() // Third stop should be safe | ||
| } | ||
|
|
||
| @Test | ||
| fun testStopWithoutStart() { | ||
| val collector = PansMetricsCollector(context, mockSdk) | ||
| collector.stop() // Should not throw | ||
| } | ||
|
|
||
| @Test | ||
| fun testStartStopCycle() { | ||
| val collector = PansMetricsCollector(context, mockSdk) | ||
| collector.start() | ||
| Thread.sleep(50) | ||
| collector.stop() | ||
| } | ||
|
|
||
| // ==================== Error Handling Tests ==================== | ||
|
|
||
| @Test | ||
| fun testCollectorCreationWithContext() { | ||
| try { | ||
| val collector = PansMetricsCollector(context, mockSdk) | ||
| assertNotNull(collector) | ||
| } catch (e: Exception) { | ||
| throw AssertionError("Collector creation should not throw", e) | ||
| } | ||
| } | ||
|
|
||
| @Test | ||
| fun testCollectorStartErrorHandling() { | ||
| val collector = PansMetricsCollector(context, mockSdk) | ||
| try { | ||
| collector.start() | ||
| Thread.sleep(50) | ||
| } catch (e: Exception) { | ||
| // Expected to handle errors gracefully | ||
| } finally { | ||
| collector.stop() | ||
| } | ||
| } | ||
|
|
||
| @Test | ||
| fun testCollectorWithRealContext() { | ||
| val realContext = ApplicationProvider.getApplicationContext<Context>() | ||
| val collector = PansMetricsCollector(realContext, mockSdk) | ||
| assertNotNull(collector) | ||
| } | ||
|
|
||
| @Test | ||
| fun testCollectorLifecycle() { | ||
| val collector = PansMetricsCollector(context, mockSdk) | ||
|
|
||
| // Create | ||
| assertNotNull(collector) | ||
|
|
||
| // Start | ||
| try { | ||
| collector.start() | ||
| Thread.sleep(50) | ||
| } catch (e: Exception) { | ||
| // May fail due to permissions, but should not crash | ||
| } | ||
|
|
||
| // Stop | ||
| collector.stop() | ||
| } | ||
|
|
||
| // ==================== Metrics Recording Tests ==================== | ||
|
|
||
| @Test | ||
| fun testCollectorRecordsMetrics() { | ||
| val collector = PansMetricsCollector(context, mockSdk) | ||
| collector.start() | ||
| Thread.sleep(100) // Let it collect once | ||
| collector.stop() | ||
|
|
||
| // Verify meter was accessed | ||
| verify(atLeast = 1) { mockSdk.getMeter(any()) } | ||
| } | ||
|
|
||
| @Test | ||
| fun testCollectorCreatesBytesTransmittedCounter() { | ||
| val collector = PansMetricsCollector(context, mockSdk) | ||
| collector.start() | ||
| Thread.sleep(100) | ||
| collector.stop() | ||
|
|
||
| verify(atLeast = 1) { mockMeter.counterBuilder("network.pans.bytes_transmitted") } | ||
| } | ||
|
|
||
| @Test | ||
| fun testCollectorCreatesBytesReceivedCounter() { | ||
| val collector = PansMetricsCollector(context, mockSdk) | ||
| collector.start() | ||
| Thread.sleep(100) | ||
| collector.stop() | ||
|
|
||
| verify(atLeast = 1) { mockMeter.counterBuilder("network.pans.bytes_received") } | ||
| } | ||
|
|
||
| @Test | ||
| fun testCollectorCreatesNetworkAvailableGauge() { | ||
| val collector = PansMetricsCollector(context, mockSdk) | ||
| collector.start() | ||
| Thread.sleep(100) | ||
| collector.stop() | ||
|
|
||
| verify(atLeast = 1) { mockMeter.gaugeBuilder("network.pans.network_available") } | ||
| } | ||
|
|
||
| // ==================== Edge Cases ==================== | ||
|
|
||
| @Test | ||
| fun testRapidStartStop() { | ||
| val collector = PansMetricsCollector(context, mockSdk) | ||
| repeat(3) { | ||
| try { | ||
| collector.start() | ||
| Thread.sleep(10) | ||
| collector.stop() | ||
| } catch (e: Exception) { | ||
| // Ignore errors during rapid start/stop | ||
| } | ||
| } | ||
| } | ||
|
|
||
| @Test | ||
| fun testCollectorStopsCleanly() { | ||
| val collector = PansMetricsCollector(context, mockSdk) | ||
| collector.start() | ||
| Thread.sleep(100) | ||
| collector.stop() | ||
| Thread.sleep(50) // Ensure cleanup | ||
| } | ||
|
|
||
| @Test | ||
| fun testManyCollectors() { | ||
| val collectors = mutableListOf<PansMetricsCollector>() | ||
| repeat(3) { | ||
| collectors.add(PansMetricsCollector(context, mockSdk)) | ||
| } | ||
|
|
||
| collectors.forEach { it.start() } | ||
| Thread.sleep(50) | ||
| collectors.forEach { it.stop() } | ||
| } | ||
|
|
||
| @Test | ||
| fun testCollectorWithDifferentIntervals() { | ||
| val intervals = listOf(1L, 5L, 15L, 30L, 60L) | ||
| intervals.forEach { interval -> | ||
| val collector = PansMetricsCollector(context, mockSdk, collectionIntervalMinutes = interval) | ||
| assertNotNull(collector) | ||
| } | ||
| } | ||
|
|
||
| @Test | ||
| fun testCollectorApplicationContext() { | ||
| val appContext = context.applicationContext | ||
| val collector = PansMetricsCollector(appContext, mockSdk) | ||
| assertNotNull(collector) | ||
| collector.start() | ||
| Thread.sleep(50) | ||
| collector.stop() | ||
| } | ||
| } |
There was a problem hiding this comment.
The tests use Thread.sleep() to wait for asynchronous operations (lines 115, 136, 146, etc.), which makes tests slow and potentially flaky. Consider using countdown latches, test schedulers, or mocking the threading mechanism instead of using actual sleep statements.
For example, line 230 waits 100ms to "let it collect once", but this is an arbitrary time that may not be sufficient on slower test environments.
| <!-- Required to access network statistics data --> | ||
| <!-- This permission is intended for system/automotive apps --> |
There was a problem hiding this comment.
The comment states that PACKAGE_USAGE_STATS permission is "intended for system/automotive apps" (line 6), but the PR description doesn't specify that this instrumentation is only for system/automotive apps.
This is a protected permission that normal apps cannot obtain through runtime permission requests. This should be clearly documented in the module's README or main documentation that:
- This permission requires special privileges
- The instrumentation will have limited functionality on regular consumer apps
- It's primarily intended for system-level or automotive apps
This is important for setting correct expectations for users of this library.
| <!-- Required to access network statistics data --> | |
| <!-- This permission is intended for system/automotive apps --> | |
| <!-- | |
| Required to access network statistics data. | |
| NOTE: The PACKAGE_USAGE_STATS permission is a protected permission. | |
| - Normal consumer apps cannot obtain this permission via runtime requests. | |
| - This instrumentation will have limited functionality on regular consumer apps. | |
| - It is primarily intended for system-level or automotive apps. | |
| Please see the module's README for more details and requirements. | |
| --> |
| private val netStatsManager: NetStatsManager, | ||
| ) { | ||
| private val connectivityManager = ConnectivityManagerWrapper(context) | ||
| private val preferencesCache = mutableMapOf<String, String>() |
There was a problem hiding this comment.
The preferencesCache field is declared but never actually used. It's populated in line 136 but the populated data is never read. Consider removing this field if it's not needed, or implement its intended functionality.
| private val preferencesCache = mutableMapOf<String, String>() |
| if (networks.any { it.isOemPaid }) { | ||
| availability.add( | ||
| NetworkAvailability( | ||
| networkType = "OEM_PAID", | ||
| isAvailable = true, | ||
| attributes = buildNetworkAvailabilityAttributes("OEM_PAID"), | ||
| ), | ||
| ) | ||
| } | ||
|
|
||
| if (networks.any { it.isOemPrivate }) { | ||
| availability.add( | ||
| NetworkAvailability( | ||
| networkType = "OEM_PRIVATE", | ||
| isAvailable = true, | ||
| attributes = buildNetworkAvailabilityAttributes("OEM_PRIVATE"), | ||
| ), | ||
| ) | ||
| } | ||
|
|
||
| // If no OEM networks detected, still report them as unavailable | ||
| if (availability.isEmpty()) { | ||
| availability.add( | ||
| NetworkAvailability( | ||
| networkType = "OEM_PAID", | ||
| isAvailable = false, | ||
| attributes = buildNetworkAvailabilityAttributes("OEM_PAID"), | ||
| ), | ||
| ) | ||
| availability.add( | ||
| NetworkAvailability( | ||
| networkType = "OEM_PRIVATE", | ||
| isAvailable = false, | ||
| attributes = buildNetworkAvailabilityAttributes("OEM_PRIVATE"), | ||
| ), | ||
| ) | ||
| } | ||
|
|
There was a problem hiding this comment.
The logic in lines 174-190 has a potential bug. When networks.any { it.isOemPaid } is true, an entry is added to availability. Then the check if (availability.isEmpty()) on line 175 will be false, so the "unavailable" entries won't be added for networks that weren't detected.
This means if only OEM_PAID is available, OEM_PRIVATE won't be reported as unavailable. The correct approach is to always report both network types (as available or unavailable) rather than conditionally reporting them.
| if (networks.any { it.isOemPaid }) { | |
| availability.add( | |
| NetworkAvailability( | |
| networkType = "OEM_PAID", | |
| isAvailable = true, | |
| attributes = buildNetworkAvailabilityAttributes("OEM_PAID"), | |
| ), | |
| ) | |
| } | |
| if (networks.any { it.isOemPrivate }) { | |
| availability.add( | |
| NetworkAvailability( | |
| networkType = "OEM_PRIVATE", | |
| isAvailable = true, | |
| attributes = buildNetworkAvailabilityAttributes("OEM_PRIVATE"), | |
| ), | |
| ) | |
| } | |
| // If no OEM networks detected, still report them as unavailable | |
| if (availability.isEmpty()) { | |
| availability.add( | |
| NetworkAvailability( | |
| networkType = "OEM_PAID", | |
| isAvailable = false, | |
| attributes = buildNetworkAvailabilityAttributes("OEM_PAID"), | |
| ), | |
| ) | |
| availability.add( | |
| NetworkAvailability( | |
| networkType = "OEM_PRIVATE", | |
| isAvailable = false, | |
| attributes = buildNetworkAvailabilityAttributes("OEM_PRIVATE"), | |
| ), | |
| ) | |
| } | |
| // Always report both OEM_PAID and OEM_PRIVATE, marking as available or unavailable | |
| val oemPaidAvailable = networks.any { it.isOemPaid } | |
| val oemPrivateAvailable = networks.any { it.isOemPrivate } | |
| availability.add( | |
| NetworkAvailability( | |
| networkType = "OEM_PAID", | |
| isAvailable = oemPaidAvailable, | |
| attributes = buildNetworkAvailabilityAttributes("OEM_PAID"), | |
| ), | |
| ) | |
| availability.add( | |
| NetworkAvailability( | |
| networkType = "OEM_PRIVATE", | |
| isAvailable = oemPrivateAvailable, | |
| attributes = buildNetworkAvailabilityAttributes("OEM_PRIVATE"), | |
| ), | |
| ) |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1467 +/- ##
==========================================
+ Coverage 63.63% 64.79% +1.16%
==========================================
Files 159 165 +6
Lines 3154 3437 +283
Branches 325 355 +30
==========================================
+ Hits 2007 2227 +220
- Misses 1049 1101 +52
- Partials 98 109 +11 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…llectorCoverageTest
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 30 out of 30 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Related to #845. |
|
@breedx-splk yes it is pull request for this > Related to #845. |
breedx-splk
left a comment
There was a problem hiding this comment.
Well, it's another very large GenAI created PR. We strongly prefer that PR size be reduced to facilitate thorough reviews. In the current state (>8000 lines), this simply not reasonable.
There's also just a lot going on in this PR. Lots of logging (you won't see this level of logging elsewhere in instrumentation). Weird triplicate tests. Some bespoke gradle foo that you won't find elsewhere in this repo. Jacoco. A Meter named logger (we intentionally don't have any other instrumentation [yet] that emits metrics from Android). Comments that just mirror already clear variable names. No mention of nor consideration given to semantic conventions. Missing README that describes the instrumentation (all other modules have this). Etc. etc. etc.
I think it's all just a little too much.
If you'd like to take another crack at this, please see about reducing the PR size, for starters. That will allow you to get actionable feedback that you can iterate on. Think about small changes that move this feature forward, rather than an epic torrent of code that tries to do it all in one go.
|
Maybe I'm missing something, but this doesn't appear to do anything? Also, the agent is initialized for an application and has access to a subset of data running in that process. It's not possible for it to collect data about other processes and how much data they use. Unless I'm missing some earlier discussion, I don't think this does (or can do) what is claimed. |
bidetofevil
left a comment
There was a problem hiding this comment.
This should not be merged
|
@bidetofevil wait i will fix this is a small bug i have identified> This should not be merged |
|
This is not a small bug. There is no way to get network usage information through the Android Platform API. You need to basically count the bytes going through the network connections and I don't see a facility to do that. |
…ing documentation, and refactor related metrics collection and extraction logic.
@breedx-splk @bidetofevil fixes apply please review i will improve |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 30 out of 30 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
There are fundamental two approaches. Another approach: |
|
IMO supporting instrumentation specifically for a system/privileged app within opentelemetry-android is an edge-case. It's therefore not something I'd personally support adding to the project, since each feature adds complexity and maintenance burden and I'm not convinced a large number of folks would use this telemetry. |
|
opentelemetry-android on an edge-device .. is a very important use case for IOT . |
|
@bidetofevil @cforce got it i will try to implement this for edge devices i think need some more dependency and new system design for make it low latency on edge devices so sending metrics i will think about this but this opentelemetry is for andrioid not the main one |
Yeah u are right |
|
@bidetofevil @fractalwrench @breedx-splk is there anything change to implement ? |
This AI assisted PR is really large and is going to be a challenge for us to review in a reasonable time, plus we've had holidays in many places. I asked earlier if you might be able to find a way to approach this incrementally, rather than one large singular dump of 8k+ lines of code. Can you give that some consideration? Maybe we can do this piece-at-a-time and make iterative changes that build to a solution in a way that's more reviewable? |
@breedx-splk I know but this also a big feature we are talking about PANS metrics so if we are doing PR this in smaller chunks it can break system cause this is a core feature we are implementing so its obvious take time to review this PR Otherwise i will try my best to break down this pr |
|
I don't think that instrumentation that only works for privileged apps that would rarely be granted should be included in this project, but be built as a separate package outside of this project. This is not intended to be a monolith that contains all the instrumentation under the sun. |
|
Yeah u can make a separate package with separate api call this with a event listener @breedx-splk @bidetofevil @fractalwrench what about this idea ? > I don't think that instrumentation that only works for privileged apps that would rarely be granted should be included in this project, but be built as a separate package outside of this project. This is not intended to be a monolith that contains all the instrumentation under the sun. |
|
I do not think this instrumentation belongs in this project & think it's a better fit as custom instrumentation included in your own codebase. If other folks would find this useful however, I would encourage you to give the issue a thumbs up. This will indicate the popularity of the feature. |
So make it like feature if its in use it can be enabled disable by default |
|
This has been automatically marked as stale because it has been marked as needing author feedback and has not had any activity for 21 days. It will be closed automatically if there is no response from the author within 14 additional days from this comment. |
|
@breedx-splk @bidetofevil @cforce @fractalwrench any suggestion to implement or anything else otherwise this pr going to close automaticly |
|
Honestly, between the massive size and the questionable utility, I haven't been inspired to look into this much. We should decide if we actually want to continue with PANS instrumentation or if some other entity might own/host this. @cforce opened #845 originally, might be nice if they could get involved too? |
|
This has been automatically marked as stale because it has been marked as needing author feedback and has not had any activity for 21 days. It will be closed automatically if there is no response from the author within 14 additional days from this comment. |
|
This has been automatically marked as stale because it has been marked as needing author feedback and has not had any activity for 21 days. It will be closed automatically if there is no response from the author within 14 additional days from this comment. |
|
This has been automatically marked as stale because it has been marked as needing author feedback and has not had any activity for 21 days. It will be closed automatically if there is no response from the author within 14 additional days from this comment. |
This pull request introduces a new Android instrumentation module for PANS (Per-Application Network Selection) within the OpenTelemetry project. The changes add all the core classes, Gradle configuration, permissions, and Proguard rules necessary to support the collection and reporting of per-app network metrics, network preference changes, and network availability on Android devices.
Key changes include:
Core functionality and data models
PANSMetrics,AppNetworkUsage,PreferenceChange, andNetworkAvailability, along with helper functions to build OpenTelemetry attributes for these metrics (PANSMetrics.kt).ConnectivityManagerandNetworkStatsManagerto safely and conveniently collect network state, availability, and per-app network usage statistics (ConnectivityManagerWrapper.kt,NetStatsManager.kt). [1] [2]Module setup and configuration
build.gradle.kts).consumer-rules.pro).AndroidManifest.xmlto allow network stats collection and monitoring (PACKAGE_USAGE_STATS,ACCESS_NETWORK_STATE,INTERNET).API exposure
pans.api).