Fix race condition in WebSocketCoreImpl#6948
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a single, centralized close(...) helper to gracefully tear down the current WebSocket session and reset internal connection state so future connection attempts can proceed.
Changes:
- Introduced
suspend fun close(code, reason)that cancels any pending connection attempt and clears the current connection reference. - Updated the “no more subscriptions” path to use the new
close(...)helper instead of callingwebSocket.close(...)directly.
|
Hi, is there anything I need to do? |
We need a test that replicate the issue (failing if we remove your logic and pass when adding it). I didn't look at the test in the comment but if you think it does what we ask submit it and put the PR back in ready for review. |
|
Please take a look at the requested changes, and use the Ready for review button when you are done, thanks 👍 |
…sure all resources are properly cleaned up
|
Yes, and it fixes my problem. |
| private fun close(code: Int, reason: String?) = wsScope.launch { | ||
| connectedMutex.withLock { | ||
| val holder = connectionHolder.getAndSet(null) | ||
| holder?.webSocket?.close(code, reason) | ||
| holder?.urlObserverJob?.cancel() |
There was a problem hiding this comment.
@TimoPtr Should I do this? I've never used CoroutineStart.UNDISPATCHED.
There was a problem hiding this comment.
the window is very very small, IO is not limited in terms of number of concurrent execution like Default is, so it should be exectuted right away. And I also never used CoroutineStart.UNDISPATCHED
jpelgrom
left a comment
There was a problem hiding this comment.
I don't see any issues with it either. Thanks for taking the time to work on this, we wouldn't have found this with Xiaomi's specific behavior being the reason.
Summary
Fix #6947. Upon close inspection there is a race condition in
WebSocketCoreImpl.android/common/src/main/kotlin/io/homeassistant/companion/android/common/data/websocket/impl/WebSocketCoreImpl.kt
Lines 764 to 769 in ba2371c
Here the WS connection is closed (no more message can be sent). However, it triggers
onClosingandonClosedasynchronously, which means that when it returns,WebSocketCoreImplstill considers the socket connected.android/common/src/main/kotlin/io/homeassistant/companion/android/common/data/websocket/impl/WebSocketCoreImpl.kt
Lines 305 to 328 in ba2371c
If a new subscription is made after
closeand beforeonClosing,connect()sees thatconnectionHolderstill contains something and return without making a new connection. Therefore, the new messages will be sent to the closing/closed connection, causing exceptions.Xiaomi's SystemUI implementation creates and cancels subscriptions in
HaControlsProviderServicein quick succession, therefore triggering this.Checklist
I have no idea how to create a unit test to cover such a race condition, but at least I no longer see exceptions (consistently) after these changes.
Any other notes
N/A