Update poll for Mac POLLHUP behavior#128
Conversation
When `connect`ing on Linux, a rejection typically returns “POLLIN|POLLERR” from `poll` on Linux. Mac differs in this situation returning one of two values from poll (depending on Mac version): “POLLHUP|POLLERR” “POLLOUT|POLLHUP” Mac also sets the socket’s error to ECONNREFUSED in this case. We can support this behavior by getting the SO_ERROR off the socket in the POLLHUP case, which has the added benefit of more explicity error messages generally. Signed-off-by: Dusty Daemon @dusty_daemon
|
I've also made a modification in if (ret < 0) {
/* On some systems (e.g. macOS) getsockopt(SO_ERROR) can
* spuriously return EBADF on a still-valid socket after a
* failed connect. Only propagate EBADF if the fd itself
* is genuinely invalid; otherwise use ECONNABORTED so the
* caller sees a network-level error, not a descriptor one. */
if (errno == EBADF && fcntl(fd, F_GETFL) >= 0)
errno = ECONNABORTED;
return -1;
}This solved for another error I was getting on macOS. |
|
Interesting write up of historical pitfalls of non-blocking Apparently this is a problem many have run into. |
Which test did this fix? |
|
Looking through Mac's kernel, EBADF can be returned by sodefunct which is called when Mac needs to "forcibly invalidate and tear down an active network socket." This defunct status happens to our fds as mac decides our processes can be semi-purged, for instance if it thinks the process is "in the background" without user interaction and resources get tight. It looks like other server programs that target mac specify to the kernel that we're important and not to do this. It probably still will but do it less. It can also happen if the network interface switches over (ie from ethernet to wifi, or changing wifi) but that is probably not relevant here. The kernel code is a little opaque to my eyes, but there are two instances where the SO_ERROR on the socket itself is set to EBADF in Some of our tests are particularly aggressive and I wouldn't be much surprised at all that they put the kernel into this purging 'defunct' mode, and this might help explain many tests being flaky on mac. So there are probably two larger things to tackle here:
For 2 there are a few more "c style approaches": /* Tell the kernel we are not a 'background' process */
setpriority(PRIO_DARWIN_PROCESS, 0, 0);/* Tell kernel we're super important in the POSIX style */
setpriority(PRIO_PROCESS, pid, -20)There are some Mac specific ways to do it let activityToken = ProcessInfo.processInfo.beginActivity(
options: [.userInitiated, .idleSystemSleepDisabled, .suddenTerminationDisabled],
reason: "Running a critical networking priority service"
)Lots to consider 🤔 |
That sounds OK though, if the kernel returns this EBADF we just pass it along.
From my experience the test fails even of it's the only test running. |
|
When
connecting on Linux, a rejection typically returns “POLLIN|POLLERR” frompollon Linux.Mac differs in this situation returning one of two values from poll (depending on Mac version): “POLLHUP|POLLERR”
“POLLOUT|POLLHUP”
Mac also sets the socket’s error to ECONNREFUSED in this case.
We can support this behavior by getting the SO_ERROR off the socket in the POLLHUP case, which has the added benefit of more explicity error messages generally.
Signed-off-by: Dusty Daemon @dusty_daemon