Skip to content

fix: supervise the cowboy listener instead of starting it as a side effect#389

Open
Taure wants to merge 1 commit into
novaframework:masterfrom
Taure:fix/supervise-cowboy-listener
Open

fix: supervise the cowboy listener instead of starting it as a side effect#389
Taure wants to merge 1 commit into
novaframework:masterfrom
Taure:fix/supervise-cowboy-listener

Conversation

@Taure

@Taure Taure commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator

Problem

nova_sup:init/1 starts cowboy by calling cowboy:start_clear/3 (or start_tls/3) as a side effect and discards the result:

setup_cowboy(Configuration),   %% return value ignored
{ok, {SupFlags, Children}}.

setup_cowboy/1 logs {error, Reason} and returns. So when the listener can't bind (e.g. another node already holds the port, eaddrinuse), nova logs Cowboy could not start but application:start(nova) still returns {ok, ...}. The node comes up "healthy" with no working listener, and requests silently hit whatever already owns the port.

This is a nasty footgun in development (a stale make start/shell node holding 8080) and in production (port conflict on deploy): ensure_all_started(my_app) succeeds, health checks may pass, but no route resolves. The symptom looks exactly like a routing bug (manual routing_tree:lookup returns {ok, ...} in the new node, but the request 404s because it landed on the other node).

Fix

Build the listener with ranch:child_spec/5 and add it to nova's supervision tree instead of starting it as a side effect:

  • A failed bind now fails init/1 and surfaces through application:start/1 (loud, not swallowed).
  • The listener is restart-managed like any other child.

The transport/protocol option transforms (connection_type, dynamic buffer) mirror cowboy:start_clear/start_tls for cowboy 2.15 (the pinned version) so socket behaviour is identical to before.

Before / after

%% port already held by another node
%% before:
application:ensure_all_started(my_app).
{ok, [...]}                         %% listener silently dead

%% after:
application:ensure_all_started(my_app).
{error, {nova, {{shutdown,
   {failed_to_start_child, {ranch_embedded_sup, nova_listener},
    ...{listen_error, nova_listener, eaddrinuse}}}, {nova_app, start, [normal, []]}}}}

Tests

test/nova_sup_tests.erl:

  • child_spec_is_supervised_test / tls_child_spec_uses_ssl_transport_test — the listener is a supervisor:child_spec() with type => supervisor and connection_type => supervisor.
  • listener_lifecycle_test_ — a free port yields a live listener; a busy port fails the start instead of being swallowed.

Full suite green: eunit 351/0, xref clean, dialyzer clean. Also dogfooded through a downstream Nova app: the listener-bind failure now correctly aborts that app's startup.

…ffect

nova_sup started cowboy via cowboy:start_clear/start_tls inside init/1
and discarded the result. A failed bind (e.g. eaddrinuse) was only
logged, so application:start(nova) still returned ok with no working
listener and requests silently hit whatever already held the port.

Build the listener with ranch:child_spec/5 and put it in the
supervision tree. A bind failure now fails init/1 and surfaces through
application:start/1, and the listener is restart-managed like any other
child. The option transforms mirror cowboy:start_clear/start_tls (2.15)
so buffer/connection_type behaviour is identical.

Adds nova_sup_tests covering the child-spec shape and that a busy port
fails the start instead of being swallowed.
@Taure Taure requested a review from burbas June 21, 2026 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant