Skip to content

manual state change should not use fork-execute model on scheduler#65677

Draft
mobuchowski wants to merge 3 commits intoapache:mainfrom
mobuchowski:fix-manual-state-change-not-use-fork
Draft

manual state change should not use fork-execute model on scheduler#65677
mobuchowski wants to merge 3 commits intoapache:mainfrom
mobuchowski:fix-manual-state-change-not-use-fork

Conversation

@mobuchowski
Copy link
Copy Markdown
Contributor

@mobuchowski mobuchowski commented Apr 22, 2026

When the Airflow scheduler processes externally-changed task states (orphaned Celery TIs adopted after a scheduler restart, UI/API state changes routed through process_executor_events → handle_failure), the OpenLineage listener calls os.fork() in _fork_execute to emit the FAIL/COMPLETE event out-of-band.

The forked child inherits the scheduler's SSL-wrapped Postgres connection pool and, because the AF3+ branch skipped configure_orm(disable_connection_pool=True) (guarded by if not AIRFLOW_V_3_0_PLUS: from #47580 to avoid crashing on the worker's airflow-db-not-allowed:/// sentinel URL), the child issues DB queries over the same TLS socket as the parent, potentially desynchronizing the OpenSSL sequence counter and crashing the scheduler's very next session.flush() with psycopg2.OperationalError: SSL error: decryption failed or bad record mac. This happened in our environment.

This PR routes the scheduler-side "manual state change" emission through the existing ProcessPoolExecutor that DAG-run listeners already use (workers are initialized once via _executor_initializer, never share connections with the scheduler, and don't fork per event).

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

Generated-by: Claude Opus 4.7 following the guidelines

Signed-off-by: Maciej Obuchowski <maciej.obuchowski@datadoghq.com>
@mobuchowski mobuchowski force-pushed the fix-manual-state-change-not-use-fork branch from 107b18f to e0aa3e6 Compare April 22, 2026 16:18
mobuchowski and others added 2 commits April 22, 2026 19:22
- Assigning isoformat() string to datetime-typed variable confused mypy
  now that the assignment is in the outer method scope (previously inside
  a nested closure where inference behaved differently).
- ti.operator is str | None; guard against None before .lower().

Signed-off-by: Maciej Obuchowski <maciej.obuchowski@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant