What would you like to happen?
Hey team we’re hitting a logging gap with native code (pybind / C++) in the Python SDK harness. It seems like boot.go launches a bunch of Python child processes, and then pipes its outputs stdout/stderr to BufferedLogger -> FnLogging. But because it never goes through the Python Fn logging stack, so we lose the same worker attribution you get from normal SDK logs - particularly portability_worker_id.
Concretely, our Python logs come with the metadata jsonPayload.portability_worker_id=sdk-0-0_sibling_5. But our pybind'd C++ code logs has jsonPayload.portability_worker_id=sdk-0-0 (from boot.go), so we can't correlate it.
I wonder if easily fixable in boot.go? In
|
bufLogger := tools.NewBufferedLogger(logger) |
, instead of:
bufLogger := tools.NewBufferedLogger(logger)
We would do something like:
workerCtx := grpcx.WriteWorkerID(context.Background(), workerId)
bufLogger := tools.NewBufferedLoggerWithFlushInterval(workerCtx, logger, 100*time.Millisecond)
I have no idea if on the Dataflow Unified Harness, will actually pick up the workerId on the Google logging sink, which is why I'm asking.
EDIT 2026-04-17
I got this working by creating a separate logger per worker:
workerCtx := grpcx.WriteWorkerID(context.Background(), workerId)
// Separate Fn logging client per worker so the first Log on this Logger
// opens the stream with the correct worker_id metadata (shared Logger
// would reuse the first stream only).
workerLogger := &tools.Logger{Endpoint: *loggingEndpoint}
bufLogger := tools.NewBufferedLoggerWithFlushInterval(workerCtx, workerLogger, 100*time.Millisecond)
...
workerLogger.Printf(workerCtx, "Executing Python (worker %v): python %v", workerId, strings.Join(args, " "))
cmd := StartCommandEnv(map[string]string{"WORKER_ID": workerId}, os.Stdin, bufLogger, bufLogger, "python", args...)
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components
What would you like to happen?
Hey team we’re hitting a logging gap with native code (pybind / C++) in the Python SDK harness. It seems like
boot.golaunches a bunch of Python child processes, and then pipes its outputs stdout/stderr toBufferedLogger->FnLogging. But because it never goes through the Python Fn logging stack, so we lose the same worker attribution you get from normal SDK logs - particularlyportability_worker_id.Concretely, our Python logs come with the metadata
jsonPayload.portability_worker_id=sdk-0-0_sibling_5. But our pybind'd C++ code logs hasjsonPayload.portability_worker_id=sdk-0-0(fromboot.go), so we can't correlate it.I wonder if easily fixable in
boot.go? Inbeam/sdks/python/container/boot.go
Line 321 in ae536b6
We would do something like:
I have no idea if on the Dataflow Unified Harness, will actually pick up the
workerIdon the Google logging sink, which is why I'm asking.EDIT 2026-04-17
I got this working by creating a separate logger per worker:
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components