Optimizations for BP5 engine of ADIOS2: Use span-based API and avoid flushing before closing an Iteration#3200
Conversation
|
I've reverted the last commit (erasing instances of flush()) to check if that was what made the tests fail. EDIT: Yep, looks like those flushes are (currently) needed. Erasing them should be what will help BP5 avoid those copies, but there might be further restructuring needed on WarpX's side? |
|
Also relevant for memory usage in BP5: Specifying the right BufferchunkSize: ComputationalRadiationPhysics/picongpu#4127 If you know that WarpX will run on systems that have virtual memory, it's worth using 2GB as a default. In the upcoming openPMD-api release, |
|
PR #6123 adds Span for fields. We can afterwards rebase this one and add particles :) |
So we can aggregate (mainly for BTD) data for a later flush. introduced an optional but related entry in the input file: `<diag>.buffer_flush_limit_btd` by default it is 5. New implementation of #3200 but so far only for fields/meshes. Particles will be a follow-up PR. --------- Co-authored-by: Junmin Gu <junmin@login10.frontier.olcf.ornl.gov> Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Junmin Gu <junmin@login08.frontier.olcf.ornl.gov>
Unlike previous engines in ADIOS2, BP5 can avoid copying data to internal buffers if one uses an optimized workflow. The following sequence of operations should be avoided:
This PR uses two optimizations to avoid that workflow:
This optimization benefits all ADIOS2 engines that support the Span API (e.g. BP4), openPMD-api automatically switches to a fallback otherwise.
Series::flush()betweenRecordComponent::storeChunk()andIteration::endStep().Notes:
Series::flush()somewhere in these routines at a later point, this way destroying the optimization again.Suggestion: Add something like
Series::promiseNoFlushesUntilEndstep()to the openPMD-api that throws an error if a flush does indeed happen.unique_ptroverload toRecordComponent::storeChunk(). This way, the backend knows that it is the unique owner of the data and can autonomously decide to delay showing the data to ADIOS2 until directly before EndStep. This is semantically only applicable to the pinned memory instance here in WarpX, the other places are static data, not unique data.Question to Axel: The following shared_ptr does correctly handle destroying the pinned memory again, right? There is no custom destructor given, so I'm wondering.
Close #3133