-
Notifications
You must be signed in to change notification settings - Fork 32
Add coarse comms metrics #999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 10 commits
58b0713
87bfd04
cf7aa82
292eb8e
f6429cc
ece5d52
cfd5316
fb0b3f0
57df63c
2506d58
c356994
20a2ae2
ea4037d
b293b95
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,6 +4,8 @@ | |
| */ | ||
| #pragma once | ||
|
|
||
| #include <concepts> | ||
| #include <cstdint> | ||
| #include <cstdlib> | ||
| #include <memory> | ||
| #include <mutex> | ||
|
|
@@ -17,6 +19,7 @@ | |
| #include <rapidsmpf/error.hpp> | ||
| #include <rapidsmpf/memory/buffer.hpp> | ||
| #include <rapidsmpf/progress_thread.hpp> | ||
| #include <rapidsmpf/statistics.hpp> | ||
|
|
||
| /** | ||
| * @namespace rapidsmpf | ||
|
|
@@ -640,6 +643,41 @@ constexpr bool COMM_HAVE_MPI = true; | |
| constexpr bool COMM_HAVE_MPI = false; | ||
| #endif | ||
|
|
||
| /** | ||
| * @brief Records a send statistic and then sends a message to a specific rank. | ||
| * | ||
| * Equivalent to calling `statistics.record_send(...)` followed by | ||
| * `comm.send(msg, rank, tag)`. The memory type and byte count are inferred from | ||
| * `T`: `std::vector<uint8_t>` is treated as `MemoryType::HOST` with size given by | ||
| * `.size()`, while `Buffer` uses `Buffer::mem_type()` and `Buffer::size`. | ||
| * | ||
| * @tparam T Message type: `std::vector<std::uint8_t>` or `Buffer`. | ||
| * @param comm The communicator to send through. | ||
| * @param msg Unique pointer to the message to send. | ||
| * @param rank The destination rank. | ||
| * @param tag Message tag for identification. | ||
| * @param statistics Statistics instance to record the send into. | ||
| * @param stat_suffix Optional suffix appended to the stat key name. | ||
| * @return A unique pointer to a `Future` representing the asynchronous operation. | ||
| */ | ||
| template <typename T> | ||
| requires std::same_as<T, std::vector<std::uint8_t>> || std::same_as<T, Buffer> | ||
| [[nodiscard]] inline std::unique_ptr<Communicator::Future> send_with_stats( | ||
| Communicator& comm, | ||
| std::unique_ptr<T> msg, | ||
| Rank rank, | ||
| Tag tag, | ||
| Statistics& statistics, | ||
| std::string_view stat_suffix = "" | ||
|
nirandaperera marked this conversation as resolved.
Outdated
|
||
| ) { | ||
| if constexpr (std::same_as<T, std::vector<std::uint8_t>>) { | ||
| statistics.record_send(MemoryType::HOST, msg->size(), stat_suffix); | ||
| } else { | ||
| statistics.record_send(msg->mem_type(), msg->size, stat_suffix); | ||
| } | ||
| return comm.send(std::move(msg), rank, tag); | ||
| } | ||
|
|
||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not quite sure I like this for a few reasons:
IMO, we would either:
I think 1 is a better choice and probably what I'll need to do for #996 , so we should probably go that direction nevertheless but perhaps in a separate PR? |
||
| /** | ||
| * @brief Overloads the stream insertion operator for the Communicator class. | ||
| * | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -23,9 +23,11 @@ AllReduce::AllReduce( | |
| std::unique_ptr<Buffer> output, | ||
| OpID op_id, | ||
| ReduceOperator reduce_operator, | ||
| std::shared_ptr<Statistics> statistics, | ||
| std::function<void(void)> finished_callback | ||
| ) | ||
| : comm_{std::move(comm)}, | ||
| statistics_{std::move(statistics)}, | ||
| reduce_operator_{std::move(reduce_operator)}, | ||
| in_buffer_{std::move(input)}, | ||
| out_buffer_{std::move(output)}, | ||
|
|
@@ -59,7 +61,7 @@ AllReduce::AllReduce( | |
| out_buffer_->rebind_stream(in_buffer_->stream()); | ||
| // Note: after this copy, we must check out_buffer's write event before receiving into | ||
| // in_buffer. See StartPreRemainder in the event loop. | ||
| buffer_copy(Statistics::disabled(), *out_buffer_, *in_buffer_, in_buffer_->size); | ||
| buffer_copy(statistics_, *out_buffer_, *in_buffer_, in_buffer_->size); | ||
|
|
||
| auto const rank = comm_->rank(); | ||
| if (rank < 2 * non_pow2_remainder_) { | ||
|
|
@@ -156,9 +158,9 @@ ProgressThread::ProgressState AllReduce::event_loop() { | |
| // If we don't have a power of two number of ranks, then we have `power_of_two + | ||
| // remainder` ranks. We first take `2 * remainder` ranks, and the even ranks send | ||
| // their contribution to their odd pair. The even ranks then jump to receive a final | ||
| // contribution, while the rest form a power of two and exchange via the above loop. | ||
| // Once that is complete, the paired odd ranks send the final answer to their even | ||
| // counterpart. | ||
| // contribution, while the rest (ranks >= 2 * remainder or odd) form a power of two | ||
| // and exchange via the above loop. Once that is complete, the paired odd ranks send | ||
| // the final answer to their even counterpart. | ||
|
Comment on lines
160
to
+163
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because this wasnt apparent when I read the comment initially. I thought rest meant just the odd ranks in the initial 2 * remainder ranks. |
||
| // | ||
| // So even ranks in the remainder do: | ||
| // PreRemainder -> PostRemainder -> Done | ||
|
|
@@ -172,7 +174,9 @@ ProgressThread::ProgressState AllReduce::event_loop() { | |
| if (!out_buffer_->is_latest_write_done()) { | ||
| break; | ||
| } | ||
| send_future_ = comm_->send(std::move(out_buffer_), rank + 1, tag); | ||
| send_future_ = send_with_stats( | ||
| *comm_, std::move(out_buffer_), rank + 1, tag, *statistics_ | ||
| ); | ||
| } else { | ||
| // The constructor copies in_buffer_ to out_buffer_ on in_buffer's | ||
| // stream. The copy must be complete before we can receive into | ||
|
|
@@ -185,6 +189,7 @@ ProgressThread::ProgressState AllReduce::event_loop() { | |
| { | ||
| break; | ||
| } | ||
| statistics_->record_recv(in_buffer_->mem_type(), in_buffer_->size); | ||
| recv_future_ = comm_->recv(rank - 1, tag, std::move(in_buffer_)); | ||
| } | ||
| phase_.store(Phase::CompletePreRemainder, std::memory_order_release); | ||
|
|
@@ -236,8 +241,13 @@ ProgressThread::ProgressState AllReduce::event_loop() { | |
| { | ||
| break; | ||
| } | ||
| statistics_->record_recv(in_buffer_->mem_type(), in_buffer_->size); | ||
| recv_future_ = comm_->recv(stage_partner_, tag, std::move(in_buffer_)); | ||
| send_future_ = comm_->send(std::move(out_buffer_), stage_partner_, tag); | ||
|
|
||
| send_future_ = send_with_stats( | ||
| *comm_, std::move(out_buffer_), stage_partner_, tag, *statistics_ | ||
| ); | ||
|
|
||
| phase_.store(Phase::CompleteButterfly, std::memory_order_release); | ||
| break; | ||
| } | ||
|
|
@@ -268,12 +278,15 @@ ProgressThread::ProgressState AllReduce::event_loop() { | |
| if (!out_buffer_->is_latest_write_done()) { | ||
| break; | ||
| } | ||
| statistics_->record_recv(out_buffer_->mem_type(), out_buffer_->size); | ||
| recv_future_ = comm_->recv(rank + 1, tag, std::move(out_buffer_)); | ||
| } else { | ||
| if (!out_buffer_->is_latest_write_done()) { | ||
| break; | ||
| } | ||
| send_future_ = comm_->send(std::move(out_buffer_), rank - 1, tag); | ||
| send_future_ = send_with_stats( | ||
| *comm_, std::move(out_buffer_), rank - 1, tag, *statistics_ | ||
| ); | ||
| } | ||
| phase_.store(Phase::CompletePostRemainder, std::memory_order_release); | ||
| break; | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -15,6 +15,7 @@ | |
| #include <rapidsmpf/coll/sparse_alltoall.hpp> | ||
| #include <rapidsmpf/error.hpp> | ||
| #include <rapidsmpf/memory/buffer_resource.hpp> | ||
| #include <rapidsmpf/statistics.hpp> | ||
|
|
||
| namespace rapidsmpf::coll { | ||
|
|
||
|
|
@@ -24,10 +25,12 @@ SparseAlltoall::SparseAlltoall( | |
| BufferResource* br, | ||
| std::vector<Rank> srcs, | ||
| std::vector<Rank> dsts, | ||
| std::shared_ptr<Statistics> statistics, | ||
| std::function<void()>&& finished_callback | ||
| ) | ||
| : comm_{std::move(comm)}, | ||
| br_{br}, | ||
| statistics_{std::move(statistics)}, | ||
| srcs_{std::move(srcs)}, | ||
| dsts_{std::move(dsts)}, | ||
| op_id_{op_id}, | ||
|
|
@@ -159,10 +162,14 @@ void SparseAlltoall::send_ready_messages() { | |
| Tag const payload_tag{op_id_, 1}; | ||
| for (auto& chunk : outgoing_.extract_ready()) { | ||
| auto const dst = chunk->destination(); | ||
| fire_and_forget_.push_back(comm_->send(chunk->serialize(), dst, metadata_tag)); | ||
| auto metadata = chunk->serialize(); | ||
| fire_and_forget_.push_back( | ||
| send_with_stats(*comm_, std::move(metadata), dst, metadata_tag, *statistics_) | ||
| ); | ||
| if (chunk->data_size() > 0) { | ||
| auto buf = chunk->release_data_buffer(); | ||
| fire_and_forget_.push_back( | ||
| comm_->send(chunk->release_data_buffer(), dst, payload_tag) | ||
| send_with_stats(*comm_, std::move(buf), dst, payload_tag, *statistics_) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why did you decide to create temporary variables ( |
||
| ); | ||
| } | ||
| } | ||
|
|
@@ -178,6 +185,10 @@ void SparseAlltoall::receive_metadata_messages() { | |
| } | ||
| state.received_count++; | ||
| auto chunk = detail::Chunk::deserialize(*msg, br_); | ||
| // During deserialization, we have allocated the recv buffer. So we record the | ||
| // stats here. | ||
| statistics_->record_recv(MemoryType::HOST, msg->size()); | ||
| statistics_->record_recv(chunk->memory_type(), chunk->data_size()); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thinking more about this, your explanation about when to post the receive from https://github.com/rapidsai/rapidsmpf/pull/999/files#r3148360252 was:
This is not actually true or perhaps not phrased accurately, I think what you meant to say is they are recorded at the point of "posting the metadata receive", which is what is happening here. However, is there a specific reason we need to do that? I think this is a bit flaky for future modifications, if we eventually make changes to the actual data |
||
| RAPIDSMPF_EXPECTS( | ||
| chunk->origin() == src, | ||
| "SparseAlltoall received metadata with unexpected origin" | ||
|
|
||
|
pentschev marked this conversation as resolved.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No.