[charts-pro] Add render-only series sampling (line, bar, scatter)#22671
[charts-pro] Add render-only series sampling (line, bar, scatter)#22671JCQuintas wants to merge 30 commits into
Conversation
Add the per-series `sampling` prop (line, bar, scatter) and the `ChartSampler` public types. Introduce `selectorChartSeriesRendered`, consumed only by the rendering context hooks, which applies a registered sampler while leaving extremums, axis domain, tooltip, highlight and interaction on the full data. The plot hooks consume the resulting `sampledIndices`.
Implement the LTTB and M4 line samplers, the scatter pixel-bucket sampler, and the bar bucket-aggregate sampler, all driven by a quantized zoom level so the sampled set stays stable while panning. Register them through the `useChartProSampling` plugin on the line, bar and scatter Pro charts.
Unit tests for the LTTB/M4/bucket algorithms and the zoom-level behaviour, plus render tests asserting line and bar series downsample (including horizontal, reversed and negative data). Visual regression fixtures for line, scatter and bar sampling.
Document per-series sampling for line, scatter and bar, with demos for each method, stacked series, and a custom sampling function. Register the page in the navigation and the charts feature grid.
Deploy previewBundle size
PerformanceTotal duration: 2,341.87 ms +191.62 ms(+8.9%) | Renders: 67 (+0) | Paint: 3,449.00 ms +335.71 ms(+10.8%)
…and 1 more (+20 within noise) — details Check out the code infra dashboard for more information about this PR. |
The sampled-indices selector took the live x/y scales as memoized inputs, so it recomputed on every pan and zoom frame even though the kept set only changes when the quantized zoom level changes. It also used the live scale to detect when few points remained visible and, in that case, returned null to render the full series - which dumped the entire (potentially huge) array to the DOM exactly when zoomed in. Drive sampling entirely by the quantized zoom level: drop the live scales from the selector inputs and from the sampler context, and remove the visible-fraction bail. The target count still grows with the zoom level, so detail returns on zoom in, but the series is always sampled to that count and never rendered in full until the target reaches its length. This matches the documented stable-while- panning behaviour.
pixelBucket was exported and tested but never used by any sampler - the scatter sampler builds its own data-space grid instead. estimateVisibleFraction became dead once sampling stopped depending on the live scale. Remove both, along with their exports and tests.
m4 was described as pixel-identical to drawing every point, but the buckets are sized by index rather than by x-pixel, so that strict guarantee does not hold - reword it as a close, shape-preserving approximation. Document that sampled bars are laid out on a uniform grid rather than at their true category positions, so a bar no longer lines up with its axis tick while tooltips and highlighting stay accurate.
Sampled bars are repositioned onto a uniform slot grid that fills the band axis range, but the axis highlight kept drawing the band rectangle at the value's original band position, so it no longer lined up with the bar under the pointer. Extract the slot geometry shared by the bar plot into a helper and add a selector that exposes each band axis's sampled indices. The x/y axis highlight now map the pointer to the slot drawn under it and highlight that slot, so the band rectangle follows the displayed sampled bar.
|
Just 2 feedback about docs before I forget:
|
Should we move everything there? 🤔 I created this before that page existed
Pretty much, I tested again. Line with 1M+sampling is smooth, scatter is ok, bar is choppy (I think because I created a check so the jump in bar size from one zoom level to another wouldn't be too abrupt, it was working smooth before, will look into it 🤔). Then if we change to full set everything goes to hell 😆 |
alexfauquette
left a comment
There was a problem hiding this comment.
I'm wondering if we should simplify the API a bit.
For now it's super flexible, user can pass any sub sampling they want.
I think you could restrain the sub-samplign to only powers of 2.
One main adventage would be the possibility to compute and save all the subsample at once.
You generate an array of N=data.length intergers, and
- values form 0 to N/2 are the first subsampling indexes
- values form N/2 to N/2+N/4 are the second subsampling indexes
- ...
- up to when N/2^k corepsonds the the subsample of a fully view of the series
This will allow to do the processing once for all at the beegining
If it simplifies you can also remove the scatter plot
| // The zoom slider preview always renders the full data, so no sampling is applied. | ||
| {}, |
There was a problem hiding this comment.
That's weird. the preview should subsample, because it's goal is to give an overview, not a detailed representation
| Keep the following in mind: | ||
|
|
||
| - **Rendering uses the sampled data only.** The drawn geometry behaves as if the dropped values never existed—bars, in particular, are laid out across the full width as if only the kept categories existed, so they become fewer and wider rather than thin with gaps. | ||
| - **Everything else uses the full data.** Tooltips, the axis domain, and item interaction always read the complete dataset. As a result, hovering a point that was dropped from a sampled scatter series does not highlight it. |
There was a problem hiding this comment.
The explaination is a bit tricky to understand
| // Built-in `'bucket'`: keep one point per data-space grid cell. Runs in data space (not pixels) | ||
| // so the kept set stays stable across pans. |
There was a problem hiding this comment.
The issue with such a sampling is that
- It destroys the notion of density. You keep the global shape but if some bucket has 100 point and the other just 1, at the end they look the same
- I does nto guaranty any data reduction factor. A series with points evenly spaced being a worst edge case compared to a dense area with few outliers that creates large min/max x/y
The main downsampling I've seen about scatter charts, are in fact fifty shades of heatmap. When there are too many data points, you don't show the points, you show their density
AG charts has a down-sampling that is more informative than this one but it feel useless. Your screen just ends up with lots of point and you've no idea what to look ag
| export type UseChartProSamplingDefaultizedParameters = UseChartProSamplingParameters; | ||
|
|
||
| export interface UseChartProSamplingState { | ||
| sampling: { | ||
| /** | ||
| * Computes the render-only sampled indices for every series that sets a `sampling` method. | ||
| */ | ||
| computeSampledIndices: ChartSampledIndicesComputer; | ||
| }; | ||
| } |
There was a problem hiding this comment.
From a DX point of view, it coudl be nice to add an extra parametter to pick the algorithm at the pluggin level according to the series type.
UseChartProSamplingParameters<SereiesType> {
sampling?: SeriesType extneds keyof SamplingAlgo ? SamplingAlgo[SereiesType] : never
}It's imperfect because when mixing a line and bar, some algo might not be available for both (lttb and m4) but it will enable
<LineChart
sampling='m4'
{/* ... */}
/>which is easier than having to pass the `sampling property to each series
| const renderBudget = (pixelSpan: number, length: number) => | ||
| Math.min(length, Math.max(2, Math.floor(pixelSpan / PIXELS_PER_POINT)) * 2 ** zoomLevel); | ||
|
|
||
| // `null` when the axis is not zoomed, so its whole extent is visible. | ||
| const visibleWindow = (axisId: AxisId): { start: number; end: number } | null => { |
There was a problem hiding this comment.
You already have a sampledIndices I'm not sure to get what this function is doing. Why not just returning the sampled indexes?
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Summary
Adds render-only sampling (downsampling) to line, bar, and scatter charts (Pro). Sampling reduces the number of points actually rendered on large datasets, while keeping the full dataset for everything else: axis extremums, the axis domain, tooltips, highlighting, and item interaction all read the complete data.
This PR is the core implementation. Support for additional chart types (range bar, candlestick, heatmap) is split into a follow-up PR for ease of review.
API
Set the
samplingprop on a series to a built-in method or a custom function:'lttb'(Largest-Triangle-Three-Buckets) or'm4'(pixel-column min/max/first/last).'bucket'(one point per marker-sized grid cell).'bucket'(one representative per pixel-width bucket).DataSamplerfunction receiving{ length, target, zoomLevel, getValue, getPosition }and returning the indices to render.The
samplingtypes (BarSampling,LineSampling,ScatterSampling,DataSampler,DataSamplerParams) live in the Pro package; the community series types expose empty*SeriesExtensioninterfaces that Pro augments.How it works
useChartProSamplingplugin computes render-only sampled indices — a sidecar keyed by series id (selectorChartSampledIndices). Only the plot hooks read it; everything else keeps reading the processed (full) series, so the full-data guarantee is structural.