feat: add max_rows parameter to .app() (#194)#234
Draft
cpsievert wants to merge 8 commits into
Draft
Conversation
…194) Add maybe_truncate() helper and max_rows parameter (default=1000) to .app() methods for Shiny, Streamlit, Dash, and Gradio. Truncates displayed data with a user-facing info message when the limit is exceeded. This does not affect the number of rows the LLM can query. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Large datasets can overwhelm the data table display. The new max_rows parameter (default 1000) truncates displayed rows while leaving the full dataset available for LLM queries. A card footer shows row/column counts and indicates when truncation is active. Also fixes ruff S101 lint error in Python maybe_truncate().
…dling For lazy sources (Polars LazyFrame, Ibis Table, R tbl_sql), truncation is now applied before collection so the backend only transfers max_rows rows — avoiding loading the full dataset into memory just to display the first 1000 rows. Python: uses narwhals lazy path (head + collect) for Polars LazyFrame, and ibis count() + head() for Ibis Tables. Callers now pass raw data directly to maybe_truncate instead of pre-collecting via as_narwhals. R: detects tbl_sql and uses dplyr::tally() (COUNT query) + head() (LIMIT query) before collect(). Removes manual collect() from app_obj. Tests added for Polars LazyFrame (Python) and tbl_sql (R) paths.
7afcc46 to
1ad488f
Compare
The previous version had three separate code paths (ibis, lazy narwhals, eager). This collapses them: use as_narwhals(lazy=True) when max_rows is set (which preserves laziness for Polars LazyFrames and is harmless for eager frames), and as_narwhals() when it's None. For R, tbl_sql gets tally() + head() before collect(); everything else uses nrow() + head(). Net reduction of ~150 lines.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Large datasets can overwhelm the data table in
.app(). The newmax_rowsparameter (default 1000) truncates displayed rows while leaving the full dataset available viadf()for charts, summaries, and other downstream views.For lazy sources (Polars LazyFrame, R's
tbl_sql), truncation is applied before collection — the backend only transfersmax_rowsrows instead of loading the full dataset into memory. For eager/in-memory sources,head()is applied at display time.Closes #194.
Python:
max_rowsparameter on.app()for all four frameworks (Shiny, Streamlit, Dash, Gradio)maybe_truncate()usesas_narwhals(df, lazy=True)to preserve lazy semantics — one code path handles all source typesR:
max_rowsparameter on$app(),$app_obj(), andquerychat_app()maybe_truncate()detectstbl_sqland usesdplyr::tally()(COUNT query) +head()(LIMIT) beforecollect()Test plan
querychat_app(mtcars)shows "Data has 32 rows and 11 columns."