Skip to content

Enh: Contract string representation variants#120

Merged
mxsm merged 6 commits into
mainfrom
codex/pr-007-repr-contraction
Jun 20, 2026
Merged

Enh: Contract string representation variants#120
mxsm merged 6 commits into
mainfrom
codex/pr-007-repr-contraction

Conversation

@mxsm

@mxsm mxsm commented Jun 20, 2026

Copy link
Copy Markdown
Owner

Implements the representation contraction stage.

Scope:

  • Contracts internal string representation to Inline / Static / Shared / Owned.
  • Removes byte-oriented variants from CheetahString core.
  • Preserves checked byte construction by converting valid bytes into string-owned storage.
  • Confirms all-features layout returns to 32B on the verified x86_64 target.

Verification completed locally:

  • cargo test --all-features
  • cargo test --no-default-features
  • cargo test layout_snapshot --all-features -- --nocapture
  • final full verification matrix on integration branch

Closes #112.

Copilot AI review requested due to automatic review settings June 20, 2026 02:23
@coderabbitai

coderabbitai Bot commented Jun 20, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@mxsm, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 52 minutes and 58 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 05584eae-6713-4655-b47d-d7129a9f312b

📥 Commits

Reviewing files that changed from the base of the PR and between 21aa0bf and 3eaa760.

📒 Files selected for processing (20)
  • .github/workflows/ci.yaml
  • .github/workflows/release.yml
  • Cargo.toml
  • README.md
  • bench-results/README.md
  • benches/comprehensive.rs
  • benches/layout.rs
  • benches/mutation.rs
  • benches/pattern.rs
  • scripts/bench-all.ps1
  • scripts/bench-all.sh
  • src/cheetah_string.rs
  • src/lib.rs
  • src/search.rs
  • src/serde.rs
  • src/simd.rs
  • tests/basic.rs
  • tests/layout_snapshot.rs
  • tests/mutation.rs
  • tests/search.rs
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/pr-007-repr-contraction

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mxsm mxsm force-pushed the codex/pr-007-repr-contraction branch from dc54b57 to 3eaa760 Compare June 20, 2026 02:29

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the “representation contraction” stage for CheetahString, simplifying the core storage model to string-only variants (Inline / Static / Shared / Owned) while preserving fallible UTF-8 byte-based construction and updating search to use memchr/memmem by default. This aligns with Issue #112’s goal of removing byte-semantics variants from the core representation and making UTF-8 invariants easier to uphold.

Changes:

  • Contracted CheetahString internals to string-only variants and shifted checked byte inputs to validated string-owned storage.
  • Replaced substring search paths with memchr/memmem (plus a reusable CheetahFinder) and updated docs accordingly.
  • Added CI/layout snapshot + new tests/benches/scripts to validate behavior, layout artifacts, and performance-sensitive paths.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
Cargo.toml Bumps version to 1.1.0; makes bytes optional; adds memchr; updates feature wiring.
README.md Updates performance/search messaging and version examples for 1.1.0.
src/lib.rs Documents memchr-based search; adds search module and re-exports CheetahFinder.
src/cheetah_string.rs Contracts core representation; replaces byte variants with validated constructors; updates mutation/add/search behavior.
src/search.rs Adds centralized memchr/memmem byte-search helpers and reusable CheetahFinder.
src/serde.rs Simplifies serialization via as_str; deserialization validates UTF-8 for byte inputs.
src/simd.rs Removes SIMD substring search usage from main paths; leaves helpers present but now unused.
tests/basic.rs Updates conversion tests to new fallible byte constructors and revised buffer reuse expectations.
tests/search.rs Adds targeted tests for find/rfind/contains semantics and CheetahFinder.
tests/mutation.rs Adds mutation/builder tests asserting capacity reuse and in-place behavior.
tests/layout_snapshot.rs Adds a layout snapshot test that emits JSON artifacts and basic layout assertions.
benches/comprehensive.rs Updates byte-construction benches to use try_from_vec.
benches/layout.rs Adds a layout artifact emitter for benchmarks.
benches/mutation.rs Adds mutation-focused benchmarks for push/add/reserve.
benches/pattern.rs Adds substring search benchmarks including pathological cases and finder reuse.
bench-results/README.md Documents benchmark artifact layout and capture workflow.
scripts/bench-all.sh Adds a convenience script to run layout test + benches and collect outputs.
scripts/bench-all.ps1 Adds a PowerShell equivalent for collecting benchmark outputs.
.github/workflows/ci.yaml Expands CI matrix, adds no_std checks, runs layout snapshot, uploads layout artifacts.
.github/workflows/release.yml Adds a release workflow with version verification, checks, publish, and GitHub release creation.
Comments suppressed due to low confidence (1)

src/serde.rs:78

  • visit_bytes/visit_byte_buf are implemented (and now do UTF-8 validation), but deserialize_str will generally never dispatch to the byte visitors. As a result, formats that encode strings as bytes (or that provide bytes for this field) will still fail to deserialize even when the bytes are valid UTF-8. If the intent is to accept either string or bytes, use deserialize_any (or deserialize_bytes) so the byte visitor paths are reachable.
            CheetahString::try_from_vec(v).map_err(Error::custom)
        }
    }
    deserializer.deserialize_str(CheetahStringVisitor)
}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/bench-all.sh
Comment on lines +1 to +2
#!/usr/bin/env sh
set -eu
Comment thread src/simd.rs
Comment on lines 90 to 93
/// Find the first occurrence of needle in haystack using SIMD when available
#[allow(dead_code)]
#[inline]
pub(crate) fn find_bytes(haystack: &[u8], needle: &[u8]) -> Option<usize> {
Comment thread src/lib.rs
Comment on lines 3 to +7
//! No more relying solely on the standard library's String! CheetahString is a versatile string type that can store static strings, dynamic strings, and byte arrays.
//! It is usable in both `std` and `no_std` environments. Additionally, CheetahString supports serde for serialization and deserialization.
//! CheetahString also supports the `bytes` feature, allowing conversion to the `bytes::Bytes` type.
//! It minimizes allocations across small, shared, and builder-oriented string workloads.
//! Substring search uses `memchr`/`memmem` by default.
Comment thread scripts/bench-all.ps1
Comment on lines +6 to +8
cargo test layout_snapshot --all-features -- --nocapture |
Tee-Object -FilePath (Join-Path $ResultDir "layout-test.txt")

@mxsm mxsm merged commit 848414b into main Jun 20, 2026
7 checks passed
@mxsm mxsm deleted the codex/pr-007-repr-contraction branch June 20, 2026 08:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enh: Contract core representation to Inline, Static, Shared, and Owned

2 participants