Skip to content

Feat: Add CheetahBytes byte semantics#121

Merged
mxsm merged 7 commits into
mainfrom
codex/pr-008-cheetah-bytes
Jun 20, 2026
Merged

Feat: Add CheetahBytes byte semantics#121
mxsm merged 7 commits into
mainfrom
codex/pr-008-cheetah-bytes

Conversation

@mxsm

@mxsm mxsm commented Jun 20, 2026

Copy link
Copy Markdown
Owner

Implements the CheetahBytes stage.

Scope:

  • Adds CheetahBytes behind the bytes feature.
  • Keeps byte semantics separate from CheetahString.
  • Requires checked or explicit unsafe conversion from bytes to string.
  • Implements serde bytes semantics for CheetahBytes only.

Verification completed locally:

  • cargo test --features bytes
  • cargo test --features "bytes serde"
  • cargo test --no-default-features --features "bytes serde"
  • final full verification matrix on integration branch

Closes #113.

Copilot AI review requested due to automatic review settings June 20, 2026 02:23
@coderabbitai

coderabbitai Bot commented Jun 20, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@mxsm, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 52 minutes and 57 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 396ce1c8-87ca-4c4a-a7fc-a9c8556b0b63

📥 Commits

Reviewing files that changed from the base of the PR and between 21aa0bf and 623dd3d.

📒 Files selected for processing (22)
  • .github/workflows/ci.yaml
  • .github/workflows/release.yml
  • Cargo.toml
  • README.md
  • bench-results/README.md
  • benches/comprehensive.rs
  • benches/layout.rs
  • benches/mutation.rs
  • benches/pattern.rs
  • scripts/bench-all.ps1
  • scripts/bench-all.sh
  • src/bytes.rs
  • src/cheetah_string.rs
  • src/lib.rs
  • src/search.rs
  • src/serde.rs
  • src/simd.rs
  • tests/basic.rs
  • tests/bytes.rs
  • tests/layout_snapshot.rs
  • tests/mutation.rs
  • tests/search.rs
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/pr-008-cheetah-bytes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mxsm mxsm force-pushed the codex/pr-008-cheetah-bytes branch from 9e13e7f to 623dd3d Compare June 20, 2026 02:29

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a dedicated CheetahBytes type (behind the bytes feature) to separate byte-oriented semantics from CheetahString, while also updating substring search to use memchr/memmem and tightening UTF-8 construction paths for CheetahString.

Changes:

  • Add CheetahBytes with explicit checked (TryFrom/try_into_string) and unsafe (into_string_unchecked) conversion into CheetahString, plus byte-oriented serde semantics.
  • Remove implicit byte-to-string conversions for CheetahString (replace From with TryFrom and add explicit unsafe constructors), and update serde for CheetahString to serialize as a string.
  • Introduce memchr-based substring search utilities (find/rfind) and add new tests/benches plus CI artifact capture for layout snapshots.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
Cargo.toml Makes bytes optional, adds memchr, bumps crate version, adds benches/dev-dep.
src/lib.rs Wires in search module, conditionally adds/exports CheetahBytes, exports CheetahFinder.
src/cheetah_string.rs Removes implicit byte conversions, adds checked/unsafe UTF-8 constructors, updates search + builder/mutation paths.
src/bytes.rs New CheetahBytes type with conversions and (feature-gated) serde bytes behavior.
src/search.rs New memchr/memmem-based find_bytes/rfind_bytes and reusable CheetahFinder.
src/serde.rs Moves CheetahString serde to string-only serialization; updates UTF-8 validation in visitor paths.
src/simd.rs Marks now-unused SIMD search helpers as dead_code to avoid warnings.
tests/basic.rs Updates tests to use checked byte conversions and adjusts ARC/Vec buffer reuse expectations.
tests/bytes.rs Adds CheetahBytes tests for invalid UTF-8 acceptance, checked/unchecked conversions, and serde bytes semantics.
tests/search.rs Adds search semantics tests (empty needle, byte indices, unicode alignment, reusable finder).
tests/mutation.rs Adds mutation/buffer-reuse regression tests for push_str, add, reserve.
tests/layout_snapshot.rs Adds a layout snapshot test that writes JSON artifacts for CI/benchmarking.
benches/comprehensive.rs Updates benches to use checked UTF-8 construction.
benches/layout.rs Adds a bench target that emits layout JSON artifacts.
benches/mutation.rs Adds mutation-focused criterion benchmarks.
benches/pattern.rs Adds substring-search-focused criterion benchmarks and finder comparisons.
scripts/bench-all.sh Adds a convenience runner for tests/benches and capturing outputs.
scripts/bench-all.ps1 Windows equivalent for capturing test/bench outputs.
bench-results/README.md Documents intended artifact layout for benchmark outputs.
README.md Updates docs for new version, search behavior, and CheetahBytes feature semantics.
.github/workflows/ci.yaml Expands CI matrix and uploads layout snapshot artifacts.
.github/workflows/release.yml Adds a release workflow that validates versions, runs checks/tests, and publishes/releases.
Comments suppressed due to low confidence (1)

src/serde.rs:78

  • CheetahStringVisitor implements visit_bytes/visit_byte_buf, but cheetah_string() calls deserializer.deserialize_str(...), which will never dispatch to those byte-oriented visitor methods. If the intent is to accept byte buffers (while validating UTF-8), switch to deserialize_any (or deserialize_bytes/deserialize_byte_buf with appropriate visitor methods); otherwise remove the unused byte visitor methods to avoid misleading behavior.
        fn visit_bytes<E>(self, v: &[u8]) -> Result<Self::Value, E>
        where
            E: Error,
        {
            str::from_utf8(v)
                .map(CheetahString::from_slice)
                .map_err(Error::custom)
        }

        fn visit_borrowed_bytes<E>(self, v: &'a [u8]) -> Result<Self::Value, E>
        where
            E: Error,
        {
            str::from_utf8(v)
                .map(CheetahString::from_slice)
                .map_err(Error::custom)
        }

        fn visit_byte_buf<E>(self, v: Vec<u8>) -> Result<Self::Value, E>
        where
            E: Error,
        {
            CheetahString::try_from_vec(v).map_err(Error::custom)
        }
    }
    deserializer.deserialize_str(CheetahStringVisitor)
}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/bench-all.sh
Comment on lines +1 to +2
#!/usr/bin/env sh
set -eu
Comment thread scripts/bench-all.ps1
Comment on lines +6 to +8
cargo test layout_snapshot --all-features -- --nocapture |
Tee-Object -FilePath (Join-Path $ResultDir "layout-test.txt")

Comment thread Cargo.toml
Comment on lines 1 to 4
[package]
name = "cheetah-string"
version = "1.0.1"
version = "1.1.0"
authors = ["mxsm <mxsm@apache.org>"]
@mxsm mxsm merged commit 0485185 into main Jun 20, 2026
7 checks passed
@mxsm mxsm deleted the codex/pr-008-cheetah-bytes branch June 20, 2026 08:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feat: Introduce CheetahBytes for byte semantics

2 participants