Skip to content

fix(regex): stop false-rejecting valid char classes & bounded quantifiers (semver/joi/winston)#5277

Merged
proggeramlug merged 2 commits into
mainfrom
fix/regex-charclass-quantifier
Jun 17, 2026
Merged

fix(regex): stop false-rejecting valid char classes & bounded quantifiers (semver/joi/winston)#5277
proggeramlug merged 2 commits into
mainfrom
fix/regex-charclass-quantifier

Conversation

@proggeramlug

Copy link
Copy Markdown
Contributor

Problem

Three independent false-rejections in perry's JS→Rust regex translation (crates/perry-runtime/src/regex*) made valid JS regex literals throw SyntaxError: Invalid regular expression: …: invalid pattern. Surfaced by the npm-corpus native-compile sweep — the top remaining blast-radius pattern (3 packages: semver, joi, winston). NOT a categorical engine gap (perry has both regex and fancy-regex), just translation/cap bugs.

Roots & fixes

  1. Compiled-size cap → semver. The regex crate caps a compiled program at 10 MiB (CompiledTooBig); fancy-regex delegates to the same backend. semver's ReDoS-hardened safeRe rewrites (\d{1,256}, […]{0,250}, …) exceed it. JS has no such limit. New build_std_regex / build_fancy_regex helpers raise the budget to 64 MiB (both engines, in lockstep). Drop-in for Regex::new/fancy_regex::Regex::new at all compile + validate sites.
  2. Class hyphen adjacent to a shorthand → joi. Inside a character class, a - next to a \d/\w/\s shorthand (or \p{…}/\P{…} property) is a literal hyphen in JS — a shorthand can't bound a range — but the Rust crate reads it as a range and errors with ClassRangeLiteral. Translation now escapes such hyphens to \- (with \\d literal-backslash and escaped-backslash guards). joi's URI/dataURI validators rely on this.
  3. Trivial char classes / ANSI patterns → winston (via @colors/colors). Covered by the above; added a regression test for [0m] and @colors's escapeStringRegexp output.

Verification

  • cargo build --release -p perry -p perry-runtime -p perry-stdlib — clean.
  • cargo test --release -p perry-runtime regex24 passed, 0 failed, incl. 3 new regression tests (trivial_char_class_compiles_and_matches, bounded_quantifier_in_class_not_rejected, class_hyphen_adjacent_to_shorthand_is_literal).
  • semver (end-to-end): regex wall cleared — now advances to a downstream non-regex error (Cannot read properties of undefined (reading 'COMPARATOR')), i.e. the regex no longer throws.
  • winston (@colors/colors): the exact new RegExp(escapeStringRegexp("�[0m"), 'g') scenario compiles and matches, byte-identical to node --experimental-strip-types. (Full winston run is gated behind the still-open fix(cjs): recognize bracket/computed-string-literal export forms (#5275) #5276 bracket-CJS fix; the regex portion is verified in isolation.)
  • joi: hyphen-shorthand translation unit-proven; full joi run is gated behind the still-open fix(codegen): gate String-method dispatch on string receiver — user method like internals.trim(v,s) (#5271) #5272 String.trim fix.

Notes

Ralph Küpper added 2 commits June 17, 2026 05:09
…iers (semver/joi/winston)

Three independent false-rejections in JS->Rust regex translation made
valid patterns throw "Invalid regular expression: ...: invalid pattern":

1. Compiled-size cap (semver): the regex crate caps a compiled program at
   10 MiB and rejects larger ones as CompiledTooBig. semver's ReDoS-hardened
   safeRe rewrites (\d{1,256}, [...]{0,250}, ...) exceed that. JS has no such
   limit, so raise the budget to 64 MiB for both the regex crate and the
   fancy-regex delegate (build_std_regex / build_fancy_regex helpers).

2. Class hyphen adjacent to a shorthand (joi): inside a class, a '-' next to
   a \d/\w/\s shorthand or \p{...} property is a LITERAL hyphen in JS (a
   shorthand can't bound a range), but the regex crate reads it as a range and
   errors with ClassRangeLiteral. Escape such hyphens to \- during translation.

3. Trivial char classes / ANSI patterns (winston via @colors/colors): covered
   by the above plus a regression test for [0m] and escapeStringRegexp output.

Adds unit regression tests for all three. No version/CHANGELOG/Cargo.lock edits
(maintainer folds version + changelog at merge).
@proggeramlug proggeramlug force-pushed the fix/regex-charclass-quantifier branch from b2ffb57 to c6543c0 Compare June 17, 2026 03:10
@proggeramlug proggeramlug merged commit 679d696 into main Jun 17, 2026
13 of 14 checks passed
@proggeramlug proggeramlug deleted the fix/regex-charclass-quantifier branch June 17, 2026 03:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant