Add an experimental opt-in libfyaml backend#805
Open
hsbt wants to merge 16 commits into
Open
Conversation
Built only with --enable-libfyaml; without the flag the default libyaml backend is unchanged. The parser and emitter are reimplemented against libfyaml's event API in separate translation units guarded by PSYCH_USE_LIBFYAML, and the backend is not supported on Windows. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Scalar type resolution happens in ScalarScanner, not the C backend, so swapping to libfyaml alone still resolved yes/no/on/off to booleans. Key the boolean set on Psych::BACKEND so the libyaml default keeps the YAML 1.1 set while the experimental libfyaml backend follows 1.2. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds an experimental, opt-in libfyaml backend to Psych to enable YAML 1.2–compliant parsing/emitting when built with --enable-libfyaml, while keeping the existing libyaml backend as the default. The change introduces new libfyaml-backed C implementations for the parser/emitter, exposes backend identification/version info, updates boolean/scalar scanning for YAML 1.2 semantics, and adjusts the test suite + CI to be backend-aware.
Changes:
- Add
PSYCH_USE_LIBFYAMLbuild mode (--enable-libfyaml) with libfyaml-backed parser/emitter translation units andPsych::BACKEND/Psych.libfyaml_version. - Update scalar scanning and tests to reflect YAML 1.2 core schema behavior (notably
yes/no/on/offas strings). - Add a dedicated GitHub Actions workflow to compile and run the suite with the libfyaml backend on Linux/macOS.
Reviewed changes
Copilot reviewed 28 out of 28 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
ext/psych/extconf.rb |
Adds --enable-libfyaml build option, pkg-config detection, and Windows rejection. |
ext/psych/psych.h |
Switches header include between yaml.h and libfyaml.h based on PSYCH_USE_LIBFYAML. |
ext/psych/psych.c |
Exposes Psych::BACKEND; adds Psych.libfyaml_version; adapts version reporting under libfyaml. |
ext/psych/psych_parser.c |
Compiles out the libyaml parser when building with libfyaml. |
ext/psych/psych_parser_fy.c |
New libfyaml-backed parser implementation using libfyaml’s event API. |
ext/psych/psych_emitter.c |
Compiles out the libyaml emitter when building with libfyaml. |
ext/psych/psych_emitter_fy.c |
New libfyaml-backed emitter implementation mapping Psych events to libfyaml emission. |
lib/psych/scalar_scanner.rb |
Makes boolean scanning backend-aware (YAML 1.1 vs 1.2 boolean sets). |
test/psych/helper.rb |
Adds libfyaml? helper for backend-aware tests. |
test/psych/test_boolean.rb |
Updates boolean tests to reflect YAML 1.2 behavior and adds Norway-problem coverage. |
test/psych/test_psych.rb |
Adds backend/version assertions and skips unsupported/format-different cases under libfyaml. |
test/psych/test_string.rb |
Skips YAML 1.1 boolean quoting assertions under libfyaml; adds libfyaml-specific expectations. |
test/psych/visitors/test_to_ruby.rb |
Skips YAML 1.1 yes/no/on/off boolean visitor tests under libfyaml. |
test/psych/test_yaml.rb |
Skips a YAML-spec fixture that depends on YAML 1.1 boolean typing under libfyaml. |
test/psych/test_yaml_special_cases.rb |
Skips YAML 1.1 off boolean special-case under libfyaml. |
test/psych/test_tree_builder.rb |
Skips location assertions that differ under libfyaml. |
test/psych/test_symbol.rb |
Skips a known libfyaml emitter round-trip limitation for indicator-named symbols. |
test/psych/test_set.rb |
Skips strict dump-format assertion under libfyaml. |
test/psych/test_parser.rb |
Skips mark position assertions that differ under libfyaml. |
test/psych/test_omap.rb |
Skips tag-format assertion under libfyaml. |
test/psych/test_json_tree.rb |
Skips JSON flow formatting assertions under libfyaml. |
test/psych/test_exception.rb |
Adjusts exception context expectation (libfyaml diagnostics differ). |
test/psych/test_encoding.rb |
Skips encoding-related expectations where libfyaml errors/tags differ. |
test/psych/test_data.rb |
Skips strict dump-format assertion under libfyaml. |
test/psych/test_coder.rb |
Skips formatting/tag-synthesis expectations under libfyaml. |
test/psych/json/test_stream.rb |
Skips JSON flow formatting assertions under libfyaml. |
README.md |
Documents the experimental libfyaml backend, build flag, behavior differences, and backend detection. |
.github/workflows/libfyaml.yml |
Adds CI job that builds with --enable-libfyaml, verifies backend, and runs tests on Linux/macOS. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Create a fresh parser per parse instead of reusing one via
fy_parser_reset(), which left default tag handles unset and rejected
bare ("---"-less) tag-led documents. Recover the real message and
position by switching the parser's own diagnostic object to collect
mode; creating a replacement diag crashes libfyaml 0.9.6. Drop the
spurious empty tag directive libfyaml reports.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Drop the tag when plain_implicit or quoted_implicit is set, matching how libyaml omits a tag that the value resolves to on reload; otherwise nil emitted as "!<tag:yaml.org,2002:null>" instead of an empty scalar. Honor the plain hint when choosing the scalar style, and restore the Check_Type guards on anchor and tag so non-string arguments raise TypeError. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
libfyaml only consumes UTF-8, so a UTF-16 IO fed through the chunked reader reached it as raw bytes and was rejected as invalid UTF-8. When the IO's external encoding is UTF-16LE/BE, slurp the whole stream and transcode it first; a 2-byte unit could otherwise straddle a read boundary. Other non-UTF-8 encodings stay raw and libfyaml rejects them, matching psych's UTF-8/UTF-16-only IO contract (Shift_JIS still raises). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The xcalloc'd tag buffers leaked when StringValue, the tuple-length check, or the emit raised mid-way. Wrap the work in rb_ensure so the buffers are always freed, and keep the exported directive strings in a Ruby array so the GC cannot reclaim them while their C pointers are in use. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a libfyaml? helper and guard the tests that intentionally diverge on the experimental YAML 1.2 backend: yes/no/on/off are strings rather than booleans, flow collections and block scalars are formatted differently, tags and marks are reported differently, and non-ASCII tags/aliases are rejected. test_boolean asserts the 1.2 string result directly; the formatting and mark cases are skipped. The default libyaml build is unaffected (every guard keys off Psych::BACKEND). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Mark the reconstructed SyntaxError message UTF-8 instead of US-ASCII, so a diagnostic that embeds a multibyte snippet of the input does not raise Encoding::CompatibilityError when concatenated with UTF-8. Add RB_GC_GUARD for the anchor and tag strings in the emitter (matching the existing guard on the scalar value) so their C pointers cannot dangle if a GC runs inside fy_emit_event_create. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Cover what the libfyaml backend does distinctly, each placed with the concern it belongs to: the Psych::BACKEND and Psych.libfyaml_version checks in test_psych.rb, the YAML 1.2 "Norway problem" boolean case in test_boolean.rb, and the "1.1 booleans are not quoted" emission case in test_string.rb. The 1.2 assertions are skipped on the default libyaml backend so the same suite passes under both. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Build psych with --enable-libfyaml and run the suite on Linux and macOS (libfyaml is not supported on Windows). A verification step asserts that Psych::BACKEND is actually libfyaml so the job cannot silently fall back to libyaml. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Explain how to enable it (--enable-libfyaml), that it is unsupported on Windows, the YAML 1.2 boolean behavior it brings (the "Norway problem" fix), and how to check the active backend, with an experimental caveat. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The verify command was a plain YAML scalar, so the " #{Psych::BACKEND}"
started a YAML comment and truncated the shell command, leaving an
unterminated quote. Use a block scalar so the interpolation survives.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Reject canonical output with NotImplementedError instead of silently ignoring the request, and honor the implicit flag in start_sequence and start_mapping so an implicit tag is not printed as a redundant verbose tag. Relax the unquoted-boolean dump test to allow an optional document end marker. Also correct the version docstrings and a stale comment about how the parser collects diagnostics. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The libfyaml-dev package on Ubuntu is 0.8, which segfaults psych's emitter, while 0.9.6 (used on macOS via Homebrew) passes the whole suite. Build the same 0.9.6 release from source in the Linux CI job, and reject libfyaml older than 0.9 in extconf so users get a clear error instead of a runtime crash. Also drop an unused variable in set_canonical. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
An adversarial review confirmed two backend-visible differences that were not written down. Scalars emitted with the default style can be formatted differently from libyaml, and Psych::SyntaxError#context is always nil because libfyaml keeps the whole diagnostic in #problem. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Record the rough parse/emit benchmark so users pick the backend on purpose: libfyaml is for YAML 1.2 semantics, and libyaml stays the choice when emit throughput matters since libfyaml dumps about 1.7x to 1.9x slower. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
68098bf to
d185ff2
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Psych is tied to libyaml, which implements YAML 1.1 and rejects some valid YAML 1.2 documents. This adds an experimental alternative backend built on libfyaml, a fully YAML 1.2 compliant parser and emitter, compiled only when psych is built with
--enable-libfyaml. Without the flag nothing changes and the default libyaml backend is used, and the flag is rejected on Windows where libfyaml is unsupported.Because libfyaml follows YAML 1.2, the YAML 1.1 booleans yes, no, on, and off load as plain strings rather than true or false, which resolves the "Norway problem" where the country code no became false. This is keyed on
Psych::BACKEND, so the libyaml default keeps its current behavior.The parser and emitter are reimplemented against libfyaml's event API in translation units guarded by
PSYCH_USE_LIBFYAML. Errors and positions are recovered from libfyaml's diagnostics, UTF-16 IO is transcoded to UTF-8, and scalar emission is matched to libyaml where it matters. The test suite is made backend-aware and gains positive coverage for the libfyaml behavior, and a new CI workflow builds with the flag on Linux and macOS and asserts the active backend before running the suite.The backend is experimental. Its output is valid YAML but is formatted differently from libyaml in places and a few emitter edge cases are not yet matched, so libyaml remains the supported default.