Skip to content

Add fuzzer for json_decode module#38

Open
AdamKorcz wants to merge 1 commit intopython:mainfrom
AdamKorcz:add-json-decode-fuzzer
Open

Add fuzzer for json_decode module#38
AdamKorcz wants to merge 1 commit intopython:mainfrom
AdamKorcz:add-json-decode-fuzzer

Conversation

@AdamKorcz
Copy link
Copy Markdown
Contributor

@AdamKorcz AdamKorcz commented Apr 10, 2026

Fuzzes the CPython _json C module (Modules/_json.c) through JSONDecoder.decode() and JSONDecoder.raw_decode(), dispatched per input via FuzzedDataProvider. Input bytes are decoded as latin-1 so every byte value maps to a distinct code point, preserving the full 0–255 byte space at the parser boundary — in contrast to json.py, which feeds UTF-8 with errors="replace" and collapses any invalid sequence to U+FFFD, sharply shrinking the effective input space. It also reaches raw_decode()'s trailing-data position reporting that json.py never calls, and drops the dumps/loads roundtrip to focus purely on decoder hardening rather than re-encoding already-valid objects.

@AdamKorcz AdamKorcz requested a review from a team as a code owner April 10, 2026 19:55
@AdamKorcz AdamKorcz marked this pull request as draft April 10, 2026 20:34
@AdamKorcz AdamKorcz force-pushed the add-json-decode-fuzzer branch from 88fdd3d to dc037dc Compare April 11, 2026 21:00
@AdamKorcz AdamKorcz force-pushed the add-json-decode-fuzzer branch from dc037dc to 2ddd07f Compare April 22, 2026 20:32
@AdamKorcz AdamKorcz marked this pull request as ready for review April 22, 2026 20:33
Fuzzes the CPython _json C module (Modules/_json.c) through
JSONDecoder.decode() and JSONDecoder.raw_decode(), dispatched per input
via FuzzedDataProvider. Input bytes are decoded as latin-1 so every
byte value maps to a distinct code point, preserving the full 0–255
byte space at the parser boundary — in contrast to json.py, which
feeds UTF-8 with errors="replace" and collapses any invalid sequence
to U+FFFD, sharply shrinking the effective input space. It also
reaches raw_decode()'s trailing-data position reporting that json.py
never calls, and drops the dumps/loads roundtrip to focus purely on
decoder hardening rather than re-encoding already-valid objects.
@AdamKorcz AdamKorcz force-pushed the add-json-decode-fuzzer branch from 2ddd07f to 2cf3a3a Compare April 22, 2026 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant