Skip to content

Bug: BodyPartReader.filename and read() leak bytearray instead of str/bytes, violating API contract and breaking JSON serialization #12404

@lxc902

Description

@lxc902

Describe the bug

The multipart reader surfaces bytearray values in places the
https://docs.aiohttp.org/en/stable/multipart_reference.html#aiohttp.BodyPartReader.filename and type hints promise str or
bytes. In aiohttp 3.13.x this leads to TypeError: Object of type bytearray is not JSON serializable whenever downstream code
pipes the filename (or a decoded text field named filename) through a JSON-based store or API response.

Per the docs:

BodyPartReader.filename — A str with the file name specified in the Content-Disposition header, or None if not specified.

And:

BodyPartReader.read(*, decode=False) — Reads body part data. … Returns: bytes.

However, in 3.13.x both entry points can surface bytearray:

  1. BodyPartReader.read(decode=True) returns the internal accumulation buffer unchanged when no
    Content-Transfer-Encoding/Content-Encoding header is present. That buffer is a bytearray, not bytes, so the annotated return
    type is violated.
  2. Reading a form field named filename via await part.read(decode=True) then chaining .strip() keeps it as bytearray, which
    silently flows into user dictionaries.
  3. When such a value is fed into json.dump, the encoder partially writes the opening of the object ({"filename": ) to the
    output stream before raising TypeError, which leaves applications with truncated, invalid JSON sidecar files on disk — a
    second-order corruption that is hard to diagnose after the fact.

This is not a None-or-missing case and it is not a charset/encoding edge case — the type itself is wrong.

To Reproduce


Environment

  • aiohttp 3.13.4 (observed); cpython 3.12.3 on Ubuntu 22.04 / x86_64
  • Default server configuration (aiohttp web.Application, default middleware)
  • No custom multipart parser, no subclassing of BodyPartReader
  • Issue was traced through standard request.multipart() → reader.next() → part.read(...)

Steps to Reproduce

Minimal self-contained server:

repro_server.py

import json
from aiohttp import web

async def handle(request):
reader = await request.multipart()
while True:
part = await reader.next()
if part is None:
break
if part.name == "filename":
data = await part.read(decode=True)
# Documented return type: bytes. Observed: bytearray.
print("type =", type(data).name, "value =", data)
try:
json.dumps({"filename": data})
except TypeError as e:
return web.json_response(
{"error": f"TypeError: {e}"}, status=500)
return web.json_response({"ok": True})
return web.json_response({"error": "no filename part"}, status=400)

app = web.Application()
app.router.add_post("/", handle)
web.run_app(app, port=8080)

Client:

curl -X POST http://localhost:8080/
-F 'filename=P1_submission.diff'

Observed stdout:

type = bytearray value = bytearray(b'P1_submission.diff')

Observed HTTP response:

500
{"error": "TypeError: Object of type bytearray is not JSON serializable"}

Expected behavior


Actual vs Expected

┌──────────────────────────────────┬─────────────────────┬─────────────────────────────────────────────────────────────────┐
│ │ Expected (per docs │ Observed (3.13.4) │
│ │ & type hints) │ │
├──────────────────────────────────┼─────────────────────┼─────────────────────────────────────────────────────────────────┤
│ │ │ str in most cases, but values originating from upstream parsing │
│ BodyPartReader.filename │ Optional[str] │ paths can surface as bytearray when fed through certain │
│ │ │ header-parsing branches │
├──────────────────────────────────┼─────────────────────┼─────────────────────────────────────────────────────────────────┤
│ BodyPartReader.read(decode=True) │ bytes │ bytearray whenever no Content-Transfer-Encoding / │
│ │ │ Content-Encoding header is present (i.e. the common case) │
├──────────────────────────────────┼─────────────────────┼─────────────────────────────────────────────────────────────────┤
│ Downstream json.dump on the │ succeeds │ raises TypeError, after already emitting a partial object to │
│ value │ │ the output stream │
└──────────────────────────────────┴─────────────────────┴─────────────────────────────────────────────────────────────────┘

Even if a decode step legitimately cannot produce a clean str (e.g. a
pathological filename), the API should preserve its declared return type
— for example by applying errors="replace" when converting to str,
or by explicitly returning bytes via bytes(buffer) rather than
leaking the internal bytearray. Users should not have to defensively
wrap str(...) / bytes(...) around every multipart read call just to
get the type the documentation promises.

Logs/tracebacks

Triggering the same bug against a real production handler (our own
  handle_upload_file), the exact traceback was:

  File "backend/server.py", line 1705, in handle_upload_file
      json.dump(meta, f)
    File ".../json/__init__.py", line 179, in dump
      for chunk in iterable:
    File ".../json/encoder.py", line 432, in _iterencode
      yield from _iterencode_dict(o, _current_indent_level)
    File ".../json/encoder.py", line 406, in _iterencode_dict
      yield from chunks
    File ".../json/encoder.py", line 439, in _iterencode
      o = _default(o)
    File ".../json/encoder.py", line 180, in default
      raise TypeError(f'Object of type {o.__class__.__name__} '
  TypeError: Object of type bytearray is not JSON serializable

  Over 225 identical occurrences were observed in a single production log
  file, which translated to an upstream 39 % HTTP 500 rate on the affected
  endpoint (observed via access-log status code tally).

Python Version

$ python --version
                                                                                                                                  
  Python 3.12.13

aiohttp Version

$ python -m pip show aiohttp
                                                                                                                                  
  Name: aiohttp
  Version: 3.13.4

multidict Version

$ python -m pip show multidict

 
  Name: multidict
  Version: 6.7.1

propcache Version

$ python -m pip show propcache
 
  Name: propcache
  Version: 0.4.1

yarl Version

$ python -m pip show yarl
 
  Name: yarl
  Version: 1.23.0

OS

OS: Ubuntu 24.04.2 LTS (Noble Numbat), x86_64 — kernel 5.15.0-79-generic

Runtime: cpython 3.12.13 built from the Anaconda distribution, statically-linked installer at bin/python3

Related component

Server

Additional context


Impact / Final Note

This is a type-contract regression that affects every user who pipes
multipart field data into any serializer with strict type requirements —
most prominently json.dump, but the same class of failure occurs in:

  • msgpack.packb (TypeError: can not serialize 'bytearray' object)
  • SQLAlchemy / ORM column binding for String columns
  • HTTP libraries that type-check body arguments (httpx, urllib3)
  • Pydantic / dataclass validation decorated with str fields

Because json.dump partially writes before raising, the failure mode is
particularly damaging: sidecar files / persistent metadata can end up
truncated mid-key on disk (we observed 47 corpses of exactly
{"filename": / 13 bytes), which does not surface as a parse error
until some future consumer reads them. Fixing the type contract upstream
in aiohttp is strictly safer than asking every downstream user to add
defensive _coerce_str(...) helpers after every multipart read.

Suggested fix directions (any one resolves the observed symptom):

  1. In BodyPartReader.read, return bytes(data) unconditionally rather
    than handing back the internal bytearray buffer.
  2. In BodyPartReader.decode, wrap the passthrough branch in bytes().
  3. Tighten the return annotation to bytes and add a CI test that asserts
    isinstance(result, bytes) on decode=True results.
  4. For filename, ensure the return is always Optional[str] even when
    header parsing falls back to a raw buffer representation — preferably
    with errors="replace" so that exotic filenames degrade gracefully
    instead of breaking the type contract.

Happy to submit a PR for (1) + (3) if that direction is agreeable.

Code of Conduct

  • I agree to follow the aio-libs Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugneeds-infoIssue is lacking sufficient information and will be closed if not provided

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions