Bug: BodyPartReader.filename and read() leak bytearray instead of str/bytes, violating API contract and breaking JSON serialization

### Describe the bug

The multipart reader surfaces bytearray values in places the
  https://docs.aiohttp.org/en/stable/multipart_reference.html#aiohttp.BodyPartReader.filename and type hints promise str or
  bytes. In aiohttp 3.13.x this leads to TypeError: Object of type bytearray is not JSON serializable whenever downstream code
  pipes the filename (or a decoded text field named filename) through a JSON-based store or API response.

  Per the docs:

  BodyPartReader.filename — A str with the file name specified in the Content-Disposition header, or None if not specified.

  And:

  BodyPartReader.read(*, decode=False) — Reads body part data. … Returns: bytes.

  However, in 3.13.x both entry points can surface bytearray:

  1. BodyPartReader.read(decode=True) returns the internal accumulation buffer unchanged when no
  Content-Transfer-Encoding/Content-Encoding header is present. That buffer is a bytearray, not bytes, so the annotated return
  type is violated.
  2. Reading a form field named filename via await part.read(decode=True) then chaining .strip() keeps it as bytearray, which
  silently flows into user dictionaries.
  3. When such a value is fed into json.dump, the encoder partially writes the opening of the object ({"filename": ) to the
  output stream before raising TypeError, which leaves applications with truncated, invalid JSON sidecar files on disk — a
  second-order corruption that is hard to diagnose after the fact.

  This is not a None-or-missing case and it is not a charset/encoding edge case — the type itself is wrong.


### To Reproduce


  ---
  Environment

  - aiohttp 3.13.4 (observed); cpython 3.12.3 on Ubuntu 22.04 / x86_64
  - Default server configuration (aiohttp web.Application, default middleware)
  - No custom multipart parser, no subclassing of BodyPartReader
  - Issue was traced through standard request.multipart() → reader.next() → part.read(...)

  ---
  Steps to Reproduce

  Minimal self-contained server:

  # repro_server.py
  import json
  from aiohttp import web

  async def handle(request):
      reader = await request.multipart()
      while True:
          part = await reader.next()
          if part is None:
              break
          if part.name == "filename":
              data = await part.read(decode=True)
              # Documented return type: bytes. Observed: bytearray.
              print("type =", type(data).__name__, "value =", data)
              try:
                  json.dumps({"filename": data})
              except TypeError as e:
                  return web.json_response(
                      {"error": f"TypeError: {e}"}, status=500)
              return web.json_response({"ok": True})
      return web.json_response({"error": "no filename part"}, status=400)

  app = web.Application()
  app.router.add_post("/", handle)
  web.run_app(app, port=8080)

  Client:

  curl -X POST http://localhost:8080/ \
       -F 'filename=P1_submission.diff'

  Observed stdout:

  type = bytearray value = bytearray(b'P1_submission.diff')

  Observed HTTP response:

  500
  {"error": "TypeError: Object of type bytearray is not JSON serializable"}



### Expected behavior


  ---
  Actual vs Expected

  ┌──────────────────────────────────┬─────────────────────┬─────────────────────────────────────────────────────────────────┐
  │                                  │ Expected (per docs  │                        Observed (3.13.4)                        │
  │                                  │    & type hints)    │                                                                 │
  ├──────────────────────────────────┼─────────────────────┼─────────────────────────────────────────────────────────────────┤
  │                                  │                     │ str in most cases, but values originating from upstream parsing │
  │ BodyPartReader.filename          │ Optional[str]       │  paths can surface as bytearray when fed through certain        │
  │                                  │                     │ header-parsing branches                                         │
  ├──────────────────────────────────┼─────────────────────┼─────────────────────────────────────────────────────────────────┤
  │ BodyPartReader.read(decode=True) │ bytes               │ bytearray whenever no Content-Transfer-Encoding /               │
  │                                  │                     │ Content-Encoding header is present (i.e. the common case)       │
  ├──────────────────────────────────┼─────────────────────┼─────────────────────────────────────────────────────────────────┤
  │ Downstream json.dump on the      │ succeeds            │ raises TypeError, after already emitting a partial object to    │
  │ value                            │                     │ the output stream                                               │
  └──────────────────────────────────┴─────────────────────┴─────────────────────────────────────────────────────────────────┘

  Even if a decode step legitimately cannot produce a clean str (e.g. a
  pathological filename), the API should preserve its declared return type
  — for example by applying errors="replace" when converting to str,
  or by explicitly returning bytes via bytes(buffer) rather than
  leaking the internal bytearray. Users should not have to defensively
  wrap str(...) / bytes(...) around every multipart read call just to
  get the type the documentation promises.


### Logs/tracebacks

```python-traceback
Triggering the same bug against a real production handler (our own
  handle_upload_file), the exact traceback was:

  File "backend/server.py", line 1705, in handle_upload_file
      json.dump(meta, f)
    File ".../json/__init__.py", line 179, in dump
      for chunk in iterable:
    File ".../json/encoder.py", line 432, in _iterencode
      yield from _iterencode_dict(o, _current_indent_level)
    File ".../json/encoder.py", line 406, in _iterencode_dict
      yield from chunks
    File ".../json/encoder.py", line 439, in _iterencode
      o = _default(o)
    File ".../json/encoder.py", line 180, in default
      raise TypeError(f'Object of type {o.__class__.__name__} '
  TypeError: Object of type bytearray is not JSON serializable

  Over 225 identical occurrences were observed in a single production log
  file, which translated to an upstream 39 % HTTP 500 rate on the affected
  endpoint (observed via access-log status code tally).
```

### Python Version

```console
$ python --version
                                                                                                                                  
  Python 3.12.13
```

### aiohttp Version

```console
$ python -m pip show aiohttp
                                                                                                                                  
  Name: aiohttp
  Version: 3.13.4
```

### multidict Version

```console
$ python -m pip show multidict

 
  Name: multidict
  Version: 6.7.1
```

### propcache Version

```console
$ python -m pip show propcache
 
  Name: propcache
  Version: 0.4.1
```

### yarl Version

```console
$ python -m pip show yarl
 
  Name: yarl
  Version: 1.23.0
```

### OS


  OS: Ubuntu 24.04.2 LTS (Noble Numbat), x86_64 — kernel 5.15.0-79-generic
 
  Runtime: cpython 3.12.13 built from the Anaconda distribution, statically-linked installer at bin/python3  

### Related component

Server

### Additional context


  ---
  Impact / Final Note

  This is a type-contract regression that affects every user who pipes
  multipart field data into any serializer with strict type requirements —
  most prominently json.dump, but the same class of failure occurs in:

  - msgpack.packb (TypeError: can not serialize 'bytearray' object)
  - SQLAlchemy / ORM column binding for String columns
  - HTTP libraries that type-check body arguments (httpx, urllib3)
  - Pydantic / dataclass validation decorated with str fields

  Because json.dump partially writes before raising, the failure mode is
  particularly damaging: sidecar files / persistent metadata can end up
  truncated mid-key on disk (we observed 47 corpses of exactly
  {"filename":  / 13 bytes), which does not surface as a parse error
  until some future consumer reads them. Fixing the type contract upstream
  in aiohttp is strictly safer than asking every downstream user to add
  defensive _coerce_str(...) helpers after every multipart read.

  Suggested fix directions (any one resolves the observed symptom):

  1. In BodyPartReader.read, return bytes(data) unconditionally rather
  than handing back the internal bytearray buffer.
  2. In BodyPartReader.decode, wrap the passthrough branch in bytes().
  3. Tighten the return annotation to bytes and add a CI test that asserts
  isinstance(result, bytes) on decode=True results.
  4. For filename, ensure the return is always Optional[str] even when
  header parsing falls back to a raw buffer representation — preferably
  with errors="replace" so that exotic filenames degrade gracefully
  instead of breaking the type contract.

  Happy to submit a PR for (1) + (3) if that direction is agreeable.


### Code of Conduct

- [x] I agree to follow the aio-libs Code of Conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: BodyPartReader.filename and read() leak bytearray instead of str/bytes, violating API contract and breaking JSON serialization #12404

Describe the bug

To Reproduce

repro_server.py

Expected behavior

Logs/tracebacks

Python Version

aiohttp Version

multidict Version

propcache Version

yarl Version

OS

Related component

Additional context

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Bug: BodyPartReader.filename and read() leak bytearray instead of str/bytes, violating API contract and breaking JSON serialization #12404

Description

Describe the bug

To Reproduce

repro_server.py

Expected behavior

Logs/tracebacks

Python Version

aiohttp Version

multidict Version

propcache Version

yarl Version

OS

Related component

Additional context

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions