Skip to content
Open
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
b833b83
Implement BytesIO.peek()
marcelm Jan 22, 2022
50a2cfb
📜🤖 Added by blurb_it.
blurb-it[bot] Apr 10, 2022
eaa7672
Document BytesIO.peek()
marcelm Nov 9, 2022
00457ae
Implement with the help of read_bytes()
marcelm Nov 9, 2022
882579d
Add to What’s New
marcelm Jul 8, 2023
c1eed72
versionadded: 3.12 -> 3.13
marcelm Jul 8, 2023
afc200c
Remove unused variable
marcelm Jul 8, 2023
79ab9a4
Test tell() after peek()
marcelm Sep 22, 2023
b493914
Update docs, factor out peek_bytes, semantics
marcelm Sep 22, 2023
2a1c85c
Update Misc/NEWS.d/next/Library/2022-04-10-20-10-59.bpo-46375.8j1ogZ.rst
marcelm Sep 28, 2023
d398717
Update Modules/_io/bytesio.c
marcelm Sep 28, 2023
26d1e81
Update Modules/_io/bytesio.c
marcelm Sep 28, 2023
9a19ff9
Use SemBr
marcelm Sep 28, 2023
9300ade
Update Doc/whatsnew/3.13.rst
marcelm Sep 28, 2023
d214089
Apply suggestions from code review
marcelm Sep 28, 2023
d6691b8
Use a context manager around memio in test_peek
marcelm Sep 28, 2023
3e51adb
Add more tests for tell() after peek()
marcelm Sep 28, 2023
3661b65
Document why size < 0 can happen
marcelm Sep 28, 2023
cd40d77
Update Modules/_io/bytesio.c
marcelm Sep 29, 2023
04372bd
Do not update pos if peek_bytes failed
marcelm Sep 29, 2023
6b9ae8c
Size can be negative after truncate or seek
marcelm Sep 29, 2023
f7406f6
Test with size<0 and size>len(buf)
marcelm Sep 29, 2023
d9528e2
Test peek() after write()
marcelm Sep 29, 2023
bc8134b
Document BufferedReader.peek and BytesIO.peek similarly
marcelm Sep 29, 2023
b6ffca8
Comment
marcelm Sep 29, 2023
5fe5645
Make it more explicit that size is ignored
marcelm Sep 29, 2023
4126a64
Return an empty bytes object for size=0
marcelm Oct 23, 2023
1ea40c2
Simplify
marcelm Oct 23, 2023
77e04d6
Test peek(3) and peek(5)
marcelm Oct 23, 2023
4d2f2dd
Run clinic.py
marcelm Apr 10, 2026
c16bebf
Apply suggestions from code review
marcelm Apr 13, 2026
08bd7da
Do not return an empty bytes object for size=0
marcelm Apr 11, 2026
6174fca
Decorate BytesIO.peek with `@critical_section`
marcelm Apr 15, 2026
b8b8cf4
Test peek returns EOF after seeking to EOF
marcelm Apr 21, 2026
7ac914e
Free-threading test for peek
marcelm Apr 21, 2026
abbd8f0
Fix free-threading test
marcelm Apr 27, 2026
07d9e4d
Default to size=0; cap to DEFAULT_BUFFER_SIZE
marcelm Apr 29, 2026
3d57f45
Revert changes to BufferedReader.peek documentation
marcelm Apr 30, 2026
023ad25
Test peek after seek or truncate
marcelm May 1, 2026
203749b
Merge branch 'main' into fix-issue-46375
emmatyping May 1, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 16 additions & 2 deletions Doc/library/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -739,6 +739,15 @@ than raw I/O does.

Return :class:`bytes` containing the entire contents of the buffer.

.. method:: peek(size=1, /)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not default to size=0 like BufferedReader?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read through the comments. In general anything returning a bytes rather than a bytes-like memoryview is going to require allocating and copying potentially many bytes... If the copy is really concerning I'd lean returning a memoryview rather than a bytes which is a mandatory copy.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memoryview came up in this comment and the three following it: #30808 (comment)

I don’t know what the conclusion is given your comment. Should a memoryview be returned instead? Most important to me is compatibility with what BufferedReader.peek() returns.

I am not too concerned about the extra memory for a bytes object as long as the default is size=1.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 to not using memoryview / definitely a good rationale in that thread.

On 3.14.0 BufferedReader (intentionally limiting buffer size) gives:

Python 3.14.0 (v3.14.0:ebf955df7a8, Oct  7 2025, 08:20:14) [Clang 16.0.0 (clang-1600.0.26.6)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

>>> br = open('README.rst', 'rb', buffering=30)
>>> def show_start_end(buf):
...     print(f'{len(buf)=}|{buf=}')
... 
>>> show_start_end(br.peek())
len(buf)=30|buf=b'This is Python version 3.15.0 '
>>> show_start_end(br.peek(0))
len(buf)=30|buf=b'This is Python version 3.15.0 '
>>> show_start_end(br.peek(1))
len(buf)=30|buf=b'This is Python version 3.15.0 '
>>> _ = br.read(len(br.peek()))
>>> show_start_end(br.peek())
len(buf)=30|buf=b'alpha 7\n======================'

Which doesn't match the behavior here or defaults. That in particular concerns me because it seems like people will build and test I/O stack pieces (ex. file parsing) expecting the .peek behavior of BytesIO then get something different when they read actual files/data...

Looking for more alternatives, the documentation page says (https://docs.python.org/3/library/io.html#io.BufferedReader.peek):

Return bytes from the stream without advancing the position. The number of bytes returned may be less or more than requested. If the underlying raw stream is non-blocking and the operation would block, returns empty bytes.

To me that leaves an intentional gap where we can always return less data than the total amount available. Peek gives no guarantees. That makes me wonder: Could we default to 0 and just always slice [:DEFAULT_BUFFER_SIZE]?

That would mean you always get DEFAULT_BUFFER_SIZE which matches default BufferedReader behavior unless there is less data available. If called in a loop yes that's a DEFAULT_BUFFER_SIZE repeated copy (BufferedIO also does that / the bytes requires a copy), but it's a lot less than "the whole buffer".

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m fine with changing the default size, but let me describe what my thought was.

My concern with choosing a default size larger than 1 was that people might test their parsing code using BytesIO that relies on the buffer always having a certain size, but then when they use BufferedReader.peek in non-testing code it could fail to work because BufferedReader.peek only guarantees that you get a single byte (if available). And if you’re almost at the end of the buffer, this does happen. Copying from an earlier comment:

$ python3 -c 'f=open("README.rst", "rb");f.read(8191);print(f.peek())'
b'r'

My thought was that by defaulting to size=1, people would be forced to at least think about this edge case. It felt like the more "conservative" choice.

That said, maybe peek is mostly used at the start of a file, and in that case, even BufferedReader.peek can be relied on to return not just a single byte.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely sympathize with wanting to vary size over time... not sure of a simple way to do that in a single change though. For BufferedReader there end up being two cases here:

  1. For non-blocking fd: peek() returns as much data as buffered currently (up to DEFAULT_BUFFER_SIZE), if there is no buffered data will attempt a read() to get some. If that returns no data / would block, will return empty buffer
  2. For blocking : Similar to non-blocking with the caveat that no data available on many kinds of files on many systems, but not all, means "end of file".

For both those cases a length 1 peek return can happen as part of regular operation, as can shorter/longer.

I'm most used to peek being used in parsing algorithms where looking ahead is often used as a way to disambiguate. For things where need a fixed readahead though usually have to build some other mechanism so that the read buffer is "guaranteed filled" to a minimum size which CPYthon .peek() on Buffered at the moment doesn't provide.

One of my thoughts was potentially a feature extension on top of this work on BytesIO: Being able to specify a "buffer size" window which is read through which gives the "peek return length varies"... Ideally that in time could pair with changing BufferedReader .peek to try and guarantee a specific length of data available which would make lexers and parsers simpler to write.

I'm not sure the risk in changing defaults of arguments in CPython / would need to ask core developers with relevant experience there. In general CPython is pretty hesitant to make incompatible changes (https://peps.python.org/pep-0387/#making-incompatible-changes) and not sure where exactly changing a default arg from 1 to 0 would fit in that framework. That has me hesitant to start with a distinct value from BufferedReader for because it may be difficult to change later.

I do think giving your thought of "definitely not whole file" answer to BytesIO.peek is good design here. The best approach I have so far is to always just limit to a fixed length (which matches BufferedReader in some ways: You get the whole file if it fits). Open to other approaches as well; I have a preference to keep the design internal to BytesIO implementation rather than function signatures/default arguments as that is easier to change.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reached out to some core developers to learn about the risks both from a "duck typing" and backwards incompatibility pieces. Will hopefully have some clear guidance shortly / find out if size=1 is actually risky from people with a lot more experience.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summarizing from the discussion: Adding a cap is within the document and unlikely to break anything. If wanted to document the cap and guarantee the behavior would want to likely add a new argument (ex. max_size). Changing size= to be meaningful generally could definitely be nice.

So lets match default of size=0 and cap at DEFAULT_BUFFER_SIZE for now, that's the path I think is viable before feature freeze next Tuesday. Can expand in the future if needed and adjusting/refining in 3.16 is very doable.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I changed the default to size=0 and added a cap of DEFAULT_BUFFER_SIZE, which is always applied even if an explicit size argument was provided. Note that DEFAULT_BUFFER_SIZE has recently been increased from 8K to 128K.


Return bytes from the current position onwards without advancing the position.
At least one byte of data is returned if not at EOF.
Return an empty :class:`bytes` object at EOF.
If the size argument is negative or larger than the number of available bytes,
a copy of the buffer from the current position until the end is returned.
Comment thread
vstinner marked this conversation as resolved.
Outdated

.. versionadded:: 3.15

.. method:: read1(size=-1, /)

Expand Down Expand Up @@ -772,8 +781,13 @@ than raw I/O does.

.. method:: peek(size=0, /)

Return bytes from the stream without advancing the position. The number of
bytes returned may be less or more than requested. If the underlying raw
Return bytes from the current position onwards without advancing the position.
At least one byte of data is returned if not at EOF.
Return an empty :class:`bytes` object at EOF.
At most one single read on the underlying raw stream is done to satisfy the call.
Comment thread
marcelm marked this conversation as resolved.
Outdated
The *size* argument is ignored.
The number of read bytes depends on the buffer size and the current position in the internal buffer.
If the underlying raw
Copy link
Copy Markdown
Author

@marcelm marcelm Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmaloney Can you confirm whether this change of the io.BufferedReader.peek documentation is fine?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think it's inaccurate (it doesn't actually guarantee a single read since PEP-0475 EINTR retry was added...). To me is out of scope for this change (Adding BytesIO.peek). Possibly worth fixing in a separate github issue + discussion.

stream is non-blocking and the operation would block, returns empty bytes.

.. method:: read(size=-1, /)
Expand Down
8 changes: 8 additions & 0 deletions Doc/whatsnew/3.15.rst
Original file line number Diff line number Diff line change
Expand Up @@ -840,6 +840,14 @@ inspect
for :func:`~inspect.getdoc`.
(Contributed by Serhiy Storchaka in :gh:`132686`.)


io
--

* Add :meth:`io.BytesIO.peek`.
(Contributed by Marcel Martin in :gh:`90533`.)


json
----

Expand Down
7 changes: 7 additions & 0 deletions Lib/_pyio.py
Original file line number Diff line number Diff line change
Expand Up @@ -996,6 +996,13 @@ def tell(self):
raise ValueError("tell on closed file")
return self._pos

def peek(self, size=1):
if self.closed:
raise ValueError("peek on closed file")
if size < 0:
return self._buffer[self._pos:]
return self._buffer[self._pos:self._pos + size]

def truncate(self, pos=None):
if self.closed:
raise ValueError("truncate on closed file")
Expand Down
42 changes: 42 additions & 0 deletions Lib/test/test_io/test_memoryio.py
Original file line number Diff line number Diff line change
Expand Up @@ -566,6 +566,48 @@ def test_issue141311(self):
buf = bytearray(2)
self.assertEqual(0, memio.readinto(buf))

def test_peek(self):
buf = self.buftype("1234567890")
with self.ioclass(buf) as memio:
Comment thread
marcelm marked this conversation as resolved.
Outdated
self.assertEqual(memio.tell(), 0)
self.assertEqual(memio.peek(1), buf[:1])
self.assertEqual(memio.peek(1), buf[:1])
Comment thread
vstinner marked this conversation as resolved.
self.assertEqual(memio.peek(), buf[:1])
self.assertEqual(memio.peek(3), buf[:3])
self.assertEqual(memio.peek(5), buf[:5])
self.assertEqual(memio.peek(0), b"")
self.assertEqual(memio.peek(len(buf) + 100), buf)
self.assertEqual(memio.peek(-1), buf)
self.assertEqual(memio.tell(), 0)
memio.read(1)
self.assertEqual(memio.tell(), 1)
self.assertEqual(memio.peek(1), buf[1:2])
self.assertEqual(memio.peek(), buf[1:2])
self.assertEqual(memio.peek(3), buf[1:4])
self.assertEqual(memio.peek(5), buf[1:6])
self.assertEqual(memio.peek(0), b"")
self.assertEqual(memio.peek(len(buf) + 100), buf[1:])
self.assertEqual(memio.peek(-1), buf[1:])
self.assertEqual(memio.tell(), 1)
memio.read()
self.assertEqual(memio.tell(), len(buf))
self.assertEqual(memio.peek(1), self.EOF)
self.assertEqual(memio.peek(3), self.EOF)
self.assertEqual(memio.peek(5), self.EOF)
self.assertEqual(memio.peek(0), b"")
self.assertEqual(memio.tell(), len(buf))
# Peeking works after writing
abc = self.buftype("abc")
memio.write(abc)
self.assertEqual(memio.peek(), self.EOF)
memio.seek(len(buf))
self.assertEqual(memio.peek(), abc[:1])
self.assertEqual(memio.peek(-1), abc)
self.assertEqual(memio.peek(len(abc) + 100), abc)
self.assertEqual(memio.tell(), len(buf))
Comment thread
marcelm marked this conversation as resolved.

Comment thread
cmaloney marked this conversation as resolved.
self.assertRaises(ValueError, memio.peek)

def test_unicode(self):
memio = self.ioclass()

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add :meth:`io.BytesIO.peek`.
48 changes: 45 additions & 3 deletions Modules/_io/bytesio.c
Original file line number Diff line number Diff line change
Expand Up @@ -420,8 +420,9 @@ _io_BytesIO_tell_impl(bytesio *self)
return PyLong_FromSsize_t(self->pos);
}

// Read without advancing position
static PyObject *
read_bytes_lock_held(bytesio *self, Py_ssize_t size)
peek_bytes_lock_held(bytesio *self, Py_ssize_t size)
{
_Py_CRITICAL_SECTION_ASSERT_OBJECT_LOCKED(self);

Expand All @@ -432,7 +433,6 @@ read_bytes_lock_held(bytesio *self, Py_ssize_t size)
if (size > 1 &&
Comment thread
marcelm marked this conversation as resolved.
self->pos == 0 && size == PyBytes_GET_SIZE(self->buf) &&
FT_ATOMIC_LOAD_SSIZE_RELAXED(self->exports) == 0) {
self->pos += size;
return Py_NewRef(self->buf);
}

Expand All @@ -444,10 +444,19 @@ read_bytes_lock_held(bytesio *self, Py_ssize_t size)
}

output = PyBytes_AS_STRING(self->buf) + self->pos;
self->pos += size;
return PyBytes_FromStringAndSize(output, size);
}

static PyObject *
read_bytes_lock_held(bytesio *self, Py_ssize_t size)
{
PyObject *bytes = peek_bytes_lock_held(self, size);
if (bytes != NULL) {
self->pos += size;
Comment thread
marcelm marked this conversation as resolved.
}
return bytes;
}

/*[clinic input]
@critical_section
_io.BytesIO.read
Expand Down Expand Up @@ -499,6 +508,38 @@ _io_BytesIO_read1_impl(bytesio *self, Py_ssize_t size)
return _io_BytesIO_read_impl(self, size);
}


/*[clinic input]
_io.BytesIO.peek
size: Py_ssize_t = 1
/

Return bytes from the stream without advancing the position.

If the size argument is negative, read until EOF is reached.
Return an empty bytes object at EOF.
[clinic start generated code]*/

static PyObject *
_io_BytesIO_peek_impl(bytesio *self, Py_ssize_t size)
/*[clinic end generated code: output=fa4d8ce28b35db9b input=1510f0fcf77c0048]*/
{
CHECK_CLOSED(self);

/* adjust invalid sizes */
Py_ssize_t n = self->string_size - self->pos;
Comment thread
cmaloney marked this conversation as resolved.
if (size < 0 || size > n) {
size = n;
/* n can be negative after truncate() or seek() */
if (size < 0) {
size = 0;
}
}
return peek_bytes_lock_held(self, size);
}



Comment thread
marcelm marked this conversation as resolved.
Outdated
/*[clinic input]
@critical_section
_io.BytesIO.readline
Expand Down Expand Up @@ -1135,6 +1176,7 @@ static struct PyMethodDef bytesio_methods[] = {
_IO_BYTESIO_READLINE_METHODDEF
_IO_BYTESIO_READLINES_METHODDEF
_IO_BYTESIO_READ_METHODDEF
_IO_BYTESIO_PEEK_METHODDEF
_IO_BYTESIO_GETBUFFER_METHODDEF
_IO_BYTESIO_GETVALUE_METHODDEF
_IO_BYTESIO_SEEK_METHODDEF
Expand Down
48 changes: 47 additions & 1 deletion Modules/_io/clinic/bytesio.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading