Issue 36111: Non-zero offsets are no longer acceptable with SEEK_END/SEEK_CUR implementation of seek in python3 when in text mode, breaking py 2.x behavior/POSIX (original) (raw)

Created on 2019-02-25 22:12 by ngie, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (6)

msg336564 - (view)

Author: Enji Cooper (ngie) *

Date: 2019-02-25 22:12

I tried using os.SEEK_END in a technical interview, but unfortunately, that didn't work with python 3.x:

pinklady:cpython ngie$ python3 Python 3.7.2 (default, Feb 12 2019, 08:15:36) [Clang 10.0.0 (clang-1000.11.45.5)] on darwin Type "help", "copyright", "credits" or "license" for more information.

import os fp = open("configure"); fp.seek(-100, os.SEEK_END) Traceback (most recent call last): File "", line 1, in io.UnsupportedOperation: can't do nonzero end-relative seeks

It does however work with 2.x, which is aligned with the POSIX spec implementation, as shown below:

pinklady:cpython ngie$ python Python 2.7.15 (default, Oct 2 2018, 11:47:18) [GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.2)] on darwin Type "help", "copyright", "credits" or "license" for more information.

import os fp = open("configure"); fp.seek(-100, os.SEEK_END) fp.tell() 501076 os.stat("configure").st_size 501176

msg336565 - (view)

Author: Steven D'Aprano (steven.daprano) * (Python committer)

Date: 2019-02-25 22:33

I believe you will find that this is because you opened the file in text mode, which means Unicode, not bytes. If you open it in binary mode, the POSIX spec applies:

py> fp = open("sample", "rb"); fp.seek(-100, os.SEEK_END) 350

Supported values for seeking in text (Unicode) files are documented here:

https://docs.python.org/3/library/io.html#io.TextIOBase.seek

I don't believe this is a bug, or possible to be changed. Do you still think otherwise? If not, we should close this ticket.

msg336567 - (view)

Author: Enji Cooper (ngie) *

Date: 2019-02-25 22:42

?!

Being blunt: why should opening a file in binary vs text mode matter? POSIX doesn't make this distinction.

Per the pydoc (https://docs.python.org/2/library/functions.html#open):

The default is to use text mode, which may convert '\n' characters to a platform-specific representation on writing and back on reading.

If this is one of the only differentiators between binary and text mode, why should certain types of seeking be made impossible?

Having to stat the file, then set the cursor to the size of the file, minus the offset breaks the 'seek(..)' interface, and having to use 'rb', then convert from bytes to unicode overly complicates things :(.

msg336588 - (view)

Author: Enji Cooper (ngie) *

Date: 2019-02-26 00:29

Opening and seeking using SEEK_END worked in text mode with python 2.7. I'm not terribly sure why 3.x should depart from this behavior:

fp = open("configure", "rt"); fp.seek(-100, os.SEEK_END) fp.tell() 501076

msg336606 - (view)

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)

Date: 2019-02-26 05:26

This does not have relation to POSIX, since POSIX says nothing about Unicode files. "Text mode" in POSIX means binary files with converted newlines. This mode is not supported in Python 3.

msg336617 - (view)

Author: Inada Naoki (methane) * (Python committer)

Date: 2019-02-26 06:32

If you want byte IO, you can use "rb" mode. You can seek on it.

History

Date

User

Action

Args

2022-04-11 14:59:11

admin

set

github: 80292

2019-02-26 06:32:35

methane

set

nosy: + methane
messages: +

2019-02-26 06:31:08

methane

set

status: open -> closed
resolution: not a bug
stage: resolved

2019-02-26 05:26:43

serhiy.storchaka

set

nosy: + serhiy.storchaka
messages: +

2019-02-26 03:43:53

ngie

set

title: Non-zero `offset`s are no longer acceptable with implementation of `seek` in some cases with python3 when in text mode; should be per POSIX -> Non-zero `offset`s are no longer acceptable with SEEK_END/SEEK_CUR implementation of `seek` in python3 when in text mode, breaking py 2.x behavior/POSIX

2019-02-26 03:43:02

ngie

set

title: Negative `offset` values are no longer acceptable with implementation of `seek` with python3 when in text mode; should be per POSIX -> Non-zero `offset`s are no longer acceptable with implementation of `seek` in some cases with python3 when in text mode; should be per POSIX

2019-02-26 00:30:14

ngie

set

versions: + Python 3.4, Python 3.5, Python 3.6, Python 3.7, Python 3.8

2019-02-26 00:29:37

ngie

set

messages: +

2019-02-25 22:44:15

ngie

set

title: Negative `offset` values are no longer acceptable with implementation of `seek` with python3; should be per POSIX -> Negative `offset` values are no longer acceptable with implementation of `seek` with python3 when in text mode; should be per POSIX

2019-02-25 22:42:10

ngie

set

messages: +

2019-02-25 22:33:23

steven.daprano

set

nosy: + steven.daprano
messages: +

2019-02-25 22:12:02

ngie

create