Issue 32978: Issues with reading large float values in AIFC files (original) (raw)

Created on 2018-03-01 16:50 by serhiy.storchaka, last changed 2022-04-11 14:58 by admin.

Pull Requests
URL	Status	Linked	Edit
PR 5952	open	serhiy.storchaka,2018-03-01 16:53

Messages (10)
msg313098 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2018-03-01 16:50
The frequency rate is saved as a 80-bit floating point value in AIFC files. There are issues with reading large values. 1. Values with maximal exponent are read as aifc._HUGE_VAL which is less then sys.float_info.max. Thus greater values can be read as lesser values. >>> aifc._read_float(io.BytesIO(b'\x7f\xff\xff\xff\xff\xff\xff\xff\xf8\x00')) 1.79769313486231e+308 >>> aifc._read_float(io.BytesIO(b'\x43\xfe\xff\xff\xff\xff\xff\xff\xf8\x00')) 1.7976931348623157e+308 >>> aifc._read_float(io.BytesIO(b'\x43\xfe\xff\xff\xff\xff\xff\xff\xff\xff')) inf 2. If exponent is not maximal, but large enough, this cause an OverflowError. It would be better consistently return the maximal value or inf. >>> aifc._read_float(io.BytesIO(b'\x44\xfe\xff\xff\xff\xff\xff\xff\xf8\x00')) Traceback (most recent call last): File "", line 1, in File "/home/serhiy/py/cpython3.7/Lib/aifc.py", line 198, in _read_float f = (himant * 0x100000000 + lomant) * pow(2.0, expon - 63) OverflowError: (34, 'Numerical result out of range') OverflowError when read a broken aifc file can be unexpected. The proposed PR tries to make reading floats more consistent. I'm not sure it is correct.
msg313102 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2018-03-01 18:12
I'm finding it hard to imagine why you'd ever have a sample rate greater than 1e308 in an audio file. (In fact, it's hard to imagine needing a sample rate greater than 1e6.) Raising an OverflowError for a value that's too large to fit in an IEEE 754 binary64 float seems entirely reasonable.
msg313107 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2018-03-01 19:01
I'm fine with an OverflowError, but it looks inconsistent to me that for some huge values an OverflowError is not raised, but aifc._HUGE_VAL is returned instead. And I think that using math.ldexp() can be more preferable that pow(2.0, ...), especially for exponent around sys.float_info.max_exp and sys.float_info.min_exp.
msg313136 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2018-03-02 15:49
> And I think that using math.ldexp() can be more preferable that pow(2.0, ...) Yes, absolutely agreed. I'll take a look at the PR.
msg313141 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2018-03-02 17:14
Thank you for your review Mark. If left the OverflowError propagated I would catch it at the caller place (_read_float() is used only once) and reraise as aifc.Error. OverflowError is not expected exception. It never is raised for valid AIFC files, and the probability of raising it rather of aifc.Error for a random binary file is very small. I suppose that most users of this module don't catch it. See also . But on other side, if there are files with an exponent of 0x7fff in wild, they are currently opened without errors. Raising an exception can be a regression.
msg313143 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2018-03-02 17:21
> If left the OverflowError propagated I would catch it at the caller place (_read_float() is used only once) and reraise as aifc.Error. Ah, if there's a dedicated exception type already then yes, agreed that raising aifc.Error (with a suitable message) makes much more sense than raising an OverflowError or a ValueError. > if there are files with an exponent of 0x7fff in wild I hope there aren't, and I'd consider any such file to be broken/corrupted. That is, unless there's a common practice of using NaNs or infinities for the frame rate. I suppose it's conceivable that people us this as a placeholder meaning "I don't know the sample rate". No idea whether that's true in practice: I'm not a frequent user of this format.
msg313144 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2018-03-02 17:27
What if return 80-bit infinities and NaN as 64-bit infinities and NaN and raise an error in the case of finite numbers overflow?
msg313145 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2018-03-02 17:28
> What if return 80-bit infinities and NaN as 64-bit infinities and NaN and raise an error in the case of finite numbers overflow? That's certainly reasonable from a pure floating-point-conversion perspective: it's exactly what I'd expect a general purpose extended-precision to double-precision converter to do.
msg313147 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2018-03-02 17:31
But I don't know the exact format for infinities and NaNs.
msg313149 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2018-03-02 17:45
> But I don't know the exact format for infinities and NaNs. From the Intel software developer manuals: For infinities: Sign bit: 0 or 1 (positive or negative infinity) Exponent field (15 bits): all ones Significand (64 bits): first bit 1, all remaining bits 0. NaN: Sign bit: 0 or 1 Exponent field (15 bits): all ones Significand (64 bits): first bit 1, at least one of the remaining bits 1.

History
Date	User	Action	Args
2022-04-11 14:58:58	admin	set	github: 77159
2020-01-12 02:01:05	cheryl.sabella	set	versions: + Python 3.9, - Python 2.7, Python 3.6, Python 3.7
2018-03-02 17:45:16	mark.dickinson	set	messages: +
2018-03-02 17:31:32	serhiy.storchaka	set	messages: +
2018-03-02 17:28:41	mark.dickinson	set	messages: +
2018-03-02 17:27:31	serhiy.storchaka	set	messages: +
2018-03-02 17:21:53	mark.dickinson	set	messages: +
2018-03-02 17:14:29	serhiy.storchaka	set	messages: +
2018-03-02 15:49:12	mark.dickinson	set	messages: +
2018-03-01 19:01:43	serhiy.storchaka	set	messages: +
2018-03-01 18:12:46	mark.dickinson	set	messages: +
2018-03-01 16:53:30	serhiy.storchaka	set	keywords: + patchstage: patch reviewpull_requests: + <pull%5Frequest5718>
2018-03-01 16:50:35	serhiy.storchaka	create