BUG: Timedelta input string with only symbols and no digits failed to raise an error · Issue #39710 · pandas-dev/pandas (original) (raw)
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- (optional) I have confirmed this bug exists on the master branch of pandas.
pd.Timedelta
with non-digit string input, such as pd.Timedelta('-')
, successfully processes and returns Timedelta('0 days 00:00:00')
. From discussion with @jreback in #39497 it was concluded that this is unwanted and that it should raise an error instead.
I found that this appears for string input with '+'
, '-'
, ','
and ' '
in various combinations, see below
In [1]: pd.Timedelta('+') Out[1]: Timedelta('0 days 00:00:00')
In [2]: pd.Timedelta('-') Out[2]: Timedelta('0 days 00:00:00')
In [3]: pd.Timedelta(' ') Out[3]: Timedelta('0 days 00:00:00')
In [4]: pd.Timedelta(',') Out[4]: Timedelta('0 days 00:00:00')
In [5]: pd.Timedelta('++') Out[5]: Timedelta('0 days 00:00:00')
In [6]: pd.Timedelta('+-') Out[6]: Timedelta('0 days 00:00:00')
Problem description
Invalid input should raise an error instead of successfully resolving to 0
Expected Output
Raise an error. I would propose to generalize the ValueError("unit abbreviation w/o a number")
on L497 of timedeltas.pyx
to ValueError("characters w/o a number")
. Since this error also shows up if there are no unit abbrevations, for example:
In [2]: pd.Timedelta('foo')
ValueError Traceback (most recent call last) in ----> 1 pd.Timedelta('foo')
~/Documents/developer/pandas/pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.Timedelta.new() 1192 value = parse_iso_format_string(value) 1193 else: -> 1194 value = parse_timedelta_string(value) 1195 value = np.timedelta64(value) 1196 elif PyDelta_Check(value):
~/Documents/developer/pandas/pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.parse_timedelta_string() 429 result += timedelta_as_neg(r, neg) 430 else: --> 431 raise ValueError("unit abbreviation w/o a number") 432 433 # treat as nanoseconds
ValueError: unit abbreviation w/o a number
Output of pd.show_versions()
INSTALLED VERSIONS
commit : 492c5e0
python : 3.8.6.final.0
python-bits : 64
OS : Darwin
OS-release : 20.2.0
Version : Darwin Kernel Version 20.2.0: Wed Dec 2 20:39:59 PST 2020; root:xnu-7195.60.75~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8
pandas : 1.2.0.dev0+1259.g492c5e00d1
numpy : 1.19.2
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.4
setuptools : 49.6.0.post20201009
Cython : 0.29.21
pytest : 6.1.1
hypothesis : 5.37.3
sphinx : 3.2.1
blosc : None
feather : None
xlsxwriter : 1.3.7
lxml.etree : 4.6.0
html5lib : 1.1
pymysql : None
psycopg2 : 2.8.6 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.18.1
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 0.8.4
fastparquet : 0.4.1
gcsfs : 0.7.1
matplotlib : 3.3.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : 1.0.1
pyxlsb : None
s3fs : 0.4.2
scipy : 1.5.2
sqlalchemy : 1.3.20
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.16.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.51.2