pd.read_fwf removes leading and trailing whitespace (original) (raw)

Code Sample, a copy-pastable example if possible

from io import StringIO import pandas as pd

data = u""" a bbb ccdd """

df = pd.read_fwf(StringIO(data), widths=[3, 3], header=None)

The output is

Expected Output

Problem description

Apparently, leading and trailing whitespaces are removed but I want to keep them. Adding dtype options, converters does not solve the problem. Is this expected behaviour?

I do not think this is intended because if we implement the same example with pd.read_csv(), whitespaces are preserved.

from io import StringIO import pandas as pd

data = u""" a ,bbb cc,dd """

df = pd.read_csv(StringIO(data), header=None)

For consistency, behaviour should be identical.

The problem is also mentioned on Stackoverflow (https://stackoverflow.com/questions/41558138/pandas-read-fwf-removing-leading-and-trailing-whitespace).

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 2.7.13.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 69 Stepping 1, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.20.1
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

pd.read_fwf removes leading and trailing whitespace (original) (raw)

Code Sample, a copy-pastable example if possible

Expected Output

Problem description

Output of pd.show_versions()

Output of `pd.show_versions()`