ENH: Data formatting with unicode length by sinhrks · Pull Request #11102 · pandas-dev/pandas (original) (raw)

OK, this PR should work all cases which I'm aware of. Appreciated if anyone provide further test cases if any concerns.

@shoyer Yes, east-asian prefer this to be default True. But it is almost 2 times slower in below case.

import numpy as np
import pandas as pd

chars = list(u'あいうえおかきくけこさしすせそたちつてとなにぬねの')

def rand_jp(x):
    return ''.join(np.random.choice(chars) for _ in range(x))

df = pd.DataFrame(np.empty((100, 100)))
df = df.applymap(lambda x: rand_jp(10))

%timeit unicode(df)
# 10 loops, best of 3: 177 ms per loop

# Enable Unicode handling
pd.options.display.unicode.east_asian_width = True

%timeit unicode(df)
# 1 loops, best of 3: 381 ms per loop

The affect is almost the same as all ascii (same condition except for characters are all ascii):

@kawochen I may not properly understand, but reducing East Asian Width category will not affect to the performance because dict lookup is O(1).

I think ambiguous characters need special handling (?).

Do you have any idea about affected characters and special handling logic? My concern is these characters can not be aligned properly even if we tried so.

CC: @ayapi