PERF: construct DataFrame with string array and dtype=str by topper-123 · Pull Request #36432 · pandas-dev/pandas (original) (raw)
Avoid inefficient call to arr.astype() when dtype is str, and use ensure_string_array instead.
Performance example:
x = np.array([str(u) for u in range(1_000_000)], dtype=object).reshape(500_000, 2) %timeit pd.DataFrame(x, dtype=str) 391 ms ± 17.7 ms per loop # master 11.9 ms ± 131 µs per loop # after this PR