API: read_stata with index_col=None should return RangeIndex by topper-123 · Pull Request #49745 · pandas-dev/pandas (original) (raw)

If read_stata was used with parameter index=None, an index based on np.arange was supplied to the constructed DataFrame, i.e. (pre pandas 2.0) an Int64Index.

np.arange has dtype np.int_, i.e. like np.intp, except is always 32bit on windows, which makes it annoying to use with tests when indexes can take all numpy numeric dtypes (like after #49560), so I'm looking into how arange is used in #49560. One case I found it was used is in read_stata and in that case it's better to use a range, so we get a RangeIndex instead of an Index[int_] when using read_stata(index_col=None).

This is a slight change in API, so I separate it out into its own PR here, so #49560, which is a large Pr, can be as focused as possible.