BUG: Index dtype may not be applied properly by sinhrks · Pull Request #11017 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation12 Commits1 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
Fixed 2 problems:
- Specified
dtype
is not applied to other iterables.
import numpy as np
import pandas as pd
pd.Index([1, 2, 3], dtype=int)
# Index([1, 2, 3], dtype='object')
- Specifying
category
to ndarray-like results inTypeError
pd.Index(np.array([1, 2, 3]), dtype='category')
# TypeError: data type "category" not understood
Related to indexing on series/frames, not to indexes themselves
Unexpected or buggy dtype conversions
and removed Indexing
Related to indexing on series/frames, not to indexes themselves
labels
pls run perf check on this -
@sinhrks can you see if this affects perf?
if is_categorical_dtype(data) or is_categorical_dtype(dtype): |
---|
return CategoricalIndex(data, copy=copy, name=name, **kwargs) |
from pandas.tseries.index import DatetimeIndex |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave all of the imports where they were, no reason to actually import them until they are used (see below)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually on 2nd thought this is fine
Following is a asv result. Will consider a better path.
before after ratio
[789e07d9] [ce89f9d1]
33.94μs 44.57μs 1.31 ctors.index_from_series_ctor.time_index_from_series_ctor
121.86μs 173.21μs 1.42 frame_methods.frame_get_dtype_counts.time_frame_get_dtype_co
16.45μs 27.07μs 1.65 index_object.index_float64_construct.time_index_float64_construct
this is prob just from the imports (e.g. a Float64Index) doesn't care about Datetimeindex, so the import adds the extra time (or the check for the import anyhow)
@jreback Thanks, suggested changes improve perf a little. I assume other slowness is caused by categorical
condition moved to the top.
All benchmarks:
before after ratio
[4d4a2e33] [c3eaeb07]
33.85μs 41.24μs 1.22 ctors.index_from_series_ctor.time_index_from_series_ctor
129.48μs 160.86μs 1.24 frame_methods.frame_get_dtype_counts.time_frame_get_dtype_counts
16.36μs 22.88μs 1.40 index_object.index_float64_construct.time_index_float64_construct
master (this is index_object.index_float64_construct.time_index_float64_construct
) benchmark
In [1]: arr = np.arange(1000000.0)
In [2]: %timeit Index(arr)
The slowest run took 12.23 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 15.7 µs per loop
this branch
In [2]: %timeit Index(arr)
The slowest run took 7.49 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 19.8 µs per loop
fix jreback@29c0325
In [2]: %timeit Index(arr)
The slowest run took 11.74 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 13.8 µs per loop
will merge in a bit, thanks @sinhrks
merged via ead3ca8 (my change in another commit)
thanks!
I don't believe we had an issue assosicated, correct?
Though this was based on gitter chat, #5196 refers to the same issue. Closed.
2 participants