BUG: Index dtype may not be applied properly by sinhrks · Pull Request #11017 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation12 Commits1 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

sinhrks

Fixed 2 problems:

import numpy as np
import pandas as pd

pd.Index([1, 2, 3], dtype=int)
# Index([1, 2, 3], dtype='object')
pd.Index(np.array([1, 2, 3]), dtype='category')
# TypeError: data type "category" not understood

@sinhrks sinhrks added Bug Indexing

Related to indexing on series/frames, not to indexes themselves

Dtype Conversions

Unexpected or buggy dtype conversions

and removed Indexing

Related to indexing on series/frames, not to indexes themselves

labels

Sep 7, 2015

@jreback

pls run perf check on this -

@jreback

@sinhrks can you see if this affects perf?

jreback

if is_categorical_dtype(data) or is_categorical_dtype(dtype):
return CategoricalIndex(data, copy=copy, name=name, **kwargs)
from pandas.tseries.index import DatetimeIndex

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave all of the imports where they were, no reason to actually import them until they are used (see below)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually on 2nd thought this is fine

@sinhrks

Following is a asv result. Will consider a better path.

    before     after       ratio
  [789e07d9] [ce89f9d1]
    33.94μs    44.57μs      1.31  ctors.index_from_series_ctor.time_index_from_series_ctor
   121.86μs   173.21μs      1.42  frame_methods.frame_get_dtype_counts.time_frame_get_dtype_co
    16.45μs    27.07μs      1.65  index_object.index_float64_construct.time_index_float64_construct

@jreback

this is prob just from the imports (e.g. a Float64Index) doesn't care about Datetimeindex, so the import adds the extra time (or the check for the import anyhow)

@sinhrks

@sinhrks

@jreback Thanks, suggested changes improve perf a little. I assume other slowness is caused by categorical condition moved to the top.

All benchmarks:

    before     after       ratio
  [4d4a2e33] [c3eaeb07]
    33.85μs    41.24μs      1.22  ctors.index_from_series_ctor.time_index_from_series_ctor
   129.48μs   160.86μs      1.24  frame_methods.frame_get_dtype_counts.time_frame_get_dtype_counts
    16.36μs    22.88μs      1.40  index_object.index_float64_construct.time_index_float64_construct

@jreback

master (this is index_object.index_float64_construct.time_index_float64_construct) benchmark

In [1]: arr = np.arange(1000000.0)
In [2]: %timeit Index(arr)
The slowest run took 12.23 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 15.7 µs per loop

this branch

In [2]: %timeit Index(arr)
The slowest run took 7.49 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 19.8 µs per loop

fix jreback@29c0325

In [2]: %timeit Index(arr)
The slowest run took 11.74 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 13.8 µs per loop

will merge in a bit, thanks @sinhrks

@sinhrks

@jreback

merged via ead3ca8 (my change in another commit)

thanks!

I don't believe we had an issue assosicated, correct?

@sinhrks

Though this was based on gitter chat, #5196 refers to the same issue. Closed.

@jreback

2 participants

@sinhrks @jreback