API: expose pandas.errors by jreback · Pull Request #15541 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation54 Commits12 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

closes #14800

this doesn't actually deprecate the existing (as we have discussed in #15181) but baby steps.

I also didn't add the 'hated'! SettingWithCopy* ones; these will be removed in pandas 2.0 anyhow, so no point in adding them.

@jorisvandenbossche xref #15181 (comment)

I think this actually works.

In [1]: from pandas.api.exceptions import UnsortedIndexError

In [2]: try:
   ...:     raise UnsortedIndexError()
   ...: except UnsortedIndexError:
   ...:     print("caught")
   ...:     
caught

@jreback second one should be except exceptions.UnsortedIndexError, ;) Missing the exceptions.

I think this actually works.

Yes, but that is because you didn't do any deprecations of the original ones. Referring from two locations to the same exception is no problem, but it was deprecating the old one (eg by using a decorator) that caused the problems discussed before.

I am in favor of putting this just in pandas.exceptions

@jorisvandenbossche

I am in favor of putting this just in pandas.exceptions

why would you create another namespace ? when we already have a perfectly good one.

can you comment on your objection?

Well, that comes back to my initial objections to the api namespace :-) (so I should have pressed that more earlier ..)
I am in favour of removing things of the top-level namespace (functions / sub-namespaces) that people shouldn't use (as we are doing), but in favour of adding things users may use.

In this specific case, what I also would do is creating an actual module (let it be pandas.api.exceptions or pandas.exceptions, that's even a bit different discussion) with the code in there, and let our other modules (in the io code, indexing code, etc) import the exceptions from there (instead the other way around as it is now). That gathers all our custom exceptions in one place (but on the other hand moves them away from where they are used, so that's not necessarily a net win), but also gives a consistent api to users. If they encounter such an exception, it is of the same type (I mean with same class path name) as the one they can use to catch it. And you minimize the chance to 'misuse' a private import.

And to be clear, big +1 to make those exceptions more officially public!

@jorisvandenbossche sure, but these are really 2 separate things.

an API for exceptions to the user
move / change the internal usage

I don't think the 2nd part is relevant here. I mean we can always change how they are imported into the public namespace. The point here is to have a separate public API layer (just as we did with pandas.api.types. The actual exceptions being defined elsewhere is fine.

I agree with Joris that it would be better to define the exceptions in `pandas.exceptions` or whereever the public API location is (personally I like the shorter `pandas.errors`). Unlike __repr__ I don't think Python provides any builtin way to indicate the canonical location of errors. I like making a immediately visible location the API location -- it makes it much harder to unintentionally use the wrong path.

…

@shoyer @jorisvandenbossche what is the reason to deviate from the established pandas.api namespace?

find with exceptions -> errors. actually was maybe going to split out warnings separately.

This is the point of the top-level api namespace, IOW.

api.types, api.errors, api.warnings, api.* whatever (imaging .algos, .lib, .includes, .src) etc.

this is going to eventually proliferate. We cannot have pandas.* namespace be anything but non-public.

We have 1 and only 1 public namespace (or at least moving in that direction).

If the default for namespaces like pandas.* going forward is public, what's the advantage of nesting them like pandas.api.*?

Most libraries don't use special designated api submodules, e.g., consider all the scipy submodules like scipy.stats, scipy.ndimage, scipy.linalg, etc.

no, pandas.* namespace is NOT public (or at least it shouldn't be). My statement is that is de-facto currently.

I am thinking about statsmodels.api

The point is that pandas NEEDS a specifically public api, that is not just everything the kitchen sink.

just because scipy/numpy does something does not make it the 'correct' way to do things. sure its a nice reference, but pandas has 2 sets of users, API and 'regular' ones. These are different and need to be treated differently.

The problem is that for years people building things on top of pandas have just reached in and used anything. That cannot continue to happen. How better to provide implementation flexibility than to have a specific public API? I thought we had settled this discussion a while back.

Ultimately, I don't think this matters too much, but given a choice between:

pandas.* indicates private namespace, pandas.api.* includes a public namespace, and
pandas.* includes public namespace, pandas._internals (and so forth) indicates private namespace

I think the later (option 2) is more user friendly and consistent with the top level namespace pandas already being public facing. In contrast, note that with statsmodels, users only use statsmodels.api, not the top level namespace statsmodels.

On the other hand, it's true that option 2 is a little harder to transition to (for us), so maybe it's not worth the trouble.

I don't really understand your point about "API" vs "regular" users. I don't think it's a terribly advanced use case to catch appropriate exception types. I do agree that we want to reduce noise in the top level namespace.

I don't really understand your point about "API" vs "regular" users. I don't think it's a terribly advanced use case to catch appropriate exception types. I do agree that we want to reduce noise in the top level namespace.

I would this is obvious.

'regular' users are the ones who pretty much use pd.* as there namespace. In theory they may use pandas.api.exceptions (e.g. specifically catching a parser exception).
'API' users are using (formerly) pandas internals to build other things on top of pandas. This is the point of an API.

Ultimately these are part of the same namespace that is publicly exposed. The point is that it is completely silly that people can just use any of the namespace. Sure you can, but pandas has gotten so big, that you simply cannot live that that when you need to move things around to change implementation details.

pandas.* includes public namespace, pandas._internals (and so forth) indicates private namespace

how exactly is this more friendly? I would argue that this is much less friendly, simply because you don't know where to find things that you might need. This is a huge surface area, both for dev and for usage. Limiting this scope is hugely important.

this now happens in our doc builds.

/home/joris/miniconda3/envs/dev/lib/python3.5/site-packages/feather/api.py:43: FutureWarning: pandas.lib.infer_dtype is deprecated. Please use pandas._libs.lib.infer_dtype instead.
  inferred_type = pd.lib.infer_dtype(col)

certainly could do a PR upstream to fix this (e.g. with conditional imports). But might be nice
to have
from pandas.api.lib import infer_dtype

as a starting location for some library code. This is a useful enough routine that its nice to expose to the outside world.

I've been running into these internal API / external API issues lately (e.g. needing to use DatetimeTZDtype in an external library), so it might be worth documenting the pandas < 0.19 way to get some of these APIs

We indeed need to document this better.

To come back on the discussion above, I am also more in favor of "option 2" from #15541 (comment). It's indeed maybe a bit harder to get to that point, but IMO it would be much more clear to be able to say: everything accessible from the pandas.* namespace (directly or in a submodule) is public (rather than eg have to say pandas.types is private, but pandas.api.types you may use). But that is also a more general discussion not only related to exceptions, cfr #13634

@shoyer
updated

removed PandasError completely (barely used)
added doc-string to infer_dtype

@jreback looks great to me -- thanks!

One question on the docstring for infer_dtype: what do mixed and `mixed-integer mean? It would be nice to give some guidance on these. Here's my guess:

I think mixed means "not anything else" (i.e., dtype=object from numpy), not necessarily mixed types (e.g., you get "mixed" for an list of all tuples). So perhaps it would be better called "unknown-or-mixed" or something like that. Maybe this isn't worth changing, but it's definitely worth documenting.
I have no idea what 'mixed-integer' means exactly or how to trigger it. I thought this might mean a mixture of integer types (e.g., int32 vs int64) but those just turn up 'integer'.

@shoyer these have long-time meaning in pandas (could be changed, but this is what they are).

mixed* are just non-strings (unicode/bytes) that are actual mixed python objects.

In [9]: infer_dtype(['a', 1])
Out[9]: 'mixed-integer'

In [10]: infer_dtype(['a', 1.5])
Out[10]: 'mixed'

In [11]: infer_dtype([1.5, 1])
Out[11]: 'mixed-integer-float'

In [12]: infer_dtype(['a', pd.Timestamp('20130101')])
Out[12]: 'mixed'

In [14]: infer_dtype([(1, 2, 3)])
Out[14]: 'mixed'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment, looks good to me

-------
string describing the common type of the input data.
Results can include:
- floating

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small rst thing: no indentation needed, and white line before the list

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, though I don't think we actually have a doc-generation section for pandas.api ATM. (i'll make an issue).

@@ -2,6 +2,7 @@

import warnings
warnings.warn("The pandas.lib module is deprecated and will be "

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to specially catch infer_dtype here and give a more specific message since it has moved and is public? Noticed I'm using it some real code, I think based on your SO answer here - http://stackoverflow.com/a/20670901/3657742.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In [1]: pd.lib.infer_dtype('foo')
/Users/jreback/miniconda3/envs/pandas/bin/ipython:1: FutureWarning: pandas.lib.infer_dtype is deprecated. Please use pandas._libs.lib.infer_dtype instead.
  #!/Users/jreback/miniconda3/envs/pandas/bin/python
Out[1]: 'string'

hmm, let me see.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

In [1]: pd.lib.infer_dtype('foo')
/Users/jreback/miniconda3/envs/pandas/bin/ipython:1: FutureWarning: pandas.lib is deprecated and will be removed in a future version. You can access infer_dtype in pandas.api.lib.infer_dtype
  #!/Users/jreback/miniconda3/envs/pandas/bin/python
Out[1]: 'string'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in latest push

add infer_dtype to pandas.api.lib

move all exception / warning impls (and references) from pandas.core.common and pandas.io.common

closes pandas-dev#14800

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

jreback added a commit that referenced this pull request

Apr 24, 2017

pcluo pushed a commit to pcluo/pandas that referenced this pull request

May 22, 2017