Add example usage to DataFrame.filter by cswarth · Pull Request #12399 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation31 Commits1 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

cswarth

Updates doc comments for DataFrame.filter and adds usage examples.
Fixed errors identified by flake8 and correctly rebase my branch before issuing PR.

DataFrame. filter(items=None, like=None, regex=None, axis=None)

Subset rows or columns of dataframe according to labels in the index.

Note that this routine does not filter a dataframe on its contents. The filter is applied to the labels of the index. This method is a thin veneer on top of DateFrame Select

Parameters: items : list-like List of info axis to restrict to (must not all be present) like : string Keep info axis where “arg in col == True” regex : string (regular expression) Keep info axis with re.search(regex, col) == True axis : int or None The axis to filter on.
Returns: same type as input object with filtered info axis

Notes

The items, like, and regex parameters should be mutually exclusive, but this is not checked.

axis defaults to the info axis that is used when indexing with [].

Examples

df one two three mouse 1 2 3 rabbit 4 5 6

select columns by name

df.filter(items=['one', 'three'])
one three mouse 1 3 rabbit 4 6

select columns by regular expression

df.filter(regex='e$', axis=1) one three mouse 1 3 rabbit 4 6

select rows containing 'bbi'

df.filter(like='bbi', axis=0) one two three rabbit 4 5 6

jreback

Note that this routine does not filter a dataframe on its
contents. The filter is applied to the labels of the index.
This method is a thin veneer on top of :ref:`DateFrame Select
<DataFrame.select>`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this last sentence is not necessary.

@cswarth

Removed the "thin veneer" comment and added a See Also referring to pandas.DataFrame.select.
The point is to demonstrate that this routine does not filter on the contents of the dataframe, but on the index. Hopefully this will help those coming from R who might be expecting DataFrame.filter to act like dplyr::filter

@jreback

@cswarth

Sorry for being a newbie, but update in what way? Do you want me to rebase my doc/df_filter development branch to upstream/master and push that to github? It looks like that will be push 201 commits so I want to make sure that is the right thing to do.

@jreback

you need to rebase on master first

@cswarth

I updated my branch, but 24hrs later the CI builds are still not complete. Is that unusual or expected?

Also the CI build for python 3.5 failed but I don't see how it has anything to do with changes I created.
https://travis-ci.org/pydata/pandas/jobs/124031500

@jreback

Travis was having some issues ok now

ithe 3.5 build on. numpy master is currently failing but that is ok

@jreback

can you rebase / update according to comments

@jreback

@jreback

@codecov-io

Current coverage is 83.76%

Merging #12399 into master will decrease coverage by 0.47%

@@ master #12399 diff @@

Files 138 135 -3
Lines 50721 49640 -1081
Methods 0 0
Branches 0 0

Powered by Codecov. Last updated by 2061e9e...5e492cd

@cswarth

As requested, added test for mutually exclusive arguments in DataFrame.filter and added unit tests for same.

jorisvandenbossche

"""
import re
args = locals().copy()
nkw = sum(map(lambda x: operator.getitem(args,x)!=None, ['items', 'like', 'regex']))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit complicated for what we want to achieve (the locals is certainly not needed I think).
Using a similar construct: sum(map(lambda x: x is not None, [item, like, regex])) (but using a list comprehension is maybe easier to read: sum([x is not None for x in [item, like, regex]]))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, changing to,

nkw = sum([operator.getitem(args,x) is not None for x in ['items', 'like', 'regex']])
if nkw > 1:
        raise TypeError("filter(): keyword arguments are mutually exclusive")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opps, I didn't understand...now using,

        nkw = sum([x is not None for x in [items, like, regex]])

@cswarth

Enforcing mutually exclusive arguments in frame.filter() is an incompatible change that is going to break some code, somewhere. Should this be called out in the release notes?

I would like to make a case that this API should be deprecated at this time. It adds little value over reindex and select at the expense of an API that is different than just about anything else.

jreback

if nkw > 1:
raise TypeError('Keyword arguments `items`, `like`, or `regex` '
'are mutually exclusive')
if axis is None:
axis = self._info_axis_name
axis_name = self._get_axis_name(axis)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. checked here

@jreback

@cswarth yes pls add a whatsnew note about the mutually exclusive args.

@jreback

already slated to deprecate in 0.19.0: #12401

@cswarth

@jreback

Labels

Docs Indexing

Related to indexing on series/frames, not to indexes themselves