DOC: correct docstring examples (#3439) by ProsperousHeart · Pull Request #16432 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation43 Commits14 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

0 0.548814 0.544883 0.437587 0.383442
1 0.715189 0.423655 0.891773 0.791725
2 0.602763 0.645894 0.963663 0.528895

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have to remove this line (even though it seems like it should be there, based on the output).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aw man ... WTF?^^ lol

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

@@ -1129,8 +1134,7 @@ def get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False,
1 0 1 0
2 0 0 1

>>> df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['b', 'a', 'c'],
'C': [1, 2, 3]})
>>> df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['b', 'a', 'c'], 'C': [1, 2, 3]})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will give a linting error (we check for PEP8)

The way you can solve this is by adding ... (then it should pass the doctests):

>>> df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['b', 'a', 'c'],
...                    'C': [1, 2, 3]})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, the lines should be less than 80 characters wide.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aw man .... I did that to several ...any suggested way to fix it?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like in mass? lol

@@ -940,6 +941,7 @@ def wide_to_long(df, stubnames, i, j, sep="", suffix='\d+'):
8 3 3 2.1 2.9
>>> l = pd.wide_to_long(df, stubnames='ht', i=['famid', 'birth'], j='age')
>>> l
... # doctest: +NORMALIZE_WHITESPACE

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you give some explanation why this is needed in this case ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche I think the issue was when there's a df.index.name the output has whitespace for the rest of that line (which we don't want to include in the source)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, ...
Is that actually something we might want to solve in pandas, in the repr? (it's not a bug, but I also don't think it is a feature someone relies upon? So could change that (if it is easy) to not have to do this here) But that is certainly for another PR

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we could set this flag globally if this occurs a lot? (if that is possible)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what @TomAugspurger was thinking, but I don't think he knows either?

@@ -689,7 +689,7 @@ def _convert_level_number(level_num, columns):
new_labels = [np.arange(N).repeat(levsize)]
new_names = [this.index.name] # something better?

new_levels.append(level_vals)
new_levels.append(frame.columns.levels[level_num])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an actual change in the code. Is this to fix a bug? If so, could you do this as a separate PR (and add a test for it)?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't make that change so I don't know what that is

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is in your commit, you somehow made that change :-)
But if it was not the intent, you can just revert it (change it back to how it was based on the diff you see here)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't - I swear I didn't. I don't even know what that means. I'll change it back, but I promise it wasn't me.


>>> s.unstack(level=0)
one two
a 1 2
b 3 4
a 1.0 3.0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this was actually an error in the example!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no! So ...... how do I fix it? Cause when I try to change it, I get an error

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, no, now it is correct! So your change is perfect. I just noticed that by running the doctests, we actually corrected some errors in the docs, which is its purpose, so that is good :-)

two a 3
b 4
one a 1.0
b 2.0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another option would be to change the series construction, use np.arange(1, 5) instead of np.arange(1.0, 5.0).

I think using integers makes the example slightly simpler

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - working on this now.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done - will be committing soon

jorisvandenbossche changed the title~~1st update for issue 3439~~ DOC: correct docstring examples (#3439)

May 22, 2017

X id
0 0 0
1 1 1
2 2 2
2 1 2

>>> pd.wide_to_long(df, ['A(quarterly)', 'B(quarterly)'],
i='id', j='year', sep='-')
X A(quarterly) B(quarterly)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that the X column below is not correct? (it has no 1's)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

X is random ... which didn't make sense to me.

    >>> df = pd.DataFrame({'A(quarterly)-2010': np.random.rand(3),
    ...                    'A(quarterly)-2011': np.random.rand(3),
    ...                    'B(quarterly)-2010': np.random.rand(3),
    ...                    'B(quarterly)-2011': np.random.rand(3),
    ...                    'X' : np.random.randint(3, size=3)})
    >>> df['id'] = df.index
    >>> df
    ... # doctest: +NORMALIZE_WHITESPACE
       A(quarterly)-2010  A(quarterly)-2011  B(quarterly)-2010  B(quarterly)-2011  \
    0           0.548814           0.544883           0.437587           0.383442
    1           0.715189           0.423655           0.891773           0.791725
    2           0.602763           0.645894           0.963663           0.528895
       X  id
    0  0   0
    1  1   1
    2  1   2```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can either change X to something not random, or leave it as is(with the random seed, it is also consistent), but the output below should just match the input (which is now not the case I think)

1 0.634401 0.611024 0.361789 0.630976
2 0.849432 0.722443 0.228263 0.092105
\
... # doctest: +NORMALIZE_WHITESPACE

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was too long then :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant

>>> df # doctest: +NORMALIZE_WHITESPACE

which would not be too long in this case

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok - I'll add in next commit

b 2
two a 3
b 4
dtype: int32

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one will depend on the platform you are using, so to be more robust, we should probably best add the dtype to np.arange. I would use 'int64', as this is the default in pandas (in numpy the default is platform dependent).
Or as an alternative use range(1, 5), then pandas Series will always convert this to int64

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per our discussion, left at int32 - np.arrange and range() did not seem to make a difference so I kept it at np.arrange

What does it mean when the continuous-integration/travis-ci/py failed?

In this case it is due to some linting errors (some style issues, based on PEP8 and flake are checked in the third build on travis, which is the one that is failing. If you go to travis and click on the failing build, and scroll down, you can see why).
The output from travis:

$ ci/lint.sh

inside ci/lint.sh
Linting  *.py
pandas/core/reshape/reshape.py:992:80: E501 line too long (84 > 79 characters)
pandas/core/reshape/reshape.py:993:80: E501 line too long (81 > 79 characters)
pandas/core/reshape/reshape.py:994:80: E501 line too long (81 > 79 characters)
pandas/core/reshape/reshape.py:995:80: E501 line too long (81 > 79 characters)
pandas/core/reshape/reshape.py:1000:1: W293 blank line contains whitespace
pandas/core/reshape/reshape.py:1001:80: E501 line too long (88 > 79 characters)
pandas/core/reshape/reshape.py:1015:80: E501 line too long (117 > 79 characters)
pandas/core/reshape/reshape.py:1133:71: W291 trailing whitespace
Linting *.py DONE

I don't see any way (may not have search correct) for Notepad++ and PEP8

... "one", "two", "two", "two", "one"], dtype=object)
>>> c = np.array(["dull", "dull", "shiny", "dull", "dull", "shiny",
... "shiny", "dull", "shiny", "shiny", "shiny"],
... dtype=object)

>>> crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this pd.crosstab instead of crosstab ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done - will be in next commit

>>> pd.cut(np.array([.2, 1.4, 2.5, 6.2, 9.7, 2.1]), 3,
labels=["good","medium","bad"])
>>> result, bins = pd.cut(np.array([.2, 1.4, 2.5, 6.2, 9.7, 2.1]),
... 3, labels=["good","medium","bad"])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, you still would need

>>> result
... [output]
>>> bins
... [output]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what that would look like :(

Once confirmed, pivot completed.

@ProsperousHeart Thanks a lot for working on this!
Merging this now (follow-up PRs to do further improvements are always welcome!)

The appveyor failure is the matplotlib one, so ignoring that one.

There is one failing doctest, but that seems a python version issue (get different results when testing that locally as well on different python version). Will do another PR to check if changing doc build to py 3.6 solves things.

Kiv pushed a commit to Kiv/pandas that referenced this pull request

Jun 11, 2017

stangirala pushed a commit to stangirala/pandas that referenced this pull request

Jun 11, 2017