Vectorize pandas.Series.apply by aman-thakral · Pull Request #355 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation4 Commits5 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
When profiling my code I noticed that numpy.vectorize was a bit faster than pandas.Series.apply.
Execution Times:
numpy.vectorize: 3.3 seconds
pandas.Series.apply: 4.4 seconds.
hey, be careful to add features in branches and not the master branch :) I will take a look at the apply patch, looks like a no brainer. You may consider resetting your fork to be exactly wesm/master since you've got a couple of merge commits
Definitely a good idea for me to reset my fork. Also, I guess the same reasoning could apply to Series.map. Unless you have some use case distinctions for map and apply in mind (besides map also accepting a dict). It would be good to document this distinction as well :).
np.vectorize has some string handling problems (changing Series.map and apply to vectorize caused unit test failures). I'll patch this separately very quickly (maybe writing a Cython function) and make sure the speed is comparable
wesm added a commit that referenced this pull request
see above commit...added a new cython function that's about 25% faster than np.vectorize and gets the types right. thanks for pointing this out!
dan-nadler pushed a commit to dan-nadler/pandas that referenced this pull request
Update .travis.yml
Update setup.py
dan-nadler pushed a commit to dan-nadler/pandas that referenced this pull request