Updated pandas.io.data.Options by sglyon · Pull Request #2758 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation8 Commits4 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

Mostly housekeeping, but there were a few functionality additions:

BUG: The URL's for current month options have been changed by yahoo so I changed the methods to reflect that.
ENH: Now using lxml to html scraping under the hood (was using BeautifulSoup).
ENH: When Options.get_xxx_data(), where xxx is either options, call, or put, a new instance variable is created. For current month options the variable is simply self.calls or self.puts. When future month data is retrieved the ivar becomes self.callsMMYY or self.putsMMYY.
ENH: Options.get_near_stock_price now allows the user to specify optional month and year kwargs to get data near the stock price for a given expiry date.
ENH: Options.get_forward_data now has optional kwargs near and above_below. This allows the user to specify if they would like to only return forward looking data for options near the current stock price. This just obtains the data from Options.get_near_stock_price instead of Options.get_xxx_data().

…from my local test file and I just caught it

Hey @spencerlyon2 , nice work! I am looking forward to this pull request being accepted.

In the meantime, I am also working on a few incremental advances to functionality with the Yahoo! Finance API (I haven't yet submitted a PR). I wanted to see if anyone else was working in this area, came across your PR, and had a few questions, if you don't mind. :)

In your updates to pandas.io.data.Options, line 531, you call get_quote_yahoo as:

price = float(get_quote_yahoo([self.symbol])['last'])

Any plans to change the returned parameter from this function soon? I ask because, along with expanding type checking capabilities to input parameter, symbols, I'd like to include some additional data in the returned DataFrame.

Are you working on expanding any test coverage of the yahoo finance api methods (found in file /io/test/test_yahoo.py)? If so, I'd be interested to know because I also had a few ideas.

Thanks!

Edit: Reworded question for clarity.

I actually have been thinking about creating a Yahoo class that would be able to get any information easily downloadable from yahoo using one of the char + int codes attached to the url:

url = 'http://finance.yahoo.com/d/quotes.csv?s=%s&f=%s' % (self.symbol, code)

I was thinking of having the docstrings inform the user of what items they could request and then we could use a dictionary to pull the codes out and return the data they asked for.

In response to your comments:

1.) I am not sure I understand your question about line 531, but I just call get_quote_yahoo in that way because that is how the current API works.

2.) I would be happy to include some test coverage in test_yahoo. I haven't written nosetest modules before so if you have or would like to learn how feel free to do it. If not, let me know and I will dig around until I figure it out.

What are you working on as far as the yahoo finance API is concerned?

Hey @spencerlyon2, thanks for the reply. I am working on a few features and throwing in some convenience functions, namely to easily retrieve stock index components and allow for easier retrieval and creation of df with multiple stocks. Also, I'll back it all up with a few tests. I'll hope to getting around to pushing to my fork later today and later a PR, so you can see what I am doing.

While implementing these features, I also found that there was a whole lot more that can be improved on. Namely, the namespace of the pandas.io.data is getting a little too crowded. I feel it's just a matter of time before an elbow is thrown and someone gets hurt... :) Just like you, I think we could provide more structure via a Yahoo class that could house all functionality with the yahoo finance API. I am not working with options now, but like what you're doing and already see we have a few features we could streamline together. I asked about if you're planning to change your call to get_quote_yahoo, because I wished to expand on it a bit. Apologies for the confusion. Also, I like your idea of docstrings tied to dicts— improving the docstring for understanding options is something I am all about.

Conclusion: we should team up and make this happen!

If you like, let's just do one more iteration of hashing out our ideas, and after I'll submit a feature request, which we can work on. I think this feedback—from repo maintainers and the pandas community at large in giving a little insight as to how they envision development around this area advancing—would be beneficial before we get too crazy in development. For instance, I know that there isn't too much love for DataReader (see #2246 (comment)), and I understand that. I also like maintaining get_data_yahoo, which is now referenced in @wesm's Python for Data Analysis, and this could easily be done via a simple reference to the Yahoo class method which performs the same.

Let me know what you think!

@nehalecky It would be great to see what you have done.

I'm happy to work with you on enhancing this. I think the first step would be for me to see what you are doing. Then I think we can have a better idea for the design of the Yahoo class.

I will try to consolidate my ideas in a concise list here:

create Yahoo class that can retrieve the following data
- options data (just move code from the current Options into the class, or point the Yahoo class to Options)
- Any of the items from this table (this is where I was planning on having the dictionary where users could enter strings, or partial strings, for what they want and we could lookup the code and retrieve the data)
- Historical equity data for any number of tickers. Have this return a hierarchically indexed df with ticker and date as the index levels.
- Potentially scrape the html from the Key Statistics page for a given stock to provide some things that aren't accessible with any of the codes from the link above. For example, I needed dividend per share the other day and I couldn't find it with any of the codes (tried them all), so build some functions to parse the html for that data. This could easily be extended to anything else on that page.
Better organization of returned df objects, especially when called with multiple tickers. My question here is, do we instantiate the Yahoo object as a singleton-esque object that has useful methods for retrieving/organizing all this data for any ticker supplied by the user upon calling the method, or do we have each instance of the class retain a specific list of tickers that the data collection/organizing methods act upon in a batch-like fashion? As I typed the ideas out the second one sounded better to me, but we could take either approach. It sounds like you have been working on something like this already so I'd be interested in hearing what you have to say about it.

Anyway, those are some of my ideas. What would you add/take away from this?

Hey @spencerlyon2!

Sorry for the delay in the reply, busy weekend. I finally was able to get my code pushed and already submitted a PR here: #2795.

Pretty basic, most of the changes, but I think it's a start and we can use it later in the Yahoo class. I do really like your ideas regarding this class, and you can see that some of that functionality of multiple tickers I was already working towards. Right now it returns a panel, but we can change that to whatever we feel best.

Also, looking over a lot of the nice work being done with database connectivity in sql.py, (see disscussion #2482), and really liked how they're tackling many different db flavors. I was thinking it could be useful to consider for our class the possibility of future integration with different data streams beyond Yahoo (e.g.Bloomberg, TR)—a more general approach.

I don't work with either data service directly, but suspect the types of queries for financial applications are similar across the board, just as is the case with databases. This might allow someone then to simply instantiate the finance data class with whatever data flavor and behind the scenes, all is configured for interaction with the data flavor service. This could be quite a bit of work, and perhaps not even useful. I'll hope that others will comment on this.

Let me know what you think of this and when you get a moment, take at look at the code! Any suggestions are appreciated!

Thanks!

Merged. If you could make a PR to the release notes and/or documentation (including the "what's new" page) that would be great. thanks

I have written a quick summary in the "what's new" document. I'll attach to a pull request today.

The source is documented well enough that we could create class documentation directly from data.py. I know this is possible, but haven't done it before. I could spend some time looking it up, or if someone else already knows how to do this, and is up for it, I'd appreciate them doing it.

pretty sure automethod type docs are generated automatically for the entire package,
look under doc/source/generated after running doc/make.py.
But that's available in a different part of the documentation, intended more as a reference.

The rest of the docs have a tutorial style, with embedded usage examples, and is
meant to be friendler.

Sphinx Is very copy-paste friendly, you can copy and adapt an existing
piece of documentation.

Feel free to yell if you hit a snag.

@ghost ghost mentioned this pull request

Feb 17, 2013