Incorporate Alpha Vantage as a source of historical equity prices · Issue #176 · joshuaulrich/quantmod (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Closed
pteetor opened this issue
Jul 4, 2017
· 43 comments
Comments
Description
The Alpha Vantage web service provides real-time and historical equity data. The service is free, requiring only a free, one-time registration. The service provides daily, weekly, and monthly history for both domestic and international markets, with up to 20 years of history. For daily data, adjusted close prices are available to account for dividends and splits. The service can also provide real-time price bars at a resolution of 1 minute or higher, for up to 10 recent days. They have a clean, documented, public API that returns JSON-encoded data.
I propose incorporating Alpha Vantage into getSymbols
as a source of historical data, much like Yahoo and Google.
Expected behavior
The caller would invoke the Alpha Vantage downloader through the usual mechanism; i.e.,
getSymbols("IBM", src="av")`.
The associated function, getSymbols.av
, would have this signature.
getSymbols.av(Symbols, env, apikey,
return.class = "xts",
periodicity = "daily",
adjusted = FALSE,
interval = "1min",
outputsize = "compact",
...)
- Symbols - a character vector specifying the names of the symbols to be loaded
- env - where to create objects.
- apikey - the API key issued by Alpha Vantage when you registered (character)
- return.class - class of returned object
- periodicity - one of "daily", "weekly", "monthly", or "intraday"
- adjusted - if TRUE, include a column of closing prices adjusted for dividends and splits (daily data only)
- interval - one of "1min", "5min", "15min", "30min", or "60min" (intraday data only)
- outputsize - either "compact" or "full"
You must register with Alpha Vantage in order to download their data, but the on-time registration is fast and free. A.V. will assign an API key to you, a short string of alphanumeric characters (e.g., "FOO4").
You must provide your API key every time you call getSymbols.av
. Alternatively, you can set it globally using
setDefaults(getSymbols.av, apikey="yourKey")
The Alpha Vantage site provides daily, weekly, monthly, and intraday data. Use periodicity
to select one.
Set adjusted=TRUE
to include a column of closing prices adjusted for dividends and stock splits (available only for daily data).
The intraday data is provided as a sequence of OHLC bars. Use the interval
argument to determine the "width" of the bars: 1 minute, 5 minutes, 15 minutes, etc.
By default Alpha Vantage returns the 100 most-recent data points. Set outputsize="full"
to obtain the entire available history (which requires more download time, of course). Alpha Vantage says they provide up to 20 years of daily data: or the most recent 10 or 15 days of intraday data.
Issues and Limitations
The web service is free, but registration is required. Alpha Vantage provides an API key at registration, and that key is needed to download data.
The A.V. API returns its data via JSON. In my experience, pulling large, historical datasets can be slower than Google or Yahoo (but not crippling slow). At this point, I don't know if the bottleneck is the A.V. server itself or the encoding/decoding process for (inefficient) JSON.
Beyond historical data, the A.V. API can also provide sector performance data and technical indicators (e.g., moving averages). I am not planning to provide access to those data elements. The sector performance stuff is kind of orthogonal to quantmod
scope; and the technical indicators are already available via the TTR
package.
I believe the intraday data is available only for domestic stocks, not international; but I'm not sure.
Great idea!
Please use the current getSymbols
"methods" as a template for getSymbols.av()
. There is a getSymbols.skeleton()
, but I'm not sure if it's up-to-date. Should we consider creating one (or more?) "alias" functions? Maybe getSymbols.alphavantage()
?
Great suggestion to use setDefaults()
to store the user's API key.
quantmod already uses jsonlite for a few things (e.g. getOptionChain()
), so please take a look at that code to see how it uses jsonlite to manipulate JSON data.
This is great! Thanks.
On Tue, Jul 4, 2017 at 13:43 Joshua Ulrich ***@***.***> wrote: Great idea! Please use the current getSymbols "methods" as a template for getSymbols.av(). There is a getSymbols.skeleton(), but I'm not sure if it's up-to-date. Should we consider creating one (or more?) "alias" functions? Maybe getSymbols.alphavantage()? Great suggestion to use setDefaults() to store the user's API key. quantmod already uses jsonlite for a few things (e.g. getOptionChain()), so please take a look at that code to see how it uses jsonlite to manipulate JSON data. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#176 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA1swNwYXghpCdGeintmIRPJhos5TGBBks5sKofWgaJpZM4ONmTL> .
Thanks, Josh and Jeff, for the encouragement and suggestions.
Yes, getSymbols.skeleton
is incomplete, I now know. I studied getSymbols.yahoo
and incorporated the logic related to getSymbolLookup
and symbol-level overrides.
I checked getOptionChain
, and I believe I'm using jsonlite
in a way consistent with it, including the call to requireNamespace
.
Finally, I did create an alias, getSymbols.alphavantage
.
You can see the updated code in the feature branch of my fork.
It would be great if we can have adjusted prices outside of Yahoo which is not reliable.
It would be great if we can have a function with the same features of getSymbols using a different data source.
I tried to use getSymbols.alphavantage in place of getSymbols and it did not work. Is there a way to have it fully replaceable?
Thank you
getSymbol.av and its expected behavior would be ideal. I do not have teh skill set to do it. Just a user! My apologies
@anozari , getSymbols is a wrapper which calls methods for different data sources. by default, it has historically called Yahoo.
Of course, the breadth, history, and quality of yahoo data are now much worse.
For some things, src='google' will provide a near-equivalent of the old yahoo data. For other things, no free source exists.
The alphavantage code is brand new, and probably needs more testing to make it robust. Saying 'it did not work' does not provide enough information for anyone to help you diagnose your problem.
Understood. Thank you.
I was not complaining. I naively used getSymbols.alphavantage instead of getSymbols and gave me an error. I probably should have not done that.
Thank you for your suggestion.
I used alphavantage today into an R code. I have found no issues to report. It does look very promising. Thank you.
Brian had a very valid point about alphavantage. I have been using it for a week. I have found serious quality issues with their data. I am sure that over time they will get better and would be to get the kinks out. But, for now their data is not robust enough.
@anozari Thanks for the testing and feedback. Can you summarize what problems you found with the data? I would like to replicate the problems if possible and study them.
They are working very hard to fix their problems. But their data quality does not appear to be robust. For a few days they had issues for SPY. They had prices for some holidays -- including Christmas. And, they just fixed it. They seem to have problems with some mutual funds. For example, if you look at PTTRX, you will see data on holidays -- Like April 30, 2017. These are simple things that I noticed while comparing long series against each other. I noticed that the dates are out of sync. There might be other issues. I was surprised that they did not know about these things. So, their quality assurance may not be as robust. I am just guessing
I think their processes are just new and over time things will only get better. They will get the kinks out. Good folks!
Thanks for the detailed description of the problems. They are pretty serious. We can keep an eye on AlphaVantage and watch for improvements.
Thank you @pteetor for proposing the quantmod-alphavantage integration!
We are members of the product development team at Alpha Vantage Inc., and would like to share the following updates with the community:
- All the issues mentioned above by Mr. Nozari (anozari) were resolved shortly after they surfaced, and we have confirmed with the original poster on this matter.
- Alpha Vantage now has a dedicated quality assurance algorithm that operates on a 24/7 basis.
- In addition to JSON, we have also enabled support for the CSV (comma-separated-value) format (here is an example). The performance benchmarks for both the JSON and CSV formats are now below 500 milliseconds at 25%-75% percentile.
Many thanks again - we appreciate the continued feedback and support from the community.
Excellent post, @pteetor! Apparently there are a number of libraries built on top of Alpha Vantage on GitHub. Below is a review article on Alpha Vantage (written by a Python developer) that may provide some additional context on the Alpha Vantage API.
Source: RomelTorres/alpha_vantage#13
Alpha Vantage API Review from a Python Developer
The Experience
I was one of the "yahoo refugees" who stumbled upon Alpha Vantage via Google search. My first interactions with their website could be summarized by the good-old term WYSIWYG (what-you-see-is-what-you-get). Their documentation promised three things: (1) time series data with various resolutions, (2) over 50 technical signals, (3) sector data, and they kept their promise by providing demo URLs for all the said API functions.
They promised free API keys, and they also delivered. They promised "no promotional materials to your inbox," and indeed the only email I got from them so far was the announcement email for their new CSV feature.
This being said, there are a couple areas they could optimize upon.
- Be more specific about their business model. Being "100% free" is good, but it can also be a bit scary, especially for a yahoo refugee like me.
- Add multi-symbol support so that one can query multiple stocks with a single API call.
- Their FAQ recommends "~200 API calls per minute." It would be great if they can set a hard limit on the call volume to prevent client-side abuses in the form of ultra-high-frequency requests.
The Data
Of the thousands of US-based equities I have analyzed so far, their historical data and technical indicators seem to match other reputable data sources. Their intraday data is realtime up to the current minute, which is fine for my research purposes but may not satisfy users who want to beat the market with millisecond precision. Perhaps a premium feature for this down the road?
Their JSON output is easily readable and python-parsable. For the daily time series, however, I understand that their most recent data point is the cumulative information of the current trading day (updated realtime), but why is the first timestamp in YYYY-MM-DD HH:MM:SS format while all the others are in the normal YYYY-MM-DD format typical of the EOD data?
"Meta Data": {
"1. Information": "Daily Prices (open, high, low, close) and Volumes",
"2. Symbol": "MSFT",
"3. Last Refreshed": "2017-08-18 16:00:00",
"4. Output Size": "Compact",
"5. Time Zone": "US/Eastern"
},
"Time Series (Daily)": {
"2017-08-18 16:00:00": {
"1. open": "72.2700",
"2. high": "72.8400",
"3. low": "71.9300",
"4. close": "72.4900",
"5. volume": "18215276"
},
"2017-08-17": {
"1. open": "73.5800",
"2. high": "73.8700",
"3. low": "72.4000",
"4. close": "72.4000",
"5. volume": "21834250"
},
I would love to see a consistent YYYY-MM-DD format across all the timestamps. The "last refreshed" timestamp can be specified in Meta Data instead:
"Meta Data": {
"1. Information": "Daily Prices (open, high, low, close) and Volumes",
"2. Symbol": "MSFT",
"3. Last Refreshed": "2017-08-18 16:00:00",
"4. Output Size": "Compact",
"5. Time Zone": "US/Eastern"
},
"Time Series (Daily)": {
"2017-08-18": {
"1. open": "72.2700",
"2. high": "72.8400",
"3. low": "71.9300",
"4. close": "72.4900",
"5. volume": "18215276"
},
"2017-08-17": {
"1. open": "73.5800",
"2. high": "73.8700",
"3. low": "72.4000",
"4. close": "72.4000",
"5. volume": "21834250"
},
In addition to the data presentation aspects, below are couple other data-related proposals:
- Expand CSV support to all API functions. CSV is currently enabled only for the time series APIs.
- Make error messages more informative for debugging purposes.
In Summary
It is always a pleasure to have an API service that is well documented, platform/language-agnostic, and easily integratable. The fact that we have several third-party libraries built on top of Alpha Vantage on GitHub is in a sense testament to its developer-friendly nature. While there is still room for them to become a better version of themselves, I hope they thrive and stay true to the description on their home page - "driven by rigorous research, cutting edge technology, and a disciplined focus on democratizing access to data."
was just about to cruft up a version of this when I stumbled upon this fork. Any plans to merge this into main anytime soon? Anything i can do to help?
@ebs238 I do plan to merge it. @pteetor has done an excellent job. I'm the roadblock.
Thanks for the great work gents. I plan on doing data comparisons with my existing database over the next week or 2. will post my findings here.
Also, I've asked them to add an API to replace yahoos getQuote. That's my last current dependency on yahoo
This all sounds good and heading in the right direction. I chuckle at Josh tagging himself as a "roadblock". The guy is prolific. As far as I'm concerned, the rest of the world is the roadblock, trying to catch up with *him*. I download AV's data nightly for a dozen ETFs. I am seeing a small data problem: duplicated last row. Haven't reported this to the vendor yet because I haven't created a minimal reproducible example yet! Could be my code. Could be b/c I download shortly after midnight (Eastern time), when the calendar is switching over. We'll see.
joshuaulrich added a commit that referenced this issue
Since Paul's initial version, Alpha Vantage has added the option to download the data via CSV. Add the data.type argument so users can specify which source they prefer.
Note that the JSON API provides timezone information that is not included in the CSV data. That's a reason to prefer the JSON API.
See #176.
Has anyone had any interaction with Alpha Vantage other than the one comment here? They offer support by email on their web site. I wrote them a couple of days ago, but haven't had a reply.
My question was about support for the TSX. By trial and error, requests like TSX:POT will get intraday quotes for regular stocks. I haven't found how to get current quotes for ETFs or REITs.
Hi i'm a refugee from Yahoo Finance Api, like many other. I'm interested in Alpha Vantage, it seems promising. I have a question : Is the EU stocks (ETFplus) market covered? I refer to Italy Stock Market, Milan Stock Exchange. I searched but found nothing. Thank you in advance for infos.
- I have no idea, ask AlphaVantage. quantmod is not AlphaVantage.
- Others have had some luck using the exchange codes published by NASDAQ
- This is not a general help forum. It is for reporting bugs or making feature requests to quantmod.
@braverock
Thank you. Surely i will ask to AV.
Sorry for off topic question.
Why I can't apply for API key?
@lishixin7 , Alpha Vantage is not quantmod. You need to get the API key from them.
Has anyone seen a way to query a date range, it appears to be 100 rows or everything. I was able to do this on the Yahoo query.
Correct. The underlying API provided by AlphaVantage provides only two choices ("compact"
or "full"
) and does not support partial queries (unlike the Yahoo API). So I did not code the AlphaVantage downloader to accept start and end dates. That did not strike me as problematic because the returned xts
matrix supports so many methods for subsetting by date.
If you'd enjoy the benefits of having the downloader window the data for you, please feel free to contribute that feature. Thank you!
This is the second link on google when you type in alpha vantage and data quality was discussed here so I thought I would post. For OTC stocks their data can be a bit wonkish. Ex: FMBM has a volume of -23 on 2017-08-24. There are a couple other OTC stocks I've seen with negative or humongous adjusted closing prices.
FTR, I don't remember seeing any non-OTC stocks from alpha vantage where the data was rough like this.
I know this isn't the right place to post about this, but again it's one of the first places people will when they google alpha vantage. Some more odd numbers
Ticker | dates | Exchange | Open | High | Low | Close | Volume | Adjusted | high_low |
---|---|---|---|---|---|---|---|---|---|
CXPO | 2013-01-02 | NASDAQ | 2.740 | 581.96 | 2.740 | 581.960 | 0 | 581.960 | 579.220 |
ABK | 2015-08-21 | NASDAQ | 366.510 | 377.50 | 1.640 | 1.640 | 0 | 1.640 | 375.860 |
CXPO | 2011-09-06 | NASDAQ | 364.860 | 364.86 | 2.550 | 2.590 | 140522 | 2.590 | 362.310 |
ABK | 2015-03-20 | NASDAQ | 340.000 | 342.01 | 1.550 | 1.550 | 0 | 1.550 | 340.460 |
ABK | 2015-07-06 | NASDAQ | 331.500 | 331.51 | 0.365 | 0.365 | 128408 | 0.365 | 331.145 |
ABK | 2015-08-07 | NASDAQ | 331.800 | 331.80 | 1.640 | 1.640 | 0 | 1.640 | 330.160 |
ABK | 2014-09-02 | NASDAQ | 0.425 | 285.02 | 0.420 | 277.000 | 295963 | 277.000 | 284.600 |
ABK | 2014-09-05 | NASDAQ | 285.020 | 285.02 | 1.790 | 1.790 | 0 | 1.790 | 283.230 |
I tried to email them about discrepancies but they have not gotten back to me
Most of their data is fine, but when it's not, you're on your own to both detect and fix. Let's just say that their data quality is questionable and they are unresponsive and leave it at that.
some examples for the record (these have all been raised as issues to them with no response):
- historic data on INVA
- handling of the APTV/DLPH transaction. makes me suspect all their ticker change and split processing
- historic data on XBKS
as with many things, you get what you pay for on this one.
Can we change the Time zone to another one ?
Which time zone? The time zone of the data? We use the time zone given by the vendor, Alpha Vantage. It wouldn't make sense for us to override the reported time zone. You can change the time zone yourself by using R's facilities for working with dates and times.
I was loving what I found from Alpha Vantage (I am doing some historical analysis) and I found that with CTXS they have all the values wrong from 1999 to 2017-01-31, when it FINALLY suddenly corrects itself:
2017-02-01,74.2700,75.0000,70.2400,71.2400,70.2453,5760300,0.0000,1.2558
2017-01-31,91.0500,91.2600,89.7881,91.1900,71.6012,2795000,0.0000,1.0000
Yahoo has:
2017-02-01,74.269997,75.000000,70.239998,71.239998,70.771149,5760300
2017-01-31,72.503586,72.670807,71.500237,72.615067,72.137161,2225700
That's enough proof to discourage me from using this service :'( as it is years and years of wrong data!!!
^check out the tiingo api, I find their data pretty consistent
Thanks Steve - Similar deviation for Tiingo... (A bit harder to read!)
{"date":"2017-01-31T00:00:00.000Z","close":91.19,"high":91.26,"low":89.79,"open":91.05,"volume":2225716,"adjClose":56.8009798175,"adjHigh":56.8445818418,"adjLow":55.9289393335,"adjOpen":56.7137757691,"adjVolume":3523697,"divCash":0.0,"splitFactor":1.0},{"date":"2017-02-01T00:00:00.000Z","close":71.24,"high":75.0,"low":70.24,"open":74.27,"volume":57603,"adjClose":70.2524423293,"adjHigh":73.9603196898,"adjLow":69.2663047335,"adjOpen":73.2404392448,"adjVolume":57603,"divCash":18.57158,"splitFactor":1.2558},
$18 dividend? I'd remember that one!!!
looks like a split issue and CTXS has had many splits. are you looking for adjusted or unadjusted data.
> getSplits("CTXS")
CTXS.spl
1996-06-05 0.5000000
1998-02-23 0.6666667
1999-03-26 0.5000000
2000-02-17 0.5000000
2017-02-01 0.7963051
You can specify which you want with the relevant parameter:getSymbols("CTXS", adjust = F, src="tiingo")
.getSymbols("CTXS", adjusted = F, src="av")
.
Also:
yahoo has many issues and is pretty far from a gold standard IMO
tiingo support is fantastic. They are very committed to data quality and I have consistently had excellent interactions with them.
Not sure if @tiingo is checked too often but yeah if you email tiingo they are pretty quick to figure out a fix for thing sort of thing. Had some v weird pennystock data they managed to fixup a while back
@ethanbsmith
I think you nailed it! https://www.splithistory.com/ctxs/: "This was a 12558 for 10000 split, meaning for each 10000 shares of CTXS owned pre-split, the shareholder now owned 12558 shares. For example, a 12000 share position pre-split, became a 15069.6 share position following the split." But since I am in Australia and it's way past my bedtime, I'll run an analysis tomorrow but the maths adds up. By the way, this place looks very good for API sources: www.worldtradingdata.com.
Vendor data quality issues are kind of off topic for this ticket. To try to wrap up the digression, my experience is that all vendors occasionally have problems. Older data is of correspondingly lower quality (fewer eyeballs). Cheaper data is of correspondingly lower quality too (less/zero leverage on your vendor to fix it).
Basically, you get what you pay for. If issues are found in paid data sources, you need to report them and work with the vendor to fix them. Data quality is always an issue. "Clean" data sources are extremely expensive, with services like CRSP/Compustat or Reuters or Bloomberg or Factset costing hundreds or thousands of dollars per month. Those costs come with corresponding increases in base data quality, and with dedicated tech support who can usually resolve issues when they inevitably arise.
As for adjusted data, it is useful for getting adjusted returns. Adjusted prices have all sorts of issues when considering analysis, and those issues compound the more splits and adjustments there are going back through time.
Hi @ozchamo
Normally we try not to speak too much in github issues, but you can always E-mail our support team at support@tiingo.com
In the case of Citrix (CTXS), it appears as if we receive conflicting information. The Feb 2017 split/distribution has to do with the spin-off of GoTo. Generally we reflect spinoffs as special stock dividends when possible. This is because splits should generally mean that the entity has not changed. That share prices are just now $X/split factor. In the case of a spin-off, splits should not be the standard because the shares now mean something fundamentally different than pre-spinoff. A stock dividend is a more accurate way to describe the spin-off - especially if we have to condense it to a table. In the case of Citrix, it appears as if we were passed a dissemination message of a split. In recent corp events we check for this (split disseminations instead of stock dividend), to help prevent this issue - hence why it happens in rare, specific situations like this one. While all vendors have an error term, I want to make sure all users who report them are heard. When we hear of an error we do extensive checks to ensure it does not appear elsewhere and add a check if possible. Given the weird events that often happen, sometimes fully automated checks are not possible. Nonetheless, if you ever reach out to the support team, we take all error reports seriously. I modeled Tiingo to be the data firm that I myself always wanted. Anyway, you should see CTXS now corrected - thanks for the report.
And @braverock - generally I agree with you (that you get what you pay for), but we are working hard to change that. We want users to get far more than what they pay for - and many people in this thread have been instrumental in that.
Thank you all. Much appreciated. And I apologise to both Tilingo and Alpha Vantage: They do reflect the adjustment through the "splitFactor":1.2558. And they both agree on that amount. Good work!
Hi, how can I get 5 years of a time series ??
I use, for instance, to get ExxonMobil data (XOM):
key = MyAPIKey
ts = TimeSeries(key, output_format='pandas')
ti = TechIndicators(key)
xom_data, xom_meta_data = ts.get_daily(symbol='XOM')
How can i get 5 years of data instead of the default 100 last days
Hi, please ask your question on Stackoverflow or a similar site. This issue tracker is only for the quantmod R package. It's not a general purpose support forum for the services quantmod provides access to.