Merge branch 'cpcloud_read_html' · pandas-dev/pandas@6518c79 (original) (raw)

Provide feedback

Appearance settings

y-p

committed

Merge branch 'cpcloud_read_html'

* cpcloud_read_html: DOC: update RELEASE.rst ENH: add ability to read html tables directly into DataFrames

lines changed

lines changed

Lines changed: 3 additions & 1 deletion

Original file line number	Diff line number	Diff line change
@@ -30,7 +30,8 @@ pandas 0.11.1
30	30
31	31	New features
32	32
33		- -
	33	+ - pd.read_html() can now parse HTML string, files or urls and return dataframes
	34	+ courtesy of @cpcloud. (GH3477_)
34	35
35	36	Improvements to existing features
36	37
@@ -88,6 +89,7 @@ pandas 0.11.1
88	89	.. _GH3437: https://github.com/pydata/pandas/issues/3437
89	90	.. _GH3455: https://github.com/pydata/pandas/issues/3455
90	91	.. _GH3457: https://github.com/pydata/pandas/issues/3457
	92	+.. _GH3477: https://github.com/pydata/pandas/issues/3457
91	93	.. _GH3461: https://github.com/pydata/pandas/issues/3461
92	94	.. _GH3468: https://github.com/pydata/pandas/issues/3468
93	95	.. _GH3448: https://github.com/pydata/pandas/issues/3448

Lines changed: 2 additions & 0 deletions

Original file line number	Diff line number	Diff line change
@@ -75,6 +75,8 @@ if ( ! $VENV_FILE_AVAILABLE ); then
75	75	pip install $PIP_ARGS xlrd>=0.9.0
76	76	pip install $PIP_ARGS 'http://downloads.sourceforge.net/project/pytseries/scikits.timeseries/0.91.3/scikits.timeseries-0.91.3.tar.gz?r='
77	77	pip install $PIP_ARGS patsy
	78	+ pip install $PIP_ARGS lxml
	79	+ pip install $PIP_ARGS beautifulsoup4
78	80
79	81	# fool statsmodels into thinking pandas was already installed
80	82	# so it won't refuse to install itself. We want it in the zipped venv

Lines changed: 7 additions & 0 deletions

Original file line number	Diff line number	Diff line change
@@ -50,6 +50,13 @@ File IO
50	50	read_csv
51	51	ExcelFile.parse
52	52
	53	+.. currentmodule:: pandas.io.html
	54	+
	55	+.. autosummary::
	56	+:toctree: generated/
	57	+
	58	+ read_html
	59	+
53	60	HDFStore: PyTables (HDF5)
54	61	~~~~~~~~~~~~~~~~~~~~~~~~~
55	62	.. currentmodule:: pandas.io.pytables

Lines changed: 6 additions & 0 deletions

Original file line number	Diff line number	Diff line change
@@ -99,6 +99,12 @@ Optional Dependencies
99	99	* `openpyxl http://packages.python.org/openpyxl/\`__, `xlrd/xlwt http://www.python-excel.org/\`__
100	100	* openpyxl version 1.6.1 or higher
101	101	* Needed for Excel I/O
	102	+ * `lxml http://lxml.de\`__, or `Beautiful Soup 4 http://www.crummy.com/software/BeautifulSoup\`__: for reading HTML tables
	103	+ * The differences between lxml and Beautiful Soup 4 are mostly speed (lxml
	104	+ is faster), however sometimes Beautiful Soup returns what you might
	105	+ intuitively expect. Both backends are implemented, so try them both to
	106	+ see which one you like. They should return very similar results.
	107	+ * Note that lxml requires Cython to build successfully
102	108
103	109	.. note::
104	110

Lines changed: 3 additions & 0 deletions

Original file line number	Diff line number	Diff line change
@@ -12,9 +12,12 @@ API changes
12	12
13	13	Enhancements
14	14	~~~~~~~~~~~~
	15	+ - pd.read_html() can now parse HTML string, files or urls and return dataframes
	16	+ courtesy of @cpcloud. (GH3477_)
15	17
16	18	See the `full release notes
17	19	https://github.com/pydata/pandas/blob/master/RELEASE.rst`__ or issue tracker
18	20	on GitHub for a complete list.
19	21
20	22	.. _GH2437: https://github.com/pydata/pandas/issues/2437
	23	+.. _GH3477: https://github.com/pydata/pandas/issues/3477

Lines changed: 1 addition & 0 deletions

Original file line number	Diff line number	Diff line change
@@ -33,6 +33,7 @@
33	33	read_fwf, to_clipboard, ExcelFile,
34	34	ExcelWriter)
35	35	from pandas.io.pytables import HDFStore, Term, get_store, read_hdf
	36	+from pandas.io.html import read_html
36	37	from pandas.util.testing import debug
37	38
38	39	from pandas.tools.describe import value_range