DOC: example of DataFrame export to HDF5 and import into R · Issue #9636 · pandas-dev/pandas (original) (raw)
When searching the web I didn't find any examples of a working pandas to R data transfer using HDF5 files, even though pandas's documentation mentions the used HDF5 format "can easily be imported into R using the rhdf5 library". The pandas export works as expected and I inspected the file format using the HDF group's viewer (HDFView).
After some experimentation I have a working sample for dataframe export from Python/pandas and import into R, which could be added to the documentation to help future users:
Example of HDF5 export for R
import numpy as np import pandas as pd
np.random.seed(1) df = pd.DataFrame({"first": np.random.rand(100), "second": np.random.rand(100), "class": np.random.randint(0, 2, (100,))}, index=range(100))
print(df.head())
store = pd.HDFStore("transfer.hdf5", "w", complib=str("zlib"), complevel=5) store.put("dataframe", df, data_columns=df.columns) store.close()
Output:
class first second
0 0 0.417022 0.326645
1 0 0.720324 0.527058
2 1 0.000114 0.885942
3 1 0.302333 0.357270
4 1 0.146756 0.908535
Load values and column names for all datasets from corresponding nodes and
insert them into one data.frame object.
library(rhdf5)
loadhdf5data <- function(h5File) {
listing <- h5ls(h5File)
Find all data nodes, values are stored in *_values and corresponding column
titles in *_items
data_nodes <- grep("_values", listing$name) name_nodes <- grep("_items", listing$name)
data_paths = paste(listing$group[data_nodes], listing$name[data_nodes], sep = "/") name_paths = paste(listing$group[name_nodes], listing$name[name_nodes], sep = "/")
columns = list() for (idx in seq(data_paths)) { data <- data.frame(t(h5read(h5File, data_paths[idx]))) names <- t(h5read(h5File, name_paths[idx])) entry <- data.frame(data) colnames(entry) <- names columns <- append(columns, entry) }
data <- data.frame(columns)
return(data) }
Now you can import the DataFrame:
data = loadhdf5data("transfer.hdf5") head(data) first second class 1 0.4170220047 0.3266449 0 2 0.7203244934 0.5270581 0 3 0.0001143748 0.8859421 1 4 0.3023325726 0.3572698 1 5 0.1467558908 0.9085352 1 6 0.0923385948 0.6233601 1
I hope this helps someone. :-)