R and pandas data frames — rpy2 3.5.13 documentation (original) (raw)
from functools import partial from rpy2.ipython import html html.html_rdataframe=partial(html.html_rdataframe, table_class="docutils")
R data.frame
and :class:pandas.DataFrame
objects share a lot of conceptual similarities, and :mod:pandas
chose to use the class nameDataFrame
after R objects.
In a nutshell, both are sequences of vectors (or arrays) of consistent length or size for the first dimension (the “number of rows”). if coming from the database world, an other way to look at them is column-oriented data tables, or data table API.
rpy2 is providing an interface between Python and R, and a convenience conversion layer between :class:rpy2.robjects.vectors.DataFrame
and :class:pandas.DataFrame
objects, implemented in :mod:rpy2.robjects.pandas2ri
.
import pandas as pd import rpy2.robjects as ro from rpy2.robjects.packages import importr from rpy2.robjects import pandas2ri
From pandas
to R
¶
Pandas data frame:
pd_df = pd.DataFrame({'int_values': [1,2,3], 'str_values': ['abc', 'def', 'ghi']})
pd_df
int_values | str_values | |
---|---|---|
0 | 1 | abc |
1 | 2 | def |
2 | 3 | ghi |
R data frame converted from a pandas
data frame:
with (ro.default_converter + pandas2ri.converter).context(): r_from_pd_df = ro.conversion.get_conversion().py2rpy(pd_df)
r_from_pd_df
R/rpy2 DataFrame (3 x 2)
int_values | str_values |
---|---|
... | ... |
The conversion is automatically happening when calling R functions. For example, when calling the R function base::summary
:
base = importr('base')
with (ro.default_converter + pandas2ri.converter).context(): df_summary = base.summary(pd_df) print(df_summary)
int_values str_values Min. :1.0 Length:3 1st Qu.:1.5 Class :character Median :2.0 Mode :character Mean :2.0 3rd Qu.:2.5 Max. :3.0
Note that a ContextManager
is used to limit the scope of the conversion. Without it, rpy2 will not know how to convert a pandas data frame:
try: df_summary = base.summary(pd_df) except NotImplementedError as nie: print('NotImplementedError:') print(nie)
NotImplementedError: Conversion 'py2rpy' not defined for objects of type '<class 'pandas.core.frame.DataFrame'>'
From R
to pandas
¶
Starting from an R data frame this time:
r_df = ro.DataFrame({'int_values': ro.IntVector([1,2,3]), 'str_values': ro.StrVector(['abc', 'def', 'ghi'])})
r_df
R/rpy2 DataFrame (3 x 2)
int_values | str_values |
---|---|
... | ... |
It can be converted to a pandas data frame using the same converter:
with (ro.default_converter + pandas2ri.converter).context(): pd_from_r_df = ro.conversion.get_conversion().rpy2py(r_df)
pd_from_r_df
int_values | str_values | |
---|---|---|
1 | 1 | abc |
2 | 2 | def |
3 | 3 | ghi |
Date and time objects¶
pd_df = pd.DataFrame({ 'Timestamp': pd.date_range('2017-01-01 00:00:00', periods=10, freq='s') })
pd_df
Timestamp | |
---|---|
0 | 2017-01-01 00:00:00 |
1 | 2017-01-01 00:00:01 |
2 | 2017-01-01 00:00:02 |
3 | 2017-01-01 00:00:03 |
4 | 2017-01-01 00:00:04 |
5 | 2017-01-01 00:00:05 |
6 | 2017-01-01 00:00:06 |
7 | 2017-01-01 00:00:07 |
8 | 2017-01-01 00:00:08 |
9 | 2017-01-01 00:00:09 |
with (ro.default_converter + pandas2ri.converter).context(): r_from_pd_df = ro.conversion.py2rpy(pd_df)
r_from_pd_df
R/rpy2 DataFrame (10 x 1)
Timestamp |
---|
... |
The timezone used for conversion is the system’s default timezone unlessrpy2.robjects.vectors.default_timezone
is specified… or unless the time zone is specified in the original time object:
pd_tz_df = pd.DataFrame({ 'Timestamp': pd.date_range('2017-01-01 00:00:00', periods=10, freq='s', tz='UTC') })
with (ro.default_converter + pandas2ri.converter).context(): r_from_pd_tz_df = ro.conversion.py2rpy(pd_tz_df)
r_from_pd_tz_df
R/rpy2 DataFrame (10 x 1)
Timestamp |
---|
... |