PyTables dates don't work when you switch to a different time zone · Issue #2852 · pandas-dev/pandas (original) (raw)
Create a dataframe with a date (not datetime) column as the index. Save that dataframe to the HDFStore. Now open that same file on a computer in a time zone that's behind you. (IE, write the file in New York and then read the file from Dallas). Result: all the dates are off by one.
Here is a code snippet to demonstrate the problem, and also a proposed fix to the pandas code. (But someone more qualified than me can review this fix).
To repro the problem:
write a file in New York time zone
import pandas
import datetime
def writeToFile(fileName) :
dates = [ datetime.date.today() , datetime.date.today(), datetime.date.today() ]
numbers = [ 1, 2, 3 ]
p = pandas.DataFrame({ "date" : dates, "number": numbers}, columns = ['date', 'number'])
p.set_index('date', inplace=True)
store = pandas.HDFStore(fileName)
store['obj1'] = p
store.close()
Then go to a dallas time zone and read that file:
def readFromFile(fileName) :
store = pandas.HDFStore(fileName)
return store['obj1']
Notice that the dates on the dataframe now show YESTERDAY when they ought to show today.
Proposed fix in pandas.io.pytables.py:
def _convert_index(index):
...
elif inferred_type == 'date':
#OLD LINE: CHANGE THIS LINE
#converted = np.array([time.mktime(v.timetuple()) for v in values],
# dtype=np.int32)
#NEW LINE:
converted = np.array([v.toordinal() for v in values], dtype=np.int32)
and then also change:
def _unconvert_index(data, kind):
...
elif kind == 'date':
# here we'll try from ordinal first. If the date was saved with the old
# mktime mechanism it'll throw an exception as it'll be out of bounds.
# in those cases we'll convert using the old method (with the bug!)
try:
index = np.array([date.fromordinal(v) for v in data], dtype=object)
except ValueError:
index = np.array([date.fromtimestamp(v) for v in data], dtype=object)