Msgpack - ValueError: buffer source array is read-only · Issue #11880 · pandas-dev/pandas (original) (raw)

I get the Value error when processing data using pandas. I followed the following steps:

  1. convert to msgpack format with compress flag
  2. subsequently read file into a dataframe
  3. push to sql table with to_sql

On the third step i get ValueError: buffer source array is read-only.

This problem does not arise if I wrap the read_msgpack call inside a pandas.concat

Example

import pandas as pd import numpy as np

from sqlalchemy import create_engine

eng = create_engine("sqlite:///:memory:")

df1 = pd.DataFrame({ 'A' : 1., 'B' : pd.Timestamp('20130102'), 'C' : pd.Series(1,index=list(range(4)),dtype='float32'), 'D' : np.array([3] * 4,dtype='int32'), 'E' : 'foo' })

df1.to_msgpack('test.msgpack', compress='zlib') df2 = pd.read_msgpack('test.msgpack')

df2.to_sql('test', eng, if_exists='append', chunksize=1000) # throws value error

df2 = pd.cooncat([pd.read_msgpack('test.msgpack')])

df2.to_sql('test', eng, if_exists='append', chunksize=1000) # works

This happens with both blosc and zlib compression. While I have found a solution, this behaviour seems very odd and for very large files there is a small performance hit.

edit: @TomAugspurger changed the sql engine to sqlite