unexpected behavior when assigning data to a multicolumn dataframe · Issue #5508 · pandas-dev/pandas (original) (raw)

Using the latest git version of pandas (705b677) I have exerienced the
following problems:

In [1]: a = pd.DataFrame(index=pd.Index(xrange(1,11)))

In [2]: a['foo'] = np.zeros(10, dtype=np.float)

In [3]: a['bar'] = np.zeros(10, dtype=np.complex)

In [4]: a.ix[2:5, 'bar'] Out[4]: 2 0j 3 0j 4 0j 5 0j Name: bar, dtype: complex128

In [5]: a.ix[2:5, 'bar'] = np.array([2.33j, 1.23+0.1j, 2.2])

invalid input (RHS has wrong size) -> does not throw an exception!

(The reason why no exception is thrown is because of the different

dtype of a['foo'] - see In[9]-In[10]

In [6]: a Out[6]: foo bar 1 0 0j 2 0 2.33j 3 0 2.33j 4 0 2.33j 5 0 2.33j 6 0 0j 7 0 0j 8 0 0j 9 0 0j 10 0 0j

In [7]: a.ix[2:5, 'bar'] = np.array([2.33j, 1.23+0.1j, 2.2, 1.0]) # valid

In [8]: a Out[8]: foo bar 1 0 0j 2 0 2.33j 3 0 (1.23+0.1j) 4 0 (2.2+0j) 5 0 (1+0j) 6 0 0j 7 0 0j 8 0 0j 9 0 0j 10 0 0j

In [9]: a = pd.DataFrame(index=pd.Index(xrange(1,11)))

In [10]: a['bar'] = np.zeros(10, dtype=np.complex)

In [11]: a.ix[2:5, 'bar'] = np.array([2.33j, 1.23+0.1j, 2.2]) # invalid RHS-> exception raised OK!

ValueError Traceback (most recent call last) in () ----> 1 a.ix[2:5, 'bar'] = np.array([2.33j, 1.23+0.1j, 2.2])

/home/thomas/.local/lib/python2.7/site-packages/pandas-0.12.0_1098_g705b677-py2.7-linux-x86_64.egg/pandas/core/indexing.pyc in setitem(self, key, value) 92 indexer = self._convert_to_indexer(key, is_setter=True) 93 ---> 94 self._setitem_with_indexer(indexer, value) 95 96 def _has_valid_type(self, k, axis):

/home/thomas/.local/lib/python2.7/site-packages/pandas-0.12.0_1098_g705b677-py2.7-linux-x86_64.egg/pandas/core/indexing.pyc in _setitem_with_indexer(self, indexer, value) 387 value = self._align_panel(indexer, value) 388 --> 389 self.obj._data = self.obj._data.setitem(indexer,value) 390 self.obj._maybe_update_cacher(clear=True) 391

/home/thomas/.local/lib/python2.7/site-packages/pandas-0.12.0_1098_g705b677-py2.7-linux-x86_64.egg/pandas/core/internals.pyc in setitem(self, *args, **kwargs) 2182 2183 def setitem(self, *args, **kwargs): -> 2184 return self.apply('setitem', *args, **kwargs) 2185 2186 def putmask(self, *args, **kwargs):

/home/thomas/.local/lib/python2.7/site-packages/pandas-0.12.0_1098_g705b677-py2.7-linux-x86_64.egg/pandas/core/internals.pyc in apply(self, f, *args, **kwargs) 2162 2163 else: -> 2164 applied = getattr(blk, f)(*args, **kwargs) 2165 2166 if isinstance(applied, list):

/home/thomas/.local/lib/python2.7/site-packages/pandas-0.12.0_1098_g705b677-py2.7-linux-x86_64.egg/pandas/core/internals.pyc in setitem(self, indexer, value) 580 try: 581 # set and return a block --> 582 values[indexer] = value 583 584 # coerce and try to infer the dtypes of the result

ValueError: could not broadcast input array from shape (3) into shape (4)

Here is the 2nd problem:

In [1]: b = pd.DataFrame(index=pd.Index(xrange(1,11)))

In [2]: b['foo'] = np.zeros(10, dtype=np.float)

In [3]: b['bar'] = np.zeros(10, dtype=np.complex)

In [4]: b Out[4]: foo bar 1 0 0j 2 0 0j 3 0 0j 4 0 0j 5 0 0j 6 0 0j 7 0 0j 8 0 0j 9 0 0j 10 0 0j

In [5]: b[2:5] Out[5]: foo bar 3 0 0j 4 0 0j 5 0 0j

In [6]: b[2:5] = np.arange(1,4)*1j

invalid input (wrong size on RHS)

In [7]: b Out[7]: foo bar 1 0j 0j 2 0j 0j 3 1j 2j 4 1j 2j 5 1j 2j 6 0j 0j 7 0j 0j 8 0j 0j 9 0j 0j 10 0j 0j

why does the expression in In [6] change the dtype of b['foo'].

Is this intended ?

In [8]: b[2:5] = np.arange(1,4)*1j # invalid input (wrong size on RHS)

ValueError Traceback (most recent call last) in () ----> 1 b[2:5] = np.arange(1,4)*1j # invalid input (wrong size on RHS)

/home/thomas/.local/lib/python2.7/site-packages/pandas-0.12.0_1098_g705b677-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in setitem(self, key, value) 1831 indexer = _convert_to_index_sliceable(self, key) 1832 if indexer is not None: -> 1833 return self._setitem_slice(indexer, value) 1834 1835 if isinstance(key, (Series, np.ndarray, list)):

/home/thomas/.local/lib/python2.7/site-packages/pandas-0.12.0_1098_g705b677-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in _setitem_slice(self, key, value) 1842 1843 def _setitem_slice(self, key, value): -> 1844 self.ix._setitem_with_indexer(key, value) 1845 1846 def _setitem_array(self, key, value):

/home/thomas/.local/lib/python2.7/site-packages/pandas-0.12.0_1098_g705b677-py2.7-linux-x86_64.egg/pandas/core/indexing.pyc in _setitem_with_indexer(self, indexer, value) 387 value = self._align_panel(indexer, value) 388 --> 389 self.obj._data = self.obj._data.setitem(indexer,value) 390 self.obj._maybe_update_cacher(clear=True) 391

/home/thomas/.local/lib/python2.7/site-packages/pandas-0.12.0_1098_g705b677-py2.7-linux-x86_64.egg/pandas/core/internals.pyc in setitem(self, *args, **kwargs) 2182 2183 def setitem(self, *args, **kwargs): -> 2184 return self.apply('setitem', *args, **kwargs) 2185 2186 def putmask(self, *args, **kwargs):

/home/thomas/.local/lib/python2.7/site-packages/pandas-0.12.0_1098_g705b677-py2.7-linux-x86_64.egg/pandas/core/internals.pyc in apply(self, f, *args, **kwargs) 2162 2163 else: -> 2164 applied = getattr(blk, f)(*args, **kwargs) 2165 2166 if isinstance(applied, list):

/home/thomas/.local/lib/python2.7/site-packages/pandas-0.12.0_1098_g705b677-py2.7-linux-x86_64.egg/pandas/core/internals.pyc in setitem(self, indexer, value) 580 try: 581 # set and return a block --> 582 values[indexer] = value 583 584 # coerce and try to infer the dtypes of the result

ValueError: could not broadcast input array from shape (3) into shape (3,2)