headers management for multiparts in tabular resource by paulgirard · Pull Request #257 · frictionlessdata/datapackage-py (original) (raw)


Overview

A multipart resource concatenates chunks as-is which is what we want for generic binary files.
But for tabular resource this default behaviour implies that if the first chunk does have one header row, the other chunks must not.
We discussed this issue here: https://github.com/frictionlessdata/forum/issues/1
The proposition is to change this behaviour for tabular-data-package: tabular chunks should all have headers or none depending on dialect.header.

Current situation with header row only in first chunck are handled as before but it raises a UserWarning as this situation will soon be deprecated.

Implementation

Multipart chunks are handled by the _MultipartSource class which build an iterator of chunks' rows iterator. My approach is simply to discard first row of chunks (but the first) iterator when the resource is tabular with header.
@roll pointed possible issues with:

Previously posted at #256

Please preserve this line to notify @roll (lead of this repository)