Re: csplit feature request: allow a user to specify which pieces to outp (original) (raw)


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


From: Pádraig Brady
Subject: Re: csplit feature request: allow a user to specify which pieces to output
Date: Wed, 12 Mar 2014 09:22:45 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2

On 03/12/2014 04:26 AM, Hong Yang wrote:

If a user is splitting a large file into pieces only to use several of them, it will be efficient to just specify the indexes of files to output.

Take an extreme case for example. "alargefile" has 12823371193 lines. "csplit alargefile 823371193 823371293" will have three output files: xx00 with 823371192 lines, xx01 with 100 lines, and xx02 with the rest. If a user is only interested in xx01, it will be desirable to do "csplit alargefile 823371193 823371293 -o 1."

So this is a borderline one.

Functionally it is useful as it can avoid redundant processing and storage.

split(1) has a similar feature in that one can use -n K/N to split the Kth item out of N. It doesn't support -o N, to select an arbitrary chunk based on size, as that can be done for single chunks with dd. If we were to implement for csplit, we'd probably implement for split also, and support specifying multiple (disjoint) chunks.

However given it is only a performance feature and I'm thinking not a common use case, I'm 60:40 against implementing it.

thanks, Pádraig.