bug#9455: RFE: split --balanced (original) (raw)

[Top][All Lists]


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


From: Bob Proulx
Subject: bug#9455: RFE: split --balanced
Date: Tue, 6 Sep 2011 19:34:18 -0600
User-agent: Mutt/1.5.21 (2010-09-15)

severity 9455 wishlist thanks

Hi Dave!

Dave Yost wrote:

Z% for x in 1 2 3 4 5 6 7 _for> do echo $x ; done | split --lines=3 _ pipe> && for x in x?? ; do echo "=== x";catx" ; cat x";catx ; done

Sure. Just fyi but GNU seq can produce sequences of numbers very easily. I think this is a little more concise example.

seq 1 7 | split --lines=3 head *

But do you always mean stdin? Or mostly is this a file? Because...

In some applications, you would like split to more evenly apportion the output to the files, like this:

Z% for x in 1 2 3 4 5 6 7 _for> do echo $x ; done | split --balanced --lines=3 _ pipe> && for x in x?? ; do echo "=== x";catx" ; cat x";catx ; done === xaa 1 2 3 === xab 4 5 === xac 6 7

I think it would be really hard to know if the user wanted the extra lines in the first file. It would be easier to gather that last widow line up into the next to last file. Or easier to leave it alone in its own file.

seq 1 7 > input.txt numsplits=3 num=$(wc -l < input.txt) perfile=$(($num / $numsplits)) split --lines=$perfile < input.txt head x*

seq 1 17 > input.txt numsplits=3 num=$(wc -l < input.txt) perfile=$(($num / $numsplits)) split --lines=$perfile < input.txt head x*

And from there you could get creative and make a decision based upon the number of lines in that last file.

if [ ((((((num % numsplits))−ltnumsplits)) -lt numsplits))lt(($num / 2)) ]; then ...

If the widow lines are less than half the number of total lines in the files then that last file could be concatenated into the next to last file, the implementation of which I will leave as an exercise. Just as an example. Or they could be put in the first file. But I think what the user would want in something like this is so varied that there isn't any one natural result. So I think this is better left to the caller to decide.

Bob