Re: Sort with header/skip-lines support (original) (raw)


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


From: Pádraig Brady
Subject: Re: Sort with header/skip-lines support
Date: Fri, 11 Jan 2013 00:11:14 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120615 Thunderbird/13.0.1

On 01/10/2013 09:57 PM, Assaf Gordon wrote:

Hello,

I'd like to re-visit an old issue: adding header-line/skip-lines support to 'sort'.

It has been discussed few times in the past, but IMHO the suggested workarounds fall short:

  1. Sometimes using 'bash' specific constructs [1]
  2. No error checking (e.g. running head/tail/sed without checking for errors)
  3. Using multiple input files is convoluted.
  4. Suggestions work for regular files, but not for pipes [2].

The attached draft patch is based on Jim Hester's patch [3], rebased to the latest sort, with some fixes and tests. It seems to work fine, except one glaring omission: it only works when output is STDOUT because creating the output file is a brute-force ugly hack.

The syntax is sort --skip-lines=N [other options]

That's a bit ambiguous and might suggest that the header line was not output after the sort? Maybe keep consistent with join and numfmt and use --header.

The two tests are: make check TESTS=tests/misc/sort-skip-lines SUBDIRS=. make check TESTS=tests/misc/sort-skip-lines-bigfiles SUBDIRS=. RUN_EXPENSIVE_TESTS=yes

If this is something you are willing to consider, I'm happy to hear comments and suggestions and improve it.

Alternatively, perhaps this is a good candidate for a "contrib" script, but I'm not sure how do go about developing a shell script that is posix compliant, has robust error checking, and still be a full 'drop-in' replacement for sort (many options combinations).

Thanks, -gordon

[1] - bash work-around: http://lists.gnu.org/archive/html/coreutils/2010-11/msg00084.html [2] - no pipe support: http://lists.gnu.org/archive/html/bug-coreutils/2007-07/msg00215.html

Note the pipe issue might be handled with stdbuf -i0 head ... but head doesn't use stdio so that won't work. But recent sed can be used for this like: seq -u 1q http://git.sv.gnu.org/gitweb/?p=sed.git;a=commit;h=737ca5e Note that commit is 4 years old, but only recently released sed 4.2.2 contains it.

[3] - Jim's patch: http://lists.gnu.org/archive/html/coreutils/2010-11/msg00091.html

Thanks for collating the previous threads on this subject.

I'm on the fence on how warranted this is TBH. We'd need stronger arguments for it I think.

thanks, Pádraig.