Re: [PATCH]: uniq: add "--group" option (original) (raw)

Hello Pádraig,

Pádraig Brady wrote, On 02/20/2013 08:47 PM:

On 02/20/2013 06:44 PM, Assaf Gordon wrote:

Hello,

Attached is a suggestion for "--group" option in uniq, as discussed here: http://lists.gnu.org/archive/html/coreutils/2011-03/msg00000.html http://lists.gnu.org/archive/html/coreutils/2012-03/msg00052.html

The patch adds two parameters: --group=[method] separate each unique line (whether duplicated or not) with a marker. method={none,separate(default),prepend,append,both) --group-separator=SEP with --group, separates group using SEP (default: empty line)

--group-sep is probably overkill. I'd just use \n or \0 if -z specified.

OK.

As for separation methods I'd just go with what we have for --all-repeated (but remove 'none' which wouldn't be useful with --group), as we've never had requests for anything else. so: --group={prepend, separate(default)}

I'd like to have at least "append" or "both", for the added convenience of downstream analysis. It's obviously a "nice-to-have" and not "must-have" feature, and can be implemented in other ways, but knowing that there will always be a terminating marker after a group (even the last group) makes downstream processing code simpler.

Typical example: $ cat INPUT | uniq --group=append |
awk '$0!="" { ## item in the group, collect it } $0=="" { ## end of group, do something }'

Without the final group marker, any downstream code will require two points of "group processing": when a marker is found, and at EOF. Something like:

$ cat INPUT | uniq --group=append |
awk '$0!="" { ## item in the group, collect it } $0=="" { ## end of group, do something } END { ## end of last group, do something, duplicated code }'

Similar reason for having "both", as it ensures there I can put any special initialization code in the group-marker case, and doesn't need to duplicate it in a separate 'BEGIN{}' clause (Of course, this doesn't have to be awk - can be perl/python/ruby/whatever that will do downstream processing).

I realize it's not a "make-or-break" feature - but if we're trying to make text processing easier, I believe "append/both" makes it even easier.