uniq - check specific fields (original) (raw)


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


From: Assaf Gordon
Subject: uniq - check specific fields
Date: Thu, 07 Feb 2013 12:13:09 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.4) Gecko/20120510 Icedove/10.0.4

Hello,

Attached is a proof-of-concept patch to add "--check-fields=N" to uniq, allowing uniq'ing by specific fields. (Trying a different approach at promoting csplit-by-field [1] :) ).

It works just like 'check-chars' but on fields, and if not used, it does not affect the program flow.

# input file, every whole-line is uniq
$ cat input.txt 
A 1 z
A 1 y
A 2 x
B 2 w
B 3 w
C 3 w
C 4 w

# regular uniq
$ uniq -c input.txt 
      1 A 1 z
      1 A 1 y
      1 A 2 x
      1 B 2 w
      1 B 3 w
      1 C 3 w
      1 C 4 w
      
# Stop after 1 field
$ uniq -c --check-fields 1 input.txt 
      3 A 1 z
      2 B 2 w
      2 C 3 w

# Stop after 2 fields
$ uniq -c --check-fields 2 input.txt 
      2 A 1 z
      1 A 2 x
      1 B 2 w
      1 B 3 w
      1 C 3 w
      1 C 4 w

# Skip the first field and check 1 field (effectively, uniq on field 2)
$ uniq -c  --skip-fields 1 --check-fields 1 input.txt 
      2 A 1 z
      2 A 2 x
      2 B 3 w
      1 C 4 w

# "--field" is convenience shortcut for skip&check fields 
$ uniq -c --field 2 input.txt 
      2 A 1 z
      2 A 2 x
      2 B 3 w
      1 C 4 w
$ uniq -c --field 3 input.txt 
      1 A 1 z
      1 A 1 y
      1 A 2 x
      4 B 2 w

===

What do you think ?

-gordon

[1] http://lists.gnu.org/archive/html/coreutils/2013-02/msg00015.html

Attachment: 0001-uniq-support-uniq-by-field.patch
Description: Text Data