Re: Feature request: testline(tl) (RFC) (original) (raw)


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


From: Pádraig Brady
Subject: Re: Feature request: testline(tl) (RFC)
Date: Tue, 09 Dec 2014 22:38:51 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0

On 09/12/14 22:20, V.Krishn wrote:

Hi,

Was reading about bloom filter, and came upon this example,

http://troydhanson.github.io/misc/bloom.html ------ The bf test program

The program bf.c implements a Bloom filter. It can be used like,

./bf -n 16 members.txt test.txt

Where the lines of members.txt are the true set members and the lines of test.txt will be tested for membership. Varying n shows how the error rate increases with smaller values of n. ------

Source: https://github.com/troydhanson/misc code: https://raw.githubusercontent.com/troydhanson/misc/master/compression/bloom/bf.c

REQUEST: Wondering if a simple implementation to test lines could be added to coreutils Features: 1. report if some lines missing (option to print) 2. option to print found lines 3. option to print missing lines 4. ....more logic posible...

------------- Presently, I can achive the same using simple shell script by calling grep on each line or using comm But believe that method using bloom should be faster and result in a uniq and useful tool.

Please ignore or guide if any similar util already exists.

Maybe we should keep the existing interfaces of grep, uniq, comm etc. and use a bloom filter internally if appropriate.

cheers, Pádraig.