Re: uniq without the need of sort (original) (raw)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From:	Pádraig Brady
Subject:	Re: uniq without the need of sort
Date:	Tue, 08 Nov 2011 17:37:44 +0000
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:6.0) Gecko/20110816 Thunderbird/6.0

On 11/08/2011 04:30 PM, Peng Yu wrote:

Hi,

'uniq' currently relies on 'sort'. When the input file is small, this is OK. But when the input file is large, this seems to be a waste (the complexity is O(n log(n)), if uniq handles a hash table its self the complexity is only O(n)). I'm wondering if it is better to relax the requirement of 'sort' when 'uniq' is used.

Well uniq with a hash, would not be O(n) unless a "pefect hash"was used, which is impossible with arbitrary inputs. So in the worst case it could go to O(n^2). Also uniq would have to worry about memory for the hash, with temp files for scalability etc.

Currently sort implements all the complexity and the other tools take advantage of this. Note sort -u is available as that can discard duplicates as encountered to reduce the sort complexity.

cheers, Pádraig.

uniq without the need of sort, Peng Yu, 2011/11/08
- Re: uniq without the need of sort, Bob Proulx, 2011/11/08
- Re: uniq without the need of sort,Pádraig Brady <=

Prev by Date:Re: uniq without the need of sort
Next by Date:[PATCH] ls: plug a leak
Previous by thread:Re: uniq without the need of sort
Next by thread:[PATCH] ls: plug a leak
Index(es):
- Date
- Thread