Re: [coreutils] join feature: auto-format (original) (raw)
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
From: | Pádraig Brady |
---|---|
Subject: | Re: [coreutils] join feature: auto-format |
Date: | Fri, 07 Jan 2011 13:03:13 +0000 |
User-agent: | Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3 |
On 06/01/11 12:05, Pádraig Brady wrote:
On 07/10/10 19:25, Pádraig Brady wrote: > On 07/10/10 18:43, Assaf Gordon wrote: >> Pádraig Brady wrote, On 10/07/2010 06:22 AM: >>> On 07/10/10 01:03, Pádraig Brady wrote: >>>> On 06/10/10 21:41, Assaf Gordon wrote: >>>>> >>>>> The "--auto-format" feature simply builds the "-o" format line >>>>> automatically, based on the number of columns from both input files. >>>> >>>> Thanks for persisting with this and presenting a concise example. >>>> I agree that this is useful and can't think of a simple workaround. >>>> Perhaps the interface would be better as: >>>> >>>> -o {all (default), padded, FORMAT} >>>> >>>> where padded is the functionality you're suggesting? >>> >>> Thinking more about it, we mightn't need any new options at all. >>> Currently -e is redundant if -o is not specified. >>> So how about changing that so that if -e is specified >>> we operate as above by auto inserting empty fields? >>> Also I wouldn't base on the number of fields in the first line, >>> instead auto padding to the biggest number of fields >>> on the current lines under consideration. >> >> My concern is the principle of "least surprise" - if there are existing >> scripts/programs that specify "-e" without "-o" (doesn't make sense, but >> still possible) - this change will alter their behavior. >> >> Also, implying/forcing 'auto-format' when "-e" is used without "-o" might >> be a bit confusing. > > Well seeing as -e without -o currently does nothing, > I don't think we need to worry too much about changing that behavior. > Also to me, specifying -e EMPTY implicitly means I want > fields missing from one of the files replaced with EMPTY. > > Note POSIX is more explicit, and describes our current operation: > > -e EMPTY > Replace empty output fields in the list selected by -o with EMPTY > > So changing that would be an extension to POSIX. > But I still think it makes sense. > I'll prepare a patch soon, to do as I describe above, > unless there are objections.
The attached changes
join
(from what's done on other platforms) so that...
join -e
will automatically pad missing fields from one file so that the same number of fields are output from each file. Previously -e was only used for missing fields specified with -o or -j.With this change join now does:
$ cat file1 a 1 2 b 1 d 1 2
$ cat file2 a 3 4 b 3 4 c 3 4
$ join -a1 -a2 -1 1 -2 1 -e. file1 file2 a 1 2 3 4 b 1 . 3 4 c . . 3 4 d 1 2 . .
$ join -a1 -a2 -1 1 -2 4 -e. file1 file2 . . . . a 3 4 . . . . b 3 4 . . . . c 3 4 a 1 2 . . b 1 . d 1 2 . .
$ join -a1 -a2 -1 4 -2 1 -e. file1 file2 . a 1 2 . . . . b 1 . . . d 1 2 . . . a . . 3 4 b . . 3 4 c . . 3 4
$ join -a1 -a2 -1 4 -2 4 -e. file1 file2 . a 1 2 a 3 4 . a 1 2 b 3 4 . a 1 2 c 3 4 . b 1 . a 3 4 . b 1 . b 3 4 . b 1 . c 3 4 . d 1 2 a 3 4 . d 1 2 b 3 4 . d 1 2 c 3 4
While -e without -o was previously a noop, and so could safely be extended IMHO, this will also change the behavior when with -e and -j are specified. Previously if -j > 1 was specified, and that field was missing, then -e would be used in its place, rather than the empty string. This still does that, but also does the padding. Without the -j issue I'd be 80:20 for just extending -e to auto pad, but given -j I'm 50:50. The alternative it to select this with say '-o padded', but that's less discoverable, and complicates the interface somewhat.
Considering this more, I think it's safer to auto pad only
when '-o padded' is specified. I notice the plan9 join
man page
has an example that uses -e '' to explicitly specify the NUL string as filler,
which would have triggered our auto pad if we left it as above.
cheers, Pádraig.
- Re: [coreutils] join feature: auto-format, Pádraig Brady, 2011/01/06
- Re: [coreutils] join feature: auto-format,Pádraig Brady <=
* Re: [coreutils] join feature: auto-format, Jim Meyering, 2011/01/07 - Re: [coreutils] join feature: auto-format, Pádraig Brady, 2011/01/11
* Re: [coreutils] join feature: auto-format, Assaf Gordon, 2011/01/12
* Re: [coreutils] join feature: auto-format, Pádraig Brady, 2011/01/12
* Re: [coreutils] join feature: auto-format, Jim Meyering, 2011/01/13
* Re: [coreutils] join feature: auto-format, Pádraig Brady, 2011/01/14
* Re: [coreutils] join feature: auto-format, Pádraig Brady, 2011/01/14
- Re: [coreutils] join feature: auto-format,Pádraig Brady <=
- Prev by Date:[coreutils] RFC for adding file creation mode feature into touch utility.
- Next by Date:Re: [coreutils] [PATCH] coreutils new feature: split --filter
- Previous by thread:Re: [coreutils] join feature: auto-format
- Next by thread:Re: [coreutils] join feature: auto-format
- Index(es):