Re: RFC: dd oflag=trunc to support in place filtering of files (original) (raw)


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


From: Pádraig Brady
Subject: Re: RFC: dd oflag=trunc to support in place filtering of files
Date: Fri, 06 Jun 2014 12:21:51 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2

On 06/06/2014 07:34 AM, Bernhard Voelker wrote:

On 06/05/2014 03:27 PM, Pádraig Brady wrote: > The thought just occurred to me that this could be useful > to filter large files in place? For example: > > grep whatever file.big | dd bs=1M conv=notrunc oflag=trunc

I guess you meant this:

_grep whatever file.big | dd bs=1M conv=notrunc oflag=trunc _ of=file.big

right

> That would assume that grep never outputs more than it reads, > and would issue a final truncate along the lines of: > > ftruncate(STDOUTFILENO, lseek(STDOUTFILENO, 0, SEEKCUR)); > > Useful enough to add?

While it sounds very useful, it looks like a powerful way to shoot oneself in the foot, e.g. when the producer command aborts

grep --unknown PAT file | dd ... grep: unrecognized option '--unknown'

... then dd probably wouldn't be able to detect the failure and truncate the file - so the original data would be lost.

Good point. Also if there was an I/O error reading the file, dd would nuke any data after that.

Second, regarding the already mentioned restriction that the producer doesn't output more data than the original size of the input file, e.g.

cat -n file | dd conv=notrunc of=file ...

Is this really an issue? It (surprisingly!) already seems to work, even with "obs=1". And if it is, how could we detect this?

This could be working due to readahead buffering in the kernel, but would not be general and fail eventually.

As a side note, "oflag=trunc" may not be enough to describe what it does ... it truncates the output file after the data copying. So what about something like "oflag=truncpost"?

Yes better.

Given the I/O error handling above I'm not sure thie is a feasible option.

thanks, Pádraig.