output to file, then use file for input

Question

Is there a shorter way of writing this? Basically output a command to a file, then use the file as input for the next command. I also want to keep the file to view afterwards.

cmd1 > verylong.txt; cmd2 < verylong.txt

I know I can do

cmd1 | tee verylong.txt | cmd2

But since I expect "verylong.txt" to be a huge file, I thought it would be less efficient to use pipe since that would hold the entire file in memory. Whereas if I use file input then it would process it one line at a time. (Or is my assumption wrong?)

It would be great if I could do something elegant like

cmd1 > verylong.txt > cmd2

score 16 · Accepted Answer · edited Apr 13 '17 at 12:36

16

As far as I know, cmd1 | tee verylong.txt | cmd2 will not hold the whole file in memory. In fact, if cmd2 was to wait too long before consuming its input, cmd1 might block on a write call and unblock only when cmd2 starts reading again.

The reason for that is that there is a buffer for the pipe, and that buffer, by default, is limited to a certain reasonable size.

Of course, the story might be different if cmd2 is sort (or something alike) where the entire input must be read before the command is able to write its output. In that case, the entire file content might be held in cmd2 memory, but that is independent of whether a pipe or an intermediary file was used for the input of that command.

edited Apr 13 '17 at 12:36

Community

1

answered Nov 21 '14 at 00:15

user43791

2,688

3

sort doesn't store the whole file in memory, it's got a buffer with a maximum size as well and resorts to temporary files when that maximum is reached. – Stéphane Chazelas Nov 21 '14 at 15:26
@StéphaneChazelas Good to know, have an upvote! ;) I'll update the answer to be less assertive in that "hold into memory" part. – user43791 Nov 21 '14 at 16:03

mikeserv · Answer 2 · 2014-11-21T00:31:07.883

The already given answer is correct. But if your goal is to selectively read your verylongfile.txt w/ cmd2, sed might be another option.

cmd1 | sed -e 'w verylongfile.txt' -e '/notinteresting/d' | cmd2

sed will write all of its input to the outfile, but only the bits that do not match the /notinteresting/ address to the pipe. Or you might negate the action with /interesting/!d which would write only the lines that match the interesting address to the pipe.

If this is not your goal, use tee instead, though - it is a more efficient tool for writing the whole of its input to both the outfile and the pipe.

score 0 · Answer 3 · answered Nov 26 '14 at 14:55

There is a clever trick with tee and subshells:

cat source.lst | tee >(doSomething.sh) >(somethingElse.sh) | somethingFinal.sh

I've done this before

pv -perl source.list | tee >(doSomething.sh) >(somethingElse.sh) | md5sum

pv will give you a progress bar, an ETA, and a running line total. Then source.lst will be fed to doSomething.sh and somethingElse.sh (and on different CPUs!) Finally we'll get an md5sum of that hugefile, just for academic purposes.

score -6 · Answer 4 · edited Nov 21 '14 at 04:54

-6

Whats wrong with simple two line batch file? Like:

Cmd1 >filespec
Cmd2 <filespec

Or

cmd1 >filespec
cmd2 filespec

either way, the file is left in mass storage.

edited Nov 21 '14 at 04:54

slm

369,824

answered Nov 21 '14 at 04:26

user92319

1

For some reason the site is not letting me key in a less than symbol. and dropping the second part of cmd2. So in words cmd1 redirect out to file. next line cmd2 redirect in from file. OR for cmd2 , just put the filename as the first parameter and cmd2 just opens the file. – user92319 Nov 21 '14 at 04:39
You use < for the < symbol... – jasonwryan Nov 21 '14 at 04:57
1

One difference between cmd | tee file | cmd and cmd >file; cmd <file is that the commands in the first group are executed in parallel - which is to say that they all start at the same time. And so cmd2 can process cmd1s output as it is written, whereas cmd1; cmd2 are two commands that are executed one after the other - or, in other words, cmd2 must wait on cmd1 to complete before processing anything. – mikeserv Nov 21 '14 at 04:58
The author of the question specifically asked for a command that's different from that because the files are very large. Executing the files using your methodology would be a lot slower than cmd1|tee file|cmd2. – Mark D Nov 26 '14 at 14:40

output to file, then use file for input

4 Answers4