13

I am trying to interpret the following tee command:

cat colors.txt words.txt | tee colorsAndWords.txt | wc

Is my understanding, as follows, correct?

  1. cat colors.txt words.txt: This command concatenates the contents of the colors.txt and words.txt files and sends the combined output to the standard output (the terminal).

  2. | tee colorsAndWords.txt: The | (pipe) symbol takes the output of the previous command and passes it as input to the tee command. tee is used to both display the data on the standard output (usually the terminal) and write it to a file. In this case, it writes the concatenated output to a file named colorsAndWords.txt.

  3. | wc: The final | wc takes the output of the tee command, which is still the concatenated content, and passes it to the wc command. wc is used to count the number of lines, words, and characters in the text it receives as input.

Nosail
  • 231
  • 1
    The tee command allows you write the result to both file and standard output so allows you to count the characters, words & lines without retreat the resulting file itself. – admstg Oct 26 '23 at 09:53
  • 7
    Did you execute this command to test it ? If so, did it not conform to your expectations ? Computing is a practical science (or art), not theoretical. – Paul_Pedant Oct 26 '23 at 10:08
  • @Paul_Pedant it checked out fine. I just wanted to be sure I am understanding this right. :) – Nosail Oct 26 '23 at 10:17
  • 8
    Did you also read the manual man tee It describes the operation of the tee command. – James K Oct 27 '23 at 09:22

3 Answers3

22

Your understanding is correct, and there is nothing that I can see that needs correcting with regard to the tee utility. It is used to duplicate a data stream, and the example you show does this, storing one copy in a single file (there could be many) while also passing it on to the next stage of the pipeline.

It's a standard utility, meaning it will work the same on any Unix system, not just Linux. Its standard specification is found here, and the implementations found on Linux and elsewhere should adhere to that description (but may add extensions, like new command line options).

Kusalananda
  • 333,661
  • I know this is unix q&a, not macos, but I am curious at the timing. To wit, the macos pbcopy utility grabs its input stream and stores it in the system clipboard, without displaying anything. So typically print normally to see what you’re getting then run again to pipe to pbcopy. Similar utilities exist on Linux, IIRC. Can tee be used to either duplicate to pbcopy, rather than a file?? Or can its screen output happen before pbcopy suppresses it? – JL Peyret Oct 27 '23 at 16:53
  • 2
    @JLPeyret: If your shell supports it, you can use process substitution to do that. – Kevin Oct 27 '23 at 16:54
  • @Kevin thanks. That’s probably my cue to ask in the Apple forum rather than here. Or look up that subject myself. – JL Peyret Oct 27 '23 at 16:58
  • @JLPeyret something | tee >(pbcopy) | otherthing is that what you're looking for? If you don't need to duplicate the data, something | pbcopy or something > >(pbcopy), although I haven't looked at the pbcopy manual. macOS questions are on topic here, if they are command-line related. We don't usually like followup questions in comments :-) – Kusalananda Oct 27 '23 at 17:29
  • @Kusalananda ooohh, sweet, that worked a charm (tested with a simple ol' ls). I'll ask a proper question instead. But I couldn't resist asking as this talk of duplication and stream routing also seemed to touch on feeding other utilities with tee, not just file outputs. Which your answer to my comment confirms. – JL Peyret Oct 27 '23 at 18:43
7

Yes, that's correct. tee is like a T-junction in a water pipe, or a dual-output headphone jack splitter. It reads the input one and duplicates it to two output files, stdout and files specified on the command line. (Actually any number of output files; you can specify more than one on the command line. The rest of this answer assumes the simple case of one arg.)


Perhaps it would help to understand how you might write a simple implementation of tee.
In C, a loop body like this:

ssize_t bytes_read = read(0 /*stdin*/, buf, 4096);
write(1 /*stdout*/,          buf, bytes_read);  // error checking not shown; also
write(cmdline_arg_output_fd, buf, bytes_read);  // loops to finish short writes

The loop condition would be something like while(bytes_read > 0) to catch EOF (read returns 0) or error (negative), perhaps written like while( (bytes = read()) > 0 ) with assignment and compare inside the controlling expression. Since it needs to check the return value before doing the write calls.

Before that loop, fd = open(argv[1], O_WRONLY|O_CREAT, 0666); or something.

One fun usage is foo | bar | tee /dev/tty | baz as kind of a "debug print" to see data going through that stage of a pipeline. (Opening /dev/tty actually gives you whatever device is the controlling TTY for that process, i.e. the terminal. Like /dev/pts/19 in a Konsole terminal tab.)

Other useful args include regular files like you're using, as well as named pipes, or even process-substitution like >( grep ... > filtered.txt).

Toby Speight
  • 8,678
Peter Cordes
  • 6,466
0

'|' the pipe character is a instruction to the shell that the three programs (cat,tee and wc) should be run in a very specific manner.

THEY are basically oblivious to the fact that stdout (in this case for cat and tee) is redirected to something which isnt the terminal itself.

The same for stdin for the programs tee and wc.

Stefan Skoglund
  • 453
  • 3
  • 5