1

Let's say I have a very large file and I want to process every line in that file by piping it to script.

cat large_file.txt | python processor.py

I'm not completely sure how the above operation works. Does the file get iterated over, passing each line to the processor waiting for the processor to finish then, once the processor is done pass the next line. Or does the entire file get read and then passed to the processor?

I really hope it's the first case.

Thanks

Jeff
  • 165

1 Answers1

2

The output of the cat command is presented as STDIN to the Python script. The Python script is responsible for how quickly / slowly it reads this input, and whether it processes one line before reading the next or reads all the input and then begins processing.

John
  • 17,011
  • so it's possible if the python script is slow enough that cat could cause the system to run out of memory (granted the file is large enough)? – Jeff Feb 06 '15 at 19:33
  • I doubt it - if the file were that big, the system would probably just buffer it and not load the whole thing into memory. If that's what you're concerned about, you're far better off modifying the processor.py script to take a filename as a parameter and open that file for reading itself. You can still use STDIN as a "default" file to open if no filename is specified. – John Feb 06 '15 at 19:42
  • Buffering is usually involved but where/when/how depends on the OS in question. This has an interesting write up on this very topic. HTH – Dude named Ben Feb 06 '15 at 19:57