I have a large input file which contains 30M lines, new lines in \r\n
. I decided to do something silly and compare the speed of counting all lines via read -r
compared to xargs
(stripping the \r
first, because xargs
does not seem to be able to split on multiple characters). Here are my two commands:
time tr -d '\r' < input.txt | xargs -P 1 -d '\n' -I {} echo "{}" | wc -l
time while read -r p || [ -n "$p" ]; do echo "$p"; done < input.txt | wc -l
Here, the second solution is much faster. Why is that?
Please note that I know that this is not a proper way to count lines of a file. This question is merely out of interest of my observation.
xargs
spawning a/usr/bin/echo
process for each line, while the second command is probably using the bashecho
directive, instead of a process. I'll check that. – Frazier Thien Jul 21 '21 at 13:52tr
. And how are you measuring the time here? Timing a pipe is complicated. – terdon Jul 21 '21 at 14:03tr
should be minimal. Regarding timing, I just writetime
in front of both commands, so including the overhead oftr
indeed. – Frazier Thien Jul 21 '21 at 14:06tr
,xargs
and/bin/echo
on each invocation while the shell one doesn't call any external tools at all. That will likely explain it, but I'm not sure. Can you also show the actual results you get? It's hard to understand what "much faster" means without them. – terdon Jul 21 '21 at 14:15/usr/bin/echo
for theread
and it does also seem to take ages now. I will added these measurements later on. – Frazier Thien Jul 21 '21 at 14:17tr
in the first version is not doing anything -- it is waiting for input from the terminal. Thexargs
is reading from the redirection<
, not the pipe|
. – Paul_Pedant Jul 21 '21 at 17:27