0

Saying that I want to check the size of each file in some directory.

Here is what I do:

du -sh *

Also, I can do:

ls | xargs du -sh

The two commands do exactly the same thing.

I want to know if both of them are exactly the same, such as their cost, their efficiency etc. (The first command is lighter than the second I guess?)

Yves
  • 3,291

2 Answers2

11

One is correct, the other isn’t.

du -sh *

(should be du -sh -- * to avoid problems with filenames starting with -)

relies on the shell to expand the glob *; du sees all the non-hidden files and directories in the current directory as individual arguments. This handles special characters correctly.

ls | xargs du -sh

relies on xargs to process ls’s output. xargs splits its input on whitespace (at least space, tab and newline, more with some implementations), also understanding some form of quoting, and runs du (one (even for an empty input¹) or more invocations) with every single whitespace-separated string as individual arguments.

Both appear equivalent if your current directory doesn’t contain files with whitespace, single quote, double quote or backslash characters in their names, and if there are few enough files (but at least one) that xargs runs only one du invocation, but they’re not.

In terms of efficiency, du -sh * uses one process, ls | xargs du -sh uses at least three. There is one scenario where the pipe approach will work, while the glob won’t: if you have too many files in the current directory, the shell won’t be able to run du with all their names in one go, but xargs will run du as many times as necessary to cover all the files, in which case you would see several lines, and files with more than one hard link may be counted several times.

See also Why *not* parse `ls`?


¹ If there's no non-hidden file in the current directory du -sh -- * will either fail with an error by your shell, or with some shells like bash run du with a literal * as argument and du will complain about that * file not existing.

While with ls | xargs du -sh --, most xargs implementations (exceptions being some BSD) will run du with no argument and so give the disk usage of the current directory (so also including the disk usage of the directory file itself and all hidden files and directories in it)

Stephen Kitt
  • 434,908
1

In the first case, the shell expands * into the list of matching file names and passes those as arguments to the du command. In the second case, the shell starts two processes (ls and xargs) connected via a pipe. ls prints the file names, and xargs reads them, then starts a du command. So the second version executes 3 commands, the first only one. There are some potential differences:

  • ls might list files in a different order
  • depending on environment settings, ls might even list more or less files (not sure about that, though)
  • when xargs receives more filenames than can be passed as arguments, it will execute du multiple times
  • I'm not aware of ls implementations that list more or less files depending on environments. However globs in several shells can be affected by the environment like with the FIGNORE variable in ksh, GLOBIGNORE in bash, BASHOPTS in bash that can enable options like dotglob, nullglob. – Stéphane Chazelas Apr 25 '18 at 09:56
  • In the unlikely event that the list of filenames is too large to pass through an exec, then I expect that du * will fail, too.  I say “unlikely” because that limit is very large. – Scott - Слава Україні Jul 25 '19 at 14:00