4

If I have a number of directories, named, for example 10001 through 10025 is there any reason to use ls 1*/foo vs. ls 100??/foo?

I have a lot more than 25 of them, so I mostly curious if there's any differences in speed.

I know the difference in use between the two, that the asterisk will match longer file names, like 10001.backup. But let's say I don't have any files that don't follow my conventions. Is there any behind-the-scenes differences?

  • 2
    It isn't a speed issue. It is a specificity issue. Many times one thinks an aggressive Regex will do but one gets bitten by a corner case one didn't think about. e.g. One wants to delete dirs 10001 through 10025 only to realize there was an unrelated dir called 1world_data that became an unintended casualty. Ergo, I think its good practice to glob only the closest match. It's the safe way in the long run. – curious_cat Aug 18 '15 at 06:21

2 Answers2

9

Function

They mean different things. The asterisk matches zero to infinity characters. The question mark matches exactly one character.

From the references above:

The * character serves as a "wild card" for filename expansion in globbing.

The ? character serves as a single-character "wild card" for filename expansion in globbing…

Performance

tl;dr: there is no detectable difference in performance.

I tested performance by using a directory filled with 36 sub-directories, each named with a single character. There were about 70 000 files in the subdirectories combined. I tested the following.

$ time ls ?/* -d >/dev/null
$ time ls */* -d >/dev/null

I alternated each command ten times each. Here are the results for the real time, in seconds.

?       *
0.318   0.326
0.355   0.212
0.291   0.351
0.291   0.265
0.287   0.283
0.362   0.23
0.248   0.33
0.286   0.283
0.293   0.351
0.233   0.352

After statistical analysis (paired t-test, two-tailed), I could detect no difference between the two values in performance (p value = 0.95).

Graph

EDIT: More samples

I repeated the above analysis with 200 samples each, again alternating tests.

$ for i in {1..200}; do time (ls */* -d >/dev/null) 2>> /tmp/time_asterisk; time (ls ?/* -d >/dev/null) 2>> /tmp/time_question_mark; done

Here are the raw data for ? and *. Again, I could detect no significant difference (p value = 0.55), and the distribution of each test looks more similar.

graph2

Sparhawk
  • 19,941
  • You can compare results faster, and on larger sample with the loop: time ( for i in {1..1000}; do ls ?/* -d >/dev/null; done ). However I would be surprised if there would be some significant differences, and even if - they would be shell-dependent. – jimmij Aug 17 '15 at 22:26
  • @jimmij Yes, I might try that later. – Sparhawk Aug 17 '15 at 22:47
  • @jimmij Okay done. I was impatient, so I "only" did 200 samples, but it looks pretty convincing to me. I also alternated tests instead of running each consecutively, to minimise systemic effects. – Sparhawk Aug 17 '15 at 23:12
  • "There were about 70 000 files in the subdirectories combined." - Just curious, is this a significant parameter in your testing? Did you test with just a file, or no nested files? :) – h.j.k. Aug 18 '15 at 03:51
  • @h.j.k. To be honest, I just used a convenient directory I had, which was a complete clone of the old Arch Linux AUR, sorted into directories based on the first letter. If I were creating the test from scratch, I'd probably create directories named 0001–9999 instead, as I think iterating over the wildcard would be more relevant. However, in this case, I think I did enough replicates overall that any differences should be apparent. – Sparhawk Aug 18 '15 at 04:48
  • @Sparhawk ah I sees, thanks for the clarification! – h.j.k. Aug 18 '15 at 05:24
1

The ?? are more specific, in the event there are or could be other, longer files the * glob would match.

% touch 10001 100dalmations
% ls 100??
10001
% ls 100*
10001  100dalmations
% 
thrig
  • 34,938