4

I have a text file, "foo.txt", that specifies a directory in each line:

data/bar/foo
data/bar/foo/chum
data/bar/chum/foo
...

There could be millions of directories and subdirectories What is the quickest way to create all the directories in bulk, using a terminal command ?

By quickest, I mean quickest to create all the directories. Since there are millions of directories there are many write operations.

I am using ubuntu 12.04.

EDIT: Keep in mind, the list may not fit in memory, since there are MILLIONS of lines, each representing a directory.

EDIT: My file has 4.5 million lines, each representing a directory, composed of alphanumeric characters, the path separator "/" , and possibly "../"

When I ran xargs -d '\n' mkdir -p < foo.txt after a while it kept printing errors until i did ctrl + c:

mkdir: cannot create directory `../myData/data/a/m/e/d': No space left on device

But running df -h gives the following output:

Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda        48G   20G   28G  42% /
devtmpfs        2.0G  4.0K  2.0G   1% /dev
none            401M  164K  401M   1% /run
none            5.0M     0  5.0M   0% /run/lock
none            2.0G     0  2.0G   0% /run/shm

free -m

 total       used       free     shared    buffers     cached
Mem:          4002       3743        258          0       2870         13
-/+ buffers/cache:        859       3143
Swap:          255         26        229

EDIT: df -i

Filesystem      Inodes   IUsed  IFree IUse% Mounted on
/dev/xvda      2872640 1878464 994176   66% /
devtmpfs        512053    1388 510665    1% /dev
none            512347     775 511572    1% /run
none            512347       1 512346    1% /run/lock
none            512347       1 512346    1% /run/shm

df -T

Filesystem     Type     1K-blocks     Used Available Use% Mounted on
/dev/xvda      ext4      49315312 11447636  37350680  24% /
devtmpfs       devtmpfs   2048212        4   2048208   1% /dev
none           tmpfs       409880      164    409716   1% /run
none           tmpfs         5120        0      5120   0% /run/lock
none           tmpfs      2049388        0   2049388   0% /run/shm

EDIT: I increased the number of inodes, and reduced the depth of my directories, and it seemed to work. It took 2m16seconds this time round.

don_crissti
  • 82,805
  • Is this a virtual machine? Does the main node has enough space? – Sreeraj Dec 15 '14 at 12:22
  • This answer can help you with the inodes problem. You will need to nuke the current filesystem on that partition, but if that's a virtual filesytem running inside a regular file that shouldn't be a major issue. – PM 2Ring Dec 15 '14 at 12:24
  • Yes. You seem to have enough space in all the partitions, there are free inodes, but still if it says you don't have enough space, probably the hypervisor on which your VPS is located has run out of space. You might have to contact your VPS provider to check that. – Sreeraj Dec 15 '14 at 12:26
  • Is that output from df -i from before or after you try to run xargs -d '\n' mkdir -p < foo.txt ? – PM 2Ring Dec 15 '14 at 12:28
  • What FS type (df -T /)? – Stéphane Chazelas Dec 15 '14 at 12:30
  • @John, then I don't know. AFAIK, there's no limit on ext4 on the number of entries in a directory (or it's very high). It would make sense to ask a separate question for your "no space left" issue. Can you not create directories at all now, or only in specific directories? – Stéphane Chazelas Dec 15 '14 at 12:49
  • @StéphaneChazelas I ignored the problem, and just increased the size of the disk image so that there are more inodes. I also reduced the depth of the directory structure and it seems to work. So now I could run your command without problem :) – Kaizer Sozay Dec 15 '14 at 13:17

3 Answers3

13

With GNU xargs:

xargs -d '\n' mkdir -p -- < foo.txt

xargs will run as few mkdir commands as possible.

With standard syntax:

(export LC_ALL=C
 sed 's/[[:blank:]"\'\'']/\\&/g' < foo.txt | xargs mkdir -p --)

Where it's not efficient is that mkdir -p a/b/c will attempt some mkdir("a") and possibly stat("a") and chdir("a") and same for "a/b" even if "a/b" existed beforehand.

If your foo.txt has:

a
a/b
a/b/c

in that order, that is, if for each path, there have been a line for each of the path components before, then you can omit the -p and it will be significantly more efficient. Or alternatively:

perl -lne 'mkdir $_ or warn "$_: $!\n"' < foo.txt

Which avoids invoking a (many) mkdir command altogether.

  • In your standard syntax, does it mean POSIX? – cuonglm Dec 15 '14 at 11:49
  • My file has millions of lines, will it really be able to pass them all as arguments ? – Kaizer Sozay Dec 15 '14 at 12:09
  • @John, xargs runs as many instances of the command as needed so as to avoid the limit on the maximum number of arguments. So it will probably invoke many mkdir commands each one of them passed a few thousand of directories to create. – Stéphane Chazelas Dec 15 '14 at 12:12
  • It repeats the error "mkdir: cannot create directory `../myData/data/a/m/e/d': No space left on device" many times for each file ? Could there be a bug in your command ? My file seems to have only unique entries. Or is this just how the error is displayed ? – Kaizer Sozay Dec 15 '14 at 12:21
  • @KaizerSozay, you're running out of space or inodes, the errors are probably about creating a directory component leading to the files. – Stéphane Chazelas May 15 '19 at 08:22
1

I know we will get lot of answers for this question.But still you can TRY this :) :D

while read -r line; do mkdir -p "$line" ; done < file.txt

Thushi
  • 9,498
  • That's running one mkdir per directory and is flawed because of that wrong usage of read and the split+glob operator. – Stéphane Chazelas Dec 15 '14 at 11:36
  • Yes it is.Because of dependency.To create the folder bar we should have data and in the same way for others.But I didn't find any flaws in read.Can you execute my command and check it once?.I did and it's working for me. – Thushi Dec 15 '14 at 11:40
  • 2
    To read a line, it's IFS= read -r line, read line does extra processing. Leaving $line unquoted means invoking the split+glob operator. mkdir can take several arguments. – Stéphane Chazelas Dec 15 '14 at 11:41
  • Oh k.Thank you.I will improve my answer. I just took the above example ;) – Thushi Dec 15 '14 at 11:44
-1

Try this one-liner:

for i in $(cat foo.txt) ; do mkdir -p $i ; done

This will create the directory/directory-tree in the current working directory. Not in bulk (as in all-the-directories-getting-created-simulataneously), but the creation will be done one after another.

Sreeraj
  • 5,062
  • That's running one mkdir per directory and is flawed because of that wrong usage of the split+glob operator. That also means storing that whole huge list in memory. – Stéphane Chazelas Dec 15 '14 at 11:36
  • Cool. Going through the man page of xargs now after looking at your comment in the question. Always something new to learn everytime I open SE :) – Sreeraj Dec 15 '14 at 11:40
  • What about $i? Unquoted?? :P – Thushi Dec 15 '14 at 11:45
  • But how is it holding a hugelist in memory since there is only one iteration variable. Wouldn't it hold only that one variable during each iteration? – Sreeraj Dec 15 '14 at 11:45
  • 1
    @Sree, expanding $(cat...) means reading the output of cat in memory, split+glob it and iterate over the resulting huge list. – Stéphane Chazelas Dec 15 '14 at 11:48
  • 2
    To be honest, since we don't know the entire list of directories, we cannot assume that this answer is correct. A single space in a name will cause the wrong directory structure to be built. – John WH Smith Dec 15 '14 at 11:56
  • there are no spaces. the directory paths are only alpha numeric characters, "../" and the path separator "/" – Kaizer Sozay Dec 15 '14 at 12:12
  • @KaizerSozay I know this is old but - the point is that the file could have spaces; and if you're saying that files can't have spaces in them you're wrong (so can directories but directories are a file in the end). They can also have newlines (etc.). – Pryftan Aug 10 '18 at 22:27