Sort files into multiple directories based on filename?

Question

I have 1000s of files in a single directory that I want to sort into subdirectories based on their filenames. They're all consistently named with a set structure of p-[number]_n-[number]_a-[number].[ext].

Here's a small sample...

p-12345_n-987_a-1254.jpg
p-12345_n-987_a-9856.pdf
p-12345_n-987_a-926.docx
p-12345_n-384_a-583.pdf
p-12345_n-384_a-987.pdf
p-2089_n-2983_a-2348.gif
p-2089_n-1982_a-403.jpeg
p-38422_n-2311_a-126.pdf
p-38422_n-2311_a-5231.docx

What I'm after is a folder structure like this:

p-12345
  ⊢ n-987
    ⊢ p-12345_n-987_a-1254.jpg
    ⊢ p-12345_n-987_a-9856.pdf
    ⊢ p-12345_n-987_a-926.docx
  ⊢ n-384
    ⊢ p-12345_n-384_a-583.pdf
    ⊢ p-12345_n-384_a-987.pdf
p-2089
  ⊢ n-2983
    ⊢ p-2089_n-2983_a-2348.gif
  ⊢ n-1982
    ⊢ p-2089_n-1982_a-403.jpeg
p-38422
  ⊢ n-2311
    ⊢ p-38422_n-2311_a-126.pdf
    ⊢ p-38422_n-2311_a-5231.docx

I hope that makes sense.

Is it possible to write a script to organise the file in this way?

EDIT: To clarify: Yes, my question should be how can I write a script to organise the files? :) I'm very new to Unix and the command line in general. So far I've only written/used basic shell scripts. I have a hunch that the answer will probably involve regular expressions but beyond that I'm not really sure where to start.

The best idea I've come up with is to

Export the file list to a text file
Find and replace "_n" and "_a" with "/n" and "/a"
Create a series of mv commands from that
Save it as a shell script

I'm sure that's far more long-winded than it needs to be though. I'd also like to have something repeatable in case I need to do it for more files in future.

The correct answer to your question will be: "Yes, it is possible to write a script to organize files the way you want" but that's not what you're after ;-). Can you give some more details on what you tried, in which language and on which part you need special assistance? — Lambert, Oct 29 '19 at 08:38
It is possible to split the lines on the underscore character, use the first elements as path to store the files and take the whole entry as filename. — Lambert, Oct 29 '19 at 08:40

score 4 · Answer 1 · answered Oct 29 '19 at 09:10

Sure:

#!/bin/bash
for i in p-*_n-*.*; do
        Ppart=${i/_n-*}
        x=${i/${Ppart}_/}
        nPart=${x/_a-*}
        mkdir -p $Ppart/$nPart
        mv $i $Ppart/$nPart
done

First loop over all the filenames matching the pattern you gave. In each loop, use shell substitution to remove the last part of the filename starting from the _n- part, which gives the P part (the first level directory). Now we need the N part, starting from n- up to the _a- part. I do this in two steps: first remove the Ppart, then the last part starting from the _a- part.

Now use mkdir -p to create the directories necessary. mkdir -p doesn't give an error if the path already exists, so it's easier to just execute mkdir -p instead of testing whether the directory exists or not before deciding to execute the command.

Finally mv the file into the correct directory.

Thanks very much for your answer. I ended up using @AdminBee's but it's good to see other solutions.. — itsViney, Oct 29 '19 at 11:09

AdminBee · Accepted Answer · 2019-10-29T11:57:13.053

As already noted, the short answer is "yes".

The long answer is: You can do it with a bash script that uses awk to extract the filename elements you want to base your directory structure on. It could look something like this (where more emphasis is placed on readability than "one-liner" compactness).

#!/bin/bash


for FILE in p-*
do
    if [[ ! -f $FILE ]]; then continue; fi

    LVL1="$(awk '{match($1,"^p-([[:digit:]]+)_[[:print:]]*",fields); print fields[1]}' <<< $FILE)"
    LVL2="$(awk '{match($1,"^p-([[:digit:]]+)_n-([[:digit:]]+)_[[:print:]]*",fields); print fields[2]}' <<< $FILE)"

    echo "move $FILE to p-$LVL1/n-$LVL2"
    if [[ ! -d "p-$LVL1" ]]
    then
    mkdir "p-$LVL1"
    fi

    if [[ ! -d "p-$LVL1/n-$LVL2" ]]
    then
    mkdir "p-$LVL1/n-$LVL2"
    fi

    mv $FILE "p-$LVL1/n-$LVL2"
done

To explain:

We perform a loop over all files starting with "p-" in the current directory.
The first instruction in the loop ensures that the file exists and is a workaround for empty directories (the reason why this is necessary is that on this forum, you will always be told not to parse the output of ls, so something like FILES=$(ls p-*); for FILE in $FILES; do ... would be considered a no-go).
Then, we extract the numerals between p- and _n needed to generate the first level of your directory structure using awk (as you suspected, with regular expressions), the same for the numerals between n- and _a for the second level. The idea is to use the match function which not only looks for the place where the specified regular expression occurs in your input, but also gives you the "completed" value of all elements enclosed in round brackets ( ... ) in the array "fields".
Third, we check if the directories for the first and second level of your intended directory structure already exist. If not, we create them.
Last, we move the file to the target directory.

For more information, have a look at the Advanced bash scripting guide and the GNU Awk Users Guide.

Once you are more firm in scripting and regular expressions, you can make this much more compact; in the above script, for example, the generation of the directory/subdirectory path could easily be contracted to just one awk call.

For one, since the directory names are actually p-<number> and n-<number>, the same as in your filename, we could have let awk do the work to extract these characters for us, too, by writing match($1,"(^p-[[:digit:]]+)_(n-[[:digit:]]+)_[[:print:]]*",fields)
We can further offload work to awk by having it generate the directory-subdirectory path at the same time with a suitable argument of print:

awk '{match($1,"(^p-[[:digit:]]+)_(n-[[:digit:]]+)_[[:print:]]*",fields); print fields[1] "/" fields[2]}'

would readily yield (e.g.) p-12345/n-384 for file p-12345_n-384_a-583.pdf. If we combine that with the usage of mkdir -p as indicated by @wurtel, the script could look like

for FILE in p-*
do
    if [[ ! -f $FILE ]]; then continue; fi

    TARGET="$(awk '{match($1,"(^p-[[:digit:]]+)_(n-[[:digit:]]+)_[[:print:]]*",fields); print fields[1] "/" fields[2]}' <<< $FILE)"
    echo "move $FILE to $TARGET"

    mkdir -p "$TARGET"
    mv $FILE $TARGET
done

Thanks so much for your answer. It worked perfectly! Thanks for the detailed explanation too - not only does the script work but I actually understand it (sort of). — itsViney, Oct 29 '19 at 11:06
You're welcome. I will also expand the answer a little so as add more explanation. — AdminBee, Oct 29 '19 at 11:46

score 2 · Answer 3 · answered Oct 29 '19 at 09:33

And another version in Python (3):

import os

sourcepath='/path/to/source'
destination='/path/to/destination'

(_,_,fnames) = next(os.walk(sourcepath))
for f in fnames:
    subpath = '/'.join(f.split('_')[:-1])
    print("Moving {} to {}".format(os.path.join(sourcepath, f), os.path.join(destination, subpath , f)))
    os.makedirs(os.path.join(destination, subpath), exist_ok=True)
    os.rename(os.path.join(sourcepath, f), os.path.join(destination, subpath , f))

Thanks for your answer. I ended up using @AdminBee's solution but it's interesting to see other approaches. — itsViney, Oct 29 '19 at 11:07

score 1 · Answer 4 · 2019-10-30T10:10:17.190

1

This seems simplest, and gets the job done (I tested it). All we're doing is to use plain old sed to transform the current name to insert "/"s at the right places, and delete stuff at the right place, to generate the directory name.

for i in p*
do
    d=$(echo $i | sed -e 's|_n-|/n-|' -e 's|_a-.*||')
    mkdir -p $d
    mv -i $i $d
done

edited Oct 30 '19 at 10:10

answered Oct 30 '19 at 09:06

Oh! Thanks for catching that; I seem to have misread the question (despite the OP providing test output). I have amended my response, which -- thanks to you -- is now even shorter :-) – Oct 30 '19 at 10:08
You're welcome. I will then delete my comment. – AdminBee Oct 30 '19 at 10:35

score 0 · Answer 5 · answered Oct 31 '19 at 06:24

How about a nice one-liner

ls | awk -F"_" '{system("mkdir -p " $1 "/" $2 "&& mv " $0 " " $1 "/" $2 "/" $0)}'

Seperate the filename sections based on _ create the required directories and then move the unaltered filename to the newly created directory.

Sort files into multiple directories based on filename?

5 Answers5

Linked