1

I have > 35,000 scanned diapositives from a scientific archive in one directory. (Fortunately, the filenames are rather uniformly formatted: {year}-{place}-{film-#}-{photo-#} OR {year}-{year}-{place}-{film-#}-{photo-#}.)

I'd like to create directories via regex/awk named {year}-{place} OR {year}-{year}-{place}, and then I'd like to move the corresponding photos into these directories.

I'm aware of these posts which are concerned with very similar problems, but I can't make the "transfer":

There are several problems, but I think my biggest one is that I can't get the regex to work (although they work beautifully on regex101.com and I escape (, | and ) ). And even if I could them to work, awk doesn't seem to support alternative groups of regexes.

Any help is greatly appreciated =)

Non-working example to show problem with my regex:

#!/bin/bash

regex='([0-9]-[0-9]|[0-9])-[a-zA-Z]' liste='1975-Bali.jpg' echo $liste [[$liste=~$regex]]

echo "${BASH_REMATCH[0]}" echo "${BASH_REMATCH[1]}"

gives:

1975-Bali.jpg
./ALT_ordner_erstellen.sh: line 8: [[1975-Bali.jpg=~\([0-9]*-[0-9]*\|[0-9]*\)-[a-zA-Z]*]]: command not found

EDIT:

as Paul_Pedant commented: this is my idea of the structure of the script, in pseudocode:

loop begin
    # via awk or sed?
    for every filename 
         mkdir based on {year}-{place} or {year}-{year}-place
         mv corresponding files to directory
loop end

mkdir has an option to not create a directory that already exists, correct? Because otherwise my abovementioned idea to do it all in one loop would be not very clever, as many photos are supposed to go in one directory, creating it anew every time would lose the photos already moved, right?

Examples of filenames:

1980-1981-Bali-055-21a.jpg  
1980-1981-Bali-055-21.jpg  
1980-1981-Bali-055a-21.jpg  
1980-Bali-055-21.jpg  
1980-Bali-055a-21.jpg

I also thought about not searching the filenames from the beginning, as there definitely can be year-year-place-film-photo as well as year-place-film-photo, but from the end, see this codesnippet:

echo "1980-Bali-055a-028.jpg" | sed -e "s/-[0-9a-zA-Z]*-[0-9]*.jpg$//"

but I'm not 100 % sure that film-photo.jpg is always (!) the end of the filename, there are so many and I didn' scan the diapositives.

Thanks for your input!

Beres
  • 33
  • 3
    I don't really understand how it helps you with your end goal of sorting, but the [[...]] extended test syntax has very specific whitespace requirements: [[ $liste =~ $regex ]] – steeldriver Aug 31 '20 at 10:25
  • 1
    See: https://www.shellcheck.net/ – JRFerguson Aug 31 '20 at 10:47
  • Thanks! now it doesn't throw an error...but also the desired output is not produced. I wanted to build my script step by step, the first step being the bash-compatible regex. You're right, my end-goal of sorting is not reached yet. I think the best way would be to loop over the filenames and create directories based on the regex-matches. – Beres Aug 31 '20 at 11:18
  • Please clarify "even if ... awk ... groups". Bash regex is different to awk regex, and awk can combine any of its many operators. I understand step-by-step, but if your path is leading to awk you probably want to identify the end-game. Possibly, this might be: use awk to group the file names, and to generate a bash script which runs mkdir for all the required directories and then a sequence of mv commands to transfer files. More examples of variant file names would be helpful. – Paul_Pedant Aug 31 '20 at 12:11
  • AFAIK bash =~ follows the extended regex (ERE) dialect - so you'd want to remove the escaping of parentheses and OR operator: regex='([0-9]*-[0-9]*|[0-9]*)-[a-zA-Z]*'. See also Why does my regular expression work in X but not in Y? – steeldriver Aug 31 '20 at 12:26
  • i think it's possible without awk or bash, just export a little function for mkdir + mv and use find or even without function or find https://unix.stackexchange.com/q/607041 – alecxs Aug 31 '20 at 14:11
  • @alecxs The target is 35,000+ files, so I would hope to keep the process count down. In awk, I could group the files by target directory, use one mkdir per group, and mv -t with maybe 200 filenames per process iteration. Writing a script prior to actually running it also allows for user scrutiny before biting the bullet. – Paul_Pedant Aug 31 '20 at 15:34
  • Comment on the edit: If you mkdir on a directory that exists, it will throw an error, but will still retain the existing directory and the files in it. Tidier to test first: [[ -d "${myDir}" ]] || mkdir "${myDir}". Still a little wasteful, compared to grouping the files, making the directory once, and using the option for mv that accepts a list of filenames. But for a one-off, efficiency won't be significant. – Paul_Pedant Sep 06 '20 at 21:16
  • I would probably start with a script that recognised and counted matches with the patterns you expect, and listed any names it didn't know how to deal with. Refine that until it knows how to deal with every variation, then add the mkdir and mv operations. That is, use the script to investigate the data passively but more deeply, then teach it to carry out the corresponding actions. – Paul_Pedant Sep 06 '20 at 21:21

1 Answers1

0

With zsh:

zmodload zsh/files
autoload -Uz zmv
mkmv() { mkdir -p -- "$2:h" && mv -- "$@"; }
zmv -n -P mkmv '(<1600-2020>(-<1600-2020>|)-[^0-9-][^-]#)*(#q.)' '$1/$f'

Remove -n (dry-run) when happy that it does what you want.

(zmodload zsh/files is to make mkdir and mv builtin to speed things up as you have thousands of files to rename which means as many invocations of mkdir and mv).

  • Thanks! I'd rather try to stay with bash, before trying another shell. But I might try it and let you know the results! – Beres Sep 03 '20 at 17:12