0

First time to post here. How do I group files with the same first 3 letters followed by a sequence number in a directory? e.g.

VHS-01-001.avi
VHS-01-002.avi
-------------
VHS-02-001.avi
VHS-02-002.avi
VHS-02-003.avi
---------
Hi8-01-001.avi
Hi8-01-002.avi
Hi8-01-003.avi

so that I can pass each group of video files in a function as the following:

encode(){
  for avi in "$@" 
  do..
}
Tony Tan
  • 101

2 Answers2

3

Maybe not very smart solution:

  • sort files by name
  • loop through names
  • compare characters of last loop:

    last=""
    ls -1 $1 | sort | while read file; do
        sub=${file:0:3}
        [ "$last" != "$sub" ] && { echo "NEW GROUP"; last="$sub"; }
        echo "[$sub] $file"
    done
    

Instead of echo-ing collect filenames inside an array ...

Just an idea ... example output:

NEW GROUP                                                                                                                                                                                                                                    
[Hi8] Hi8-01-002.avi                                                                                                                                                                                                                         
NEW GROUP                                                                                                                                                                                                                                    
[VHS] VHS-01-001.avi                                                                                                                                                                                                                         
[VHS] VHS-01-002.avi
[VHS] VHS-02-002.avi
NEW GROUP
[XZU] XZU

Edit 1: based on Anthony Geoghegan 's answer avoid the pipes at the beginning of the loop and use bash globbing. Take a look at his comment.

Improved script:

last=""
for file in *avi; do
    sub=${file:0:3}
    [ "$last" != "$sub" ] && { echo "NEW GROUP"; last="$sub"; }
    echo "[$sub] $file"
done

Edit 2:

as asked by @ Tony Tan in his third comment: here you find a straigt forward solution to parse the collected file names to a function. There are many ways to do so. And I don't have much experience in bash scripting ... ;)

#!/bin/bash

SOURCE_DIR="$1"
cd "$SOURCE_DIR" || { echo "could not read dir '$SOURCE_DIR'"; exit 1; }

function parseFiles() {
  echo "parsing files:"
  echo "$1"
}

last=""
declare -a fileGroup

for file in *avi; do
  # first 3 chars of filename
  sub=${file:0:3}

  if test -z "$last"; then
    # last is empty. first loop
    last="$sub"
  elif test "$last" != "$sub"; then
    # new file group detected, parse collected
    parseFiles "${fileGroup[*]}"
    # reset array
    fileGroup=()
    last="$sub"
  fi

  # append name to array
  fileGroup[${#fileGroup[@]}]=$file
done

parseFiles "${fileGroup[*]}"
ChristophS
  • 565
  • 5
  • 13
  • 2
    Welcome to [unix.se]. That's a great first post. You can make it even better if you [edit] it to point out that it's applicable to the Bash shell. It's also a bad idea to parse the output of ls: you can actually simplify your first line by replacing it with for file in *avi; do. – Anthony Geoghegan May 09 '17 at 08:53
  • Thanks. ;) Maybe we have edited in parallel? I've tried to improve the formatting. Your idea is fine, but the loop will only do it's work if the filenames are sorted. How do you sort for file in *avi; ...? On the other hand you use 2 processes less ;) – ChristophS May 09 '17 at 09:02
  • When expanding a globbing pattern, POSIX-compliant shells (such as Bash) automatically list the files in alphabetical order (https://superuser.com/a/1117638/247588) ... so you don't actually need the sort command. I tried your code with the modified first line and it works fine. – Anthony Geoghegan May 09 '17 at 09:15
  • 1
    @AnthonyGeoghegan thanks for the globbing pattern hint. I did not know that. – Tony Tan May 09 '17 at 17:13
  • @ChristophS thanks, the modified codes works – Tony Tan May 09 '17 at 17:16
  • @ChristophS I am still new with shell scripting. What should I do if I need to process each group of files as a whole? sth like Stephane's code below. I have a working script which will encode a list of video files with certain format depending on their total length and authorize a dvd after encoding. I hope I made myself clear. – Tony Tan May 09 '17 at 19:08
  • @Tony: use the function "parseFiles". The param $1 contains the file group. Rename the function to "encode" and it's the same as Stephane's code - as far as I can see ;) – ChristophS Jun 15 '17 at 13:14
1

With zsh:

files=(???-??-*.avi)
for prefix (${(Mu)files#???-??-}) encode $prefix*.avi

(or encode ${(M)files:#$prefix*})

The equivalent with the GNU shell (bash) and tools would be:

while IFS= read -u3 -rd '' prefix; do
  encode "$prefix-"*.avi 3<&-
done 3< <(printf '%s\0' ???-??-*.avi | grep -oz '^...-..-' | sort -zu)

Same principle. We get the list of files matching the ???-??-*.avi pattern in the current directory, extract the part that matches ((M)/grep -o) ???-??- (regexp ...-..-), unique them ((u)/sort -u), and then loop over that list of prefixes.

  • Can you elaborate a bit more your shell script?I am getting following error: syntax error near unexpected token `<' – Tony Tan May 09 '17 at 18:50
  • @TonyTan, that needs to be interpreted by bash, not when run as sh. When running as sh, bash doesn't recognise the <(...) process substitution syntax. – Stéphane Chazelas May 09 '17 at 19:10
  • @Chazelas Is there any reason why it works fine in my arch linux system but not in Ubuntu subsystem under Windows 10? It is trying to use prefix (e.g. Hi8-01- VHS-01- etc as files name) in Ubuntu system. The rest of the codes is wokring fine. – Tony Tan May 09 '17 at 21:36
  • OK. This works in Ubuntu Bash subsystem under Windows 10. printf '%s\0' ???-??-*.avi | grep -oz '^...-..-' | sort -zu | while read a; do echo "$a*.avi" done – Tony Tan May 10 '17 at 00:04