Only list unique files based on partial filename

Question

For example I have many files that look like the output below, I'm trying to get a list of all the unique file names but disregard the characters to the right of the "-". I have tried ls -la | grep ....- | sort --unique and some variations but that does not give the output I need

4855-00160880.psi
4855-00160980.ps
4855-00160980.psi
5355-00160880.ps
5355-00160880.psi
5355-00160980.ps
5355-00160980.psi
5855-00160880.ps
5855-00160880.psi
5855-00160980.ps
5855-00160980.psi
5855-00160A80.ps
5855-00160A80.psi

Ideally I would like the output to show something like

4855
5355
5855

DopeGhoti · Answer 1 · 2017-04-28T21:11:28.143

6

Since you really don't want to parse ls, this should do the trick:

find . -type f -maxdepth 1 -exec basename "{}" \; | cut -d'-' -f1 | sort -u

edited Apr 28 '17 at 21:11

answered Apr 28 '17 at 17:46

DopeGhoti

76,081

1

But then this solution is not immune from some of the issues associated with parsing the output from ls, is it? – iruvar Apr 28 '17 at 18:37
1

It could break if a filename contains a newline, yes. But that's just one of the many, many things which is wrong with ls, and probably acceptable in the OP's scenario. – tripleee Apr 28 '17 at 19:53

tripleee · Answer 2 · 2017-04-29T08:22:38.560

How's this?

printf "%-4.4s\n" ????-* | uniq

The shell expands the wildcard in alphabetical order and passes the result as arguments to printf. The format string truncates each argument to four characters and adds a newline. Now all that remains is to remove adjacent duplicates.

If you don't know the number of digits before the hyphen, but you have an idea, you can loop over some candidates:

for expr in '??' '???' '????' '?????'  # Quoted (!)
do
    printf "%-${#expr}.${#expr}\n" $expr-* |  # Unquoted!
    uniq
done

This uses the Bash-only parameter expansion $[#var} which obtains the string length of $var.

Notice the trickery of quoting the wildcards to avoid their expansion in the loop initialization, then using the variable unquoted inside the loop (which is a no-no in most other cases).

What if the number of digits preceding the hyphen is unknown? — DopeGhoti, Apr 28 '17 at 21:12

steve · Answer 3 · 2017-04-28T19:20:45.033

Worth adding -type f to DopeGhoti's answer, to avoid that bogus . result.

find . -maxdepth 1 -exec basename "{}" \; | cut -d'-' -f1 | sort -u
.
4855
5355
5855
find . -maxdepth 1 -type f -exec basename "{}" \; | cut -d'-' -f1 | sort -u
4855
5355
5855
$

If wishing to keep similar to your original attempt, you could use this (bad, as it parses ls though!)

ls -1 | grep ^....-  | cut -c1-4 | sort --unique

awk based solution, still parsing ls

ls -1 | awk -F- '{print $1}' | sort --unique

No real need to sort in each of these cases, since ls output is already sorted, so can just use uniq.

ls -1 | awk -F- '{print $1}' | uniq

sed based solution

ls -1 | sed 's/-.*//' | uniq

find / sed solution that avoids parsing ls

find . -type f -printf "%f\n" | sed 's/-.*//g' | sort --unique

If always 4 digits before the "-" then this is quite elegant

find . -type f -printf "%.4f\n" | sort -u

find . -type f -iname '*.php' -exec grep "STRING-I-NEED-TO-FIND" {} + | sed 's/:.*//g' | sort --unique - thanks for the nudge in the the unique direction! — WEBjuju, Feb 15 '23 at 19:14

score 1 · Answer 4 · answered Apr 28 '17 at 20:51

With zsh:

myfiles=(*-*(.))
print -rl -- ${(u)myfiles[@]%%-*}

This saves all regular file names that contain at least one dash in an array. It then uses parameter expansion on each element of the array to remove the first dash and everything that follows. Any duplicate elements are removed via the (u) flag.
To select hidden files too, use myfiles=(*-*(.D))

Only list unique files based on partial filename

4 Answers4