23

For the purpose of testing, I'd like count how many images files are inside a directory, separating each image file type by file extension (jpg="yes". This because later it will be useful for another script that will execute an action on each file extension). Can I use something like the following for only JPEG files?

jpg=""
count=`ls -1 *.jpg 2>/dev/null | wc -l`
if [ $count != 0 ]
then
echo jpg files found: $count ; jpg="yes"
fi

Considering file extensions jpg, png, bmp, raw and others, should I use a while cycle to do this?

Kusalananda
  • 333,661

9 Answers9

32

My approach would be:

  1. List all files in the directory
  2. Extract their extension
  3. Sort the result
  4. Count the occurrences of each extension

Sort of like this (last awk call is purely for formatting):

ls -q -U | awk -F . '{print $NF}' | sort | uniq -c | awk '{print $2,$1}'

(assuming GNU ls here for the -U option to skip sorting as an optimisation. It can be safely removed without affecting functionality if not supported).

groxxda
  • 1,028
  • mhmh... later should I filter each extension found for do an action for it? – watchmansky Jul 26 '14 at 19:24
  • It depends on what you want to do in the end. Can you give more information? – groxxda Jul 26 '14 at 19:25
  • My goal: a script that process each extension file (only image file) changing the size from input user data. So, I start from how many jpg files there're, next png, etc. – watchmansky Jul 26 '14 at 19:27
  • steeldrivers solution may be more appropriate then. – groxxda Jul 26 '14 at 19:30
  • 2
    I had both JPG and jpg files, and wanted it recursively so my solution was to write find . -type f | awk -F . '{print tolower($NF)}' | sort | uniq -c | awk '{print $2,":",$1}' – Kristian May 24 '17 at 12:40
  • Replace ls with ls -U for high performance. (ls sorts output by default. By turning off sorting with option -U you'd make execution much faster.) – Andriy Makukha Jan 10 '19 at 16:40
20

This recursively traverses files and counts extensions that match:

$ find . -type f | sed -e 's/.*\.//' | sort | uniq -c | sort -n | grep -Ei '(tiff|bmp|jpeg|jpg|png|gif)$'
   6 tiff
   7 bmp
  26 jpeg
  38 gif
  51 jpg
  54 png
Kit
  • 389
  • 1
    This is exactly what I was looking for, thanks a lot! Needed something that could quickly run on the CLI. – Carlos F Dec 10 '20 at 14:57
14

I'd suggest a different approach, avoiding the possible word-splitting issues of ls

#!/bin/bash

shopt -s nullglob

for ext in jpg png gif; do 
  files=( *."$ext" )
  printf 'number of %s files: %d\n' "$ext" "${#files[@]}"

  # now we can loop over all the files having the current extension
  for f in "${files[@]}"; do
    # anything else you like with these files
    :
  done 

done

You can loop over the files array with any other commands you want to perform on the files of each particular extension.


More portably - or for shells that don't provide arrays explicitly - you could re-use the shell's positional parameter array i.e.

set -- *."$ext"

and then replace ${#files[@]} and ${files[@]} with $# and "$@"

steeldriver
  • 81,074
9
find -type f | sed -e 's/.*\.//' | sort | uniq -c
Neik
  • 99
  • 3
    Don't forget a starting directory with find. Also, it can help future readers of these answers if you give a brief explanation of your solution (in case they would like to modify it for a slightly different case). – Jeff Schaller Oct 22 '15 at 16:04
  • How well does this solution deal with path names containing spaces? Newlines? – dhag Oct 22 '15 at 17:37
  • 1
    find defaults to the current directory, which is how I use this. I don't think God intended filenames to have spaces in them, but this works fine for that case. If you have newlines, then you deserve all you get. I thought about an explanation but decided it would make the answer too long, I think simplicity is what matters. 99% of the cases in 1% of the time. This is probably Version 7 compatible. – Neik Oct 22 '15 at 21:22
3

Anything involving ls is likely to produce unexpected results with special chars (space and other symbols). Any bashism (like arrays) isn't portable. Anything involving while read is usually slow.

On the other hand, find is VERY flexible (lots of options to filter), it has [at least] two syntax which are fail safe for special chars... and It scales well on large directory.

For this example, I have used -iname to match both upper and lower case extension name. I have also restricted the -maxdepth 1 to respect your question's "in current directory". Rather than counting the number of lines, where filenames could include CR/LF, -print0 will print a NULL byte at the end of each filename... so | tr -d -c "\000" | wc -l is accurately counting files (NULL bytes!).

extensions="jpg png gif"
for ext in $extensions; do
  c=$(find . -maxdepth 1 -iname "*.$ext" -print0 | tr -d -c "\000" | wc -c)
  if [ $c -gt 0 ]; then
    echo "Found $c  *.$ext files"

    find . -maxdepth 1 -iname "*.$ext" -print0 | xargs -0 -r -n1 DOSOMETHINGHERE
    # or #  find . -maxdepth 1 -iname "*.$ext" -exec "ls" "-l" "{}" ";"
  fi
done

P.S. -print0 | tr -d -c "\000" | wc -c can be replaced with -printf "\000" | wc -c or even -printf '\n' | wc -l.

2

Maybe it can get shorter

exts=( *.jpg *.png *.gif ); printf "There are ${#exts[@]}" extensions;
1

can just use ls for something this simple IMO

ls -l /opt/ssl/certs/*.pem | wc -l

or

count=$(ls -l /some/folder/*.jpg | wc -l)

or

ls *.{mp3,exe,mp4} 2>/dev/null | wc -l
Mike Q
  • 159
0

Usually this type of task is best solved by breaking it up into chunks (the Unix philosophy). Find the files, strip out all but their extensions, sort alphabetically (to break ties) then by number of occurrences:

find . -type f | egrep -o '\.[^/.]+$' | sort | uniq -c | sort -n

You might like additional flourishes. I removed files which are only extension (like .gitignore), combined results by case (so gif and GIF are both under gif), and stripped out the initial dot:

find . -type f | egrep -v '^\.' | egrep -o '\.[^/.]+$' | tr 'A-Z' 'a-z' | sed -e 's/^\.//' | sort | uniq -c | sort -n

You might instead choose to limit to certain image types

find . -type f -iname '*.jpg' -o -iname '*.jpeg' -o -iname '*.png' -o -iname '*.bmp' -o -iname '*.raw' -o -iname '*.gif' | egrep -o '\.[^.]+$' | uniq -c | sort -n

Hopefully these are both useable of themselves and show how to combine the various utilities into a nice result.

Charles
  • 279
-1

If you are sure of the extension, you can go with find like

find *.jpeg | wc -l
Nithish
  • 11