1

I retrieved all the pdfs in $HOME directory

$ find -E ~ -regex ".*/[^/].*.pdf"

It print more than 1000 files;
I intent to sort them by size and searched

$ stat -f '%z' draft.sh
184

I drafts the script:

#! /usr/local/bin/bash

OLD_IFS=IFS 
IFS=$'\n'

touch sorted_pdf.md

for file in $(find -E ~ -regex ".*/[^/].*.pdf")
do
    file_size=$(stat -c "%s" $file)
    ....

done > sorted_pdf.md

IFS=OLD_IFS

It's hard to work them together and get my result. Could you please provide any hint?

I refactored the code

#! /bin/zsh
OLD_IFS=IFS 
IFS=$'\n'

touch sorted_pdf.md

for file in $(find -E ~ -regex ".*/[^/].*.pdf")
do
    # file_size=$(stat -c "%s" $file)
    printf '%s\n' $file(DoL)

done > sorted_pdf.md

IFS=OLD_IFS

but get error report

$ ./sort_files.sh

./sort_files.sh: line 12: syntax error near unexpected token `('
./sort_files.sh: line 12: `    printf '%s\n' $file(DoL)'
Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
Wizard
  • 2,503
  • That looks like BSD find for -E and BSD stat for -f %z, but then the stat -c %s indicates GNU stat, what system is it? Do you have to use bash or can you use other shells like zsh? – Stéphane Chazelas Oct 26 '18 at 14:48
  • @StephenKitt, here there's an extra requirement that the file names end in .pdf. So it's different in that we need to sort files found by find (or other ways to find files by name). – Stéphane Chazelas Oct 26 '18 at 14:53
  • ty, bsd on macos, but the refering question does not answer my question. @StéphaneChazelas – Wizard Oct 26 '18 at 14:55
  • no clue about BSD/macos, that is why I don't write an answer, but won't something like find ... -printf '%s %P\n' | sort -n work? – pLumo Oct 26 '18 at 15:07
  • syntax error near unexpected token '(' is a bash message, but in any case, $file(DoL) wouldn't make sense in zsh either. That's meant to be a glob qualifier to sort the glob expansion, so doesn't make sense when applied to a single file. – Stéphane Chazelas Oct 26 '18 at 15:08
  • @Stéphane the second answer on the duplicate I’d linked showed how to sort find’s output by size (using -printf on GNU find). – Stephen Kitt Oct 26 '18 at 15:09
  • @StephenKitt, printf is GNU specific. The OP is on macOS. – Stéphane Chazelas Oct 26 '18 at 15:09
  • @Stéphane which is also addressed alongside the aforementioned answer. (And when I closed the question, the macOS requirement wasn’t apparent — I agree as it stands currently, the question is better left open with your answer.) – Stephen Kitt Oct 26 '18 at 15:12
  • printf resides on BSD @StéphaneChazelas – Wizard Oct 26 '18 at 15:13
  • @riderdragon, sorry I meant the -printf preficate of find is GNU-specific. The printf utility itself is standard. – Stéphane Chazelas Oct 26 '18 at 18:04

2 Answers2

2

To sort by size, you can use zsh's glob qualifiers (zsh is installed by default on macOS, it even used to be sh there):

#! /bin/zsh -
printf '%s\n' **/*.pdf(DoL)
  • **/ is recurse globbing
  • (DoL) is a glob qualifier, D to include dot files (hidden files) as find would, oL to sort the generated list by file Length.

Note that -regex ".*/[^/].*.pdf doesn't make much sense.

That matches for instance on /home/foo/pdf , .* on /home, then /, then [^/] on f then .* on oo, then . on / and then pdf.

With -regex, with or without -E, you can use -regex '.*\.pdf' to match on *.pdf files, but you might as well use the standard -name '*.pdf'.

You could use:

find . -name '*.pdf' -exec stat -f '%z %N' {} + |
  sort -n |
  cut -d ' ' -f 2-

But that wouldn't work if there were file paths with newline characters.

With GNU utilities, you could do:

find . -name '*.pdf' -printf '%s %p\0' |
  sort -nz |
  cut -zd ' ' -f 2- |
  tr '\0' '\n'

Note that if any of those pdf files are symlinks, it's the size of the symlink that is considered, not the size of the target of the symlink. To sort on the size of that target, change DoL to D-oL or add the -L options to stat. And with GNU find:

find -L . \( ! -xtype l -o -prune \) -name '*.pdf' -printf '%s %p\0' |
  sort -nz |
  cut -zd ' ' -f 2- |
  tr '\0' '\n'

For case-insensitive matching, either replace pdf with [pP][dD][fF] or replace -name with -iname (not standard but supported by GNU and BSD find), or for zsh, enabled the extendedglob option and change pdf to (#i)pdf or enable the nocaseglob option.

1

If you have access to GNU find, awk:

$ find $HOME -iname "*.pdf" -printf '%s\0%p\n' | sort -h -t '\0' | awk -F '\0' '{print $2}'

This command:

  • finds all files in $HOME having (case insensitive) pdf extension and prints size and path for each one;
  • sorts the list by the first field using the -h option that enables human readable number comparison;
  • prints the sorted paths.
fra-san
  • 10,205
  • 2
  • 22
  • 43