Get numbers matching a pattern from the output of ls?

Question

I have a folder, and when I execute ls in it, it outputs

t-1-myFirstTest.c
myFile.c
t-42-my_second_test.c
t-3-test1234.c
  .
  .
  .
mySecondFile.c
t-21-tset241.c

I want to delete everything of this text except the newlines and the numbers between t- and the second -. So the output of this previous one should be

I have a solution, but I think it is really bad. If the folder we are talking about is actually in the current directory, then I used

ls | grep -o -E t-[0-9]+-[a-zA-Z0-9_]+.c | grep -o -E t-[0-9]+ | grep -o -E [0-9]+

Any better way to accomplish the same thing?

score 3 · Accepted Answer · edited Jul 17 '18 at 12:27

Parsing the output of ls is a bad idea (the output of ls is strictly for looking at). For more info on that, see the question "Why *not* parse `ls`?".

This is how you may do it under /bin/sh:

for filename in t-*-*.c; do
    [ ! -f "$filename" ] && continue
    number=${filename#t-}   # remove "t-" from start of filename
    number=${number%%-*}    # remove everything from first "-" in what remains
    printf '%s\n' "$number"
done

This would iterate over all filenames in the current directory whose name matches the pattern t-*-*.c. For each of these names, the t- bit is stripped off from the start, and then the second - and everything afterwards is removed with another parameter expansion.

The expansion ${variable#word} would remove the (shortest) match for word from the start of $variable, while ${variable%%word} would remove the (longest) match for word from the end of the string.

With bash, using regular expression matching on the filenames:

for filename in t-*-*.c; do
    [ ! -f "$filename" ] && continue
    if [[ "$filename" =~ ^t-([0-9]+)- ]]; then
        printf '%s\n' "${BASH_REMATCH[1]}"
    fi
done

This would match and capture the digits after t- in each filename. The captured group of digits is available in ${BASH_REMATCH[1]} after a successful match. The index 1 refers to the first capture group (parenthesis) in the regular expression.

For a slow, but possibly comfortable (as in "familiar") solution, you may want to call an external command to parse out the bit of the string that you're interested in:

for filename in t-*-*.c; do
    [ ! -f "$filename" ] && continue
    cut -d '-' -f 2 <<<"$filename"
done

This assumes bash and that you're ok with calling cut in a loop. This would be much slower than using operation built into the shell itself. The cut command here is asked to return the second --delimited field from the string passed to it from bash (using a "here-string" redirection).

YoMismo · Answer 2 · 2018-07-17T10:56:15.197

3

According to your output:

ls|awk -F"-" '{print $2}'

Should work, but if you want to take into account the t- part then

ls|grep ^t-|awk -F"-" '{print $2}'

or

ls|awk -F"t-" '{print $2}'|awk -F"-" '{print $1}'

edited Jul 17 '18 at 10:56

answered Jul 17 '18 at 10:48

YoMismo

4,015

I would add a condition to this to make it a bit better - no blank lines for files that don't match: `ls | awk -F- 'NF == 3 { print $2 }'. 'NF >= 3' might also be appropriate, but the question isn't real clear on that... – twalberg Jul 17 '18 at 17:09

score 3 · Answer 3 · answered Jul 17 '18 at 12:26

When I created the list of files from your example, my ls sorts them this way:

$ ls -1
myFile.c
mySecondFile.c
t-1-myFirstTest.c
t-21-tset241.c
t-3-test1234.c
t-42-my_second_test.c

As a result, the bash function below outputs the newlines and numbers for the files in the same order.

I want to delete everything of this text except the newlines and the numbers between t- and the second -

I interpreted this to mean that filenames that do not match t- should be "deleted except for the newline", meaning: output a blank line for those filenames, but otherwise output the numbers between the dashes.

lsnums ()
{
    for f in *
    do
        if [[ "$f" =~ t-([[:digit:]]+)- ]]; then
            printf '%s\n' "${BASH_REMATCH[1]}"
        else
            echo
        fi
    done
}

The resulting output is:

$ lsnums


1
21
3
42

... where the two blank lines correspond to the first type files beginning with my instead of t-.

score 1 · Answer 4 · answered Jul 17 '18 at 13:50

1

It can be simply done with :

ls | cut -d '-' -f 2

answered Jul 17 '18 at 13:50

paulplusx

111

Get numbers matching a pattern from the output of ls?

4 Answers4