4

So, I've a simple nested loop where the outer for loops over all the files in a directory and the inner for loops over all characters of these filenames.

#!/bin/bash

if [ $# -lt 1 ] then echo "Please provide an argument" exit fi

for file in ls $1 do for ch in $file do echo $ch done done

The script above doesn't work. The inner loop doesn't loop over all the characters in the filename but instead loops over the entire thing.

UPDATE:

Based on @ilkkachu's answer I was able to come up with the following script and it works as expected. But I was curious can we not use the for...in loop to iterate over strings?

#!/bin/bash

if [ $# -lt 1 ] then echo "Please provide an argument" exit fi

for file in ls $1; do for ((i=0; i<${#file}; i++)); do printf "%q\n" "${file:i:1}" done done

  • 3
    Never use ls to gather a list of files for scripted usage! https://unix.stackexchange.com/questions/128985/why-not-parse-ls-and-what-to-do-instead – Marcus Müller Jul 04 '21 at 15:57
  • @MarcusMüller I am new to this, could you please explain why this is bad and what should I do instead? – Som Shekhar Mukherjee Jul 04 '21 at 15:58
  • 1
    I added a link to my comment that explains that. – Marcus Müller Jul 04 '21 at 15:58
  • 1
    I would change "character" to "byte" in the title, since "character" isn't well defined without an encoding, and has multiple definitions if you're dealing with Unicode. – l0b0 Jul 05 '21 at 00:53
  • To add to Marcus Müller's comment, if you are entering the filename as an argument, why do you use ls anyway? – Wastrel Jul 05 '21 at 16:35
  • @MarcusMüller The answers to that question have some great stuff. The question is exceedingly long and confusing (I think the OP was editing it to argue with the answers?) – Ben Jul 06 '21 at 01:19
  • @Wastrel I don't wish to loop over the directory provided as an argument, but I want to loop over all the files inside that directory. – Som Shekhar Mukherjee Jul 06 '21 at 06:44
  • @SomShekharMukherjee again, don't use ls for that; that's bad. for file in "$1"/* just works. – Marcus Müller Jul 06 '21 at 11:16
  • If you have a new question, please ask it separately. And seriously, don't use ls. It isn't needed here and it just makes your script less likely to work. ilkkachu gave you a version without ls, so use that! For more details on why parsing ls is bad, see: https://mywiki.wooledge.org/ParsingLs – terdon Jul 06 '21 at 12:56

6 Answers6

9

Since you're using Bash:

#!/bin/bash
word=foobar
for ((i=0; i < ${#word}; i++)); do
   printf "char: %q\n" "${word:i:1}" 
done

${var:p:k} gives k characters of var starting at position p, ${#var} is the length of the contents of var. printf %q prints the output in an unambiguous format, so e.g. a newline shows as $'\n'.

ilkkachu
  • 138,973
  • +1, You i miss the $ in the word expansion – DanieleGrassini Jul 04 '21 at 16:17
  • @roaima Oh! Amanzig! God to know – DanieleGrassini Jul 04 '21 at 16:20
  • 4
    @DanieleGrassini, the inside of a for (( )) and the index and offset in ${var:p:k} are arithmetic contexts, and plain variable names without the $ work there. Or you could do stuff like ${word:i+1:1}. The ${#word} needs the ${} though, no way around that. – ilkkachu Jul 04 '21 at 16:20
  • 1
    @DanieleGrassini more details at https://github.com/koalaman/shellcheck/wiki/SC2004 – glenn jackman Jul 04 '21 at 17:17
  • +1, This is a really succinct and beginner friendly answer, I could achieve what I was looking for. But I had a question can we not use the for..in loop to iterate over strings? – Som Shekhar Mukherjee Jul 06 '21 at 06:39
  • 1
    @SomShekharMukherjee, no, you'd need to be able to split the string to individual elements somehow, and the shell can't do splitting between each and every character. for x in $var would word-split $var on the characters specified in IFS, but an empty IFS means no splitting, not splitting everywhere (unlike in Perl, where split "", "abcd" would split into characters). Also, since there are no real types, it's hard to tell a single-element list from a single word. Unlike in Python where for i in "abcd": is different from for i in ["abcd"]:. – ilkkachu Jul 06 '21 at 12:11
8

When the strings become larger than a few hundred characters (yes, unlikely for filenames), using a for-loop over the string length and extracting the character at index i becomes very slow.

This answer uses advanced bash techniques:

while IFS= read -r -d "" -n 1 char; do
    # do something with char, like adding it to an array
    chars+=( "$char" )
done < <(printf '%s' "$string")

inspect the array

declare -p chars

That uses a Process Substitution to redirect the string into the while-read loop. I'm using printf to avoid adding a newline onto the end of the string. The main advantage of using a process substitution instead of printf ... | while read ... is the loop executes in the current shell, not a subshell.

I once got curious about the magnitude of the slowness and benchmarked it.

glenn jackman
  • 85,964
  • heh, funny, I wonder if it copies the whole string around with ${var:p:k}. Ksh isn't any better there. – ilkkachu Jul 04 '21 at 19:25
  • That ${#string} is surprisingly slow makes me think bash is walking a linked list. But I haven't looked into the code at all. – glenn jackman Jul 05 '21 at 13:45
  • not just ${#var}, but the substring expansion too. For a 9999 char string, I got about 1.5 s for for ((i=0; i < ${#word}; i++)); do : ${word:i:1} ; done, 1.2 s with ${#word} replaced with a constant var, and 0.08 s with the substring expansion removed in addition. – ilkkachu Jul 05 '21 at 15:44
5

For completeness, even though the question is tagged , an alternative that uses POSIX shell features only:

#!/bin/sh
for fname
do
  while [ "${#fname}" -ge 1 ]
  do
    rest=${fname#?} char=${fname%"$rest"} fname=$rest
    printf '%s\n' "$char"       # Do something with the current character
  done
done

What the inner loop does:

  • set rest to the value of fname minus its first character;

  • assign the single character obtained by removing rest from the end of fname to char;

  • set fname to the value of rest and repeat until all characters are processed.

Note the quotes in ${fname%"$rest"}, needed to prevent $rest's expansion from being used as a pattern.

As an aside, for file in `ls $1` should be avoided. The most obvious reason is that it breaks if a file name contains any character that happens to be in IFS. More on this at Bash Pitfall n. 1, including what you should do instead.

fra-san
  • 10,205
  • 2
  • 22
  • 43
4
#!/bin/sh

for name do
    printf 'name="%s"\n' "$name"

    printf '%s\n' "$name" | fold -w 1 |
    while IFS= read -r character; do
        printf 'character="%s"\n' "$character"
    done
done

The outer loop here just loops over the arguments given to the script. Each argument is printed as is, and then passed through fold -w 1, which creates a stream of single characters separated by newline characters. This stream is then read by the inner loop, which prints each character in turn.

Testing:

$ sh script *
name="script"
character="s"
character="c"
character="r"
character="i"
character="p"
character="t"
$ sh script /*bin*
name="/bin"
character="/"
character="b"
character="i"
character="n"
name="/sbin"
character="/"
character="s"
character="b"
character="i"
character="n"

By changing the printf that prints the full pathnames into fold to basename "$name", you get only the filename portion of the pathnames in the inner loop:

$ sh script /sbin/l*
name="/sbin/ldattach"
character="l"
character="d"
character="a"
character="t"
character="t"
character="a"
character="c"
character="h"
name="/sbin/ldconfig"
character="l"
character="d"
character="c"
character="o"
character="n"
character="f"
character="i"
character="g"
Kusalananda
  • 333,661
1

Use the bash string slicing operator:

s="string"
for c in $(seq 0 $((${#s}-1)));  do echo "${s:c:1}"; done

s t r i n g

Applied to your script can be:

#!/bin/bash
for f in *; do
    echo "Char in $f:"
    for i in $(seq 0 $((${#f}-1))); do
        echo "${f:i:1}"
    done
done
0
  1. Loop over each filename in the current directory “*" (can be easily customised for specific needs “~/*.jpg” etc…) and populate string variable “ff” with each character composing the current filename in the loop (“$ff" can also be understood as an array of single characters with ${ff:10:1} being the single 10th character in the ff “string array”). Nota: if we wanted 2 characters rather from the 10th index we would write ${ff:10:2}
  2. echo “➜ ${ff}” simply to check current filename in the loop
  3. Loop over the “string array ${ff}” to echo each of its index values, from first (bash variable starts at index 0) to last (#ff gives total size of the array) - this is done by incrementing c by 1 (c++), from initial value of 0, and stop when it reaches #ff value
  4. echo a blank space before shifting to the next filename
for ff in *; do 
  echo -e "--> $ff" ; 
  for ((c=0;c<${#ff};c++)); do echo -e "${ff:$c:1}"; done ; 
  echo -e ""; 
done

Steps 2) and 4) (“echo” statements) are obviously for cosmetic purposes only and may be skipped.

  • Apart from the outer loop, wouldn't that be the same as the accepted answer? – AdminBee Jul 06 '21 at 10:49
  • I guess you are not wrong, my bad! Bar aforementioned "outer loop” - as well as my irrepressible bias for the humble, if basic (sic!) ”echo” command,vs. its more sophisticated, if less readable “printf” cousin. – docgyneco69 Jul 15 '21 at 21:36