0

I had 5000 text files in a directory, each text file name starts with a prefix OG00* (ex - OG0017774.log)

The .log files which contains asterisk (*) inside the file, need to be copied into a new directory.

The content of the file -

cat OG0017774.log

M0~b904dbe442e0eb658d229076cacb9ef6 M1~9edeedcb1f4f315c4689bacd8075222f 0.000035** M0~b904dbe442e0eb658d229076cacb9ef6 M2~aeba83b564ee32e0ef1a8321c8d930f4 0.000671** M0~b904dbe442e0eb658d229076cacb9ef6 M3~006a376da2fba16185ce424bf4cba983 0.000055** M0~b904dbe442e0eb658d229076cacb9ef6 M4~e564dbfbbbe8d1f7d9d8c8e4da202943 0.000015** M0~b904dbe442e0eb658d229076cacb9ef6 M5~2abe603e8fee2fcb08b7fb818957e0aa 0.000006**

Suggestions appreciated.

I tried this code, it copies all the files in the current directory to a new directory.

I want to copy those files which had * inside the each text file.

#!/bin/bash
KEYWORD_PATTERN='*'
find . -type f |
while read FNAME
do
    if grep -Ew -q "$KEYWORD_PATTERN" $FNAME
    then
        KEYWORD=$(grep -Ew -o "$KEYWORD_PATTERN" $FNAME)
        cp -r $FNAME keywords/$KEYWORD
    fi

done

  • What is your question? Your script does not do anything since all it does is output the mv command. Do you get error messages too? Do you also want to give the files some special name in the keywords directory? If so, what should happen if you have name collisions? How should the new name be chosen? What's the reasoning behind using find if you know all files match OG00*? – Kusalananda Aug 01 '22 at 20:54
  • I update the post. If a text file contains * inside it, I'd like to copy them in to a new directory. – sunnykevin Aug 01 '22 at 20:58
  • * by itself is an invalid extended regex, and even though I tried to figure it out, I'm not even sure what grep -Ew '*' does on my Debian. With -o, it doesn't print anything anyway. So that might be a problem. I'm also not exactly sure what you expect to get with the $(grep -Ew -o "$KEYWORD_PATTERN" $FNAME), as if you're trying to match against a literal *, the matched string would always be just *. And if $KEYWORD is * then you really better start quoting the rest of those variable expansions. – ilkkachu Aug 01 '22 at 21:25
  • see https://unix.stackexchange.com/questions/131766/why-does-my-shell-script-choke-on-whitespace-or-other-special-characters and https://unix.stackexchange.com/questions/68694/when-is-double-quoting-necessary – ilkkachu Aug 01 '22 at 21:26
  • You mean any .log file that contains even a single * on any line shall be copied ? – Paul_Pedant Aug 01 '22 at 22:44
  • Matching special characters is frequently difficult. I would count the * using tr (which is ignorant of patterns) and wc: tr -cd '*' < $"{fname}" | wc -c. Then just test for non-zero. Also note that uppercase variable names are likely to clash with those in your environment. – Paul_Pedant Aug 01 '22 at 22:51
  • Yes you're right. – sunnykevin Aug 01 '22 at 22:54

2 Answers2

1

What about something like this?

for i in OG00*; do 
    if grep -q -F '*' "$i"; then 
        mv "$i" ../keywords/
    fi
done
r_31415
  • 516
1

With GNU tools:

grep -rFlZ --include='OG00*' '*' . |
  xargs -r0 cp -t ../keywords

grep searches for the Fixed string * inside the current directory (.) recursively, in file whose names starts with OG00 and lists the files with at least one match Zero delimited; xargs takes that output, and splits it on 0s to pass as arguments to cp.

The POSIX equivalent would be:

find . -name 'OG00*' -type f -exec grep -qF '*' {} ';' \
   -exec sh -c 'exec cp "$@" ../keywords' sh {} +

Though that means running one grep per file so would be significantly less efficient.

To match a * with grep, the options are:

  • grep -F '*' fixed string match the easiest and the one you want to use if you only need to search for fixed strings.
  • grep '*' in basic regular expressions, * at the beginning of the pattern matches a literal *. grep 'a*' would match any sequence of 0 or more as though, and you'd need grep 'a\*' or grep 'a[*]' to match a literal a*.
  • grep -E '\*' / grep -E '[*]'. With extended regexps, a * at the start of the pattern is an error, and that * needs to be escaped. Same goes for grep -P or grep -X where supported.

You may also want to read:

About some of the common mistakes in your code.