0

I have file abc.sh

search_dir='dummy'
filename='numbers.txt'

for entry in "$search_dir"/*
do
  while read p;
  do 
    sed -i '' "/$p/d" $entry
  done < $filename
done

Trying to delete a line with the matching pattern. Basically, the pattern is just a string which I am passing from the file. But unfortunately, it is not working.

What I am able to debug is, I am not passing the variable in pattern correct way.

EDIT: numbers.txt

2018061300006178
2018061300006179
2018061300006325
2018061300006326
2018061400006505

the content of files that is present in search_dir is :

1888~2018061400006505~0101~1~OWNED~SELF EMPLOYED~~~~3~~AGRICULTURE~~~OTHERS~AGRICULTURIST~~~AGRICULTURE~~~~~~~~N~N~Y~N~N~~300000-500000~~~49582E95361D5FA0C10C4C419B2940591C17E94EF329C31047A6B7DE26E68638
1889~2018061400006505~0101~2~OWNED~SELF EMPLOYED~~~~32~~AGRICULTURE~~~OTHERS~AGRIC

So numbers.txt contains 2018061400006505 and file also contain numbers related data, so I want to delete the line which matches the given numbers.

Kusalananda
  • 333,661
Ankur_009
  • 101
  • @steeldriver I tried double quotes things but it still not working, I have added the example in the edit section of the question. Hope it helps. – Ankur_009 Jun 17 '18 at 17:49
  • @Ankur_009 there is a single quote after -i in sed. is that a typo? – Siva Jun 17 '18 at 18:01
  • @SivaPrasath not the typo, as somewhere i have read that is necessary while using sed cmd. But I have tried without using that too. – Ankur_009 Jun 17 '18 at 18:04
  • @SivaPrasath this is not duplicate, as I have tried solution mentioned there too. – Ankur_009 Jun 17 '18 at 18:07
  • @Ankur_009 i hope its not required, can u try this ...sed -i '/'"$p"'/d' – Siva Jun 17 '18 at 18:08
  • @SivaPrasath that is giving me error "sed: -i may not be used with stdin" – Ankur_009 Jun 17 '18 at 18:09
  • @SivaPrasath, it doesn't matter one bit if the slashes are single-quoted, double-quoted or not-at-all-quoted. The thing with -i is the only issue here, GNU sed wants the backup filename suffix as part of the same argument, so you'd use -i.bak, or just -i for no backup. BSD sed is different here. – ilkkachu Jun 17 '18 at 18:12
  • @ilkkachu can you bit explain. – Ankur_009 Jun 17 '18 at 18:16
  • Related: https://unix.stackexchange.com/questions/92895/how-can-i-achieve-portability-with-sed-i-in-place-editing – Kusalananda Jun 17 '18 at 18:19
  • Note, this question is not actually related to passing variables into sed, it's about the portability of sed -i and/or about working with DOS text files. – Kusalananda Jun 18 '18 at 10:26

1 Answers1

4

As long as the numbers in your example does not contain the delimiter that sed is using (by default /), the $p in your code will be interpreted as a regular expression (with all what that means).

Your code:

search_dir='dummy'
filename='numbers.txt'

for entry in "$search_dir"/*
do
  while read p;
  do 
    sed -i '' "/$p/d" $entry
  done < $filename
done

Here, you want to delete all lines in the files under $search_dir that contains any of the numbers in $filename. Whether this work or not depends on how your sed treats -i ''. With some implementations of sed you would have to use -i without an argument.

Related to sed -i and portability: How can I achieve portability with sed -i (in-place editing)?

It is safer to write the result to a temporary file and then to move that file to the original filename:

for entry in "$search_dir"/*
do
  while read p;
  do 
    sed "/$p/d" "$entry" >"$entry.tmp" && mv "$entry.tmp" "$entry"
  done <"$filename"
done

This ensures that it will work regardless of what sed implementation you happen to be working with. In general, it's a bad idea to try to make in-place changes to files while testing out a script, so you may well want to comment out that mv before you are happy with the way the script otherwise works.

This is still a bit unsafe as a general solution though since you're actually "using data as code" (the numbers are data, and you use them a part of your sed script). This means that you easily could cause a syntax error in the sed script by just inserting a / in one of the numbers in your numbers file.

Since the operation is so simple, we may instead use grep. This also gets rid of the inner while loop:

for entry in "$search_dir"/*
do
  grep -Fv -f "$filename" "$entry" >"$entry.tmp" && mv "$entry.tmp" "$entry"
done

This will cause grep to read its patterns from $filename and to apply these to the $entry file. The -v means we'll discard any line containing the pattern and -F means grep will not interpret the numbers as regular expressions but as fixed strings. With -f "$filename" we get grep to read the strings from $filename.

If there may be directories under $search_dir we would want to skip these:

for entry in "$search_dir"/*
do
  [ ! -f "$entry" ] && continue
  grep -Fv -f "$filename" "$entry" >"$entry.tmp" && mv "$entry.tmp" "$entry"
done

Another, even safer way to do your operation is to use awk. Since with both the sed and grep solutions above, the number is matched anywhere on the line, it is conceivable that we might delete the wrong lines. With awk it's easy to match just the second ~-delimited field in the data:

for entry in "$search_dir"/*; do
    [ ! -f "$entry" ] && continue
    awk -F '~' 'NR==FNR { num[$0]; next } !($2 in num)' "$filename" "$entry" >"$entry.tmp" &&
    mv "$entry.tmp" "$entry"
done

The awk program first populates an associative array/hash with the numbers as keys, and then prints every line from the $entry file whose second ~-delimited column is not a key in that hash.

Kusalananda
  • 333,661