-1

I am using the following command to replace the non-ASCII characters, single quotes and non printable characters:

sed -i -e "s/'//g" -e's/'//g' -e's/[\d128-\d255]//g' -e's/\x0//g' filename

However, I am getting an error:

sed: -e expression #3, char 18: Invalid collation character

How can I replace these characters?

MatthewRock
  • 6,986
Azhar
  • 11

1 Answers1

0

Try it this way:

LANG=iso-8859-1 sed -i -e"s/'//g" -e's/'//g' -e's/[\d128-\d255]//g' -e's/\x0//g' 

or you might find this useful (will replace non-printable and single quotes):

sed -i 's/[^[:print:]]//;s/'\''//g;s/'//g' filename
rush
  • 27,403
  • Are you replacing the "(double quotes) in your sed command i. e sed -i 's/"//g statement. But i dont want to replace the double quotes from file – Azhar Jan 21 '16 at 15:44
  • Thanks rush, Both the command works fine but the performance is very slow. I have 12 files and size of each file varies from 1GB to 6GB and I am removing the non printable characters and single quotes from these files and the process is taking too long. approx 2min for each file. can we improve the performance by any chance. – Azhar Jan 21 '16 at 16:29
  • Hello everyone, the command is taking around 20 min to complete for 12 files..please advise if I can improve the performance.Its urgent – Azhar Jan 21 '16 at 16:52
  • I don't think you can easily increase processing time. sed -i copies original file to a new one and then simply replaces the original one. this means the whole process takes approximately the same time as simple copy. 12 files with average size 3GB (from 1 to 6 :)) will make 36GB to copy. So it's fine to take about 20 minutes. – rush Jan 21 '16 at 16:58
  • Hi, When I am running the same script(sed -i -e "s/'//g" -e's/'//g' -e's/[\d128-\d255]//g' -e's/\x0//g' filename) in development environment its running fine and completing in 15 min but where as when I am running the same script in TEST environment its throwing error as sed: -e expression #3, char 18: Invalid collation character but when I use the script LANG=iso-8859-1 sed -i -e"s/'//g" -e's/'//g' -e's/[\d128-\d255]//g' -e's/\x0//g' in TEST environ ment its running fine but taking around 30min. why there is discrepancy? – Azhar Jan 22 '16 at 14:29
  • are sed and operation systems in test and dev envs identical? – rush Jan 22 '16 at 14:33
  • can you please elaborate it more, what exactly need to check. – Azhar Jan 22 '16 at 15:13
  • DEV OS - Linux sed --version GNU sed version 4.1.5

    UAT OS - Linux sed --version GNU sed version 4.2.1

    – Azhar Jan 22 '16 at 15:22
  • Hey Guys,

    I am using the command sed -i 's/\o000//g' filename or sed -i 's/\x0//g' filename to remove the NUL character but the command works fine in DEV environment but doesnt work in UAT. OS and sed version are below DEV OS - Linux, sed --version GNU sed version 4.1.5:
    UAT OS - Linux, sed --version GNU sed version 4.2.1 ..Please advise

    – Azhar Jan 27 '16 at 11:52