1

I have a DE_CopyOldToNew.sh file that was created in Windows. The file is then uploaded to Linux using WinSCP. The file contains a whole bunch of cp commands that copies files to a new folder with a new filename being assigned. The commands contain folders and files with diacritics like Gewährleistungsbürgschaft. When I do a cat DE_CopyOldToNew.sh I noticed that the diacritics are displayed in a "corrupted" way like Gew▒hrleistungsb▒rgschaft. When I do a view DE_CopyOldToNew.sh then the diacritics are displayed as they should be, like Gewährleistungsbürgschaft. When I execute my script I am getting cp: cannot stat errors and the diacritics in the folders and files are displayed as Gew\344hrleistungsb\374rgschaft. I have uploaded the file using binary as well as text and I have also performed a dos2unix DE_CopyOldToNew.sh. When I copy the content of my script in Windows and paste it into a new file in Linux then I am able to run the new script without issues. What is causing the uploaded version to be "corrupted" (for a lack of a better word)?

  • Which editor are you using? It's probably saving as latin1 / iso8859 / cp1252 instead of utf8. When saving as UTF-8, if there is an option to save with or without Byte-Order-Mark, pick UTF-8 without BOM. You can also use iconv to convert charsets. file might display the charset (but it's a guess). – frostschutz Mar 07 '24 at 09:39
  • I am creating the file with VBA using ADO Stream Writer. – Rico Strydom Mar 11 '24 at 09:11

1 Answers1

3

Your file is written in one of the ISO-8859 encodings (probably Windows CP1252 or ISO-8859-15), whereas your Linux-based system is set up to expect a UTF-8 encoding.

You can verify this easily enough:

# Original text
printf 'Gew\344hrleistungsb\374rgschaft\n'
Gew�hrleistungsb�rgschaft

What character set

printf 'Gew\344hrleistungsb\374rgschaft\n' | file - /dev/stdin: ISO-8859 text

Transcoded text

printf 'Gew\344hrleistungsb\374rgschaft\n' | iconv -f iso-8859-15 -t utf-8 Gewährleistungsbürgschaft

What character set

printf 'Gew\344hrleistungsb\374rgschaft\n' | iconv -f iso-8859-15 -t utf-8 | file - /dev/stdin: UTF-8 Unicode text

Solutions?

  1. Create your file as UTF-8 on the source system (Windows applications support this character set)

  2. Downgrade your Linux-based system back to ISO-8859. Not recommended (but possible)

  3. Convert the file once it's been transferred:

    iconv -f iso-8859-15 -t utf-8 DE_CopyOldToNew.sh >DE_CopyOldToNew.sh.tmp &&
        mv -f DE_CopyOldToNew.sh.tmp DE_CopyOldToNew.sh
    
Chris Davies
  • 116,213
  • 16
  • 160
  • 287
  • Thanks for the response @Chris Davies. I have recreated my file in Windows in UTF-8 and uploaded it again. $ file DE_CopyOldToNew.sh DE_CopyOldToNew.sh: UTF-8 Unicode (with BOM) text, with very long lines, with CR line terminators When doing a cat DE_CopyOldToNew.sh the copy commands are now all starting on a new line. When doing a view DE_CopyOldToNew.sh the commands are all wrapped and delimited with a ^M like <copy command 1>^M<copy command 2>. When executing the script I am getting ./DE_CopyOldToNew.sh: line 1: cp: command not found – Rico Strydom Mar 11 '24 at 06:45
  • Your file is still in the wrong format. Can you create it on the target system instead? So much easier. If not then How can I remove the BOM from a UTF-8 file? and txt File from Mac not converting properly – Chris Davies Mar 11 '24 at 07:47
  • In my VBA code I have removed the UTFStream.LineSeparator = 10 line of code and in Linux I performed a dos2unix DE_CopyOldToNew.sh. This solved the problem. – Rico Strydom Mar 11 '24 at 09:09
  • If you put the line separator code back in, it'll probably work without needing dos2unix – Chris Davies Mar 11 '24 at 10:50