I have a DE_CopyOldToNew.sh
file that was created in Windows. The file is then uploaded to Linux using WinSCP. The file contains a whole bunch of cp commands that copies files to a new folder with a new filename being assigned. The commands contain folders and files with diacritics like Gewährleistungsbürgschaft
.
When I do a cat DE_CopyOldToNew.sh
I noticed that the diacritics are displayed in a "corrupted" way like Gew▒hrleistungsb▒rgschaft
. When I do a view DE_CopyOldToNew.sh
then the diacritics are displayed as they should be, like Gewährleistungsbürgschaft
. When I execute my script I am getting cp: cannot stat
errors and the diacritics in the folders and files are displayed as Gew\344hrleistungsb\374rgschaft
.
I have uploaded the file using binary
as well as text
and I have also performed a dos2unix DE_CopyOldToNew.sh
.
When I copy the content of my script in Windows and paste it into a new file in Linux then I am able to run the new script without issues.
What is causing the uploaded version to be "corrupted" (for a lack of a better word)?
Asked
Active
Viewed 58 times
1

Rico Strydom
- 135
1 Answers
3
Your file is written in one of the ISO-8859 encodings (probably Windows CP1252 or ISO-8859-15), whereas your Linux-based system is set up to expect a UTF-8 encoding.
You can verify this easily enough:
# Original text
printf 'Gew\344hrleistungsb\374rgschaft\n'
Gew�hrleistungsb�rgschaft
What character set
printf 'Gew\344hrleistungsb\374rgschaft\n' | file -
/dev/stdin: ISO-8859 text
Transcoded text
printf 'Gew\344hrleistungsb\374rgschaft\n' | iconv -f iso-8859-15 -t utf-8
Gewährleistungsbürgschaft
What character set
printf 'Gew\344hrleistungsb\374rgschaft\n' | iconv -f iso-8859-15 -t utf-8 | file -
/dev/stdin: UTF-8 Unicode text
Solutions?
Create your file as UTF-8 on the source system (Windows applications support this character set)
Downgrade your Linux-based system back to ISO-8859. Not recommended (but possible)
Convert the file once it's been transferred:
iconv -f iso-8859-15 -t utf-8 DE_CopyOldToNew.sh >DE_CopyOldToNew.sh.tmp && mv -f DE_CopyOldToNew.sh.tmp DE_CopyOldToNew.sh

Chris Davies
- 116,213
- 16
- 160
- 287
-
Thanks for the response @Chris Davies. I have recreated my file in Windows in UTF-8 and uploaded it again.
$ file DE_CopyOldToNew.sh DE_CopyOldToNew.sh: UTF-8 Unicode (with BOM) text, with very long lines, with CR line terminators
When doing acat DE_CopyOldToNew.sh
the copy commands are now all starting on a new line. When doing aview DE_CopyOldToNew.sh
the commands are all wrapped and delimited with a^M
like<copy command 1>^M<copy command 2>
. When executing the script I am getting./DE_CopyOldToNew.sh: line 1: cp: command not found
– Rico Strydom Mar 11 '24 at 06:45 -
Your file is still in the wrong format. Can you create it on the target system instead? So much easier. If not then How can I remove the BOM from a UTF-8 file? and txt File from Mac not converting properly – Chris Davies Mar 11 '24 at 07:47
-
In my VBA code I have removed the
UTFStream.LineSeparator = 10
line of code and in Linux I performed ados2unix DE_CopyOldToNew.sh
. This solved the problem. – Rico Strydom Mar 11 '24 at 09:09 -
If you put the line separator code back in, it'll probably work without needing
dos2unix
– Chris Davies Mar 11 '24 at 10:50
iconv
to convert charsets.file
might display the charset (but it's a guess). – frostschutz Mar 07 '24 at 09:39