1
#!/usr/bin/tcsh
setenv LC_ALL de_DE
rm /home/users0/me/master/me/LookupScripts/Sampa/*
foreach f ( /home/users0/me/master/me/LookupScripts/Tokenized/*.txt )
set g = "`basename $f .txt`"
set h = "`echo $g | tr "tokenized" "sampa"`"
cat $f | ./a4.lookup > Sampa/$h.txt
end

In /home/users0/me/master/me/LookupScripts/Tokenized/ I have some .txt files which are named randomnumber_tokenized.txt. I want to run a script on them and I want to put the output of the script to the folder Sampa/ and I want to keep randombumber_ in the file names but I want to rename the tokenized part to sampa, so that the new files look like randomnumber_sampa.txt.

Strangely though in the end the files are not called randomnumber_sampa.txt but randomnumber_samaaaaaa.txt

I suspect that it is either an issue with tcsh or it is because of the setenv command.

What am I doing wrong?

doc
  • 45
  • 1
  • 2
  • 6
  • Does it output the correct name if you put the following in your loop: echo $f | tr "tokenized" "sampa"? – Julie Pelletier Jul 31 '16 at 04:18
  • 1
    tr translates one character at a time, always to one character, except if you use -d then it translates to nothing. tr tokenized sampa translates t to s, k to m, e to p, and all of d e i n o z to a. To replace a string with another string use sed, in this case sed s/tokenized/sampa/. Also: you say "some" filenames are _tokenized; note this loop will run ./a4.lookup for ALL *.txt files, but it will change the name only for _tokenized and leave the filename unchanged for others. Finally cat $f | x is the same as x <$f or <$f x. – dave_thompson_085 Jul 31 '16 at 09:04
  • @Julie Pelletier No, same effect. I still get samaaaaaa – doc Jul 31 '16 at 13:38
  • @dave_thompson_085 according to what you wrote, shouldn't the files then be named sampaaaaaa? I don't get a p in the name of the output file. – doc Jul 31 '16 at 13:41
  • @doc: Simply use sed as Dave showed you and your problem will be fixed. – Julie Pelletier Jul 31 '16 at 17:10
  • @Julie Yes, thanks, that worked. I'd still like to know why tr gives me such a weird result, because, as I said, I'd expect it to output sampaaaaaa – doc Jul 31 '16 at 18:06
  • The problem comes from the e being repeated and therefore linked to the last character of the replacement characters. – Julie Pelletier Jul 31 '16 at 18:18
  • tr leaves unchanged characters that aren't in the first argument, so your random number and underscore are left unchanged. Characters that are in the first argument, but beyond the length of the second argument, reuse the last character fo the second argument. – dave_thompson_085 Aug 02 '16 at 05:16

1 Answers1

1

Your immediate problem is that tr doesn't do what you think it does. tr performs character replacement, not string replacement. The command tr "tokenized" "sampa" replaces t by s, o by a, k by m, (e by p is overridden by a subsequent occurrence of e), and n, i, z, e and d by a.

To perform a string replacement, you can use sed. But that's somewhat inconvenient, and hard to get right. For simple string manipulation, use the shell's string manipulation constructs instead of external tools.

Tcsh lacks string manipulation constructs. But (t)csh has not been a tier-1 command line shell for the past 20 years or so, and has never been good for scripting. Just don't write csh scripts.

Also:

  • Never set LC_ALL to anything other than C (or its synonym POSIX). LC_ALL overrides all categories and this can cause problems. To set a default for all categories, use LANG. But in scripts, C is usually what you need except for LC_CTYPE (character set) and LC_MESSAGES (messages for the user).
  • String manipulation in sh is done via parameter expansion.
  • Plain sh has no string replacement construct, but bash does.
  • Always use double quotes around variable substitutions
#!/bin/bash
export LC_ALL=C
for f in /home/users0/me/master/me/LookupScripts/Tokenized/*.txt; do
  g="${f##*/}"
  h="${g//tokenized/sampa}"
  ./a4.lookup <"$f" >"Sampa/$h.txt"
end
  • I would not use tcsh if I did not have to. The scripts I have to use only run in tcsh and with setenv LC_ALL de_DE and they are probably older than a decade. Nothing I can do about it. The tr explaination is nice though. Thanks – doc Aug 01 '16 at 01:00