0

Context

  • OS: Ubuntu 18.04.6 LTS
  • Text editor: GNU nano, 2.9.3

In the BASH script below, I'm trying to create directories using mkdir. The name of each directory is determined by a combination of variable values, which are determined by reading in rows of a specified CSV (see here). All of this is happening in a Linux environment.

Current script

#!/bin/bash
# Read in 4 variable values line by line
while IFS=, read -r m M n gtalpha; do
    # Create a directory based on variable values
    mkdir ./output/QUAC_m${m}_M${M}_n${n}_ga${gtalpha}
    # Dummy follow-up command here, which uses this output folder
 done < subset_denovo.params

Here's the subset_denovo.params file being referenced:

3,8,8,0.01
4,4,4,0.05
5,4,4,0.05
6,2,2,0.05
7,1,1,0.01

Expected output

Folders (within the directory output/) with the names:

QUAC_m3_M8_n8_ga0.01
QUAC_m4_M4_n4_ga0.05
QUAC_m5_M4_n4_ga0.05
QUAC_m6_M2_n2_ga0.05
QUAC_m7_M1_n1_ga0.01

Problem: Actual Output

Instead, the folder names are:

'QUAC_m3_M8_n8_ga0.01'$'\r'
'QUAC_m4_M4_n4_ga0.05'$'\r'
'QUAC_m5_M4_n4_ga0.05'$'\r'
'QUAC_m6_M2_n2_ga0.05'$'\r'
'QUAC_m7_M1_n1_ga0.01'$'\r'

I understand that the '$'\r' is the Linux carriage return, but I don't understand why it's being appended to the folder name, or how to prevent it. Encasing the mkdir argument in quotes (see below) doesn't address the issue:

mkdir "./output/QUAC_m${m}_M${M}_n${n}_ga${gtalpha}" # <-- same result

I've seen this post describing a similar problem with the scp command, but am unable to recreate the solution in this (simpler) context. This post says the cause of the issue is probably an API library, but I don't know what library could cause the problem in this context.

Is this a BASH problem? A nano problem? Any input on preventing this behavior would be appreciated.

  • \r as carriage return has nothing to do with Linux as such, AFAIK it's the carriage return in ASCII already. Your input file has Windows style CR-LF line endings and Linux/Unix tools often interpret the CR as a regular character, so it gets read into the variables and ends up in the directory name. – ilkkachu Nov 15 '21 at 21:30
  • This has nothing to do with mkdir. There's a CR in your file, so mkdir is told to create a directory with CR at the end of the name, and it does what it's told. – Gilles 'SO- stop being evil' Nov 15 '21 at 21:43
  • @ilkkachu thanks for this information. My question becomes: why should my input file have Windows style CR-LF line endings when it's being created on a Linux system, with the GNU nano text editor? And how can I remove the CR from the BASH script? – akoontz11 Nov 15 '21 at 22:29
  • Are you saying your CSV file has never been in a Windows environment? Not downloaded or transferred from another system? – muru Nov 15 '21 at 22:43
  • Windows is not the only possible cause. Although Unix uses LF in files and programs, it uses (converts to) CRLF on tty interfaces including ptys; if this data was captured from something like script or ssh -t or docker -it it gets CRLF. Plus many Internet protocols and mediatypes use CRLF, so if you obtained the data from a remote system (even a remote Unix system) as certain types of application data (e,g, email) rather than a raw file it might get CRLF. – dave_thompson_085 Nov 16 '21 at 01:43
  • @akoontz11, dos2unix file.txt or tr -d '\r' <file.txt >file-new.txt. Those and some other solutions in the answers to the post Gilles linked this to. As for why they came there, well, that's hard to say from far away. Some editors have a setting to choose the newline style, and they might use what the file already had. So it's remotely possible to create a new file based on an old one and get the newline style from there. (Plus the things dave said in their comment.) – ilkkachu Nov 16 '21 at 10:30
  • @ilkkachu thanks for the feedback. I tried both of your suggestions; neither one addressed the issue (folders generated still have CR included in the name).

    I haven't yet tried any of the other solutions in the post Gilles linked--most of them mention that dos2unix should work, which it isn't currently (for me).

    FWIW: calling file on my script (suggested in the duplicate post) generates the output paramOpt_demo.sh: Bourne-Again shell script, ASCII text executable

    – akoontz11 Nov 16 '21 at 15:54
  • @akoontz11, did you check the script or the data file or both? In principle, you could have the CR in either. If it was in the script, the line would be mkdir $var\r, and the CR would come from there. If it was in the data file, you'd have e.g. foo,bar\r, and IFS=, read a b would read bar\r into $b, and subsequently mkdir $b would still have the CR there. Note that it's the same gtalpha that's the last field in read and the very last part on the mkdir line. But I don't think the issue is with the script, since a CR on the do line would give a syntax error. – ilkkachu Nov 16 '21 at 15:59
  • 1
    @ilkkachu thank you--this solved the issue. Trimming the CR from the parameter file being read (tr -d '\r' <subset_denovo.params> subset_denovo.params.new) addressed the problem. Before, I had just been calling it on the script. The source of the issue was the CR in the parameters file the script was reading. – akoontz11 Nov 16 '21 at 17:19

0 Answers0