1

Hello GNU/Linux newbie here.

I want to write two variables in a two-columns tab-separated file. In my code, the variables are $sample_name and $file.

I use the commands:

  • touch to create the file and
  • echo -e $sample_name $file | column -t >> $output_file to write each line. Although this results in an one-column file.

Any ideas?

Simplified script:

touch $output_file
for file in $path/*.g.vcf; do
        sample_name=`echo $file | grep -P 'HG(\d+)(?=.g)' -o`
        echo -e $sample_name $file | column -t >> $output_file
done

Expected output (viewing the output file):

HG00321        ./.../HG00321/HG00321.g.vcf
HG00322        ./.../HG00322/HG00322.g.vcf
# and so on
  • Consider using shellcheck utility or checking your scripts on https://www.shellcheck.net - it will show common mistakes with explanations. – Vilinkameni Feb 11 '22 at 09:58

2 Answers2

3

You don't need to use column -t (in fact, that's going to expand your tabs with spaces so that the columns align correctly no matter the widths). Just use printf. And remember to double-quote your variables. e.g.

for file in "$path/"*.g.vcf; do
  sample_name=$(echo "$file" | grep -P 'HG(\d+)(?=.g)' -o)
  printf "%s\t%s\n" "$sample_name" "$file" >> "$output_file"
done

BTW, there's no need to touch the file to create it. >> redirection will create a file if it doesn't already exist.

Also, you can use <<< instead of echo with the grep line. e.g.

  sample_name=$(grep -oP 'HG(\d+)(?=.g)' <<< "$file")

This redirects the contents (value) of variable $file into the grep command. There's not really any significant benefit, either way (unless the variable contains value(s) that change echo's behaviour, such as -n, -e, -E, or some backslash-escaped chars like \n, \t, \0nnn, \xHH, etc - see help echo in bash. BTW, this is why printf is recommended over echo these days), but you may find it easier to read.

cas
  • 78,579
1

It looks like what you want to do is something like

for pathname in "$dirpath"/*.g.vcf; do
    printf '%s\t%s\n' "$(basename "$pathname" .g.vcf)" "$pathname"
done >"$output_file"

This loops over the pathnames that match your globbing pattern. For each pathname, the filename portion of the pathname is extracted using basename (which can also remove the know filename suffix .g.vcf), and it is printed along with the full pathname.

The output of the loop is redirected to the output file, which the shell will create if it does not already exist or truncate (emptied) if it does exist.

I changed the name of the path variable that you used as that name collides with a special (array) variable by the same name in the zsh shell. I also added double quotes around all expansions to make sure we can handle all possible filenames. If expansions are left unquoted, you'd have issues with filenames containing spaces or globbing characters.

See also:


Without using the basename utility, use parameter substitutions to trim off the unwanted bits of the pathnames. This code does not use any external utilities:

for pathname in "$dirpath"/*.g.vcf; do
    name=${pathname##*/}
    printf '%s\t%s\n' "${name%.g.vcf}" "$pathname"
done >"$output_file"
Kusalananda
  • 333,661