2

Thank you very much for reading this. I am very new to bash so I need your advice in the following:

I want to write a bash script that would read a file with 2 columns

f  2
g  1
s  4
d  5
f  2
g  5
d  9
g  10
h  1
s  5
d  29

My script would actually sort this file based on the first column (alphabet) and produce a file called alpha_sorted.txt and then I want it to do the same thing for the numbers and name it numbers_sorted.txt.

I am very new to this so I would like to ask for your help if possible supplying me with documents or links or even helping out with the code.

The script is meant to be for introductory level so complicating the methods is not advised.

Update

Using john1024's answer, I have the following problem:

Hasan@HasanChr /cygdrive/c/users/Hasan/Desktop/Bash
$ chmod +x script.sh

Hasan@HasanChr /cygdrive/c/users/Hasan/Desktop/Bash
$ ./script.sh
cat: alpha_sorted.txt: No such file or directory

Here is a screenshot of script.sh

enter image description here

John1024
  • 74,655
JavaFreak
  • 93
  • 3
  • 3
  • 11
  • Knowing that i am operating on windows * – JavaFreak Jun 12 '16 at 01:15
  • do you really want a pure bash solution, or do you want to use tools crafted for the purpose such as sort? As in: it is possible to write a function doing the sorting in pure bash – but in most cases one would use tools made for the task. The former is far more complex. – Runium Jun 12 '16 at 01:21
  • I would probably want to use the most basic solution as in a script in bash ( pure bash ) to do the trick ... – JavaFreak Jun 12 '16 at 01:25
  • But of course i can still use sort so its fine two versions would be great – JavaFreak Jun 12 '16 at 01:25

2 Answers2

8

Since this is posted this on unix.stackexchange.com, I am going to assume that you have access to the usual unix tools.

Alphabetic sorting on first column:

$ sort file.txt >alpha_sorted.txt
$ cat alpha_sorted.txt
d  29
d  5
d  9
f  2
f  2
g  1
g  10
g  5
h  1
s  4
s  5

Numeric sorting:

$ sort -nk2,2 file.txt >numbers_sorted.txt
$ cat numbers_sorted.txt
g  1
h  1
f  2
f  2
s  4
d  5
g  5
s  5
d  9
g  10
d  29

-n specifies numeric sorting. -k2,2 specifies sorting on the second column.

For more information, see man sort.

Problems editing a Unix script with Notepad

I created a script with DOS line-endings:

$ cat dos.sh
sort file.txt >alpha_sorted.txt
cat alpha_sorted.txt 

Although it is not visible, I added a space at the end of the cat command. With this file, I can reproduce the error that you saw:

$ chmod +x dos.sh
$ dos.sh
cat: alpha_sorted.txt: No such file or directory
: No such file or directory

We can correct this problem with a utility such as dos2unix or tr. Using tr:

$ tr -d '\r' <dos.sh >fixed.sh
$ chmod +x fixed.sh

Now, we can run the command successfully:

$ fixed.sh
d  29
d  5
d  9
f  2
f  2
g  1
g  10
g  5
h  1
s  4
s  5
John1024
  • 74,655
  • John, slight nitpicking, -k2 specifies to start sorting from column 2 to till end, not only on the column 2 (in this case which is only what left but again hey :) ) – heemayl Jun 12 '16 at 03:32
  • @heemayl Very true. I changed it to -k2,2. While the second 2 is superfluous here, you are probably right that it is better practice to be explicit. – John1024 Jun 12 '16 at 05:16
  • I do have a small problem, iif i was to put the commands in a text file and change it to .sh how am i suppose to run it since i have been trying to do so for the past few hours and it is not working am using cyguin – JavaFreak Jun 12 '16 at 13:12
  • @JavaFreak On a linux machine, you would mark the script as executable with chmod +x scriptname.sh, and then run it with ./ scriptname.sh – Alex Stragies Jun 12 '16 at 14:33
  • Check my comment below if possible sir – JavaFreak Jun 12 '16 at 14:53
  • 1
    @JavaFreak Looking at your post, my first guess is that the problem has to do with DOS/Windows line-endings. DOS ends a line with \r\n. Unix ends a line with just \n and treats \r like a valid (but invisible) character. This causes an endless number of subtle problems. Please try converting your script.sh to Unix line endings using a tool like dos2unix. If you use a native Unix editor (nano is good for beginners), then you will avoid this as well as other subtle issues. – John1024 Jun 12 '16 at 17:56
  • @JavaFreak I just transferred the info showing the no-such-file error to your question. As you have found out, StackExchange takes its format seriously. The idea is that the question is to contain a complete question and the answers are to contain only answers. – John1024 Jun 12 '16 at 18:03
  • 1
    One can also reproduce the line-endings problem in Notepad by not providing a newline at the end of the cat alpha_sorted.txt. Empirically bash is happy to infer the newline, and this then also leads to the alpha_sorted.txt\r vs alpha_sorted.txt mismatch that triggers the reported problem – Chris Davies Jun 12 '16 at 19:43
  • @roaima Very true. Since Windows is not going away, I think it is unfortunate that Unix tools do not have an option to ignore \r characters. – John1024 Jun 12 '16 at 19:57
  • 1
    That's an interesting suggestion. An environment variable to define "nonstandard" line endings perhaps, thereby handling not only Windows but also Mac in one go. ALLOW_LINE_ENDINGS='\r\n' handled by libc and recommended being set as an appropriate default at login. Mind you, how long would it be before script writers started intentionally abusing it as a shortcut? – Chris Davies Jun 13 '16 at 07:51
  • Looks like Cygwin's already thinking about this, see https://cygwin.com/cygwin-ug-net/using-textbinary.html, but it doesn't really default to DWIM. Also, thinking harder about the problem, libc wouldn't really know whether a file was text or binary unless binmode() had been used - in which case Cygwin already handles it correctly – Chris Davies Jun 13 '16 at 09:07
  • @roaima That link is very interesting. Thanks. Cygwin knows what the "ideal" behavior would be. And, they know that the actual behavior, for good reasons or otherwise, is quite different. – John1024 Jun 15 '16 at 06:31
3

There are better ways to sort than to do it purely in bash. This is not a good answer to your question -- it's not simple (because it uses several features of bash that aren't common-place), and it doesn't do things "The Unix Way", which is to use tools that are pre-built for doing one thing and doing it well (such as sorting).

I decided to write this Answer up to help make a larger point that your account's default shell is built to run commands and redirect I/O. Just because a shell has a multitude of features, like Bash does doesn't mean it's the best tool for a particular job. You'll very often see answers here that suggest using awk or perl (or jq or sort ...) instead of trying to hack it into a shell-only script.

That being said, bash can sort -- it's just not built-in. I'll repeat myself: it's still not a good idea. But you can do it. Below are four functions, implemented in bash, that sort two different ways on each of the two fields.

The functions use:

The insertion sort is not efficient (O(n)2), but certainly reasonable for small datasets, such as the 11-line example. The four functions ran in sub-second time for the sample data, but for a randomly-generated 1,000 line input file, the "separate array" sorts took ~15 seconds while the "in-place" versions took ~60 seconds because of all of the re-processing of the values. Compare this to the standard sort utility which sorted the 1,000 line file on either column in sub-thousandths-of-a-second time.

The two "inplace" functions attempt to save a few bytes by creating only one array (and some one-off variables for looping and swapping values); on the plus side, it uses a neat bash function to map file contents into arrays. The "keyed" functions throw caution to the wind and create two separate arrays, one for the desired keys to sort on and the other of the actual values.

function sort_inplace_f1 {
  local array
  mapfile -t array < "$1"
  local i j tmp
  for ((i=0; i <= ${#array[@]} - 2; i++))
  do
    for ((j=i + 1; j <= ${#array[@]} - 1; j++))
    do
      local ivalue jvalue
      [[ ${array[i]} =~ ([^[:space:]]+)[[:space:]]+(.*) ]]
      ivalue="${BASH_REMATCH[1]}"
      [[ ${array[j]} =~ ([^[:space:]]+)[[:space:]]+(.*) ]]
      jvalue=${BASH_REMATCH[1]}
      if [[ $ivalue > $jvalue ]]
      then
        tmp=${array[i]}
        array[i]=${array[j]}
        array[j]=$tmp
      fi
    done
  done
  printf "%s\n" "${array[@]}"
}

function sort_inplace_f2 {
  local array
  mapfile -t array < "$1"
  local i j tmp
  for ((i=0; i <= ${#array[@]} - 2; i++))
  do
    for ((j=i + 1; j <= ${#array[@]} - 1; j++))
    do
      local ivalue jvalue
      [[ ${array[i]} =~ ([^[:space:]]+)[[:space:]]+(.*) ]]
      ivalue="${BASH_REMATCH[2]}"
      [[ ${array[j]} =~ ([^[:space:]]+)[[:space:]]+(.*) ]]
      jvalue=${BASH_REMATCH[2]}
      if [[ $ivalue > $jvalue ]]
      then
        tmp=${array[i]}
        array[i]=${array[j]}
        array[j]=$tmp
      fi
    done
  done
  printf "%s\n" "${array[@]}"
}

function sort_keyed_f1 {
  local c1 c2 keys values
  while IFS=' ' read -r c1 c2
  do
    keys+=("$c1")
    values+=("$c1 $c2")
  done < "$1"

  local i j tmpk tmpv
  for ((i=0; i <= ${#keys[@]} - 2; i++))
  do
    for ((j=i + 1; j <= ${#keys[@]} - 1; j++))
    do
      if [[ ${keys[i]} > ${keys[j]} ]]
      then
        # swap keys
        tmpk=${keys[i]}
        keys[i]=${keys[j]}
        keys[j]=$tmpk
        # swap values
        tmpv=${values[i]}
        values[i]=${values[j]}
        values[j]=$tmpv
      fi
    done
  done
  printf "%s\n" "${values[@]}"
}

function sort_keyed_f2 {
  local c1 c2 keys values
  while IFS=' ' read -r c1 c2
  do
    keys+=("$c2")
    values+=("$c1 $c2")
  done < "$1"

  local i j tmpk tmpv
  for ((i=0; i <= ${#keys[@]} - 2; i++))
  do
    for ((j=i + 1; j <= ${#keys[@]} - 1; j++))
    do
      if [[ ${keys[i]} -gt ${keys[j]} ]]
      then
        # swap keys
        tmpk=${keys[i]}
        keys[i]=${keys[j]}
        keys[j]=$tmpk
        # swap values
        tmpv=${values[i]}
        values[i]=${values[j]}
        values[j]=$tmpv
      fi
    done
  done
  printf "%s\n" "${values[@]}"
}

Even after all of that, you still need one of your shell's core "functions", that is -- to redirect the output to a file:

sort_keyed_f1 input-file > alpha_sorted.txt
sort_keyed_f2 input-file > numbers_sorted.txt
Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255