Sort through numbers

Question

I need to sort a list by the ISBN NUMBER (the third column) from an input file to a file.sh and sent to an out file (file.out). The input file (file.input) would have a list

Donald Smith,Fire Lands,97868545414459
Adam Barry,The Armies,97564325678855
Jennifer Lelan,Childhood dreams,97546766544237

using a looping structure to process the data and the titles Author Name of book ISBN.

The result

Author                   Name of Book               ISBN

Jennifer  Lelan    Chilhood Dreams   97546766544237
Adam Barry          The Armies             97564325678855
Donald Smith        Fire Lands              97868545414459

Kusalananda · Answer 1 · 2023-01-05T22:58:21.950

This first part is a new answer added in 2023. The old answer is still available after the divider.

$ mlr --icsv --implicit-csv-header --opprint  label Author,Title,ISBN then sort -n ISBN file.input
Author         Title            ISBN
Jennifer Lelan Childhood dreams 97546766544237
Adam Barry     The Armies       97564325678855
Donald Smith   Fire Lands       97868545414459

This uses Miller (mlr) to read the data as header-less CSV, add labels to the three fields, sort the records numerically on the ISBN field, and output all data in a "pretty printed" tabular format. Since Miller is CSV-aware, this would cope with quoted fields containing embedded commas and newlines, etc.

Use 'Name of Book' (quoted) in place of Title for the longer header, as in the question. Use --otsv instead of --opprint to generate tab-separated value output.

Redirect the command with >file.output to overwrite or create the file file.output.

First of all, you would not loop over this data: Why is using a shell loop to process text considered bad practice?

If the only commas in the file are the commas that delimit the fields, then

sort -t ',' -k3n -o file.output file.input

would sort the data numerically on the number in the third column. The output would be written to file.output.

For the given data, file.output would look like

Jennifer Lelan,Childhood dreams,97546766544237
Adam Barry,The Armies,97564325678855
Donald Smith,Fire Lands,97868545414459

To further process this data, one could consider using an awk program. Since you have not specified what kind of processing you'd like to do, the following just extracts the data into variables (not really necessary) for each line and prints them:

sort -t ',' -k3n file.input |
awk -F ',' '{ author=$1; title=$2; isbn=$3;
              printf("Author: %s\nTitle: %s\nISBN: %s\n",
                     author, title, isbn) }'

Note that there is no need to store the sorted data in an intermediate file in this case.

The output given the data in the question:

Author: Jennifer Lelan
Title: Childhood dreams
ISBN: 97546766544237
Author: Adam Barry
Title: The Armies
ISBN: 97564325678855
Author: Donald Smith
Title: Fire Lands
ISBN: 97868545414459

For getting the data into nice looking columns, and with dashes in the ISBN number, you don't need awk. The following uses sed for the formatting of the ISBN numbers and column to format the columns:

sort -t ',' -k3n file.input |
sed -E -e 's/,([0-9]{3})([0-9]{4})([0-9]{5})/,\1-\2-\3-/' |
column -s ',' -t

The output will be

Jennifer Lelan  Childhood dreams  975-4676-65442-37
Adam Barry      The Armies        975-6432-56788-55
Donald Smith    Fire Lands        978-6854-54144-59

Note that the ISBN numbers look a bit wonky. That's because they are 14 digits long. Real ISBN numbers are either 10 or 13 digits long, and the above code assumes that they are 13 digits (or at least 12 digits).

To add columns headers:

sort -t ',' -k3n file.input |
{ echo 'Author,Name of book,ISBN'
  sed -E -e 's/,([0-9]{3})([0-9]{4})([0-9]{5})/,\1-\2-\3-/'
} |
column -s ',' -t

Which produces

Author          Name of book      ISBN
Jennifer Lelan  Childhood dreams  975-4676-65442-37
Adam Barry      The Armies        975-6432-56788-55
Donald Smith    Fire Lands        978-6854-54144-59

... using no explicit loops in the shell.

score 4 · Answer 2 · answered Jun 13 '18 at 15:58

sort is clearly the best tool for sorting.

If awk is required, you can use GNU awk:

gawk -F, '
    {line[$NF] = $0} 
    END {
        PROCINFO["sorted_in"] = "@ind_num_asc"
        for (isbn in line) print line[isbn]
    }
' file

See https://www.gnu.org/software/gawk/manual/html_node/Controlling-Array-Traversal.html and https://www.gnu.org/software/gawk/manual/html_node/Controlling-Scanning.html

Sort through numbers

2 Answers2