This first part is a new answer added in 2023. The old answer is still available after the divider.
$ mlr --icsv --implicit-csv-header --opprint label Author,Title,ISBN then sort -n ISBN file.input
Author Title ISBN
Jennifer Lelan Childhood dreams 97546766544237
Adam Barry The Armies 97564325678855
Donald Smith Fire Lands 97868545414459
This uses Miller (mlr
) to read the data as header-less CSV, add labels to the three fields, sort the records numerically on the ISBN
field, and output all data in a "pretty printed" tabular format. Since Miller is CSV-aware, this would cope with quoted fields containing embedded commas and newlines, etc.
Use 'Name of Book'
(quoted) in place of Title
for the longer header, as in the question. Use --otsv
instead of --opprint
to generate tab-separated value output.
Redirect the command with >file.output
to overwrite or create the file file.output
.
First of all, you would not loop over this data: Why is using a shell loop to process text considered bad practice?
If the only commas in the file are the commas that delimit the fields, then
sort -t ',' -k3n -o file.output file.input
would sort the data numerically on the number in the third column. The output would be written to file.output
.
For the given data, file.output
would look like
Jennifer Lelan,Childhood dreams,97546766544237
Adam Barry,The Armies,97564325678855
Donald Smith,Fire Lands,97868545414459
To further process this data, one could consider using an awk
program. Since you have not specified what kind of processing you'd like to do, the following just extracts the data into variables (not really necessary) for each line and prints them:
sort -t ',' -k3n file.input |
awk -F ',' '{ author=$1; title=$2; isbn=$3;
printf("Author: %s\nTitle: %s\nISBN: %s\n",
author, title, isbn) }'
Note that there is no need to store the sorted data in an intermediate file in this case.
The output given the data in the question:
Author: Jennifer Lelan
Title: Childhood dreams
ISBN: 97546766544237
Author: Adam Barry
Title: The Armies
ISBN: 97564325678855
Author: Donald Smith
Title: Fire Lands
ISBN: 97868545414459
For getting the data into nice looking columns, and with dashes in the ISBN number, you don't need awk
. The following uses sed
for the formatting of the ISBN numbers and column
to format the columns:
sort -t ',' -k3n file.input |
sed -E -e 's/,([0-9]{3})([0-9]{4})([0-9]{5})/,\1-\2-\3-/' |
column -s ',' -t
The output will be
Jennifer Lelan Childhood dreams 975-4676-65442-37
Adam Barry The Armies 975-6432-56788-55
Donald Smith Fire Lands 978-6854-54144-59
Note that the ISBN numbers look a bit wonky. That's because they are 14 digits long. Real ISBN numbers are either 10 or 13 digits long, and the above code assumes that they are 13 digits (or at least 12 digits).
To add columns headers:
sort -t ',' -k3n file.input |
{ echo 'Author,Name of book,ISBN'
sed -E -e 's/,([0-9]{3})([0-9]{4})([0-9]{5})/,\1-\2-\3-/'
} |
column -s ',' -t
Which produces
Author Name of book ISBN
Jennifer Lelan Childhood dreams 975-4676-65442-37
Adam Barry The Armies 975-6432-56788-55
Donald Smith Fire Lands 978-6854-54144-59
... using no explicit loops in the shell.