Reorder Pipe separated fields with modifications

Question

I have a Linux text file around 8k lines in the following format

|f_name:x|l_name:x|address:x x|city:x|state:x|zip:x|country:x|ordernumber:x|code:x|downloaded:x|exp:09/2017|ip:x.x.x.x|

And I would like to sort it to be in the following format:

ordernumber:x,exp:09/2017,code:x,f_name:x,l_name:x,address:x x,city:x,state:x,zip:x,country:x,ip:x.x.x.x

NOTE

Some data in the text having issue with the field of |address:x x|

it can be like |address:x x | which means there's a space before the ending |; I would like to remove the space or spaces in the output.

And there's data having issue with the field of |exp:09/2017|; there's data displayed like |exp:9/2017| so I would like to add 0 in case the month is a single number so it would be 09/2017 in the output.

Note that the year can be different.

Example:

|f_name:x|l_name:x|address:x x |city:x|state:x|zip:x|country:x|ordernumber:x|code:x|downloaded:x|exp:9/2017|ip:x.x.x.x|

Expected Output:

ordernumber:x,exp:09/2017,code:x,f_name:x,l_name:x,address:x x,city:x,state:x,zip:x,country:x,ip:x.x.x.x**

Show the code that you have tried so far. You will get much better answers if you show that you have made an effort and you are not just asking for free consulting. — John1024, Dec 20 '17 at 03:09
@John1024 to be honest, i were only trying to handle that through notepad++ and I would like to learn how to handle it through Linux, on the other side i keep read about awk and sed. even so am trying now. — αԋɱҽԃ αмєяιcαη, Dec 20 '17 at 03:11
Hint: start with awk -v OFS=, -F'|' '{print $9, $12 ...}' Can you fill in the ellipsis? — Wildcard, Dec 20 '17 at 03:31

igal · Answer 1 · 2017-12-20T04:50:55.637

Summary

I wrote a an Awk script, a Python script, and a Bash script, each of which should solve your problem. They all produce identical output.

Here is the example data (taken from your question and put in the file data.csv):

|f_name:x|l_name:x|address:x x|city:x|state:x|zip:x|country:x|ordernumber:x|code:x|downloaded:x|exp:09/2017|ip:x.x.x.x|

And here is the output on running the scripts:

ordernumber:x,exp:09/2017,code:x,f_name:x,l_name:x,address:x x,city:x,state:x,zip:x,country:x,ip:x.x.x.x

Awk

Here is an awk script:

#!/usr/bin/env awk
# transformcsv.awk

# Set the input field-separator (FS) and the output field-separator (OFS)
BEGIN{
    FS="|";
    OFS=",";
}

# Skip empty lines
/^\s*$/ {next;}

# Print lines with the fields reordered as desired
{
   print $9,$12,$10,$2,$3,$4,$5,$6,$7,$8,$13
}

And here is how you would run it:

awk -f transformcsv.awk data.csv

You could also run this as a one-liner:

awk 'BEGIN{FS="|";OFS=",";}/^\s*$/ {next;}{print $9,$12,$10,$2,$3,$4,$5,$6,$7,$8,$13}' data.csv

Python

Here is the Python script:

#!/usr/bin/env python
# -*- coding: ascii -*-
"""transformcsv.py"""

import sys
import csv

# Make a list with the field names in their input order
# NOTE: We padding colums because each row begins
#       and ends with the delimiter `|`
fieldnames = (
    "padding_1",
    "f_name", "l_name", "address", "city", "state", "zip",
    "country", "ordernumber", "code", "downloaded", "exp", "ip",
    "padding_2"
)

# Make a list with the field names in their output order
reordered_fieldnames = (
    "ordernumber", "exp", "code", "f_name", "l_name",
    "address", "city", "state", "zip", "country", "ip"
)

# Read each input row and print out the reordered row
with open(sys.argv[1]) as csvfile:
    reader = csv.DictReader(csvfile, fieldnames=fieldnames, delimiter='|')
    for row in reader:
        print(','.join([row[field] for field in reordered_fieldnames]))

Here is how you would run the script:

python transformcsv.py data.csv

Bash

NOTE: This will probably be very slow on large files. You probably shouldn't use this - I only included it for fun.

Here is the Bash shell-script:

#!/usr/bin/env bash
# transformcsv.sh

while read LINE; do
    if [[ -n "${LINE}" ]]; then

    # Extract the field values
    f_name="$(echo "${LINE}" | cut -d'|' -f2)"
    l_name="$(echo "${LINE}" | cut -d'|' -f3)"
    address="$(echo "${LINE}" | cut -d'|' -f4)"
    city="$(echo "${LINE}" | cut -d'|' -f5)"
    state="$(echo "${LINE}" | cut -d'|' -f6)"
    zip="$(echo "${LINE}" | cut -d'|' -f7)"
    country="$(echo "${LINE}" | cut -d'|' -f8)"
    ordernumber="$(echo "${LINE}" | cut -d'|' -f9)"
    code="$(echo "${LINE}" | cut -d'|' -f10)"
    downloaded="$(echo "${LINE}" | cut -d'|' -f11)"
    exp="$(echo "${LINE}" | cut -d'|' -f12)"
    ip="$(echo "${LINE}" | cut -d'|' -f13)"

    # Output the reordered row
    printf \
        "%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s\n" \
        "${ordernumber}" "${exp}" "${code}" "${f_name}" "${l_name}" \
        "${address}" "${city}" "${state}" "${zip}" "${country}" "${ip}"

    fi
done < "$1"

And here is how you would run it:

bash transformcsv.sh data.csv

Oh dear, the shell variant is going to be slow, you're forking subshells and invoking cut a dozen times for each input line! You could use while IFS='|' read -r a b c d ... to do it all in the shell, which would probably be usable (though the shell is a bit slow compared to e.g. awk in any case). There's some discussion of that here: Why is using a shell loop to process text considered bad practice? — ilkkachu, Dec 20 '17 at 04:35
Besides, you shouldn't use printf "$somevar,$othervar\n" but printf "%s,%s\n" "$somevar" "$othervar" instead. Just consider what happens if some of the variables contains a %. And this goes for any language actually, not just the shell. — ilkkachu, Dec 20 '17 at 04:38
@ilkkachu Thank you for the comment. The bash script was just for fun. I added a warning and reordered the scripts to put it last. — igal, Dec 20 '17 at 04:51
@igal good morning igal, thank you very much but still just 2 points which i already explained in my question. in case if the exp:9/2017 so i would like to add 0 if the month is single number so i need it to be exp:09/2017 — αԋɱҽԃ αмєяιcαη, Dec 20 '17 at 08:34
@igal and the second point is, in case if the address for example |address:102 test st | so i would like to remove the end space so the output will be address:102 test st which means no space in the end. — αԋɱҽԃ αмєяιcαη, Dec 20 '17 at 08:36

RomanPerekhrest · Accepted Answer · 2017-12-20T11:39:10.683

2

GNU awk solution:

awk '{ 
         split($12, a, /[/:]/); 
         if (length(a[2]) == 1) $12=sprintf("%s:%02d/%d", a[1], a[2], a[3]);
         sub(/ *$/, "", $4);
         print $9, $12, $10, $2, $3, $4, $5, $6, $7, $8, $13 
     }' FS='|' OFS=',' file

The output:

ordernumber:x,exp:09/2017,code:x,f_name:x,l_name:x,address:x x,city:x,state:x,zip:x,country:x,ip:x.x.x.x

edited Dec 20 '17 at 11:39

answered Dec 20 '17 at 11:33

RomanPerekhrest

30,212

Thanks for king of awk,sed,regex :D:D:D:D – αԋɱҽԃ αмєяιcαη Dec 20 '17 at 11:40
@αԋɱҽԃαмєяιcαη, you're welcome. Glad to help – RomanPerekhrest Dec 20 '17 at 11:40
could you please explain for me this part sub(/ *$/, "", $4); this mean to remove any spaces in end of $4 about the address ? are I'm correct ? if yes, which part on sub mean remove any spaces in end of $4 – αԋɱҽԃ αмєяιcαη Dec 20 '17 at 12:19
@αԋɱҽԃαмєяιcαη, yes, this pattern / *$/ matches space(s) in unlimited manner * at the end of the string $ (where the input string is the 4th field $4) – RomanPerekhrest Dec 20 '17 at 12:45
was trying with sub(/$* *$/, "", $0); which mean to remove all space(s) before and after string $ for all records. but seems something wrong :( what's wrong with my edit ? AWK WIKI Say A special case, $0, refers to the entire record – αԋɱҽԃ αмєяιcαη Dec 20 '17 at 13:27
@αԋɱҽԃαмєяιcαη, this /*$ *$/ is incorrect pattern. Use gsub(/^ *| *$/,"") – RomanPerekhrest Dec 20 '17 at 13:28
i edited code with gsub(/^ *| *$/,""); and tested it on same example but it's will remove everything and print ,,,,,,,,,, but if i changed the | with : it's will print the line correctly but space(s) still there. – αԋɱҽԃ αмєяιcαη Dec 20 '17 at 14:15
so i will use this on the output sed 's/\s*,\s*/,/g' to not tired you more – αԋɱҽԃ αмєяιcαη Dec 20 '17 at 14:17

Reorder Pipe separated fields with modifications

2 Answers2

Summary

Awk

Python

Bash