2

I have a Linux text file around 8k lines in the following format

|f_name:x|l_name:x|address:x x|city:x|state:x|zip:x|country:x|ordernumber:x|code:x|downloaded:x|exp:09/2017|ip:x.x.x.x|

And I would like to sort it to be in the following format:

ordernumber:x,exp:09/2017,code:x,f_name:x,l_name:x,address:x x,city:x,state:x,zip:x,country:x,ip:x.x.x.x

NOTE

Some data in the text having issue with the field of |address:x x|

it can be like |address:x x | which means there's a space before the ending |; I would like to remove the space or spaces in the output.

And there's data having issue with the field of |exp:09/2017|; there's data displayed like |exp:9/2017| so I would like to add 0 in case the month is a single number so it would be 09/2017 in the output.

Note that the year can be different.

Example:

|f_name:x|l_name:x|address:x x |city:x|state:x|zip:x|country:x|ordernumber:x|code:x|downloaded:x|exp:9/2017|ip:x.x.x.x|

Expected Output:

ordernumber:x,exp:09/2017,code:x,f_name:x,l_name:x,address:x x,city:x,state:x,zip:x,country:x,ip:x.x.x.x**
Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255

2 Answers2

2

Summary

I wrote a an Awk script, a Python script, and a Bash script, each of which should solve your problem. They all produce identical output.

Here is the example data (taken from your question and put in the file data.csv):

|f_name:x|l_name:x|address:x x|city:x|state:x|zip:x|country:x|ordernumber:x|code:x|downloaded:x|exp:09/2017|ip:x.x.x.x|

And here is the output on running the scripts:

ordernumber:x,exp:09/2017,code:x,f_name:x,l_name:x,address:x x,city:x,state:x,zip:x,country:x,ip:x.x.x.x

Awk

Here is an awk script:

#!/usr/bin/env awk
# transformcsv.awk

# Set the input field-separator (FS) and the output field-separator (OFS)
BEGIN{
    FS="|";
    OFS=",";
}

# Skip empty lines
/^\s*$/ {next;}

# Print lines with the fields reordered as desired
{
   print $9,$12,$10,$2,$3,$4,$5,$6,$7,$8,$13
}

And here is how you would run it:

awk -f transformcsv.awk data.csv

You could also run this as a one-liner:

awk 'BEGIN{FS="|";OFS=",";}/^\s*$/ {next;}{print $9,$12,$10,$2,$3,$4,$5,$6,$7,$8,$13}' data.csv

Python

Here is the Python script:

#!/usr/bin/env python
# -*- coding: ascii -*-
"""transformcsv.py"""

import sys
import csv

# Make a list with the field names in their input order
# NOTE: We padding colums because each row begins
#       and ends with the delimiter `|`
fieldnames = (
    "padding_1",
    "f_name", "l_name", "address", "city", "state", "zip",
    "country", "ordernumber", "code", "downloaded", "exp", "ip",
    "padding_2"
)

# Make a list with the field names in their output order
reordered_fieldnames = (
    "ordernumber", "exp", "code", "f_name", "l_name",
    "address", "city", "state", "zip", "country", "ip"
)

# Read each input row and print out the reordered row
with open(sys.argv[1]) as csvfile:
    reader = csv.DictReader(csvfile, fieldnames=fieldnames, delimiter='|')
    for row in reader:
        print(','.join([row[field] for field in reordered_fieldnames]))

Here is how you would run the script:

python transformcsv.py data.csv

Bash

NOTE: This will probably be very slow on large files. You probably shouldn't use this - I only included it for fun.

Here is the Bash shell-script:

#!/usr/bin/env bash
# transformcsv.sh

while read LINE; do
    if [[ -n "${LINE}" ]]; then

    # Extract the field values
    f_name="$(echo "${LINE}" | cut -d'|' -f2)"
    l_name="$(echo "${LINE}" | cut -d'|' -f3)"
    address="$(echo "${LINE}" | cut -d'|' -f4)"
    city="$(echo "${LINE}" | cut -d'|' -f5)"
    state="$(echo "${LINE}" | cut -d'|' -f6)"
    zip="$(echo "${LINE}" | cut -d'|' -f7)"
    country="$(echo "${LINE}" | cut -d'|' -f8)"
    ordernumber="$(echo "${LINE}" | cut -d'|' -f9)"
    code="$(echo "${LINE}" | cut -d'|' -f10)"
    downloaded="$(echo "${LINE}" | cut -d'|' -f11)"
    exp="$(echo "${LINE}" | cut -d'|' -f12)"
    ip="$(echo "${LINE}" | cut -d'|' -f13)"

    # Output the reordered row
    printf \
        "%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s\n" \
        "${ordernumber}" "${exp}" "${code}" "${f_name}" "${l_name}" \
        "${address}" "${city}" "${state}" "${zip}" "${country}" "${ip}"

    fi
done < "$1"

And here is how you would run it:

bash transformcsv.sh data.csv
igal
  • 9,886
  • 4
    Oh dear, the shell variant is going to be slow, you're forking subshells and invoking cut a dozen times for each input line! You could use while IFS='|' read -r a b c d ... to do it all in the shell, which would probably be usable (though the shell is a bit slow compared to e.g. awk in any case). There's some discussion of that here: Why is using a shell loop to process text considered bad practice? – ilkkachu Dec 20 '17 at 04:35
  • 2
    Besides, you shouldn't use printf "$somevar,$othervar\n" but printf "%s,%s\n" "$somevar" "$othervar" instead. Just consider what happens if some of the variables contains a %. And this goes for any language actually, not just the shell. – ilkkachu Dec 20 '17 at 04:38
  • @ilkkachu Thank you for the comment. The bash script was just for fun. I added a warning and reordered the scripts to put it last. – igal Dec 20 '17 at 04:51
  • @igal good morning igal, thank you very much but still just 2 points which i already explained in my question. in case if the exp:9/2017 so i would like to add 0 if the month is single number so i need it to be exp:09/2017 – αԋɱҽԃ αмєяιcαη Dec 20 '17 at 08:34
  • @igal and the second point is, in case if the address for example |address:102 test st | so i would like to remove the end space so the output will be address:102 test st which means no space in the end. – αԋɱҽԃ αмєяιcαη Dec 20 '17 at 08:36
2

GNU awk solution:

awk '{ 
         split($12, a, /[/:]/); 
         if (length(a[2]) == 1) $12=sprintf("%s:%02d/%d", a[1], a[2], a[3]);
         sub(/ *$/, "", $4);
         print $9, $12, $10, $2, $3, $4, $5, $6, $7, $8, $13 
     }' FS='|' OFS=',' file

The output:

ordernumber:x,exp:09/2017,code:x,f_name:x,l_name:x,address:x x,city:x,state:x,zip:x,country:x,ip:x.x.x.x