How to split single column to multiple column in CSV file

Question

I have below text in csv file and need to place FILE and TIMESTAMP into separate columns to a csv file. Could you please let me know how i can achieve this.

FILE, TIMESTAMP
/u01/app/xxcus/12.0.0/mds/cvs/oracle/apps/xxcus/receipt/server/XxReceipt.java, Thu 28 Jun 2018 02:49:45 AM EDT
/u01/app/xxcus/12.0.0/mds/cvs/oracle/apps/xxcus/receipt/webui/XxReceiptCreateCO.java, Thu 28 Jun 2018 09:00:43 AM EDT
/u01/app/xxcus/12.0.0/mds/cvs/oracle/apps/xxcus/receipt/webui/XxOlympusReceiptPG.xml, Thu 28 Jun 2018 05:16:46 AM EDT
/u01/app/xxcus/12.0.0/reports/US/XX_POXRCPPV.rdf, Thu 28 Jun 2018 12:31:29 PM EDT
/u01/app/xxcus/12.0.0/reports/US/XX_POXRCIPS.rdf, Thu 28 Jun 2018 12:31:40 PM EDT

Note: I have tried column command but it is not helping.

What you actually want to do is very unclear. You should give example(s) of what would be a correct output for a given input. — Mathieu CAROFF, Jul 05 '18 at 20:42
@MathieuCAROFF i need the output in tabular format so full path of the file comes under File column and time under Timestamp column. currently both are under single column in csv file. — Abraham Dev Prasad, Jul 06 '18 at 13:58
@AbrahamDevPrasad The file listed looks like it is in CSV format (where CSV is comma separated). Do you want the output in a different format, such as tab-separated (TSV)? (If so, consider using a csv2tsv converter.) It would help if you clarify what you want to do with the result. The column command is used to pretty-print for reading, but is less useful for subsequent processing by other tools. — JonDeg, Jul 07 '18 at 04:50
Hi @MathieuCAROFF your above example is already set with FILE and TIMESTAMP into separate columns of a csv file. — aborruso, Jan 06 '19 at 17:59

score 1 · Accepted Answer · answered Jul 07 '18 at 13:49

The `sed` way

If you want to replace comma-spaces (,␣) by tabulations in your file, you can pipe it's content through sed. Here is an example

$ echo '/apps/XxReceipt.java, Thu 28 Jun 2018 02:49:45 AM EDT' | sed 's:, :\t:g'
/apps/XxReceipt.java    Thu 28 Jun 2018 02:49:45 AM EDT

Explanation:

The simple quotes around s:, :\t:g tell the shell to give the string as is, as a single argument, to sed.
For sed, s in first position means substitution
: is the pattern / replacement delimiter
,␣ is the pattern to match
\t is the pattern replacement -- an escape sequence for a tabulation
g (global) tells sed to replace every match of the line, not just the first.

If you need to match more complex patterns with sed, you can use the -E switch, so that patterns are interpreted as regular expressions. You can chain multiple sed expressions if you prefix each with -e.

If the csv data is in a file, here is how to pipe it through sed:

cat my-data.csv | sed 's:, :\t:g' | tee my-data.tsv

or

cat my-data.csv | sed 's:, :\t:g' > my-data.tsv

The Sed way worked for me as i cannot install additional software.. thanks @Mathieu — Abraham Dev Prasad, Apr 16 '19 at 20:31

Kusalananda · Answer 2 · 2018-07-17T11:58:11.723

The way I'm reading this question is that you'd like to create the CSV formatted output given a list of files. For another interpretation, see the end of this answer.

Here's a shell script that will do this. It uses the Linux version of stat to get the timestamp of last modification.

#!/bin/sh

echo "PATHNAME,TIMESTAMP"
stat -c '"%n",%y' "$@"

After outputting a header, this script simply calls stat with the pathnames mentioned on the command line to get the timestamp of last modification (see the manual for stat on your system to figure out how to change this). It prints the pathname (quoted) and the timestamp.

You would use this as

sh script.sh PATTERN >outputfile

For example:

$ sh script.sh *.log* *.tar >file.cvs

$ cat file.cvs
PATHNAME,TIMESTAMP
"dsmerror.log",2018-07-17 13:00:02.911711652 +0200
"dsminstr.log",2018-07-17 13:00:04.079726608 +0200
"dsminstr.log.bak",2018-05-13 18:00:03.231791181 +0200
"dsminstr.log.lock",2018-07-17 13:00:04.079726608 +0200
"archive_20170823-old.tar",2017-08-22 16:44:23.037803149 +0200
"archive_20170823.tar",2017-08-23 09:35:28.956158119 +0200
"archive_20180409.tar",2018-04-09 09:47:29.472374428 +0200
"archive-chr22.tar",2018-06-19 14:50:45.896447161 +0200
"gene_cache.tar",2018-04-25 09:44:15.518486626 +0200

Since the script is so short, its commands may be written directly on the command line. The equivalent command line for the example above would be

$ { echo "PATHNAME.TIMESTAMP"; stat -c '"%n",%y' *.log* *.tar; } >file.cvs

Now when we have this file, we may want to format it nicely for reporting purposes:

$ column -s, -t file.csv
PATHNAME                    TIMESTAMP
"dsmerror.log"              2018-07-17 13:00:02.911711652 +0200
"dsminstr.log"              2018-07-17 13:00:04.079726608 +0200
"dsminstr.log.bak"          2018-05-13 18:00:03.231791181 +0200
"dsminstr.log.lock"         2018-07-17 13:00:04.079726608 +0200
"archive_20170823-old.tar"  2017-08-22 16:44:23.037803149 +0200
"archive_20170823.tar"      2017-08-23 09:35:28.956158119 +0200
"archive_20180409.tar"      2018-04-09 09:47:29.472374428 +0200
"archive-chr22.tar"         2018-06-19 14:50:45.896447161 +0200
"gene_cache.tar"            2018-04-25 09:44:15.518486626 +0200

This works unless any of the pathnames contains a comma.

To properly format this with a CSV parser, which would also cope with pathnames containing commas:

$ csvlook file.csv
| PATHNAME                 | TIMESTAMP                           |
| ------------------------ | ----------------------------------- |
| dsmerror.log             | 2018-07-17 13:00:02.911711652 +0200 |
| dsminstr.log             | 2018-07-17 13:00:04.079726608 +0200 |
| dsminstr.log.bak         | 2018-05-13 18:00:03.231791181 +0200 |
| dsminstr.log.lock        | 2018-07-17 13:00:04.079726608 +0200 |
| archive_20170823-old.tar | 2017-08-22 16:44:23.037803149 +0200 |
| archive_20170823.tar     | 2017-08-23 09:35:28.956158119 +0200 |
| archive_20180409.tar     | 2018-04-09 09:47:29.472374428 +0200 |
| archive-chr22.tar        | 2018-06-19 14:50:45.896447161 +0200 |
| gene_cache.tar           | 2018-04-25 09:44:15.518486626 +0200 |

csvlook is part of csvkit, a Python toolkit for working with CSV files.

How to split single column to multiple column in CSV file

2 Answers2

The `sed` way

Linked

How to split single column to multiple column in CSV file

2 Answers2

The sed way

Linked

The `sed` way