0

What I have

Hi, imagine a lot of this files, where first column is epoch, and the other(s) column(s) are some data:

1000333,34,1
1001456,56,0
1005356,34,2

What I need

I need to transform them into this:

0,34,1
1123,56,0
5023,34,2

The above first column numbers come from:

 1000333 - 1000333 =    0
 1001456 - 1000333 = 1123
 1005356 - 1000333 = 5023

Context

Those files are into several folders into a big folder called logs_swapoff, they end with _times.csv (there are another csvs in those folders that must not be touched).

Examples of files:

logs_swapoff/folder1/modifyMe_times.csv
logs_swapoff/folder1/dontTouchMe_cores.csv
logs_swapoff/folder2/modifyMeToo_times.csv

I am planing to use this loop in bash, but I don't know how to do the task itself.

for filename in $(find logs_swapoff/* -name '*_times.csv') ; do
    # filename without extension (to write the output with a similar name?)
    fname=$(dirname $filename`"/"`basename -s .csv $filename);

    ?????

done;

Thanks guys :)

1 Answers1

1

Enumerating the files

Parsing the output of find is fragile. Better make find invoke the transformation program. To generate the output file name, a simple parameter expansion is enough to change the suffix _times.csv into _subtracted.csv (for example).

find logs_swapoff -name '*_times.csv' -exec sh -c '
  <"$1" awk "$0" >"${1%_times.csv}_subtracted.csv"
' '…' {} \;

The '…' is the awk code to run. I put it outside the shell snippet to simplify the quoting.

Transforming each file

You need to process a file line by line, and on each line do a simple text transformation involving some arithmetic. That makes awk an ideal tool for the job. The only difficulty with your sample output is that you seem to want to align to the smallest width; that can't be done without first reading the whole file to determine the maximum width. If you're content with a few extra spaces, you can process the file line by line.

awk '
    NR==1 {start = $1}
    {n = $1 - start; sub(/^ *[0-9]+/, ""); printf "%6d", n; print}
'

Explanation: on the first line, set the start variable to the first number. Then, on every line, subtract the value of start from the first number, and strip the first number. Print out the result of the subtraction (padded to 6 characters with spaces) and the rest of the line.

This code assumes that there's always space after the first number. If this isn't the case, you can make a more precise match.

awk '
    NR==1 {match(/[0-9]+/); start = substr(RSTART, RLENGTH)}
    match(/[0-9]+/) {n = substr(RSTART, RLENGTH) - start; sub(/ *[0-9]+/, ""); printf "%6d", n; print}
'

If the fields are comma-separated and there are no spaces to worry about, declare the comma as a field separator. Then you can simply replace the first field by an updated value.

awk -F, '
    NR==1 {start = $1}
    {$1 = $1 - start; print}
'

So putting it all together (comma version):

find logs_swapoff -name '*_times.csv' -exec sh -c '
  <"$1" awk -F, "$0" >"${1%_times.csv}_subtracted.csv"
' '
    NR==1 {start = $1}
    {$1 = $1 - start; print}
' {} \;
  • Hi, thanks for your answer, I DO NOT actually need the alignment, it was just for better reading :) – onlycparra Feb 11 '16 at 01:48
  • Sorry sorry sorry, I made a mess with the spaces and alignments, the files doesn't have them actually, it was only an aesthetic issue, let me correct them. I'm a newbie in bash and Perl regular expressions turns me crazy, if you can, please simplify the answer amusing there are no spaces or padding. – onlycparra Feb 11 '16 at 01:58
  • @onlycparra See my edit – Gilles 'SO- stop being evil' Feb 11 '16 at 12:45