0

I have some .csv log files in 2 sub directories of a top directory and I want to empty all the .csv log files in each directory but retain the header so they can be repopulated by the app creating them.

I can use for file in /path/to/file/*; do > $file;done to empty the files, but the header is also removed!

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255

2 Answers2

1
tmpfile=$( mktemp )

for pathname in /path/to/dir/*.csv; do
    head -n 1 "$pathname" >"$tmpfile"
    cat "$tmpfile >"$pathname"
done

rm "$tmpfile"

That is, extract the header using head -n 1 to a temporary file (assuming it's the first line only), then truncate the original file and insert the header from the temporary file.

If the header is exactly identical in all files:

tmpfile=$( mktemp )
set -- /path/to/dir/*.csv

head -n 1 "$1" >"$tmpfile"

for pathname do
    cat "$tmpfile" >"$pathname"
done

rm "$tmpfile"

This first sets the positional parameters to the list of files that we're interested in, then extracts the header from the first of them. The loop iterates over the positional parameters (the CSV files) and truncates each, inserting the header.

In both examples above, the pattern /path/to/dir/*.csv is assumed to match all affected files. A real world example of an actual pattern may be

/var/log/myprogram/dir1/*.csv /var/log/myprogram/dir2/*.csv

or, if you're using a shell that understands brace expansion:

/var/log/myprogram/{dir1,dir2}/*.csv
Kusalananda
  • 333,661
  • ok i tried this #!/bin/bash

    tmpfile=$( mktemp ) set -- /u01/import/wandl/input.empty/{AORTA,EU,HU}/* head -n4 "$4" >"$tmpfile" for pathname do cat "$tmpfile" >"$pathname" done rm "$tmpfile" which worked on my local redhat machine and i got all the headers but when i run it on my remote server i get the 4 lines but the 4th line has no header info

    – Dwayne Pype Jun 12 '18 at 16:56
  • @DwaynePype It picks the header lines from one of the existing file (the one that happens to sort first). If that file has a broken header, then it will be propagated to the other files. – Kusalananda Jun 12 '18 at 17:33
0

If you have a flavor of sed that provides an --in-place or -i option, you could replace > "$file" by sed -i 4q "$file", where 4 is the number of header lines that you wish to keep. Note that some implementations may require an explicit empty backup file i.e. -i ''.

If the number of files is not too large, then you may be able to avoid looping and simply pass the list of files directly e.g.

sed -si 4q subdir1/*.csv subdir2/*.csv

(the s is probably superfluous at least in GNU sed, since -i implies -s)

or use find

find path/to/dir -name '*.csv' -execdir sed -si 4q {} +

See related How to extract only the header name in a data without listing the data itself

steeldriver
  • 81,074