22

I have 1000 csv files in a directory. I would like to concatenate them all together in order. They are named img_size_1.csv to approx img_size_1000.csv This answer is close but assumes a list file. Can this be done in a one-liner?

codecowboy
  • 3,442

2 Answers2

31

Yes it can, with the unimaginatively named cat command:

$ cat *csv > all.csv

cat does what it says on the bottle, it conCATenates its input and prints to standard output. The command above will give an error if a file called all.csv already exists in the target directory:

$ cat *csv > all.csv
cat: all.csv: input file is output file

You can safely ignore that error, the contents of all.csv will be overwritten. Apparently, on some systems (e.g. OSX according to the comments below this answer), you cannot ignore the error and this will enter a loop, catting all.csv back into itself until you run out of disk space. If so, just delete all.csv, if it exists, before running the command.

terdon
  • 242,166
  • If the command is carried out more that once (so all.csv will exist), one may not wish to concatenate all.csv with the other .csv files. rm all.csv first? – suspectus Feb 05 '14 at 17:58
  • @terdon, thanks. Is there any way to affect the order in which the files are added to be sure that they will be processed in numeric order? Or by date? – codecowboy Feb 05 '14 at 18:02
  • @suspectus no, that is not needed. The > all.csv will truncate all.csv (empty it) before anything else is run (shell comands are run right-to-left). Therefore, all.csv will always be empty and you will not get the repetition you are thinking of. – terdon Feb 05 '14 at 18:02
  • @codecowboy by default, the glob (the *csv) is expanded in alphanumeric order so it should do that already. If not, please [edit] your question to explain exactly what your file names look like. Do you have both file_N.csv and fileN.csv? – terdon Feb 05 '14 at 18:04
  • @terdon thanks for the illuminating explanation. – suspectus Feb 05 '14 at 18:16
  • 4
    interestingly on OS X bash 3.2 the destination file is not overwritten first. If all.csv exists then do cat *.csv > all.csv the operation does not return and continues to add to all.csv until out of disk space. – suspectus Feb 05 '14 at 18:22
  • Thank you @suspectus for pointing this out. I've also encountered this on other platforms. It is definitely not "safe" to ignore "input file is output file". It may not have caused problem for the poster, but it invites a race condition and could fill up your hard drive if you are away making coffee. NOT SAFE! – Owen Mar 16 '17 at 15:18
  • 1
    @Owen fair enough. I've never seen this behavior, so thanks to you and suspectus for letting me know. I edited the answer accordingly. – terdon Mar 16 '17 at 15:24
  • 1
    Ran this cat ./**/*json > all.json and got this error bash: /bin/cat: Argument list too long guess it doesn't like running on millions of files. Any suggestions? – balupton Apr 11 '18 at 16:27
  • Figured it out: https://unix.stackexchange.com/a/437084/50703 – balupton Apr 11 '18 at 16:46
7
ls -1 *.csv | while read fn ; do cat "$fn" >> output.csv.file; done

If you want to concatenate them by alphabetic order :

ls -1 *.csv | sort | while read fn ; do cat "$fn" >> output.csv.file; done

If you want to concatenate them by time creation order :

ls -1t *.csv | while read fn ; do cat "$fn" >> output.csv.file; done
terdon
  • 242,166
Slyx
  • 3,885
  • I fixed the quoting and format issues but this will break on files whose names contain newlines or backslashes and it will re-concatenate everything into the output file every time it is run, so you should make sure that output.csv does not exist before running it. Oh, and the sort in undeeded, ls with no options will already sort files alphabetically. – terdon Feb 05 '14 at 18:36
  • Thanks ! The sort just prevent any ls aliasing. – Slyx Feb 05 '14 at 22:01