3

I have a csv file that is 6 gigabytes, but I don't need that much data, I need like 100 rows or so. How can I truncate it?

  • @K7AAY, sorry, I have no idea, that would require me to download the whole thing from s3 and check, which will take a while. – Pavel Orekhov May 29 '19 at 15:40
  • @K7AAY do csv files have '\n' at the end, should i just readline 100 times and write it to another file? – Pavel Orekhov May 29 '19 at 15:42
  • Windows and DOS use carriage return and line feed ("\r\n") as a line ending, which Unix uses just line feed ("\n"). – K7AAY May 29 '19 at 15:44

2 Answers2

8

Depending on what you want you can:

  1. Take the 1st 100 rows as suggested by @K7AAY.

    head -n100 filename.csv > file100.csv  
    
  2. Take the last 100 rows

    tail -n100 filename.csv > file100.csv  
    
  3. Take a random selection of 100 rows. This requires you have the GNU shuf program installed. It should be installable from your distribution's repositories if you're on Linux.

    shuf -n100 filename.csv > file100.csv  
    

    Alternatively, if your sort supports the -R (random sort) option, you can do:

    sort -R filename.csv | head -n100 > file100.csv 
    
terdon
  • 242,166
2

Use head to display only the first 100 lines and direct them to a new file. Please substitute the current file name for filename.csv:

head -n100 filename.csv > file100.csv  
K7AAY
  • 3,816
  • 4
  • 23
  • 39