6

I have a 67GB .csv file (I know... I know...)

I need to remove the 4,125,878th line from the file as it is corrupt. My CSV parsing tools won't ignore or skip it.

I could use sed to do something like

sed '4125878d' in.csv > out.csv

But that would be an hugely expensive operation in terms of time and disk space.

If I use sed -i '4125878d' in.csv, the operation takes ages.

Is there a way I can quickly remove a line from the middle of a huge file?

  • How do you expect the computer to know where the 4,125,878th line starts without actually reading the whole file and counting line breaks? If it's a one-time issue, just wait for it to be solved, and if you need to do this regularly, I suggest you try to fix the root cause of file corruption instead. – Dmitry Grigoryev Oct 02 '15 at 14:10
  • See my reply at: http://unix.stackexchange.com/questions/66730/is-there-a-faster-way-to-remove-a-line-given-a-line-number-from-a-file/233509#233509 where I explain how to use ved which is a very fast editor. – schily Oct 02 '15 at 14:27

2 Answers2

2

I believe not.

Even if sed or whatever program you use for that is clever enough to do the change in place and not with a temp file, it will still have to rewrite all the data after the starting of the line you want to delete.

Deleting a line means that you are shifting left all the contents of the file from the point on, to the end of the previous line. No matter what, you will still have to rewrite the file from that point on.

V13
  • 4,749
1

Do it on-the-fly:

csv-parser -f <(cat my-huge.csv | sed '4125878d')
fazie
  • 2,417