2

I want to remove the first $rmv lines of a huge text file called $filename. This text file is so big that I cannot fit two copies of it on my hard drive.

The following leaves me with a blank file called $filename:

tail -n +$rmv "$filename" > "$filename" 

The following cannot execute because I do not have the storage space to fit both $filename and $filename.tmp:

tail -n +$rmv "$filename" > "$filename.tmp" && mv "$filename.tmp" "$filename"

If it matters, I'm using Mac OS X El Capitan.

4 Answers4

3

If you have perl:

{
  tail -n +"$rmi"
  perl -e 'truncate STDOUT, tell STDOUT'
} <file 1<>file

Note that with this approach, there's no backup of file, so any corruption makes you lose your data.

cuonglm
  • 153,898
0

I think ed does not use temp files, so

ed bigfile <<ED_SCRIPT
1,${rmv}d
w
q
ED_SCRIPT
glenn jackman
  • 85,964
  • 1
    It depends on whether you have sufficient memory to avoid the use of temp files, AFAIK. But according to this answer, it always uses temp files in /tmp. – Wildcard Mar 29 '16 at 19:29
0

For scripted file edits, the tool of choice is ex.

ex -sc "1,${rmv}d | x" "$filename"

You may also want to look into the split utility.

Wildcard
  • 36,499
  • So does ed. vi at least doesn't seem to fail if the filesystem is full, but perhaps that's only true if the whole file can be held in memory. I ran into this before and it's what started me on learning ex. – Wildcard Mar 29 '16 at 19:31
0

Your text file is x MB, you do not have 2* x MB, but I suppose you do not have enough space to compress the file either? Text files often compress to a tenth of the original size...

You don't have enough space to hold just the ($total-$rmv) lines, but what if they are compressed? tail -n +$rmv "$filename" | gzip > "$filename.tmp" && zcat "$filename.tmp" > "$filename"

Taking this as an abstract intellectual problem, I would cut up the file in chunks, sized either to fit in memory or in available disk space, starting with the last chunk and then truncating the original file. I could also compress the file in place using dd conv=notrunc trickery.

However, in practice, I'd either

  • copy the file over the network to a server with enough disk to hold (only) the ($total-$rmv) lines, check that I got the right lines, remove the original file, and copy back.

  • add disk, since you obviously need it.

Law29
  • 1,156
  • This site is NOT a "site for problems in a business environment". It is for Unix and Linux questions in ANY environment including (but not limited to) business, educational/academic, non-profit, government, NGOs, and home. – cas Mar 30 '16 at 02:28
  • @cas So I forgot I wasn't on ServerFault – Law29 Mar 30 '16 at 05:52