2

As sysadmins, we have all encountered the situation where the disk drive has filled up due to a large file. But we can't delete the file because it contains important data. So we need to copy the file to another location first - another host or another volume - before we can delete it.

For huge files, we're sitting around waiting for the copy - time spent during which the host is virtually unusable.

So I pose the question: is there a way that as the copy is occurring, the parts already copied can be removed from the original thereby reducing disk usage so that the host can be brought back into a usable state more quickly? I'm imagining some kind of tool or command-line flag that would do this.

Michael Martinez
  • 982
  • 7
  • 12
  • One could copy the file contents backwards truncating the file size as you go along, but then the large file would need to be reversed afterwards as it would be in blocks but backwards in order. Also things would get complicated if the copy or hosts fail mid-copy, as you'd have a partially truncated file and hopefully some blocks of it to assemble elsewhere... – thrig Aug 12 '22 at 16:22
  • Some other process might have this file open while the truncate is done. Would that other process see the truncation in real time, and more importantly, would the file system release blocks, at all, while there are open file descriptors? – phunsoft Aug 12 '22 at 16:29
  • maybe related https://unix.stackexchange.com/a/341473/30851 - not sure if there is a proper tool for that – frostschutz Aug 12 '22 at 16:31
  • Thanks so much for the excellent comments, suggestions and links. It appears nobody has developed a way to do this. – Michael Martinez Aug 13 '22 at 17:13
  • fallocate --punch-hole (fallocate -p) is a way, not automatic though. It's your job to know how big hole you can punch at any given moment. – Kamil Maciorowski Aug 14 '22 at 16:51
  • The only robust copying/archiving process I know of is rsync. It will allow a restart of a file copy process from its point of being interrupted. Having said that, they might have incorporated some logic to perform what you say, but I don't think so. It is counter-intuitive to release file-specific blocks until a file is fully copied. Otherwise, how do you prevent a restart of a copying process from clobbering the beginning of the copy with the mid-file contents of the truncated source? Only alternative is to split the file into distinct files, for the copying, followed by a merge. – Eric Marceau Sep 19 '22 at 02:29
  • @EricMarceau I don't see any reason why it would not be possible to design a program that will validate each data block that's copied and keep track of where it's at and therefore not clobber anything. – Michael Martinez Sep 20 '22 at 17:19
  • Two approaches: use basic OS functionality ... or ... develop low-level code that manipulates file content by micro-managing/manipulating directory inode tables assigned to individual files (and here is the gotcha) while the OS at source and OS at destination keep track of the relationship between the two files intelligently ... across a power-fail condition, ... and ... where device ** journalling ** does not confuse the OSs on which pieces have been well and truly successfully copied. That leaves only ... "slice and dice" at source, copy segments, rebuild from segments on destination. – Eric Marceau Sep 20 '22 at 18:40
  • @EricMarceau Exactly. Nobody's ever done that work. I personally feel like it would be worthwhile. It's been a "todo" project of mine for several years but I've never had the time to do it. – Michael Martinez Sep 22 '22 at 15:42

0 Answers0