1

I'd splitted a vdi file with 7 gigabytes and then joined with cat command. The file works normally on VBox, but get 14 gigabytes after be re-joined. The exact operation was:

$ split -- bytes=2000M file.vdi /locationX/prefix

moved it to another machine...

$ cat prefix* > /locationY/file.vdi

and then it gets its size doubled.

What happened?

1 Answers1

1

file.vdi is in all likelihood a sparse file. This is very common with virtual machine disk images: parts that have never been written to are left as holes in the file that don't consume space. You can confirm by checking whether the length of the original file matches its disk usage:

ls -l file.vdi; du file.dvi

I expect that ls -l will report 14GB (actual file length) but du will report 7GB (disk uage), meaning that about half of the image was never written to.

Sparse files are a crude form of compression performed by the system. The holes in the file are defined as containing a bunch of null bytes, and that's what applications see if they read from the holes. So split (or cat or cp or dd or tar or anything else) read a lot of null bytes that take up space in the output.

If you want to save space at the destination, you can make the file sparse again. This will only save space, it will not improve performance.

  • I've understood. But that python script for punch holes in regions of files that are filled with zeros, from your link, does not work for me. Even using dd with "conv=sparse" does not work (I'm using coreutils 8.12.197-032bb). So, for me the solution was re-make all the operation, but first compressing the vdi file with tar.gz and decompressing before concat the pieces. – Leandros López Mar 07 '15 at 16:42