How can an image file be created for a directory?

Question

How can an image file be created for a directory? For example:

Create iso image from folder via terminal commands shows that genisoimage can create an iso image file for a directory. I only know that an image file is create for a disk volume. Does genisoimage first create a loop device or block device file (similar to a CD/DVD) for the directory, and then create an image file for the loop device or block device file? Is it the exact opposite to the two step process of mounting an image file to a directory which needs to create a loop device for the image file first (similar to losetup command)?
Is there a command which can create an ext4 image file for a directory? Is it also a two step process?

An archive file can also be created for a directory. How are the two processes different? Does creating an image file for a directory involve a loop evice, while creating an archive file for a directory doesn't?

How can you create a tar for a directory? Pretty much the same, except you're creating an archive of iso9660 or UDF format. — 炸鱼薯条德里克, Feb 26 '19 at 21:54
Theyre not the same but have similar data structure, at least from human's view, because they both provide similar abstraction to human. You need to understand filesystem is nothing more than a data structure, it's just Linux allow you to read/write it using a way, called "mount", however, "I allow you to" doesn't mean "you have to" — 炸鱼薯条德里克, Feb 26 '19 at 22:13
Remember a data structure can be stored anywhere, a partition, a CD, RAM, a file, you can read/write it by reading/writing the underlying storage. What does loop device do? It allows you access a file like a block storage device, but for what? Why not just read/write the file? Because by that you can easily access the filesystem on it using the "mount" method, the mount () syscall doesn't allow you mount a non-device, so you got to use loop. — 炸鱼薯条德里克, Feb 26 '19 at 22:40
But let's go back, tar or iso filesystems are not designed to write using a mount way, so the "mount" must be useless because you need to write. So back to the original idea -- read/write the file directly. As long as you know all the detail about iso/tar structure, you can build up that structure little by little by directly read/write the image file, so jobs done, although very hard (because that needs the knowledge of every detail of tar/iso format in the userspace), but jobs done — 炸鱼薯条德里克, Feb 26 '19 at 22:46
How about ext4? Ext4 are designed to be mounted for both reading/writing. So you can easily read/write the ext4 structure using "mount" way. You don't need the knowledge about every detail of ext4 structure, you just need filesystem API and take care about the abstraction that ext4 provides, and then ext4 kernel driver handles everything for you. Oh, the ext4 resides on … a file, not a device. What do you do then? — 炸鱼薯条德里克, Feb 26 '19 at 22:53

sourcejedi · Answer 1 · 2019-02-27T09:23:32.567

There is no equivalent tool for ext4. As you say, you can achieve an equivalent effect using a loop device -

Create an empty image file with more than enough space to store your files. - truncate -s 1G my.img
Format it - mkfs.ext4 my.img
Mount it as a loop device - mount -oloop my.img /mnt
Copy the files into the mounted filesystem. - cp -a directory/. /mnt/.
Shrink the ext4 filesystem to its minimum size using resize2fs -M my.img. It will tell you how many blocks long the filesystem is (and the size of the filesystem blocks).
You may now truncate the file to the smaller size, using an appropriate truncate command.

However the Linux kernel does not have a writeable implementation of ISO9660, so this is not how genisoimage is implemented. We certainly don't have a shrinker tool for ISO9660 either :-).

It might not be very practical to mount ISO9660 as a writeable filesystem, because it was not designed for it. E.g. it might not have any structure to efficiently record free space on disk, or allow fragmenting written files to fit the spaces which are left by deletions.

genisoimage is indeed similar to creating a TAR or ZIP archive file. The filesystem image will contain a copy of all of the specified files. It will be about the same size as the total of all the specified file sizes. There is some contrast to these archive formats though. The filesystem data structures are aligned to the sectors of the disc device (2048 bytes), for efficient reads of individual files. The archive formats are not aligned to any such large boundary. The archive files are packed for compactness; note also that it is common to extract all the files at once. I expect the archive formats also require more effort in order to find one individual file and read its contents.

I would not refer to this as an image of the directory. I suppose etymologically, it is not an image of any real block device. However it is in the same format as an image file created as a copy of a data CDROM. As the man page puts it -

genisoimage takes a snapshot of a given directory tree, and generates a binary image which will correspond to an ISO9660 and/or HFS filesystem when written to a block device.

Special-purpose technology was created for CD-Rs (and DVD-Rs) in part because it was a special case.

You cannot seek when writing them. Also, the CD-R writing procedure naturally wants to be fed a continuous stream of data. If the data is not provided fast enough and it runs out, the write will fail and the disc will be useless - except that nowadays many writers have a workaround which is not standardized and requires writing a small gap on the disc.

It is not necessarily very useful for the OS to allow writing such discs using the generic block device interface. We certainly cannot create a filesystem on a CD-R block device, mount it, and copy files into it. Instead, we use this more specialized approach.

We do not tell genisoimage to write the filesystem image directly to the Optical Disc Device e.g. /dev/sr0. We can pipe it through wodim dev=/dev/sr0 - if we like to :-). In ancient times, it was probably safer to generate the entire filesystem image in advance, because you can get more consistent performance when reading a single large file. I.e. it is less likely for the reads to be too slow, so it is less likely that the writer will run out of data and fail.

No? No iso9660 writing driver at all? How about UDF? Does the kernel has rw driver for that? — 炸鱼薯条德里克, Feb 27 '19 at 01:44
Thanks. Could you show the commands when creating an ext4 image file for a directory? — Tim, Feb 27 '19 at 01:44
@炸鱼薯条德里克 the kernel can write UDF. It's not necessarily very mature (reliable). UDF is not ISO9660 though. — sourcejedi, Feb 27 '19 at 01:46
Thanks. When creating an ext4 image file for a directory, is it possible to firstly create a loop device directly for the given directory (for example, by losetup), and then create an image file for the loop device? — Tim, Feb 27 '19 at 02:14
@Tim yes, that's fine too. mount -oloop is just more convenient because it destroys the loop device automatically when you unmount the filesystem. — sourcejedi, Feb 27 '19 at 02:16
In your reply, you first create an image file, and then creates a loop device for an image file bymount -oloop, and then adjust the image file according to the given directory. In my post and previous comment, I was wondering if we can firstly create a loop device directly from the given directory, without an image file created yet, and then create an image file for the newly created loop device? — Tim, Feb 27 '19 at 02:23
No, you can't. loop device's underlying storage is always a regular file. @Tim If you need more complex underlying storage model, use ndb and implement a server process in the userspace. You may also needs an program to provide client user interface. — 炸鱼薯条德里克, Feb 27 '19 at 02:37

score 3 · Answer 2 · answered Feb 27 '19 at 09:23

When an existing filesystem is imaged, it usually means creating a block-by-block exact copy of the contents of the block device underlying the existing filesystem. It is literally an exact copy of that storage as it was at the time of imaging.

An ISO image is a bit different. It is a file that is constructed out of a set of directories and files defined by any means, in order to be a block-by-block exact copy of what a CD/DVD/Blu-Ray produced from that data will be. It includes the ISO9660 filesystem metadata, created at ISO image creation time.

The ISO image format exists because CD writer devices could initially only write to the disc in a sequential-access fashion, and could not stop in the middle of a write operation without ruining the disc. So you could not write a file here and another there, making it up as you went: you had to have all the files and all the metadata that would make up your CD-ROM (or audio CD) specified in advance, down to the last byte, before firing up the writing laser, and then write the entire disc, or at least a complete data or audio track, in one go. This was fine for the mass production of pressed audio CDs and CD-ROMs.

(Then, CD-Rs were introduced and it was found inconvenient to have to waste an entire disc even if you did not have a full 650 MiB to write... so multi-session CD writing was developed. Later, a more seamless write start/stop technology made it possible to develop packet writing for CD-RWs, and the UDF filesystem was developed to be optimized for that... but I digress.)

You cannot create an image file out of a directory quite the same way you can from an entire filesystem. A filesystem image includes the filesystem metadata that describes which blocks are allocated and which are free, and the physical locations of each block of each file and directory on the block device containing the filesystem. That information only makes sense in the context of the rest of the filesystem.

Trying to copy that information without also copying the rest of the filesystem would be mostly useless: when restoring files from such an image to a new filesystem, you would have to disregard the original block location information and let the filesystem driver place the files and directories according to which blocks on the destination filesystem are free. Otherwise you might overwrite and corrupt existing files and/or directories when restoring your "image".

So, when "imaging" directories rather than complete filesystems, it makes sense to only store the (relative) pathnames, file and directory ownerships, permissions and other attributes, and the data within the files. And when you develop a file format optimized for this, you'll get an archive file: for example, a .tar file. Add compression to the concept, and you have a .tar.gz or a .zip file, or any of the numerous compressed archive file formats.

A loop device is not necessary in creation of image files: to create an image of a filesystem, you just read all the blocks of the block device containing that filesystem in order from the beginning to the end, and write them all into a single file, while ensuring that the filesystem that is being imaged is not modified during the imaging process.

A loop device allows accessing the contents of an image file without writing it to a "real" block device.

How can an image file be created for a directory?

2 Answers2

Linked