Under normal circumstances, splitting a file will always result in file content being copied. As far as I know, Linux offers no mechanism to split files any other way.
That's almost the whole story, but not quite...
Linux uses a file system driver to manage the file on disk and the filesystem written to disk can sometimes be manipulated by other programs typically while the filesystem is NOT mounted by Linux. Eg debugfs for ext2/ext3/ext4
But there are hard limits on what a program might do if it manipulates the bytes of the filesystem itself...
Background
In most (all?) file systems the file and directory names are stored separately from the file data. The meta data such as file ownership and timestamp may also be stored in a separate place again commonly known as an "Inode".
This allows attaching two file names to the same file (AKA hard links).
File data is stored in blocks. The default block size for ext4 is 4KiB (4096 bytes). Normally, saving just one byte will require a whole block, but then more bytes can be saved to the file "for free" until the block is full.
When the first block is full and more bytes need to be written, the file system must find a new (available) block. Then:
- It marks the new block unavailable so that no other file tries to use it
- and records where this block is inside the file
Different file systems have different techniques for these two things, but the requirement is always the same. File system drivers do their best to ensure that new blocks follow on immediately from previous ones in the file but this cannot be guaranteed. Fragmentation happens when the next block has already been allocated to a different file.
When might files quietly share some blocks?
Some file systems and some block level storage has the ability to perform de-duplication. If the exact content written to a block matches the exact content of another block then it is possible that only one block may be kept. Sometimes this means the duplicates are never written, sometimes it means they are retrospectively removed.
As Marcus Müller points out some file systems can perform duplicate block removal.
It is also a feature of LVM which happens at the block level, outside the knowledge of the filesystem.
Hard limits
Hypothetically, you could manipulate a file system by creating a new inode, adding a filename for that inode, and then reallocating blocks from one inode to another.
Again Linux does NOT have a mechanism for this, but there's nothing to stop you playing with the actual bytes on your hard drive to make this happen if you know how.
What's not [usually] possible is splitting a file anywhere you like without moving bytes. Block sizes are non-negotiable. You can't have half used blocks in the middle of a file for most file systems.
This means that in your example, a new block will (nearly) certainly be written because 10 byes is not a multiple of 4096 or any other common block size.
However different file systems have different capabilities. BTRFS, for example, supports tail packing.
lxsplit
is that it is an example of a utility for splitting files. Since there are several, I guess some of them is faster, some them is slower, ... And I was wondering (but not asking) whether the topic I've asked has a relevance in making one such tool faster than it would be otherwise. – Enlico Apr 04 '23 at 18:46