99

I want to create a "copy" of a directory tree where each file is a hardlink to the original file

Example: I have a directory structure:

dirA/
dirA/file1
dirA/x/
dirA/x/file2
dirA/y/
dirA/y/file3

Here is the expected result, a "copy" of the directory tree where each file is a hardlink to the original file:

dirB/            #  normal directory
dirB/file1       #  hardlink to dirA/file1
dirB/x/          #  normal directory
dirB/x/file2     #  hardlink to dirA/x/file2
dirB/y/          #  normal directory
dirB/y/file3     #  hardlink to dirA/y/file3
Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255

6 Answers6

103

On Linux (more precisely with the GNU and busybox implementations of cp as typically found on systems that have Linux as a kernel) and recent FreeBSD, this is how:

cp -al dirA dirB

For a more portable solution, see answer using pax and cpio by Stéphane Chazelas

  • 1
    Note that like pax, on FreeBSD, cp -a doesn't hardlink symlinks. – Stéphane Chazelas May 11 '15 at 20:05
  • 1
    Be aware that hard links do not work across separate filesystem mounts. – Dave Oct 26 '15 at 23:34
  • 1
    If dirB exists, dirB will CONTAIN the new dirA. If dirB does not exist, dirB will BE the new dirA. But it probably depends on your OS as to the exact behavior. – Kelly Bang Jan 20 '21 at 06:26
  • 1
    -a = --archive = "same as -dR --preserve=all" = "never follow symbolic links in SOURCE", --preserve=links, and "copy directories recursively", while -l means "hard link files instead of copying" – endolith Jan 20 '21 at 16:14
  • In this mode, files won't be overwritten though. Is there a way? – Martin Braun Jun 20 '23 at 03:02
  • If you want to preserve the dir structure starting from the current dir/start of the copy path, use --parents. Useful when you want to back up some files deep down into the file tree, and want to recover by simply copy paste everything – WesternGun Jan 30 '24 at 12:37
32

POSIXly, you'd use pax in read+write mode with the -l option:

pax -rwlpe -s /A/B/ dirA .

(-pe preserves all possible attributes of files (in this case only directories) that are copied, like GNU cp's -a does).

Now, though standard, that command is not necessarily very portable.

First, many GNU/Linux-based systems don't include pax by default (even though that's a non-optional POSIX utility).

Then, a number of bugs and non-conformances with a few implementations cause a number of issues with that code.

  • because of a bug, Solaris 10 pax (at least) doesn't work when using -rwl in combination with -s. For some reason, it seems it applies the substitution to both the original and copied path. So above, it would attempt to do some link("dirB/file", "dirB/file") instead of link("dirA/file", "dirB/file").
  • on FreeBSD, pax doesn't create hardlinks for files of type symlink (a behaviour allowed by POSIX). Not only that, but it also applies the substitution to the targets of the symlinks (a behaviour not allowed by POSIX). So for instance if there's a foo -> AA symlink in dirA, it will become foo -> BA in dirB.

Also, if you want to do the same but with arbitrary file paths whose content is stored in $src and $dst, it's important to realise that pax -rwl -- "$src" "$dst" creates the full directory structure of $src inside $dst (that has to exist and be a directory). For instance, if $src is foo/bar, then, $dst/foo/bar is created.

If instead, you want $dst to be a copy of $src, the easiest is probably to do it as:

absolute_dst=$(umask 077 && mkdir -p -- "$dst" && cd -P -- "$dst" && pwd -P) &&
(cd -P -- "$src" && pax -rwlpe . "$absolute_dst")

(which would also work around most of the problems mentioned above but would fail if the absolute path of $dst ends in newline characters).

Now that won't help on GNU/Linux systems where there's no pax.

It's interesting to note that pax was created by POSIX to merge the features of the tar and cpio commands.

cpio is a historical Unix command (from 1977) as opposed to a POSIX invention, and there is a GNU implementation as well (not a pax one). So even though it is no longer a standard command (it was in SUSv2 though), it is still very common, and there's a core set of features you can usually rely on.

The equivalent of pax -rwl would be cpio -pl. However:

  1. cpio takes the list of input file on stdin as opposed to arguments (newline delimited which means file names with newline characters are not supported)
  2. All files have to be specified (typically you feed it the output of find (find and cpio were developed jointly by the same people)).
  3. metadata are not preserved (some cpio implementations have options to preserve some, but nothing portable).

So with cpio:

absolute_dst=$(umask 077 && mkdir -p -- "$dst" && cd -P -- "$dst" && pwd -P) &&
(cd -P -- "$src" && find . | cpio -pl "$absolute_dst")
  • Seems that -s/A/B/ is specific to my example. How would you do this if the source directory name and target directory name were variables $sourcedir and $targetdir? – Gudmundur Orn May 09 '15 at 16:12
  • @GudmundurOrn, see edit. – Stéphane Chazelas May 11 '15 at 19:26
  • I run this command on OS X and just receives an error message "pax: Unable to link file ./a.txt to itself". I used the your command literally, just replacing the source directory with the actual name, leaving /A/B and the final dot as is. Am I misunderstanding something? – d-b Jul 19 '16 at 23:03
  • @d-b, -s /A/B replaces A with B so that dirA becomes dirB. If your source directory name has no A, then that will copy (link) it over itself. See also the rest of the answer for possibly better approaches. – Stéphane Chazelas Jul 20 '16 at 06:30
10

Short answer:

cd $source_folder
pax -rwlpe . $dest_folder
lkraider
  • 201
  • 1
    Note that the pe toggles can cause privilege issues (Operation not permitted) because pax is calling chown. For my use case having hardlinks attributed to the executing user was fine so I ended up using simply pax -rwl – Vincent Pazeller Jul 10 '20 at 06:37
  • Another issue I encountered is with permissions. If you have a directory owned by root and try to create hard links with pax using a standard user (e.g. your web server user like www-data), the behavior of pax is creating copies of the files instead of hardlinks. This is certainly for security reasons (this would allow a user to modify root's files), but be sure to be aware of this. – Vincent Pazeller Jul 20 '20 at 12:42
6

rsync -av --link-dest="$PWD/dirA" dirA/ dirB

If you happen to have rsync already installed this one is a quick simple command. To cope with symlinks you may want to choose among --links, --copy-links, --copy-unsafe-links or --safe-links

From the rsync man page:

--link-dest=DIR         hardlink to files in DIR when unchanged
 -l, --links                 copy symlinks as symlinks
 -L, --copy-links            transform symlink into referent file/dir
--copy-unsafe-links     only "unsafe" symlinks are transformed
--safe-links            ignore symlinks that point outside the tree

Edit:

  • Fixed the command after the comment by @MichaelR. Thank you!
  • Tested as follows on MacOS using rsync 2.6.9
$ cd /tmp && rm -rf a b; mkdir a && touch a/c && echo "xxx" > a/c && rsync -av --link-dest="$PWD/a" a/ b; 
$ ls -lR a b
building file list ... done
created directory b
./

sent 74 bytes received 26 bytes 200.00 bytes/sec total size is 4 speedup is 0.04 a: total 8 -rw-r--r-- 2 user wheel 4 Aug 26 16:09 c

b: total 8 -rw-r--r-- 2 user wheel 4 Aug 26 16:09 c

  • --link-dest doesn't appear to work for me using rsync v2.6.9 on macOS. I'm running rsync -av --link-dest=a a/ b/ and directory b/ contains file copies, not hardlinks. – Michael R Aug 24 '21 at 16:10
  • @MichaelR, you are right. I'm sorry! I tested the following on MacOS and it works: rsync -av --link-dest="$PWD/a" a/ b – Adan Cortes Aug 26 '21 at 21:11
2

In case you are looking for that copy-with-hardlinks feature to make snapshots or backups of (all or part of) your files have a look at rsnapshot.

Janis
  • 14,222
  • 1
    That's interesting. But I guess hard-links are only a good snapshot mechanism if the files will not be modified. Right? – Gudmundur Orn May 09 '15 at 15:41
  • @Gudmundur Orn; This is correct. The tool mentioned in my answer will create a new snapshot in a way that files are unique; i.e. existing (unmodified) files will be created as hardlinks and new files (or modified versions of existing files) will be created as new files. So in consequence you will have the least redundancy. – Janis May 09 '15 at 15:45
1

@gudmundur-orn's answer is correct, but if you are on BtrFS on Linux cp a --reflink=auto dirA dirB should do the trick, with the difference the files are actually different and changing one doesn't change the other. You can achieve mostly the same with cp -c on a Mac with APFS (auto will do a full copy if not possible, -c will fail).

Any COW file system should be able to do that, but vendors haven't agreed on a standard command line option.

rbanffy
  • 1,208