10

I want to copy a file from A to B, which may be on different filesystems.

There are some additional requirements:

  1. The copy is all or nothing, no partial or corrupt file B left in place on crash;
  2. Do not overwrite an existing file B;
  3. Do not compete with a concurrent execution of the same command, at most one can succeed.

I think this gets close:

cp A B.part && \
ln B B.part && \
rm B.part

But 3. is violated by the cp not failing if B.part exists (even with -n flag). Subsequently 1. could fail if the other process 'wins' the cp and the file linked into place is incomplete. B.part could also be an unrelated file, but I'm happy to fail without trying other hidden names in that case.

I think bash noclobber helps, does this work fully? Is there a way to get without the bash version requirement?

#!/usr/bin/env bash
set -o noclobber
cat A > B.part && \
ln B.part B && \
rm B.part

Followup, I know some file systems will fail at this anyway (NFS). Is there a way to detect such filesystems?

Some other related but not quite the same questions:

Approximating atomic move across file systems?

Is mv atomic on my fs?

is there a way to atomically move file and directory from tempfs to ext4 partition on eMMC

https://rcrowley.org/2010/01/06/things-unix-can-do-atomically.html

Evan Benn
  • 440
  • 2
    Are you only concerned about concurrent execution of the same command (i.e. could locking within your tool suffice), or about other outside interference with the files as well? – Michael Homer Jul 26 '19 at 03:55
  • 3
    "Transactional" might be better – muru Jul 26 '19 at 03:57
  • 1
    @MichaelHomer within the tool is good enough, I think outside would make things very hard! If its possible with file locks though... – Evan Benn Jul 26 '19 at 04:45
  • Notice there must be some inconsistent states stored on the disk, even the filesystem wouldn't be clean when power fail, it's just software should be able to deal with such situation and expose transactional concept to users – 炸鱼薯条德里克 Jul 26 '19 at 09:13
  • You are mentioning different file-systems, yet you want to employ hard-links. That does not work. This feels like an XY problem. What are you actually trying to achieve? Are databases involved by any chance? – Hermann Jul 26 '19 at 09:25
  • Is your problem of 1) failing only, that both may overwrite B.part? Then you could use different randomly generated names. check for example mktemp for this purpose. This would of course allow the second process to overwrite the file again (you may put a check before 3), that tries to detect if the file now exists), but it will be consistent. – allo Jul 26 '19 at 11:25
  • @allo This sounds like a lockfile question ... but noclobber across local filesystems, and NFS locks, deal with that. – Rich Jul 26 '19 at 22:20
  • I think the ln && rm (steps 2 + 3) in your example can be replaced by simply mv. – marcelm Jul 26 '19 at 23:06
  • 1
    @marcelm mv will overwrite an existing file B. mv -n will not notify that it has failed. ln(1) (rename(2)) will fail if B already exists. – Evan Benn Jul 27 '19 at 23:03
  • 1
    @EvanBenn Good point! I should have read your requirements better. (I tend to need atomic updates of an existing target, and I was replying with that in mind) – marcelm Jul 31 '19 at 20:25

7 Answers7

12

rsync does this job. A temporary file is O_EXCL created by default (only disabled if you use --inplace) and then renamed over the target file. Use --ignore-existing to not overwrite B if it exists.

In practice, I never experienced any problems with this on ext4, zfs or even NFS mounts.

Evan Benn
  • 440
Hermann
  • 6,148
  • rsync probably does do this nicely, but the extremely complicated man page does scare me. options implying other options, being incompatible with each other etc. – Evan Benn Jul 26 '19 at 10:26
  • Rsync does not help with requirement #3, as far as I can tell. Still, it's a fantastic tool, and you should not shy away from a bit of man-page reading. You can also try either https://github.com/tldr-pages/tldr/blob/master/pages/common/rsync.md or http://cheat.sh/rsync. (tldr and cheat are two different projects that aim to help with the problem you stated, viz., "man page is TL;DR"; lots of common commands are supported, and you will see the most common usages shown. –  Jul 26 '19 at 13:19
  • @EvanBenn rsync is an amazing tool and well worth learning! It's man page is complicated because it is so versatile. Don't be intimidated :) – Josh Jul 26 '19 at 13:35
  • @sitaram, #3 could be resolved with a pid file. A small script like in the answer here. – Robert Riedl Jul 26 '19 at 15:30
  • Even if you invoke multiple instances of rsync concurrently, the transfer will be executed more than once, but the final result will be valid (last one to finish wins). – Hermann Jul 26 '19 at 16:31
  • 2
    This is the best answer. Rsync is the industry standard go-to for atomic file transfers, and in various configurations can satisfy all of your requirements. – wakey Jul 26 '19 at 18:57
  • rsync --ignore-existing A B seems to fit the bill, it doesn't indicate if B already exists though. Any way to make that happen? – Evan Benn Jul 26 '19 at 22:09
  • @RobertRiedl yes but they appear racy. I think the last example in man flock (on most LInuxes?), which is to add [ "${FLOCKER}" != "$0" ] && exec env FLOCKER="$0" flock -en "$0" "$0" "$@" || : to the top of the script, would probably work better. In fact, with that addition, and properly chosen rsync options, I do believe the OP's question is indeed solved. Haven't tested though. –  Jul 29 '19 at 13:12
4

Don't worry, noclobber is a standard feature.

ilkkachu
  • 138,973
4

You asked about NFS. This kind of code is likely to break under NFS, since the check for noclobber involves two separate NFS operations (check if file exists, create new file) and two processes from two separate NFS clients may get into a race condition where both of them succeed (both verify that B.part does not exist yet, then both proceed to successfully create it, as a result they're overwriting each other.)

There's not really to do a generic check for whether the filesystem you're writing to will support something like noclobber atomically or not. You could check the filesystem type, whether it's NFS, but that would be a heuristic and not necessarily a guarantee. Filesystems like SMB/CIFS (Samba) are likely to suffer from the same problems. Filesystems exposes through FUSE may or may not behave correctly, but that mostly depends on the implementation.


A possibly better approach is to avoid the collision in the B.part step, by using a unique filename (through cooperation with other agents) so that you don't need to depend on noclobber. For instance, you could include, as part of the filename, your hostname, PID and a timestamp (+possibly a random number.) Since there should be a single process running under a specific PID at a host at any given time, this should guarantee uniqueness.

So either one of:

test -f B && continue  # skip already existing
unique=$(hostname).$$.$(date +%s).$RANDOM
cp A B.part."$unique"
# Maybe check for existance of B again, remove
# the temporary file and bail out in that case.
mv B.part."$unique" B
# mv (rename) should always succeed, overwrite a
# previously copied B if one exists.

Or:

test -f B && continue  # skip already existing
unique=$(hostname).$$.$(date +%s).$RANDOM
cp A B.part."$unique"
if ln B.part."$unique" B ; then
    echo "Success creating B"
else
    echo "Failed creating B, already existed"
fi
# Both cases require cleanup.
rm B.part."$unique"

So if you have a race condition between two agents, they will both proceed with the operation, but the last operation will be atomic, so either B exists with a full copy of A, or B doesn't exist.

You can reduce the size of the race by checking again after the copy and before the mv or ln operation, but there's still a small race condition there. But, regardless of the race condition, the contents of B should be consistent, assuming both processes are trying to create it from A (or a copy from a valid file as origin.)

Note that in the first situation with mv, when a race exists, the last process is the one who wins, since rename(2) will atomically replace an existing file:

If newpath already exists, it will be atomically replaced, so that there is no point at which another process attempting to access newpath will find it missing. [...]

If newpath exists but the operation fails for some reason, rename() guarantees to leave an instance of newpath in place.

So, it's quite possible processes consuming B at the time might see different versions of it (different inodes) during this process. If the writers are just all trying to copy the same contents, and readers are simply consuming the contents of the file, that might be fine, if they get different inodes for files with the same contents, they'll be happy just the same.

The second approach using a hard link looks better, but I recall making experiments with hardlinks in a tight loop on NFS from many concurrent clients and counting success and there still seemed to be some race conditions there, where it seemed if two clients issued a hardlink operation at the same time, with the same destination, both seemed to succeed. (It is possible that this behavior was related to the particular NFS server implementation, YMMV.) In any case, that's probably the same kind of race condition, where you might end up getting two separate inodes for the same file in cases where there's heavy concurrency between writers to trigger these race conditions. If your writers are consistent (both copying A to B), and your readers are only consuming the contents, that might be enough.

Finally, you mentioned locking. Unfortunately locking is severely lacking, at least in NFSv3 (not sure about NFSv4, but I'd bet it's not good either.) If you're considering locking, you should look into different protocols for distributed locking, possibly out of band with the actual file copies, but that's both disruptive, complex and prone to issues such as deadlocks, so I'd say it's better to be avoided.


For more background on the subject of atomicity on NFS, you might want to read on the Maildir mailbox format, which was created to avoid locks and work reliably even on NFS. It does so by keeping unique filenames everywhere (so you don't even get a final B at the end.)

Perhaps somewhat more interesting to your particular case, the Maildir++ format extends Maildir to add support for mailbox quota and does so by atomically updating a file with a fixed name inside the mailbox (so that might be closer to your B.) I think Maildir++ tries to append, which is not really safe on NFS, but there's a recalculation approach which uses a procedure similar to this and it's valid as an atomic replace.

Hopefully all these pointers will be useful!

filbranden
  • 21,751
  • 4
  • 63
  • 86
2

You can write a program for this.

Use open(O_CREAT|O_RDWD) to open the target file, read all the bytes and metadata to check if the target file is a complete one, if not, there are two possibilities,

  1. Incomplete write

  2. Other process is running the same program.

Try to aquire an open file description lock on target file.

Failure means there's a concurrent process, the current process should exist.

Success means last write crashed, you should start over or try to fix it by writing to the file.

Also notice that you'd better fsync() after writing to the target file before you close the file and release the lock, or other process might read not-yet-on-disk data.

https://www.gnu.org/software/libc/manual/html_node/Open-File-Description-Locks.html

This is important to help you distinguish between an concurrently running program and lastly crashed operation.

  • Thanks for the info, I am interested to implement this myself and will give it a go. I am surprised it doesn't already exist as part of some coreutils / similar package! – Evan Benn Jul 26 '19 at 06:48
  • This approach can't meet the no partial or corrupt file B left in place on crash requirement. It really is best to use the standard approach of copying the file to a temporary name, then moving it into place: the move can be atomic, which copying cannot be. – reinierpost Jul 26 '19 at 15:30
  • @reinierpost If crash, but data isn't fully copied, partially copied data will be left no matter what. But my approach will detect this and fix it. Moving a file can't be atomic, any data written to disk cross physical sector will not be atomic, but software(eg. OS filesystem driver, this approach) can fix it(if rw) or report a consistent state(if ro), as mentioned in the comment section of the question. Also the question is about copying, not moving. – 炸鱼薯条德里克 Jul 26 '19 at 15:56
  • I also saw O_TMPFILE, which would probably help. (and if not available on the FS, should cause an error) – Evan Benn Jul 26 '19 at 21:39
  • @Evan have you read the document or have you ever think of why O_TMPFILE would rely on the filesystem support? – 炸鱼薯条德里克 Jul 26 '19 at 22:11
  • I dont understand your question, man 2 open lists what file systems O_TMPFILE works on. – Evan Benn Jul 26 '19 at 22:20
  • @炸鱼薯条德里克 Moving can be atomic, it's an edit operation on a directory. With the copy-then-move approach, it doesn't hurt if the copying fails halfway through, just resume or retry until you succeed. – reinierpost Jul 28 '19 at 22:55
0

You will get the correct result by doing a cp together with mv. This will either either replace "B" with a fresh copy of "A", or leave "B" as it was before.

cp A B.tmp && mv B.tmp B

update to accomodate existing B:

cp A B.tmp && if [ ! -e B ]; then mv B.tmp B; else rm B.tmp; fi

This isn't 100% atomic, but it gets close. There's a race condition where two of these things are running, both enter the if test at the same time, both see that B does not exist, then both execute the mv.

Kaan
  • 101
  • mv B.tmp B will overwrite a pre-existing B. cp A B.tmp will overwrite a pre-existing B.tmp, both failures. – Evan Benn Jul 26 '19 at 21:38
  • mv B.tmp B will not run unless cp A B.tmp first runs and returns a success result code. how is that a failure? also, I agree that cp A B.tmp would overwrite an existing B.tmp which is what you want to do. The && guarantees that the 2nd command will run if and only if the first one completes normally. – Kaan Jul 26 '19 at 21:45
  • In the question success is defined as not overwriting pre-existing file B. Using B.tmp is one mechanism, but also must not overwrite any pre-existing file. – Evan Benn Jul 26 '19 at 21:48
  • I updated my answer. Ultimately if you need fully 100% atomicity when files may or may not exist, and multiple threads, you need a single exclusive lock somewhere (create a special file, or use a database, or...) that everyone follows as part of the copy/move process. – Kaan Jul 26 '19 at 22:04
  • This update still overwrites B.tmp, and has a race condition between the test and the mv. Yes the point is to do things correctly not roughly maybe good enough hopefully. Other answers show why locks and databases are not needed. – Evan Benn Jul 26 '19 at 22:23
  • Other answers show why locks and databases are not needed – the accepted answer doesn't solve it fully for multiple processes, one of the comments says to use a pid file – that's an exclusive lock. Also, the rsync answer you accepted states "a temporary file is O_EXCL created" - have you verified behavior with multiple processes and a single shared tmp filename? Anyway, good luck. – Kaan Jul 28 '19 at 21:28
  • yes, I have inspected the source of rsync, and run using strace and ltrace. I have tested it concurrently. – Evan Benn Jul 28 '19 at 23:33
0

You can accomplish this by creating a proper tempfile in the target directory, copying over that tempfile, and then linking the tempfile to the target like you were doing in the question.

This only relies on linkat(2) being atomic for the destination filesystem

#!/bin/sh

ddir=$(dirname "$2") tmpfile=$(mktemp --tmpdir="$ddir") cp "$1" "$tmpfile" && ln "$tmpfile" "$2" ret=$? rm "$tmpfile" exit $ret

Dev
  • 111
-1

Rsync is the appropriate tool to use i think .

You should use rsync -Pahn --checksum /path/from/source /destination/path

However be careful is the files you have are very large...