How do I synchronize appending writes in shell scripts?

Question

I want to write a shell script that accepts a URL and an output markdown file, and adds that URL plus some metadata to the end of that file. It is possible that this script is invoked concurrently, resulting in concurrent echo $some_multiline_thing >> files.

Per this question, this can result in corrupt data being written to file. How do I synchronize the writes so that the appending writes all happen but atomically? (The order of the appends doesn't matter to me.)

Update: I found a half-baked solution

function sync-append() {
    local file="$1"
    local text="$2"

    local lock_fd
    {
        exec {lock_fd}>>$file
        flock -x "$lock_fd"
        echo "$text" >> $file
    } always {
        exec {lock_fd}>&-
    }
}

This solution relies on zsh's always, which may not be invoked (with, e.g., kill -9).

@pLumo I think flock is the answer, but that question does not tell me how to actually use it to accomplish my goal of synchronous appending writes. — HappyFace, May 26 '20 at 12:22
@HappyFace, isn't that linked question just about the same situation? You're appending to a file, and they're not but other than that it's the same, locking a file for just a single concurrent process at a time? — ilkkachu, May 26 '20 at 13:25
also I don't think the kill -9 should matter. the lock is associated with the open file description, and if the process dies, the fd closes and the lock is gone — ilkkachu, May 26 '20 at 13:26
@ilkkachu You're right, kill -9 kills the lock as well. Why's that though? Where is the lock living? Shouldn't the lock be in the OS, and not in the file descriptor? Does the OS query all open file descriptors to determine if a file is locked? — HappyFace, May 26 '20 at 13:32
@ilkkachu https://stackoverflow.com/questions/20070943/do-flock-locks-reset-after-a-system-restart?rq=1 — HappyFace, May 26 '20 at 13:42
@HappyFace, well, it's just bookkeeping. The system knows what fds a process is holding, and what locks they have. All it needs to do is keep a count of how many there are still left. It's a bit of the same with deleting files, it only happens when all open fds are closed. (That is, the file proper, the inode, is only deleted then. The names can be removed before.) — ilkkachu, May 26 '20 at 13:59

score 3 · Answer 1 · answered May 27 '20 at 07:10

3

Just do:

{
  flock 1 &&
    echo something
} >> "$file"

Locks are gone when the process is gone anyway, so you don't have to worry about kill -s KILL.

answered May 27 '20 at 07:10

Stéphane Chazelas

544,893

score 0 · Answer 2 · answered May 26 '20 at 18:07

0

Most probably you do not have to care about locking at all. The conditions for that are

that your system is not a very old Linux (<3.14)
that the amout of data you write from a single process does not exceed 32K (or rather: getconf SSIZE_MAX)

See https://serverfault.com/questions/599486/what-is-the-size-of-an-atomic-write-to-disk-on-my-system

answered May 26 '20 at 18:07

Hauke Laging

90,279

2

But a single echo invocation may do more than one write(2) call and you have no control over that. With strace -fe write zsh -c "echo '${(l:10000:)}'" > /dev/null on Debian, I see two writes, one of 8192 bytes, one of 1809. Same with bash, probably down to stdio. – Stéphane Chazelas May 27 '20 at 07:05
@StéphaneChazelas Interesting. Nonetheless: When I was searching for that lower limit I assumed it might be as low as 512 bytes. And even that would probably be more than enough for this task. – Hauke Laging May 27 '20 at 08:10

How do I synchronize appending writes in shell scripts?

2 Answers2