3

I want to write a shell script that accepts a URL and an output markdown file, and adds that URL plus some metadata to the end of that file. It is possible that this script is invoked concurrently, resulting in concurrent echo $some_multiline_thing >> files.

Per this question, this can result in corrupt data being written to file. How do I synchronize the writes so that the appending writes all happen but atomically? (The order of the appends doesn't matter to me.)

Update: I found a half-baked solution

function sync-append() {
    local file="$1"
    local text="$2"

    local lock_fd
    {
        exec {lock_fd}>>$file
        flock -x "$lock_fd"
        echo "$text" >> $file
    } always {
        exec {lock_fd}>&-
    }
}

This solution relies on zsh's always, which may not be invoked (with, e.g., kill -9).

HappyFace
  • 1,612
  • @pLumo I think flock is the answer, but that question does not tell me how to actually use it to accomplish my goal of synchronous appending writes. – HappyFace May 26 '20 at 12:22
  • https://linux.die.net/man/1/lockfile also seems promising. – HappyFace May 26 '20 at 12:39
  • @HappyFace, isn't that linked question just about the same situation? You're appending to a file, and they're not but other than that it's the same, locking a file for just a single concurrent process at a time? – ilkkachu May 26 '20 at 13:25
  • 1
    also I don't think the kill -9 should matter. the lock is associated with the open file description, and if the process dies, the fd closes and the lock is gone – ilkkachu May 26 '20 at 13:26
  • @ilkkachu True, we can close this question. – HappyFace May 26 '20 at 13:27
  • @ilkkachu You're right, kill -9 kills the lock as well. Why's that though? Where is the lock living? Shouldn't the lock be in the OS, and not in the file descriptor? Does the OS query all open file descriptors to determine if a file is locked? – HappyFace May 26 '20 at 13:32
  • @ilkkachu https://stackoverflow.com/questions/20070943/do-flock-locks-reset-after-a-system-restart?rq=1 – HappyFace May 26 '20 at 13:42
  • @HappyFace, well, it's just bookkeeping. The system knows what fds a process is holding, and what locks they have. All it needs to do is keep a count of how many there are still left. It's a bit of the same with deleting files, it only happens when all open fds are closed. (That is, the file proper, the inode, is only deleted then. The names can be removed before.) – ilkkachu May 26 '20 at 13:59

2 Answers2

3

Just do:

{
  flock 1 &&
    echo something
} >> "$file"

Locks are gone when the process is gone anyway, so you don't have to worry about kill -s KILL.

0

Most probably you do not have to care about locking at all. The conditions for that are

  1. that your system is not a very old Linux (<3.14)
  2. that the amout of data you write from a single process does not exceed 32K (or rather: getconf SSIZE_MAX)

See https://serverfault.com/questions/599486/what-is-the-size-of-an-atomic-write-to-disk-on-my-system

Hauke Laging
  • 90,279
  • 2
    But a single echo invocation may do more than one write(2) call and you have no control over that. With strace -fe write zsh -c "echo '${(l:10000:)}'" > /dev/null on Debian, I see two writes, one of 8192 bytes, one of 1809. Same with bash, probably down to stdio. – Stéphane Chazelas May 27 '20 at 07:05
  • @StéphaneChazelas Interesting. Nonetheless: When I was searching for that lower limit I assumed it might be as low as 512 bytes. And even that would probably be more than enough for this task. – Hauke Laging May 27 '20 at 08:10