5

If I have a shell/python script that uses sed to modify a file in place based on user inputs, and then two users run the same script at the same time or approx. same time, is 'sed' thread safe ? Or perhaps it is not an issue because the file_descripor that was opened by the first thread will be used to lock the file anyway ? thx

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
terreys
  • 71
  • 3
    Slightly more info wanted. How do you use sed from Python (and why, can't Python do things like that fairly effortlessly?). – Kusalananda Mar 06 '19 at 21:38

2 Answers2

13

I'm not going to nitpick on the awful terminology, but yes, GNU sed with its -i ("in-place") flag could be safely used by more than one process at the same time without any extra locking, because sed is not actually modifying the file in-place, but it's redirecting the output to a temporary file, and if everything goes well, it will rename(2) (move) the temporary file to the original file, and the rename(2) is guaranteed to be atomic:

$ strace sed -i s/o/e/g foo.txt
open("foo.txt", O_RDONLY)               = 3
...
open("./sedDe80VL", O_RDWR|O_CREAT|O_EXCL, 0600) = 4
...
read(3, "foo\n", 4096)                  = 4
...
write(4, "fee\n", 4)                    = 4
read(3, "", 4096)                       = 0
...
close(3)                                = 0
close(4)                                = 0
rename("./sedDe80VL", "foo.txt")        = 0

At any point, foo.txt will refer either to the complete original file or to the complete processed file, never to something in between the two.

Notes:

This does not handle the case where more than one process starts editing a file without waiting for the other processes to have finished editing it, in which case only the process which finishes last "wins" (ie wipes the changes performed by the other processes). This is not a matter of data integrity, and cannot be handled without higher level coordination between the processes (blindly locking the file will lead to deadlocks).

Currently, GNU sed will copy the standard file permissions into the new inode, but not the ACLs and extended attributes. If using sed -i on such a file, all that extra metadata will be lost. IMHO that's more of a feature than a bug or limitation.

perl -i used to work very differently from sed -i until version 5.28; it used to first make a temporary copy of the file, truncate to the original file, and redirect the output to it. That was preserving the original inode number and extra metadata, but would completely trash the content of the file in the case where the perl -i process was interrupted or more than one perl -i process was editing the file at the same time. See the discussion, the original commit (which was subsequently improved) and the changelog in perl5280delta.

  • 4
    But what can happen is that two copies of sed start reading the file, make different changes, storing them to their respective temporary files, which are then renamed into place one after other, without regard for the fact that there was another sed working on the file at the same time. The changes made by one of the sed processes would be lost. – ilkkachu Mar 06 '19 at 22:05
  • 2
    @ilkkachu that will still be completely consistent -- the file will be either modified by both processes in turn (in any order) or by just one of them. At no point will the file contain garbage resulting from both processes modifying it at the same time. –  Mar 06 '19 at 22:15
  • 9
    it won't be total garbage, but having changes lost can still be an issue. The question refers to the file "being locked", and locking would usually refer to the second sed process waiting until the first completed. That might just be awful terminology on their part; it's hard to say what they really care about, but that particular issue is still possible. – ilkkachu Mar 06 '19 at 22:25
  • Good reply, but in my opinion needs a correction, definitely by feature/definition, sed -i on the same file is not thread safe, that would mean handle the concurrency in a reasonable way. When the last one to end "wins", overrides the previous one work, we clearly looking at a race condition, something that should not happen in Thread Safe code. – João Mar 13 '24 at 15:11
-2

It is not recommendable to use sed -i on the same file on multiple parallel task, but if you have to...

The pattern to make it work is by using a Semaphore.

To allow asynchronous call on sed -i on the same file in a functional thread aware way is to use Advisory lock.

flock theSharedFile sed -i s/"userInput1"/"$userInput2"/g theSharedFile

Flock creates and advisory lock on theSharedFile, and will execute the sed command with the rest of the arguments when the lock is free (when no other flock is running on theSharedFile).

If two different users trigger it at same time, the second one will wait until the previous ones end. Like that it will be a fully functional and thread safe approach. No "user" change will be discarded overridden.

Since it is an advisory lock, it will only be significant for flock, and should not cause any kind of deadlock. Anyone reading the file will never get locked. Notice if file is large it can be expected to take significant amount of time of waiting for the processes waiting for the lock.

You can list current locks with lslocks

Hope this helps someone out there. Thanks for your time.

João
  • 97
  • 1
    The way I read your example you locked a file called sed in the current directory .... not sure what the rest of the command will do. – tink Mar 13 '24 at 17:28
  • Hello, typing back here I missed typing the first argument of flock in the example. That was not what I implemented in my situations with a concrete sed -i (of an application level data corruption issue). Fixed my reply. Hope now it is correct and useful for similar situations. Thanks for the feedback. – João Mar 14 '24 at 06:02
  • 1
    Would it not potentially be problematic to use flock with the same file you use with sed? It more common to use flock with a separate file. After all, you are not locking the file, you are locking the operation. Also, your code needs double quoting of all shell variable expansions. – Kusalananda Mar 14 '24 at 11:28
  • Thanks, that is just example, not real code. Flock is all about controlling access to shared resources, same file. The sed on the same file is the not so great practice. In my specific case working on adding gnu parallel on existing large shell script code base. Using flock to control the concurrent access on operation worked great. – João Mar 15 '24 at 13:09