12

Due to an application bug as yet undiagnosed, I have several hundred servers with a full disk. There is one file that has been filled up with duplicate lines—not a log file, but a user environment file with variable definitions (so I can't just delete the file).

I wrote a simple sed command to check for the erroneously added lines and delete them, and tested it on a local copy of the file. It worked as intended.

However, when I tried it on the server with the full disk, I got approximately the following error (it's from memory, not copy and paste):

sed: couldn't flush /path/to/file/sed8923ABC: No space left on deviceServerHostname

Of course, I know there's no space left. That's why I'm trying to delete stuff! (The sed command I'm using will reduce a 4000+ line file to about 90 lines.)

My sed command is just sed -i '/myregex/d' /path/to/file/filename

Is there a way I can apply this command despite the full disk?

(It must be automated, since I need to apply it to several hundred servers as a quick-fix.)

(Obviously the application bug needs to be diagnosed, but in the meantime the servers aren't working correctly....)


Update: The situation I faced was resolved by deleting something else that I found out I could delete, but I'd still like the answer to this question, which would be helpful in the future and for other people.

/tmp is a no-go; it's on the same filesystem.

Before I freed up disk space, I did test and find out that I could delete the lines in vi by opening the file and running :g/myregex/d and then successfully save the changes with :wq. It seems it should be possible to automate this, without resorting to a separate filesystem to hold a temp file.... (?)

Wildcard
  • 36,499
  • Related: http://unix.stackexchange.com/q/75889/135943 – Wildcard Dec 22 '15 at 20:09
  • For the astute readers wondering how I'm using a sed regex to check for duplicate lines: Good spotting; I'm really not checking for duplicate lines. The lines that should stay in the file all use double quotes around the values; the lines that should be deleted all use single quotes. – Wildcard Dec 22 '15 at 20:13
  • sponge of moreutils fame might be able to schlep the data off to /tmp or perhaps a memory filesystem as a workaround to the partition being full. – thrig Dec 22 '15 at 20:16
  • 1
    sed -i creates a temporary copy to operate on. I suspect that ed would be better for this, though I'm not familiar enough to proscribe an actual solution – Eric Renouf Dec 22 '15 at 20:29
  • 2
    With ed you'd run: printf %s\\n g/myregex/d w q | ed -s infile but keep in mind some implementations also use temporary files just like sed (you could try busybox ed - afaik it doesn't create a temporary file) – don_crissti Dec 22 '15 at 21:30
  • your vi success was probably only a success because you had the memory to handle it. a similar thing might be done with sed like: sed 'H;1h;$!d;x;P' <file | { read v&& sed "$script" >file; } – mikeserv Dec 23 '15 at 02:54
  • @mikeserv, interesting point that it is only sufficient memory that allowed me to do that...so then (except for trailing newlines which would be stripped) I could probably have done it with echo "$(sed '/myregex/d' file)" > file? – Wildcard Dec 23 '15 at 03:06
  • 1
    @Wildcard - not reliably w/ echo. use printf. and make sed append some char you drop at the last line so you can avoid losing trailing blanks. also, your shell needs to be able to handle the whole file in a single command-line. that's your risk - test first. bash is especially bad at that (i think its to do w/ stack space?) and may sick up on you at any time. the two sed's i recommended would at least use the kernel's pipe buffer to good effect between them, but the method is fairly similar. your command sub thing will also truncate file whether or not the sed w/in is successful. – mikeserv Dec 23 '15 at 03:10
  • 1
    @Wildcard - try sed '/regex/!H;$!d;x' <file|{ read v && cat >file;} and if it works read the rest of my answer.' – mikeserv Dec 23 '15 at 05:52

8 Answers8

11

The -i option doesn't really overwrite the original file. It creates a new file with the output, then renames it to the original filename. Since you don't have room on the filesystem for this new file, it fails.

You'll need to do that yourself in your script, but create the new file on a different filesystem.

Also, if you're just deleting lines that match a regexp, you can use grep instead of sed.

grep -v 'myregex' /path/to/filename > /tmp/filename && mv /tmp/filename /path/to/filename

In general, it's rarely possible for programs to use the same file as input and output -- as soon as it starts writing to the file, the part of the program that's reading from the file will no longer see the original contents. So it either has to copy the original file somewhere first, or write to a new file and rename it when it's done.

If you don't want to use a temporary file, you could try caching the file contents in memory:

file=$(< /path/to/filename)
echo "$file" | grep -v 'myregex' > /path/to/filename
Barmar
  • 9,927
  • 1
    Did it preserves permissions, ownership and timestamps? Maybe rsync -a --no-owner --no-group --remove-source-files "$backupfile" "$destination" from here – Hastur Dec 23 '15 at 01:50
  • @Hastur - do you mean to imply that sed -i does preserve that stuff? – mikeserv Dec 23 '15 at 04:01
  • 3
    @Hastur sed -i doesn't preserve any of those things. I just tried it with a file I don't own, but located in a directory that I do own, and it let me replace the file. The replacement is owned by me, not the original owner. – Barmar Dec 23 '15 at 05:11
  • @Barmar - that's what i thought. sed -i or perl -i are both seriously insecure and I've always considered their popularity confusing. actually writing over the file is the only sure way to do it. creating a new file and moving it over the old results in a new file. – mikeserv Dec 23 '15 at 05:12
  • What about echo "$(cat FILE)" | grep '^"' > FILE? I'm guessing that would capture FILE in RAM before renewing it. – Ralph Rönnquist Dec 23 '15 at 06:52
  • @RalphRönnquist - maybe - if cat can open FILE and if the shell can handle the length of the resulting command... Probably not, though, if the shell sets up the pipeline starting at the right side, or if the subshell spawned on the right-side winds up coming around sooner than the one opened on the left. In either of those cases (which are fairly likely to occur) the subshell on the right side truncates FILE before the one on the left opens it and reads it, or perhaps it truncates it while the command sub reads it. See my answer here for how to overwrite a file in place. – mikeserv Dec 23 '15 at 08:30
  • 1
    @RalphRönnquist To be sure, you'd need to do it in two steps: var=$(< FILE); echo "$FILE" | grep '^"' > FILE – Barmar Dec 23 '15 at 15:18
  • @Barmar - how is that sure? you don't test anything. – mikeserv Dec 23 '15 at 18:07
  • @mikeserv When commands are separated by a semicolon, the first one completes before the second one begins. So there can't be any interference. I don't need to test this to know it's true. – Barmar Dec 23 '15 at 18:13
  • i know how it works - but you dont test anything - it could be an empty variable. you dont know if it worked - you just echo. – mikeserv Dec 23 '15 at 18:14
  • Why would it be an empty variable? I just assigned it from the output of a command that I know works. – Barmar Dec 23 '15 at 18:15
  • Just notices a typo, I meant echo "$var". I got it right in my edit of the answer. – Barmar Dec 23 '15 at 18:16
  • 2
    @Barmar - you don't it works - you don't even know you've successfully opened input. The very least you could do is v=$(<file)&& printf %s\\n "$v" >file but you don't even use &&. The asker's talking about running it in a script - automating overwriting a file with a portion of itself. you ought at least to validate you can successfully open input and output. Also, the shell might explode. – mikeserv Dec 25 '15 at 04:59
  • This is a very good answer, actually; it hadn't occurred to me to place it in a variable. Also, @mikeserv is right: for automating this, I would definitely not run it without &&. – Wildcard Dec 29 '15 at 09:27
4

That's how sed works. If used with -i (in place edit) sed creates a temporary file with the new contents of the processed file. When finished sed, replaces the current working file with the temporary one. The utility does not edit the file in-place. That's exact the behavior of every editor.

It's like you perform the following task in a shell:

sed 'whatever' file >tmp_file
mv tmp_file file

At this point sed, tries to flush the buffered data to the file mentioned in the error message with the fflush() system call:

For output streams, fflush() forces a write of all user-space buffered data for the given output or update stream via the stream's underlying write function.


For your problem, I see a solution in mounting a separte filesystem (for instance a tmpfs, if you have enough memory, or an external storage device) and move some files there, process them there, and move them back.

chaos
  • 48,171
3

Since posting this question I've learned that ex is a POSIX-compliant program. It's almost universally symlinked to vim, but either way, the following is (I think) a key point about ex in relation to filesystems (taken from the POSIX specification):

This section uses the term edit buffer to describe the current working text. No specific implementation is implied by this term. All editing changes are performed on the edit buffer, and no changes to it shall affect any file until an editor command writes the file.

"...shall affect any file..." I believe that putting something on the filesystem (at all, even a temp file) would count as "affecting any file." Maybe?*

Careful study of the POSIX specifications for ex indicate some "gotchas" about its intended portable use when compared to common scripted uses of ex found online (which are littered with vim-specific commands.)

  1. Implementing +cmd is optional according to POSIX.
  2. Allowing multiple -c options is also optional.
  3. The global command :g "eats" everything up to the next non-escaped newline (and therefore runs it after each match found for the regex rather than once at the end). So -c 'g/regex/d | x' only deletes one instance and then exits the file.

So according to what I've researched, the POSIX-compliant method for in-place editing a file on a full filesystem to delete all lines matching a specific regex, is:

ex -sc 'g/myregex/d
x' /path/to/file/filename

This should work providing you have sufficient memory to load the file into a buffer.

*If you find anything which indicates otherwise, please, mention it in the comments.

Wildcard
  • 36,499
  • 2
    but ex writes to tmpfiles... always. its spec'd to write its buffers to disk periodically. there are even spec'd commands for locating the tmp file buffers on disk. – mikeserv Jan 07 '16 at 10:25
  • @Wildcard Thanks for sharing, I've linked back at similar post at SO. I assume ex +g/match/d -scx file is POSIX-compliant as well? – kenorb Jan 07 '16 at 10:34
  • @kenorb, not quite, according to my reading of the specs—see my point 1 in the answer above. Exact quote from POSIX is "The ex utility shall conform to XBD Utility Syntax Guidelines, except for the unspecified usage of '-', and that '+' may be recognized as an option delimiter as well as '-'." – Wildcard Jan 07 '16 at 10:46
  • 1
    I can’t prove it, except by appeal to common sense, but I believe that you’re reading more into that statement from the specification than is really there.  I suggest that the safer interpretation is that no changes to the edit buffer shall affect any file that existed before the edit session began, or that the user named.  See also my comments on my answer. – G-Man Says 'Reinstate Monica' Jan 31 '16 at 23:08
  • @G-Man, I actually think you're right; my initial interpretation was probably wishful thinking. However, since editing the file in vi worked on a full filesystem, I believe that in most cases it would work with ex as well—though maybe not for a ginormous file. sed -i doesn't work on a full filesystem regardless of filesize. – Wildcard Feb 01 '16 at 07:00
2

Use the pipe, Luke!

Read file | filter | write back

sed 's/PATTERN//' BIGFILE | dd of=BIGFILE conv=notrunc

in this case sed doesn't create a new file and just send output piped to dd which opens the same file. Of course one can use grep in particular case

grep -v 'PATTERN' BIGFILE | dd of=BIGFILE conv=notrunc

then truncate the remaining.

dd if=/dev/null of=BIGFILE seek=1 bs=BYTES_OF_SED_OUTPUT
1

You can truncate the file quite easily if you can get the byte count to your offset and your lines occur from a start point through to the end.

o=$(sed -ne'/regex/q;p' <file|wc -c)
dd if=/dev/null of=file bs="$o" seek=1

Or else if your ${TMPDIR:-/tmp} is on some other file system perhaps:

{   cut -c2- | sed "$script" >file
} <file <<FILE
$(paste /dev/null -)
FILE

Because (most) shells put their here-documents there in a deleted temp-file. It is perfectly safe so long as the <<FILE descriptor is maintained from start to finish and ${TMPDIR:-/tmp} has as much space as you need.

Shells which don't use temp files use pipes, and so are not safe to use this way. These shells are typically ash derivatives like busybox, dash, BSD sh - zsh, bash, ksh, and the Bourne shell, however, all use temp files.

apparently I wrote a little shell program last July to do something very like this


If /tmp is not viable, then so long as you can fit the file in memory something like...

sed 'H;$!d;x' <file | { read v &&
sed "$script" >file;}

...as a general case would at least ensure that the file was fully buffered by the first sed process before attempting to truncate the in/out file.

A more targeted - and efficient - solution could be:

sed '/regex/!H;$!d;x' <file|{ read v && cat >file;}

...because it wouldn't bother buffering lines you meant to delete anyway.

A test of the general case:

{   nums=/tmp/nums
    seq 1000000 >$nums
    ls -lh "$nums"
    wc -l  "$nums"
    sed 'H;$!d;x' <$nums | { read script &&  ### read always gets a blank
    sed "$script" >$nums;}
    wc -l  "$nums"
    ls -lh "$nums"
}

-rw-r--r-- 1 mikeserv mikeserv 6.6M Dec 22 20:26 /tmp/nums
1000000 /tmp/nums
1000000 /tmp/nums
-rw-r--r-- 1 mikeserv mikeserv 6.6M Dec 22 20:26 /tmp/nums
mikeserv
  • 58,310
  • I confess I hadn't read your answer in detail before, because it starts with unworkable (for me) solutions that involve byte count (different amongst each of the many servers) and /tmp which is on the same filesystem. I like your dual sed version. I think a combination of Barmar's and your answer would probably be best, something like: myvar="$(sed '/myregex/d' < file)" && [ -n "$myvar" ] && echo "$myvar" > file ; unset myvar (For this case I don't care about preserving trailing newlines.) – Wildcard Dec 29 '15 at 09:32
  • 2
    @Wildcard - that could be. but you shouldnt use the shell like a database. the sed | cat thing above never opens output unless sed has already buffered the entire file and is ready to start writing all of it to output. If it tries to buffer the file and fails - read is not successful because finds EOF on the | pipe before it reads its first newline and so cat >out never happens until its time to write it out from memory entirely. an overflow or anything like it just fails. also the whole pipeline returns success or failure every time. storing it in a var is just more risky. – mikeserv Dec 29 '15 at 09:37
  • @Wildcard - if i really wanted it in a variable too, i think id do it like: file=$(sed '/regex/!H;$!d;x' <file | read v && tee file) && cmp - file <<<"$file" || shite so the output file and the var would be written simultaneously, which would make either or an effective backup, which is the only reason you'd wanna complicate things further than you'd need to. – mikeserv Dec 29 '15 at 09:51
  • @mikeserv: I am dealing the same problem as the OP now and I find your solution really useful. But I don't understand the usage of read script and read v in your answer. If you can elaborate more about it I will be much appreciated, thanks! – sylye Sep 26 '16 at 08:01
  • 1
    @sylye - $script is the sed script you would use to target whatever portion of your file you wanted; its the script that gets you the end result that you want in stream. v is just a placeholder for an empty line. in a bash shell it is not necessary because bash will automatically use the $REPLY shell variable in its stead if you dont specify one, but POSIXly you should always do so. im glad you find it useful, by the way. good luck with it. im mikeserv@gmail if you need anything in depth. i should have a computer again in a few days – mikeserv Sep 30 '16 at 11:54
1

As noted in other answers, sed -i works by copying the file to a new file in the same directory, making changes in the process, and then moving the new file over the original.  That's why it doesn't work.  ed (the original line editor) works in a somewhat similar manner, but, last time I checked, it uses /tmp for the scratch file.  If your /tmp is on a different filesystem from the one that's full, ed may do the job for you.

Try this (at your interactive shell prompt):

$ ed /path/to/file/filename
P
g/myregex/d
w
q

The P (which is a capital P) is not strictly necessary.  It turns on prompting; without it, you're working in the dark, and some people find this disconcerting.  The w and q are write and quit.

ed is notorious for cryptic diagnostics.  If at any point it displays anything other that the prompt (which is *) or something that is clearly a confirmation of successful operation (especially if it contains a ?), do not write the file (with w).  Just quit (q).  If it doesn't let you out, try saying q again.

If your /tmp directory is on the filesystem that is full (or if its filesystem is full, also), try to find some space somewhere.  chaos mentioned mounting a tmpfs or an external storage device (e.g., a flash drive); but, if you have multiple filesystems, and they are not all full, you can simply use one of the other existing ones.  chaos suggests copying the file(s) to the other filesystem, editing them there (with sed), and then copying them back.  At this point, that may be the simplest solution.  But an alternative would be to create a writable directory on a filesystem that has some free space, set environment variable TMPDIR to point to that directory, and then run ed.  (Disclosure: I'm not sure whether this will work, but it can't hurt.)

Once you get ed working, you can automate this by doing

ed filename << EOF
g/myregex/d
w
q
EOF

in a script.  Or printf '%s\n' 'g/myregex/d' w q | ed -s filename, as suggested by don_crissti.

1

This answer borrows ideas from this other answer and this other answer but builds on them, creating an answer that is more generally applicable:

num_bytes=$(sed '/myregex/d' /path/to/file/filename | wc -c)
sed '/myregex/d' /path/to/file/filename 1<> /path/to/file/filename
dd if=/dev/null of=/path/to/file/filename bs="$num_bytes" seek=1

The first line runs the sed command with output written to standard output (and not to a file); specifically, to a pipe to wc to count the characters.  The second line also runs the sed command with output written to standard output, which, in this case is redirected to the input file in read/write overwrite (no truncate) mode, which is discussed here.  This is a somewhat dangerous thing to do; it is safe only when the filter command never increases the amount of data (text); i.e., for every n bytes that it reads, it writes n or fewer bytes.  This is, of course, true for the sed '/myregex/d' command; for every line that it reads, it writes the exact same line, or nothing.  (Other examples: s/foo/fu/ or s/foo/bar/ would be safe, but s/fu/foo/ and s/foo/foobar/ would not.)

For example:

$ cat filename
It was
a dark and stormy night.
$ sed '/was/d' filename 1<> filename
$ cat filename
a dark and stormy night.
night.

because these 32 bytes of data:

I  t     w  a  s \n  a     d  a  r  k     a  n  d     s  t  o  r  m  y     n  i  g  h  t  . \n

got overwritten with these 25 characters:

a     d  a  r  k     a  n  d     s  t  o  r  m  y     n  i  g  h  t  . \n

leaving the seven bytes night.\n left over at the end.

Finally, the dd command seeks to the end of the new, scrubbed data (byte 25 in this example) and removes the rest of the file; i.e., it truncates the file at that point.


If, for any reason, the 1<> trick doesn’t work, you can do

sed '/myregex/d' /path/to/file/filename | dd of=/path/to/file/filename conv=notrunc

Also, note that, as long as all you’re doing is removing lines, all you need is grep -v myregex (as pointed out by Barmar).

0

I can imagine that in that moment, you have something/enough free RAM. You could use tmpfs and work with file there, using sed.

I think that /path/to/file/filename is not big (they are just user environment file ).

mkdir /mnt/ramdisk
mount -t tmpfs -o size=1m tmpfs /mnt/ramdisk

Test it with df -h /mnt/ramdisk or mount, or exit status code

echo $?
0

df -h /tmp/tmpfs/ Filesystem Size Used Avail Use% Mounted on tmpfs 1.0M 0 1.0M 0% /tmp/tmpfs

Now you can move or copy the file to /mnt/ramdisk and do your stuff, then replace original

More about tmpfs or ramfs on jamescoyle site.

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255