87

I am not talking about recovering deleted files, but overwritten files. Namely by the following methods:

# move
mv new_file old_file

# copy
cp new_file old_file

# edit
vi existing_file
> D
> i new_content
> :x

Is it possible to retrieve anything if any of the above three actions is performed assuming no special programs are installed on the linux machine?

  • 4
    You mean apart from your backups? – jasonwryan Aug 09 '14 at 02:03
  • 1
    @jasonwryan, yes, of course. – Question Overflow Aug 09 '14 at 02:03
  • 4
    I just want to point out that your first example (mv) is akin to deleting old_file, not overwriting it, so methods (if any exist) for recovering deleted files, as opposed to overwritten files, would apply in that case. Your other two examples do indeed overwrite an existing old_file and existing_file, respectively. – Celada Aug 09 '14 at 04:40
  • All three examples you provided are implemented by deleting all the original file's data blocks and writing to newly-allocated blocks, and the procedure for recovering that data is the same as recovering a deleted file. An exception might be if the original files are exceedingly short (shorter than 60 bytes on ext4) where the latter two examples likely make the previous data unrecoverable. – Mark Plotnick Aug 09 '14 at 13:46
  • 2
    @MarkPlotnick, according to Celada's comment, mv is different. – Question Overflow Aug 10 '14 at 03:33

7 Answers7

102

The answer is "Probably yes, but it depends on the filesystem type, and timing."

None of those three examples will overwrite the physical data blocks of old_file or existing_file, except by chance.

  • mv new_file old_file. This will unlink old_file. If there are additional hard links to old_file, the blocks will remain unchanged in those remaining links. Otherwise, the blocks will generally (it depends on the filesystem type) be placed on a free list. Then, if the mv requires copying (a opposed to just moving directory entries), new blocks will be allocated as mv writes.

    These newly-allocated blocks may or may not be the same ones that were just freed. On filesystems like UFS, blocks are allocated, if possible, from the same cylinder group as the directory the file was created in. So there's a chance that unlinking a file from a directory and creating a file in that same directory will re-use (and overwrite) some of the same blocks that were just freed. This is why the standard advice to people who accidentally remove a file is to not write any new data to files in their directory tree (and preferably not to the entire filesystem) until someone can attempt file recovery.

  • cp new_file old_file will do the following (you can use strace to see the system calls):

    open("old_file", O_WRONLY|O_TRUNC) = 4

    The O_TRUNC flag will cause all the data blocks to be freed, just like mv did above. And as above, they will generally be added to a free list, and may or may not get reused by the subsequent writes done by the cp command.

  • vi existing_file. If vi is actually vim, the :x command does the following:

    unlink("existing_file~") = -1 ENOENT (No such file or directory)
    rename("existing_file", "existing_file~") = 0
    open("existing_file", O_WRONLY|O_CREAT|O_TRUNC, 0664) = 3

    So it doesn't even remove the old data; the data is preserved in a backup file.

    On FreeBSD, vi does open("existing_file",O_WRONLY|O_CREAT|O_TRUNC, 0664), which will have the same semantics as cp, above.


You can recover some or all of the data without special programs; all you need is grep and dd, and access to the raw device.

For small text files, the single grep command in the answer from @Steven D in the question you linked to is the easiest way:

grep -i -a -B100 -A100 'text in the deleted file' /dev/sda1

But for larger files that may be in multiple non-contiguous blocks, I do this:

grep -a -b "text in the deleted file" /dev/sda1
13813610612:this is some text in the deleted file

which will give you the offset in bytes of the matching line. Follow this with a series of dd commands, starting with

dd if=/dev/sda1 count=1 skip=$(expr 13813610612 / 512)

You'd also want to read some blocks before and after that block. On UFS, file blocks are usually 8KB and are usually allocated fairly contiguously, a single file's blocks being interleaved alternately with 8KB blocks from other files or free space. The tail of a file on UFS is up to 7 1KB fragments, which may or may not be contiguous.

Of course, on file systems that compress or encrypt data, recovery might not be this straightforward.


There are actually very few utilities in Unix that will overwrite an existing file's data blocks. One that comes to mind is dd conv=notrunc. Another is shred.

Mark Plotnick
  • 25,413
  • 3
  • 64
  • 82
  • 6
    Thank you for explaining the inner mechanics of the three different operations. This is really useful! – Question Overflow Aug 16 '14 at 02:04
  • btrfs is pretty resilient to deleted files. It tends to use blocks in a round-robin fashion, so if you have enough space on the device the file will not be overwritten for a long time. See here – pqnet Aug 17 '14 at 18:09
  • how to get the preceeding text block and what does skip do? – unixit Apr 27 '15 at 11:05
  • @Islam When you give dd the skip= parameter, then instead of reading from the beginning of the input it will skip that number of blocks. A block is 512 bytes by default, but can be changed with the bs= parameter. – Mark Plotnick Apr 28 '15 at 19:40
  • 1
    @Islam To get the preceding text block, I'd suggest giving a skip= value that's 1 block (512 bytes) less. In my example, $(expr 13813610612 / 512 - 1) . If that doesn't get what you want, try again while subtracting 16 or 32, which will look at the areas that are 8192 and 16384 bytes less; files are often allocated in 8192-byte chunks. If you're trying to recover a larger file, try larger counts to save time. I usually use count=16 and look at the result in an editor like emacs which doesn't mind if some of the data isn't text. – Mark Plotnick Apr 28 '15 at 19:40
  • 1
    works great !! i used your solution with grep for a overwritten file !! – ropic Aug 11 '15 at 02:18
  • 1
    That grep trick is amazing. You sir, are a savior. – Krzysztof Jabłoński Nov 10 '15 at 16:20
  • with that grep/dd method, what do I do with the resulting data? I lost a tarfile because my downloader decided to go full retard and download the tarfile from my local FS, and save it to the tarfile on my local FS. It totally corrupted it, so I'm trying to get it back. This is the data: http://pastebin.com/XBVPVCjn – Braden Best Nov 19 '15 at 16:24
  • @B1KMusic Stitching together a nontrivial tar file or other non-text file from fragments might not be easy. As a start, you can often find the beginning of a tar file by grepping for the string ustar. But you may want to try specialty recovery tools such as PhotoRec – Mark Plotnick Nov 19 '15 at 16:29
  • @MarkPlotnick I would really like to get that data back. Particularly, tarfile::session/todo/main and tarfile::session/todo/code. Everything else, I'm okay with losing because I was never going to get to them anyway. Even if a few lines are irrecoverable, it's fine, but I need the two aforementioned "files", because they are critical. Losing them will be a huge blow to my workflow. It has all of my project ideas, bookmarks, reminders and whatnot, it's the basket that has all of my eggs – Braden Best Nov 19 '15 at 16:34
  • @B1KMusic You can try grepping the filesystem for "session/todo/main", and dd that block and the blocks immediately after that into a new file on another filesystem. Edit the resulting file in an editor that won't choke on binary data, such as emacs, and trim the extraneous data. – Mark Plotnick Nov 22 '15 at 01:32
  • @Mark that is indeed what I ended up doing. I wrote a script to dd about 100 of the nodes grep gave me, 1mb each. I did indeed see tons of instances of ustar, but I had no idea what it meant at the time. I take it it's us-tar and not u-star, correct? Anyways, I got my data back several times over. I take it tar.vim generates a temp file, and that would explain why the data is found multiple times. – Braden Best Nov 22 '15 at 01:39
  • Good. ustar is the acronym for Uniform Standard Tar; it's the "magic number" in the header of reasonably modern tar files. – Mark Plotnick Nov 22 '15 at 02:01
  • This just saved me about an hour of work. Thank youuu! – Jazzepi Sep 26 '17 at 22:27
  • This answer saved me a lost 4hrs of work recovring a file I over wrote in vi and then saved - the old :wq! without looking. I ran grep -i -a -B100 -A100 'text in the deleted file' /dev/sda1 | strings > /var/tmp/my-recovered-file . The searched this file for most of what I lost. I had a few spaces and curly brackets missing , but I managed to get the file back to the way it was in about 10 mins. – AndyM Nov 21 '17 at 09:27
  • I think this might really work, but I get grep: memory exhausted error. – Eerik Sven Puudist Aug 11 '20 at 22:35
  • 4
    @EerikSvenPuudist That can happen because grep tries to read the input line by line, and on disk partitions with random bytes, the lines can be very long. A workaround is in the answer to this question. Instead of grep -i -a -B100 -A100 'text in the deleted file' /dev/sda1, try tr -s "\0" "\n" < /dev/sda1 | grep -i -a -B100 -A100 'text in the deleted file' – Mark Plotnick Aug 12 '20 at 17:41
  • I just got back an entire, untouched config.toml file from an (unlocked) LUKS-encrypted XFS partition thanks to this technique. I just had to adjust the -A parameter to the length of the file (over 200 lines). This is absolutely incredible, thank you so much! – neitsab Mar 03 '21 at 21:02
  • Fantastic trick! You saved me many hours of brainwork, thank you so much! – Patrick Apr 23 '23 at 19:08
  • Thank you so much! – sfotiadis Mar 29 '24 at 14:26
13

Make sure you have enough disk space in /var/tmp or somewhere big.

Try

 grep -i -a -B100 -A100 'a string unique to your file' /dev/sda1 |
 strings > /var/tmp/my-recovered-file

where /dev/sda1 would be your disk on your system.

Then search my-recovered-file for you string.

It might mostly be there, If you find it check for missing linespaces, brackets , sysmbols etc.

Use a search word from your file that is fairly unqiue or string that will cut down the amount of data in file. If you search for a word such as "echo" you will get back loads of strings as the system will have lots of files with the word echo in them.

AndyM
  • 514
7

I'm going to say no (with a giant asterisk).

Think about how data is laid on a disk. You have blocks which contain data and point to the next block (if there is one).

When you overwrite data you are changing the block contents (and if you are extending the file all the ending marker). So nothing should be able to be recovered (see below).

If you shorten the file, then you are loosing the old blocks and they will soon be recycled. If you're a programmer, think of a linked list where you "lose" half of your list without doing a free/delete. That data is still there, but good luck finding it.

Something that might be interesting to think about is fragmentation.

Fragmentation occurs when you have "holes" of non-contiguous data on your disk.This can be caused by modifying files such that you extend or shorten them and they no longer fit in their original spot on the disk.

In the event of having a file grow past its original size (it needs to move at this point), depending on your filesystem you may copy the entire file to a new location where the old data would still be there (but marked as free) or you just change the old ending pointer and have it point to a new location (this will lead to thrashing).

The long story short, your data is probably lost (without going through an extreme forensic process where you look at it under a microscope); however, there is a chance that it is still there.

SailorCire
  • 2,503
  • 4
    Your answer makes the assumption that a block-based non-copy-on-write filesystem such as ext4 or xfs is in use. With copy on write filesystems such as zfs and btrfs you are in fact never "changing the block contents"; those filesystems always use brand new blocks to contain new data. Also, log-based filesystems like jffs2 also always write new data to new locations (not "blocks", those filesystems are not block-based). That being said, this doesn't mean it's easy to find where the old data live and to do it before the space is recycled. So your answer, which is no, is still correct – Celada Aug 09 '14 at 04:51
  • @Celada Thanks! I found that very informative. I haven't had the time to look at how btrfs or zfs works, but I knew they exist. – SailorCire Aug 09 '14 at 15:18
5

TL;DR - If the overwritten file is still being held open by a running process, then this blog post might save your bacon:

https://www.linux.com/news/bring-back-deleted-files-lsof/

In it, it talks about deleted files, but I had good luck with it even with a file that was overwritten by rsync. And I'm talking about a 60 GB file overwritten by a 4 MB one, and I was able to recover the original because luckily I had not stopped the running process that was keeping it open.

fulv
  • 151
1

I found myself in the same situation - I did a "mv FILE1.c FILE2.c".

You required that "no special programs are installed on the linux machine", which is possible if you install those tools on other machines or using a livedisk.

Stop or limit writes to your disk

In this kind of situation is it best to limit any writes on the system at hand because you could really overwrite the data you want to keep.

So, first of all, I hope that you are not browsing the web from the computer where you have your data on!

In some cases the file might still be open in a process. If you think it is, do not stop your machine just yet. You might want to bring the process keeping the file open to sleep first before looking for the handle to your data. And in that case, you do not have to stop your machine as. At least one other reply points to the method. The file will often have a giberish name in the directory where it was located (look using ls -lart - the most recent files appear last and the date may hint you which is the best guess).

Use a recovery system

Then, according to your priorities, do one of the following:

If you can plug the disk in another machine or boot from a another partition or USB key or Live CD:

Stop the machine, if it is acceptable for you (and your system), just power it off by unplugging the power plug, battery or pressing the on/off button for a long time.
A clean power down adds some risk of overwriting the file you need.

If you can't start up another system: Limit your writes to the system. Kill programs that are likely to write to disk.

'photorec' installed with 'testdisk'

Most of the time I use "testdisk". I landed on this page as I check if there was another method I did not know about.

"testdisk" is a set of tools that I often install beforehand, and I had it installed in my Ubuntu 16.04 machine ("legacy" for a good reason).

You required that "no special programs are installed on the linux machine" - you can install "testdisk" on another machine and read your original disk from there. You can boot from a USB disk as well.

If you can't do that, you can install it on the system at hand. In that case I recommend to remove some old big files that you have hanging around, such a a big ZIP that you downloaded a while ago, or an iso. I do that because I suppose that the recent data tends to sit near the end of the disk, not the start.

You can then just install 'testdisk' using something like (example for debian):

apt-get install testdisk

Then launch "photorec" and let it restore files to a device (partition) different from the one that your data is located on. That can be a USB drive, a network drive and even a the /tmp directory in some cases (when it is mapped to RAM).

photorec /d PATH_TO_OTHER_DEVICE

After selecting the device to restore from, choose "[File Opt]" from the bottom menu. Then deselect all options and select only the file type that you are looking for. In my case it was a "C"-file, so I selected "text". photorec still created .c files it found. Then start the [Search] and look only in the Free space.

While the restore was running, I performed a command like:

grep minTemp recup*/*.c

In the path where the recovery directories were created by photorec. I kne that "minTemp" was present in my file, and I was looking for a c file.

I got 30 entries of diffent versions of the file, examining the bigger ones first.

photorec was still running, but there were now new matches on 'minTemp', so I stopped that process as I was confident I had the file I needed.

External service

Depending on your expertise, there is also the option of subconstracting the job. There are quite a few companies specializing in data recovery - they do not install any tool on your disk. The minimum cost is something like $500 if the subcontractor can recover data.

Prepare

To better cope with such a situation, prepare!:

  • Learn how to recover data before it occurs, try to recover some data when you do not need to recover it.
  • Install 'testdisk' on your systems before you need it (installing testdisk will not overwrite the data as it is installed already);
  • Keep your data on a partition different from your system files - some even recommend a separate partition for the /tmp directory;
  • Use snapshots. You can do that on the "device" level (zfs/btrfs), snapshot tools (rsnapshot) and even private clouds that may keep some older file versions. There are also NAS systems that have such a function integrated (you can find the previous versions in '.snapshot' directories;
  • Use backup tools like ShadowProtect, Acronis, and others that allow you to do frequent incremental backups of your online disks.
  • Prepare a USB Drive with recovery tools and appropriate live OS's. [I keep one on me].
le_top
  • 123
1

I had this problem and manage to solve it using this tool. https://github.com/PabloLec/RecoverPy Basically, my file auto-saved when the whole file was empty. Due to an accidental select all. Later I keep working on file but made mistake so I decide to close file without saving... When I reopen file it was empty to my horror 4000 line file gone... Thankfully this open file saved months of my research notes.

https://github.com/PabloLec/RecoverPy This tool solved by issue I accidentally saved the file empty... When I reopen file it was empty.

This tool can scan blocks in your drive using a keyword.

How to use:

  1. Select the drive you plan to perform the search you need to move the arrow keys then click enter to select the drive
  2. perform a search for keyword you know for certain the file contains
  3. It will bring search results, click on one that contain keyword you looking for
  4. Then you can navigate to nearby blocks and select them to be added to a file
  5. Once you have all the blocks you plan to add to the file click on save file and it save a file in your tmp folder.

This tool is amazing since I manage to recover a file with nearly 4000 lines... It was my notes for some research I have been doing. (Yes I should had made a backup) and I learned my lesson and now the file is online.

John
  • 11
  • I realize that it’s impossible (or, at best, highly impractical) to post a tool in an answer, but can you please [edit] your answer to give some explanation of how this tool works (what are its capabilities and limitations?) and how to use it? – G-Man Says 'Reinstate Monica' Aug 02 '22 at 16:25
-1

I had overwritten a text file (VQ1.txt) with 12 hr worth test data:( A notion that unix saves previous version of the file in text.txt~ format, made me look into folder containing the overwritten file with $ -ll Full list showed VQ1.txt~ that had my 'lost' data!

$ cat VQ1.txt~  
Start time at: Thu Apr  2 18:07:23 PDT 2015
User, KW: 12hrFA_OEM_HelloVoiceQ
Test Case: 
Detection:  1, 1, 04-03 01:07:00.673 D/MultiKeywordBdctReceiver( 1743): vs status 258 : 2 : 1
Detection:  2, 1, 04-03 01:09:04.813 D/MultiKeywordBdctReceiver( 1743): vs status 258 : 2 : 1
Detection:  3, 1, 04-03 04:09:26.023 D/MultiKeywordBdctReceiver( 1743): vs status 258 : 2 : 1
Detection:  4, 1, 04-03 04:11:29.893 D/MultiKeywordBdctReceiver( 1743): vs status 258 : 2 : 1
Detection:  5, 1, 04-03 07:12:27.013 D/MultiKeywordBdctReceiver( 1743): vs status 258 : 2 : 1
Detection:  6, 1, 04-03 07:14:30.803 D/MultiKeywordBdctReceiver( 1743): vs status 258 : 2 : 1
Detection:  7, 1, 04-03 08:37:13.113 D/MultiKeywordBdctReceiver( 1743): vs status 258 : 2 : 1
Detection:  8, 1, 04-03 10:21:23.533 D/MultiKeywordBdctReceiver( 1743): vs status 258 : 2 : 1
Detection:  9, 1, 04-03 10:23:27.733 D/MultiKeywordBdctReceiver( 1743): vs status 258 : 2 : 1
Detection:  10, 1, 04-03 13:23:47.893 D/MultiKeywordBdctReceiver( 1743): vs status 258 : 2 : 1
Detection:  11, 1, 04-03 13:25:52.203 D/MultiKeywordBdctReceiver( 1743): vs status 258 : 2 : 1

12hrFA_OEM_HelloVoiceQ,  
KW detect count: 11
jherran
  • 3,939
catsat
  • 11
  • 10
    Isn't that more a feature of certain text editors instead of Unix in general? I'm not aware of a file system that saves old versions of files that way. – Joey Sep 11 '16 at 09:29
  • @Joey You are correct but I had completely forgotten that my trusty emacs does this and catsat's answer (in conjunction with your comment) did save my bacon on this occasion. – Paul Brodersen Sep 06 '22 at 12:09