9

I accidentally lost a pdf file during the following process

  • I was running a pdf software application PDFXCView in Wine in Ubuntu 18.04, to open a pdf file in a ext4 filesystem.

  • Then I mv the pdf file somewhere else.

  • Then I edited the pdf file already opened in PDFXCView. When I tried to save the edited file, I had to choose "save as..." to locate the current path of the file and attempted to overwrite it. But PDFXCView failed to overwrite the file, furthermore made it disappear and then aborted .

Here are some attempts.

  1. If it can be helpful, I remember the pathname of the lost pdf file.

  2. I couldn't backup the partition of the filesystem by dd, since I don't have an additional hard drive with big enough capacity for the partition.

  3. I tried debugfs according to https://unix.stackexchange.com/a/80285,

     $ sudo debugfs -w /dev/sda4
     debugfs: lsdel
    
     Inode  Owner  Mode    Size      Blocks   Time deleted
    22549259   1000 100600    141      1/     1 Sat Apr  2 09:14:06 2016
    1 deleted inodes found.
    
    debugfs:  logdump -i 22549259
    22549259: File not found by ext2_lookup    
    

    The file was just lost instead of being deleted in 2016, so I am not sure if it found the correct inode.

  4. I saw in https://unix.stackexchange.com/a/98700/ that says using

    grep -a -C 500 'known pattern' /dev/sda | tee /tmp/recover
    

    to recover a text file which contains a known pattern.

    A while ago, I created the lost pdf file by concatenating several smaller pdf files using pdftk and I still have those smaller files. From one smaller pdf file, I can see the binary content of a smaller pdf file by cat smaller.pdf | less, which contains a readable pdf format specific string

    /URI (http://flask.pocoo.org/docs/1.0/api/#flask.Flask.logger)
    

    So I tried:

    sudo grep -a -C 500 'http://flask.pocoo.org/docs/1.0' /dev/sda4 >  /tmp/test/recover
    

    Because those small files and the lost file both contain the string, and -C 500 is too arbitrary to specify the begin and end of a file. I am not sure it can produce useful results.

I was wondering what ways I may try to recover the pdf file?

Thanks!

Tim
  • 101,790
  • Since it sounds like the pdf is very important, don't play around, unmount the partion if you can (if not, hopefully you have a live usb lying around) and then proceed with magicrescue I will put the steps for it in the answer below – NetIceCat Jan 26 '20 at 19:54

1 Answers1

8

Definitely start with leaving the partition with the data alone, if at all possible (you would be surprised what you can recover even a month later if it is not your main system partition). Then proceed with foremost (I originally mentioned magicrescue but foremost performs just as well, yet it has a ready receipe for pdf

sudo apt update && sudo apt install foremost
sudo foremost -v -t pdf -i [PATH] -o ~/pdfrecovery/

# -t - Filetype [in our case pdf]
# -i - Input file [can be as wide as /dev/sdX or more detailed]
# -o - Output Directory

I just ran it for a few seconds on one of my /dev/sdX drives and pulled 370 pdf files. The files will have no original names and will look like this: 14348984.pdf so the -i flag is pretty important.

Good luck.


Update

Your second option is testdisk/photorec which in your case may be easier when dealing with the known path. testdisk and photorec do have some caveats that if not careful (and happen to confirm multiple dialogs asking if you want to apply changes) can lead to disk damage, but it you take it slow, it may be more appropriate, and faster as it will likely show you a good folder tree structure with a node corresponding to your missing file. If you do not find your file with foremost in let's say 2 hours, post a comment and I will provide a secondary testdisk approach.

Update 2

When I just tested it, testdisk crushed foremost in terms of locating a specific deleted file. It preserved the folder tree and filename structure perfectly, thus limiting the time spent creating every *.pdf file. The two approaches differ substantially, and if the file is very important, I would definitely use both testdisk and foremost to locate the same file to be sure I end up with a full non-corrupted file.

Archemar
  • 31,554
NetIceCat
  • 2,294
  • Thanks. (1) If I don't unmount the partition (my /home partition) or remount it readonly, can I run foremost? Is the risk only that some processes may write to the partition? Will foremost write to the partition? (2) If I know the time window in which the file became lost, can I specify that to foremost? – Tim Jan 26 '20 at 20:39
  • (3) does -i specify the full pathname of the directory which contained the lost file? Or has to be the partition that contained the lost file, which is /dev/sda4 in my case? – Tim Jan 26 '20 at 20:56
  • (4) does it recover all the pdf files including those that are not deleted, or just recover deleted pdf files? – Tim Jan 26 '20 at 21:00
  • Yes, no need to unmount it but it is always better. 2) no, there is no time indexing as far as I know 3) i have only used it either on partitions or .dd files of partitions. I am not sure if you can actually use it on for example ~/Downloads/* because foremost scans blocks and looks for the particular headers and footers 4) it recovers ALL pdf files, because it is meant and often used for damaged partitions where files may be inaccessible but not "deleted" per se. If you know the size of the pdf then you can run a secondary script on the -o` directory.
  • – NetIceCat Jan 26 '20 at 21:13
  • actually if you know some specific part/excerpt of the pdf in question then you can run foremost and then for every new pdf file found in the directory specified with -o you can use pdftotextand then grep for some specific word – NetIceCat Jan 26 '20 at 21:17
  • Thanks. (1) I am running it and it will take a long time to create all the files, and I manually run pdfgrep to search among the directory. I was wondering how to automatically run pdfgrep once a new file is created by foremost, or delete a file if it doesn't contain the specific string? (I now store the recovered files in /tmp under my root partition which has limited unused space. (2) I am also trying ext3grep and have some questions. I'd appreciate if you could consider https://unix.stackexchange.com/questions/564231/does-ext3grep-work-on-ext4 – Tim Jan 26 '20 at 21:41
  • Thanks for update. Do you have some links for recommended usage of testdisk and photorec? Do I have to umount the filesystem that contains the lost file before using them? Is photorec only for recovering image files not pdf files? – Tim Jan 26 '20 at 23:06
  • Could you elaborate how you used "testdisk/photorec which in your case may be easier when dealing with the known path" and "When I just tested it, testdisk crushed foremost in terms of locating a specific deleted file"? Can they recover files only in a given pathname? – Tim Jan 26 '20 at 23:27
  • photorec is not just for photos, but all types of files. However, it again tends to pull from .dd and other image files or partitions. 2) testdisk on the other hand is primarily designed for scanning and repairing of partitions but it allows for remarkable recovery of deleted files as well. If you would like I can walk you throughyour issue with testdisk , just give me 5 minutes to finish eating and we can move to chat in the meantime make sure you have testdisk installed etc.
  • – NetIceCat Jan 26 '20 at 23:27
  • If you could let me know how you use testdisk, given the original pathname of the lost file, the time window of the deletion and the type of the file (pdf). That would be great – Tim Jan 26 '20 at 23:29
  • Do I have to umount the filesystem that contains the lost file before using using testdisk and phtorec? (My laptop doesn't recognize my bootable flash drive, so I haven't figured out how to umount my /home filesystem) – Tim Jan 26 '20 at 23:33
  • Ok, I'm back I'll open a chat window and walk you through it – NetIceCat Jan 26 '20 at 23:51
  • https://chat.stackexchange.com/rooms/info/103751/how-may-i-recover-a-lost-pdf-file?tab=general – NetIceCat Jan 26 '20 at 23:58
  • "Only users nominated by the room owner may talk here" – Tim Jan 27 '20 at 00:09
  • we moved to https://chat.stackexchange.com/rooms/103752/ – Tim Jan 27 '20 at 12:04