17

I had a directory containing some 2000 files.
I ran the following command to move those 2000 files into a target directory:

find /opt/alfresco \
        -type f \( -iname \*.pdf -o -iname \*.xml \) \
        -exec mv {} /opt/alfresco/archived/2020-01-07  \; > /opt/alfresco/scripts/move.log

But, I forgot to append a / at the end of destination path. So what the above command did is, it created a file with name 2020-01-07 and wrote some binary contents to it which is now unreadable. And my 2000 files are gone. This 2020-01-07 file's size is 220 KB. But those 2000 files' size combined was approx 1 GB.

Is there any way I can recover those 2000 files? Or any way by which I can convert this file 2020-01-07 to a directory 2020-01-07 with my data coming back?

Kusalananda
  • 333,661
  • 1
    Next time, backup your data. a few gigabytes can fit into a cheap USB key, or on some Internet server (using scp, rsync, ftp, etc...), or some tar.gz archive stored elsewhere on your computer – Basile Starynkevitch Jan 07 '20 at 13:50
  • You might also be interested in using some version control system such as git or svn. Both can handle *.pdf and *.xml files quite efficiently. I really suggest taking an hour to learn more about them. In 2020 a few gigabytes of data is not much. – Basile Starynkevitch Jan 07 '20 at 13:55
  • You could setup some crontab(5) job to backup your small data every hour, e.g. using rsync(1) – Basile Starynkevitch Jan 07 '20 at 14:01
  • 6
    A variant on this ("What happens when you run mv *?") is a question I've asked in interviews in the past. Understanding how globs are expanded by the shell, rather than individual commands, and reasoning about the behavior in the way Kusalananda describes are good skills to have if you're working with Linux regularly. – Xiong Chiamiov Jan 07 '20 at 21:10
  • 8
    Getting used with using mv (and also cp) with options -i and -v (short -iv) has saved me lots of headaches. The option -i instructs mv to prompt for confirmation before overwriting existing files and -v increases the verbosity by printing the source and the destination paths. Using -iv in your case would have paused the process at the moving of the second file over the first moved and because of the verbosity you would have known which file got moved first. – woodengod Jan 08 '20 at 00:26
  • Shell commands are little programs. This is why many people test a potentially destructive command first, or at least insert "echo " before the dangerous part to review the output first. Even so, I would have done a cp first, then delete the source files after observing the expected result. – andy256 Jan 08 '20 at 11:50
  • @andy256: or maybe ln to avoid actually copying the data. Of course I would already have been using -exec mv -t /dest/dir {} + so find could pass multiple files to one invocation of mv. It seems -exec command {} more args + doesn't work, but apparently does with {} \; – Peter Cordes Jan 08 '20 at 17:33

1 Answers1

50

Adding a slash at the end of the destination path /opt/alfresco/archived/2020-01-07 would have made the mv command error out, as the 2020-01-07 directory evidently does not exist. This would have saved your files.

They would also have been saved if /opt/alfresco/archived/2020-01-07 had been an existing directory (regardless of whether the destination path had a slash at the end or not), and your files would have been moved into that directory (filename collisions may still have been an issue though, as you move files from several directories into a single directory). This is what you wanted to do. What you forgot to do was to create that directory first.

Now, since the directory did not exist, what the find command did was to take each individual XML and PDF file, rename it to /opt/alfresco/archived/2020-01-07, and then continue doing the same with the next file, overwriting the previous.

The file /opt/alfresco/archived/2020-01-07 is now the last XML or PDF file found by find.

Also note that since you ran your find command across /opt/alfresco, any PDF or XML file below that path, for example in any directory beneath /opt/alfresco/archived, would have met the same fate.

This is such an easy error to make. There is no convenient way to recover the lost files other than restoring them from your backups.

If you do not take hourly backups of your data, this may be a good point in time to start looking into doing that. I would recommend restic or borgbackup for doing backups of personal files, preferably against some sort of off-site or at least external storage.

The following questions and answers may be of some help:

In your next rewrite of this script, you may want to ignore the archived subdirectory, and use mv -n -t. You also need to explicitly -print the found files (or use mv -v) as find will otherwise not output their location:

find /opt/alfresco \
    -path /opt/alfresco/archived -prune -o \
    -type f \( -iname '*.pdf' -o -iname '*.xml' \) \
    -exec mv -n -t /opt/alfresco/archived/2020-01-07 {} + \
    -print >/opt/alfresco/scripts/move.log

A few things from the comments (below) that may be useful to know:

  • If GNU mv is used with -t target, it will fail if target is not a directory. You would use -exec mv -t /opt/alfresco/archived/2020-01-07 {} + to move multiple files at once with find (which would also speed up the operation).

  • If GNU mv is used with -n, it will refuse to overwrite existing files.

  • Neither -t nor -n are standard (macOS and FreeBSD have -n too though), but that shouldn't stop you from using them in scripts that don't need to be portable between systems.

Kusalananda
  • 333,661