2

I have 168307 jpg photos in one folder - result of a recovery from an accidentally formatted hard drive. Casual browsing shows that 80% of files have been recovered ok, most even have valid EXIF data (incl. timestamp), some are partially recovered (a part of image missing but still usable), some are totally useless (most image wasn't recovered). All of the files have random numeric names and all have same date & time in the file system.

As such they are unusable. What I want to do is:

  • create a set of thumbnails to browse manually through them and fairly quickly remove files that are useless,
  • using the preserved EXIF tags to automatically sort remaining images into a neat tree of folders (year/month/day/pics like structure - or a set of folders with YYYY-MM-DD as file name).

What tools would you recommend for such a task? Should I try something like digikam for the first part and some command line tools for the second?

Andy
  • 21
  • 1
    shotwell will do exactly as you ask. You may need to pre-process if shotwell isn't happy with malformed images but that's another question. – msw Jan 03 '16 at 17:24
  • What kind of preprocessing you suggest? Do you think it will cope with 168k images in one folder? – Andy Jan 03 '16 at 20:45
  • Pre-processing would be some batch process rejecting un-decodable images. I don't know of a tool off hand, but I could probably write it in 15 minutes, relying on jpeg libraries to tell me if they can't decode. As a batch process (one image, then the next, and so on) it could cope with 1.68M images in one folder. How much patience would you need? To be measured. – msw Jan 03 '16 at 21:36
  • I solved my problem by reversing the process - that is I first sorted the images by EXIF into folders and only then used Shotwell to browse those folders. This gave me two things - first, only fully recovered images had valid EXIF, second as a result I had manageable sized folders to deal with afterwards. I found an excellent Python script for sorting images into folders based on EXIF called "Sortphotos" - it took it ~2 hours to sort all my 168k pics into folders. – Andy Jan 11 '16 at 22:14

1 Answers1

0

You can create thumbnails with imagemagick, which is a command line tool, so you just write a bash script to process the set. Here is an example: http://www.cyberciti.biz/tips/howto-linux-creating-a-image-thumbnails-from-shell-prompt.html or use mogrify. Another alternative in the comments to the cyberciti link is to write a bash script alone without imagemagick to create even smaller files (see the comment there by Tim).

Likewise, you can organize them by exif shot date in bash. Here's a script for that: http://binaryunit.blogspot.com/2007/11/just-simple-script-to-order-your.html In the comments there's a variation that will rename them as well.

I've used bash scripts with imagemagick before rename and modify photos before for posting to the web, but no where near so many as you have, it's going to take a while. Still, a command line tool is going to be faster and use fewer resources than other solutions; getting the job done quicker and allowing you to do other things while they process. If you wanted to, you could even combine the two processes so that you don't have to touch it until it's done.

I highly recommend testing any script on a few dozen images before you run it on all 168K. You should also make a backup copy of your raw data so you don't finish only to find you've done something wrong and have destroyed your originals.

You can then process them in any photo management software or simply browse the folders/files in a gui file manager.

Red Anne
  • 539
  • Thanks. I wanted to first browse them to weed out the ones that are useless (as they won't contain useful EXIF data), but maybe it would be easier to first toss them into folders with a script and only then try to remove the ones that are bad plus ones without EXIF. – Andy Jan 03 '16 at 20:44
  • OK, will have to tinker that script a bit - I have three pics taken within one second, have to add some condition not to overwrite them if that happens. – Andy Jan 03 '16 at 21:00
  • You can easily just make the thumbs first then save them but one issue you will have is that the thumb is a separate image from the original so you don't want to delete the thumb and leave the original. You may want to move your bad thumbs to a trash folder and then use that as a list to read from to determine which images to delete. The script would read /home/trash/imagethumb1, imagethumb5, imagethumb8, etc. and rm the corresponding /home/heapofimages/image1, image5, image8, etc. – Red Anne Jan 03 '16 at 21:05
  • As for the names, just modify the script to append the EXIF date to the front of the current file name. This will put them in sequence but keep the original filenames, which must be unique, preventing any overwrite. – Red Anne Jan 03 '16 at 21:06