2

I have been using 'find' to output the directory structure from the root down and I don’t mind that it takes a while. My problem is that I want to cut down on the redundant info of repeating every files path, I want to output the files in JSON format.

I need to be able to run this from the termin, i can not be creating python files etc on the box.

For example, this:

root
|_ fruits
|___ apple
|______images
|________ apple001.jpg
|________ apple002.jpg
|_ animals
|___ cat
|______images
|________ cat001.jpg
|________ cat002.jpg

Would become something like....

{"data" : [
  {
    "type": "folder",
    "name": "animals",
    "path": "/animals",
    "children": [
      {
        "type": "folder",
        "name": "cat",
        "path": "/animals/cat",
        "children": [
          {
            "type": "folder",
            "name": "images",
            "path": "/animals/cat/images",
            "children": [
              {
                "type": "file",
                "name": "cat001.jpg",
                "path": "/animals/cat/images/cat001.jpg"
              }, {
                "type": "file",
                "name": "cat001.jpg",
                "path": "/animals/cat/images/cat002.jpg"
              }
            ]
          }
        ]
      }
    ]
  }
]}
Fearghal
  • 315

1 Answers1

11

Here's a quick Python program that should output your desired schema, using recursion. Should work in both Python 2 and 3 (although I only tested on 2). The first argument is the directory to descend into, or by default, the script will use the current directory.

#!/usr/bin/env python

import os
import errno

def path_hierarchy(path):
    hierarchy = {
        'type': 'folder',
        'name': os.path.basename(path),
        'path': path,
    }

    try:
        hierarchy['children'] = [
            path_hierarchy(os.path.join(path, contents))
            for contents in os.listdir(path)
        ]
    except OSError as e:
        if e.errno != errno.ENOTDIR:
            raise
        hierarchy['type'] = 'file'

    return hierarchy

if __name__ == '__main__':
    import json
    import sys

    try:
        directory = sys.argv[1]
    except IndexError:
        directory = "."

    print(json.dumps(path_hierarchy(directory), indent=2, sort_keys=True))
Chris Down
  • 125,559
  • 25
  • 270
  • 266
  • Hey, so i pasted this into the cmd prompt but get a lot of 'indentation' errors eg
    print(json.dumps(path_hierarchy(directory), indent=2, sort_keys=True))
    

    File "", line 1 print(json.dumps(path_hierarchy(directory), indent=2, sort_keys=True)) ^ IndentationError: unexpected indent

    – Fearghal Oct 28 '14 at 15:30
  • yea, im not sure what is going on with the script, tryng to run it in terminal window without success, i have python 2.7 installed – Fearghal Oct 28 '14 at 15:40
  • @Fearghal Just run it as a script, I don't believe __name__ is __main__ in the REPL. – Chris Down Oct 28 '14 at 15:49
  • thx. is it possible to have a param to change the root to search the dirs from? Eg python files.py '/'? My fault but i really should have stated i question that i need it done in terminal, not via a python file. – Fearghal Oct 28 '14 at 17:02
  • @Fearghal: It already has that, read the instructions again :-) – Chris Down Oct 28 '14 at 18:41
  • Fantastic, thx Chris. Do is there anyway i can run this directly in the terminal? I cant assume this file is on the machine i am profiling, so i guess i could write a script that creates the file, runs it, gets the results, and deletes it? – Fearghal Oct 29 '14 at 11:01
  • @Fearghal Just put it in a script and run the script. – Chris Down Oct 29 '14 at 12:36
  • I cant add a script to the machine i am profiling. – Fearghal Oct 29 '14 at 12:59
  • I mentioned that in question (with typo) - it needs to be run-able from the terminal. Apologies, by this i mean, it cant be a script on the machine already. – Fearghal Oct 29 '14 at 13:19