7

I want to find all subfolders, that contains a markdown file with the same name (and extension .md).

For example: I want to Find following subfolders:

Apple/Banana/Orange      #Apple/Banana/Orange/Orange.md exists
Apple/Banana             #Apple/Banana/Banana.md exists
Apple/Banana/Papaya      #Apple/Banana/Papaya/Papaya.md exists
  • Note: There can be other files or subdirectory in the directory.

Any suggestions?


The solutions to the problem can be tested using the following code:

#!/usr/bin/env bash
# - goal: "Test"
# - author: Nikhil Agarwal
# - date: Wednesday, August 07, 2019
# - status: P T' (P: Prototyping, T: Tested)
# - usage: ./Test.sh
# - include:
#   1.
# - refer:
#   1. [directory - Find only those folders that contain a File with the same name as the Folder - Unix & Linux Stack Exchange](https://unix.stackexchange.com/questions/534190/find-only-those-folders-that-contain-a-file-with-the-same-name-as-the-folder)
# - formatting:
#   shellcheck disable=
#clear

main() {
    TestData
    ExpectedOutput
    TestFunction "${1:?"Please enter a test number, as the first argument, to be executed!"}"
}

TestFunction() {
    echo "Test Function"
    echo "============="
    "Test${1}"
    echo ""
}

Test1() {
    echo "Description: Thor"
    find . -type f -regextype egrep -regex '.*/([^/]+)/\1\.md$' | sort
    echo "Observation: ${Green:=}Pass, but shows filepath instead of directory path${Normal:=}"
}

Test2() {
    echo "Description: Kusalananda1"
    find . -type d -exec sh -c '
    dirpath=$1
    set -- "$dirpath"/*.md
    [ -f "$dirpath/${dirpath##*/}.md" ] && [ "$#" -eq 1 ]' sh {} \; -print | sort
    echo "Observation: ${Red:=}Fails as it ignores B.md${Normal:=}"
}

Test3() {
    echo "Description: Kusalananda2"
    find . -type d -exec sh -c '
    for dirpath do
        set -- "$dirpath"/*.md
        if [ -f "$dirpath/${dirpath##*/}.md" ] && [ "$#" -eq 1 ]
        then
            printf "%s\n" "$dirpath"
        fi
    done' sh {} + | sort
    echo "Observation: ${Red:=}Fails as it ignores B.md${Normal:=}"
}

Test4() {
    echo "Description: steeldriver1"
    find . -type d -exec sh -c '[ -f "$1/${1##*/}.md" ]' find-sh {} \; -print | sort
    echo "Observation: ${Green:=}Pass${Normal:=}"
}

Test5() {
    echo "Description: steeldriver2"
    find . -type d -exec sh -c '
  for d do
    [ -f "$d/${d##*/}.md" ] && printf "%s\n" "$d"
  done' find-sh {} + | sort
    echo "Observation: ${Green:=}Pass${Normal:=}"
}

Test6() {
    echo "Description: Stéphane Chazelas"
    find . -name '*.md' -print0 \
        | gawk -v RS='\0' -F/ -v OFS=/ '
    {filename = $NF; NF--
     if ($(NF)".md" == filename) include[$0]
     else exclude[$0]
    }
    END {for (i in include) if (!(i in exclude)) print i}'
    echo "Observation: ${Red:=}Fails as it ignores B.md${Normal:=}"
}

Test7() {
    echo "Description: Zach"
    #shellcheck disable=2044
    for fd in $(find . -type d); do
        dir=${fd##*/}
        if [ -f "${fd}/${dir}.md" ]; then
            ls "${fd}/${dir}.md"
        fi
    done
    echo "Observation: ${Green:=}Pass but shows filepath instead of directory${Normal:=}"
}
ExpectedOutput() {
    echo "Expected Output"
    echo "==============="
    cat << EOT
./GeneratedTest/A
./GeneratedTest/A/AA
./GeneratedTest/B
./GeneratedTest/C/CC1
./GeneratedTest/C/CC2
EOT
}

TestData() {
    rm -rf GeneratedTest

    mkdir -p GeneratedTest/A/AA
    touch GeneratedTest/index.md
    touch GeneratedTest/A/A.md
    touch GeneratedTest/A/AA/AA.md

    mkdir -p GeneratedTest/B
    touch GeneratedTest/B/B.md
    touch GeneratedTest/B/index.md

    mkdir -p GeneratedTest/C/CC1
    touch GeneratedTest/C/index.md
    touch GeneratedTest/C/CC1/CC1.md

    mkdir -p GeneratedTest/C/CC2
    touch GeneratedTest/C/CC2/CC2.md

    mkdir -p GeneratedTest/C/CC3
    touch GeneratedTest/C/CC3/CC.md

    mkdir -p GeneratedTest/C/CC4
}
main "$@"
Porcupine
  • 1,892
  • 1
    Regarding your final remarks. Note that some answers do different things from others. Mine and Stéphane's for example, interpreted your first "Note" as "if there are other markdown files in the directory whatsoever, don't return that directory" while the others don't (as far as I can see). Apart from that, only you can pick the answer that is most helpful to you. Answers here will continue to receive up and down votes after you have accepted an answer, depending on what other readers find most useful. – Kusalananda Aug 06 '19 at 18:47
  • When you say "Folders that contain markdown file whose names are different should not be found," do you mean to exclude directories with both? E.g. if you have foo/foo.md and foo/bar.md should foo be included or excluded? – Kevin Aug 07 '19 at 20:36
  • @Kevin In the example that you gave, I had meant to include foo. But unfortunately many people interpreted in the other way and they justified that. So, I thought that I was not clear in communication. So, I accepted answer which did not included foo. – Porcupine Aug 07 '19 at 20:56
  • If you use -printf with find, you can get whatever part of the match you want, see my edit – Thor Aug 08 '19 at 07:26

6 Answers6

12

Assuming your files are sensibly named, i.e. no need for -print0 etc. You can do this with GNU find like this:

find . -type f -regextype egrep -regex '.*/([^/]+)/\1\.md$'

Output:

./Apple/Banana/Orange/Orange.md
./Apple/Banana/Papaya/Papaya.md
./Apple/Banana/Banana.md

If you only want the directory name, add a -printf argument:

find . -type f -regextype egrep -regex '.*/([^/]+)/\1\.md$' -printf '%h\n'

Output when run on your updated test data:

GeneratedTest/A/AA
GeneratedTest/A
GeneratedTest/C/CC2
GeneratedTest/C/CC1
GeneratedTest/B
Thor
  • 17,182
6
find . -type d -exec sh -c '
    dirpath=$1
    set -- "$dirpath"/*.md
    [ -f "$dirpath/${dirpath##*/}.md" ] && [ "$#" -eq 1 ]' sh {} \; -print

The above would find all directories below the current directory (including the current directory) and would execute a short shell script for each.

The shell code would test whether there's a markdown file with the same name as the directory inside the directory, and whether this is the only *.md name in that directory. If such a file exists and if it's the only *.md name, the inline shell script exits with a zero exit status. Otherwise it exits with a non-zero exit status (signalling failure).

The set -- "$dirpath"/*.md bit will set the positional parameters to the list of pathnames matching the pattern (matches any name with a suffix .md in the directory). We can then use $# later to see how many matches we got from this.

If the shell script exits successfully, -print will print the path to the found directory.

Slightly speedier version that uses fewer invocations of the inline script, but that doesn't let you do more with the found pathnames in find itself (the inline script may be further expanded though):

find . -type d -exec sh -c '
    for dirpath do
        set -- "$dirpath"/*.md
        [ -f "$dirpath/${dirpath##*/}.md" ] &&
        [ "$#" -eq 1 ] &&
        printf "%s\n" "$dirpath"
    done' sh {} +

The same commands but without caring about whether there are other .md files in the directories:

find . -type d -exec sh -c '
    dirpath=$1
    [ -f "$dirpath/${dirpath##*/}.md" ]' sh {} \; -print
find . -type d -exec sh -c '
    for dirpath do
        [ -f "$dirpath/${dirpath##*/}.md" ] &&
        printf "%s\n" "$dirpath"
    done' sh {} +

See also:

Kusalananda
  • 333,661
6

On a GNU system, you could do something like:

find . -name '*.md' -print0 |
  gawk -v RS='\0' -F/ -v OFS=/ '
    {filename = $NF; NF--
     if ($(NF)".md" == filename) include[$0]
     else exclude[$0]
    }
    END {for (i in include) if (!(i in exclude)) print i}'
  • 3
    would you mind re-including your proposed zsh solution as an alternate? it would be helpful for those of us trying to learn more about zsh – steeldriver Aug 06 '19 at 17:23
  • Given that this answer has received more votes: To those who are upvoting this answer, could you please specify why this is better than the rest? It would help me to choose the most suitable answer. – Porcupine Aug 06 '19 at 18:33
  • Stéphane, I agree with steeldriver. Do mention the previous zsh solution (it got, I believe, two of the upvotes), and feel free to point out any flaws in it that might have prompted you to remove it. – Kusalananda Aug 06 '19 at 18:38
  • 1
    @steeldriver, in that zsh approach I (like you) had missed the part of the requirement that dirs that contain other md files should be omitted. – Stéphane Chazelas Aug 06 '19 at 20:32
  • @StéphaneChazelas OP just clarified in the comments he actually meant for those to be included, it was just poorly phrased and people took it too literally. – Kevin Aug 07 '19 at 20:59
4

Either

find . -type d -exec sh -c '[ -f "$1/${1##*/}.md" ]' find-sh {} \; -print

or

find . -type d -exec sh -c '
  for d do
    [ -f "$d/${d##*/}.md" ] && printf "%s\n" "$d"
  done' find-sh {} +

To avoid running one sh per file.

The find-sh is an arbitrary string that becomes the shell's zeroth positional parameter $0 - making it something memorable may help with debugging in case the shell encounters errors (others may suggest using plain sh or even _ as a default "skip" parameter).

steeldriver
  • 81,074
0

Here's mine. I added some more directories and files to verify. I was also bored, so I added the last modified time and MD5. Maybe you're looking for duplicates.

GREEN='\033[0;32m'
RED='\033[0;31m'
NC='\033[0m'

mkdir -pv {Pear,Grape,Raisin,Plaintain}/{DragonFruit,Nababa,Strawberry,Grape,Raisin}
touch {Pear,Grape,Raisin,Plaintain}/{DragonFruit,Nababa,Strawberry,Grape,Raisin}/{Strawberry,Grape,Raisin}.md

for dir in $(find ./ -type d)
do
    dirname="${dir##*/}"
    fname="${dirname}.md"
    if [ -f "${dir}/${fname}" ]
    then
        STAT=$(stat --printf="%y %s" "${dir}/${fname}")
        STAT="${STAT:0:19}"
        MD5=$(md5sum "${dir}/${fname}")
        MD5="${MD5:0:32}"
        printf "${GREEN}%-60s${NC}%-40s%-40s\n" "'${dir}/${fname}' exists" "$STAT" "$MD5"
    else
        echo -e "${RED}'${dir}/${fname}' doesn't exist${NC}"
    fi
done

'.//.md' doesn't exist
'./Raisin/Raisin.md' doesn't exist
'./Raisin/Raisin/Raisin.md' exists                          2019-08-07 19:54:09      a3085274bf23c52c58dd063faba0c36a
'./Raisin/Nababa/Nababa.md' doesn't exist
'./Raisin/Strawberry/Strawberry.md' exists                  2019-08-07 19:54:09      3d2eca1d4a3c539527cb956affa8b807
'./Raisin/Grape/Grape.md' exists                            2019-08-07 19:54:09      f577b20f93a51286423c1d8973973f01
'./Raisin/DragonFruit/DragonFruit.md' doesn't exist
'./Pear/Pear.md' doesn't exist
'./Pear/Raisin/Raisin.md' exists                            2019-08-07 19:54:09      61387f5d87f125923c2962b389b0dd67
'./Pear/Nababa/Nababa.md' doesn't exist
'./Pear/Strawberry/Strawberry.md' exists                    2019-08-07 19:54:09      02c9e39ba5b77954082a61236f786d34
'./Pear/Grape/Grape.md' exists                              2019-08-07 19:54:09      43e85d5651cac069bba8ba36e754079d
'./Pear/DragonFruit/DragonFruit.md' doesn't exist
'./Apple/Apple.md' doesn't exist
'./Apple/Banana/Banana.md' exists                           2019-08-07 19:54:09      a605268f3314411ec360d7e0dd234960
'./Apple/Banana/Papaya/Papaya.md' exists                    2019-08-07 19:54:09      e759a879942fe986397e52b7ba21a9ff
'./Apple/Banana/Orange/Orange.md' exists                    2019-08-07 19:54:09      127618fe9ab73937836b809fa0593572
'./Plaintain/Plaintain.md' doesn't exist
'./Plaintain/Raisin/Raisin.md' exists                       2019-08-07 19:54:09      13ed6460f658ca9f7d222ad3d07212a2
'./Plaintain/Nababa/Nababa.md' doesn't exist
'./Plaintain/Strawberry/Strawberry.md' exists               2019-08-07 19:54:09      721d7a5a32f3eacf4b199b74d78b91f0
'./Plaintain/Grape/Grape.md' exists                         2019-08-07 19:54:09      0bdaff592bbd9e2ed5fac5a992bb3566
'./Plaintain/DragonFruit/DragonFruit.md' doesn't exist
'./Grape/Grape.md' doesn't exist
'./Grape/Raisin/Raisin.md' exists                           2019-08-07 19:54:09      aa5d4c970e7b4b6dc35cd16d1863b5bb
'./Grape/Nababa/Nababa.md' doesn't exist
'./Grape/Strawberry/Strawberry.md' exists                   2019-08-07 19:54:09      8b02f8273bbff1bb3162cb088813e0c9
'./Grape/Grape/Grape.md' exists                             2019-08-07 19:54:09      5593d7d6fdcbb48ab5901ba30469bbe8
user208145
  • 2,485
-1

This would require a bit of logic.

for fd in `find . -type d`; do
  dir=${fd##*/}
  if [ -f ${fd}/${dir}.md ]; then
    ls ${fd}/${dir}.md
  fi
done

You can also adapt that to fit into a one liner by using code blocks.

EDIT: Bash is hard. basedir is not a command, dirname doesn't do what I thought it did, so let's go with parameter expansion.