1

I have a mix of filenames and relative paths in a markdown file that I want to refactor so that the full relative path to ~/app/ is inserted for interoperability between markdown readers.

For my files in ~/app/ I have e.g. a file ~/app/file1.md that references attachements in ~/app/attachments/

Example: file1.md contains:

Here is an image:
![[img1.png]]

Here is another image: ![[attachments/img2.png]]

(both images reside in ~/app/attachments/)

I can get the references between the ![[ and ]] with

sed -n 's/!\[\[\(.*\.png\)]]/\1/gp'

But how do I combine this with the find command to replace inline with the full relative path? - Note I would want this to work with wherever the images are located not just under attachments/

The resulting file1.md would be:

Here is an image:
![[attachments/img1.png]]

Here is another image: ![[attachments/img2.png]]

Chris Davies
  • 116,213
  • 16
  • 160
  • 287
  • 3
    What's a "full relative path" please? I'm trying to understand whether it's an absolute /full/path/to/the/image.png or a relative path from your home directory to/the/image.png? – Chris Davies Dec 01 '21 at 12:43
  • 1
    By full relative I mean relative to ~/app/ so ![[img1.png]] should be changed to ![[attachments/img1.png]] (vs. ![[~/app/attachments/img1.png]] – Spencer Maroukis Dec 02 '21 at 13:05
  • I've put that in the question for you, where it belongs, so that people can find it easily – Chris Davies Dec 02 '21 at 14:08
  • Are all your image attachment files under ~/app/? Are there other sub-directories of ~/app/ other than ~/app/attachments/ where image files might be found? If not, that greatly simplifies the script. If so, what do you you want to happen if your markdown file says ![[img1.png]] and img1.png is not in ~/app/ but there are two different files of that name, one in ~/app/attachments/ and another in, say, ~/app/more-attachments/. Are they all .png files? what should happen if file1.md says img3.png but there's only an img3.jpeg? or if there's an IMG3.PNG? – cas Dec 03 '21 at 04:34

2 Answers2

2

Here are three versions of a perl script to do this. All three of them require that the first argument is the directory to be searched (e.g. ./app or ./). Remaining arguments are the name(s) of any markdown file(s) to be modified (e.g. ./app/file1.md or ./app/*.md)

They're all written to search only for .png files, but that can easily be changed by changing the regular expressions and globs used.

Note: for all three scripts, delete the first #! line if you want the script to modify the markdown file(s) instead of just print to stdout. The first #! is for testing, to verify it does what you want. The second with -i.bak actually modifies the markdown file(s) in place (and copies the original to .bak - just change it it -i if you don't want a backup copy made). See man perlrun and search for -i for details on how this option works.

Also note that the File::Basename and File::Find modules used are perl core library modules, included with perl. File::Basename does essentially what the basename command does, and File::Find recursively searches directories, like the find command.

Why perl and not sh or bash? Because shell is a terrible language for text or data processing. See Why is using a shell loop to process text considered bad practice? for some of the reasons why. Shell's job is to orchestrate other programs to do data processing work, not to do the data processing itself. Using shell to do data processing is like using a shovel when you need a set of screw-drivers, or a fork when you need a ladle.

All three versions were tested with the following files & directory structure:

app/attachments/img1.png
app/attachments/img2.png
app/attachments/more/img5.png
app/file1.md
app/file2.md
app/file3.md
app/img4.png
app/other/img3.png
First version

The first version is useful if the attachment files will only be found in ./app OR ./app/attachments.

If an attachment is found in the location specified by the ![[filename]] markup, it is left as is. If not found, the script looks for it first in the top-level dir, then in the attachments/ subdirectory.

$ cat fix-paths1.pl 
#!/usr/bin/perl -p
#!/usr/bin/perl -p -i.bak

BEGIN { $dir = shift };

use File::Basename;

if (/![[([^]]*.png)]]/i) { $file = $1; next if -f $file1; $bn = fileparse($file);

if (-f "$dir/$bn") { s/$file/$bn/ } elsif (-f "$dir/attachments/$bn") { s/$file/attachments/$bn/ } else { print STDERR "WARNING: Attachment '$file' does not exist. $ARGV:$.\n" }; }

Sample run - file1.md is the same as in your question:

$ ./fix-paths1.pl ./app/ app/file1.md 
Here is an image:
![[attachments/img1.png]]

Here is another image: ![[attachments/img2.png]]

Second version

The second version is useful if files may be found in any immediate sub-directory of ./app/ - i.e. app/attachments/ but not app/attachments/more/

It uses perl's glob function build an array of .png files in the specified directory (./app/) and all immediate sub-directories. The array is used as a cache of all matching files because searching directories is a moderately "expensive" operation - definitely something you don't want to do repeatedly in a loop.

$ cat fix-paths2.pl
#!/usr/bin/perl -p
#!/usr/bin/perl -p -i.bak

use File::Basename;

BEGIN { $dir = shift; $dir =~ s:/+$::;

@png = glob("$dir/.png"); push @png, glob("$dir//*.png"); @png = map { s:^$dir/:: ? $_ : $_ } @png; };

if (/![[([^]]*.png)]]/i) { $file = $1; next if -f $file1;

$bn = fileparse($file);

($found) = grep { m:(^|/)$bn$: } @png;

if ($found) { s/$file/$found/; } else { print STDERR "WARNING: Attachment '$file' does not exist. $ARGV:$.\n" }; }

Sample run:

$ cat app/file2.md 
Here is an image:
![[img1.png]]

Here is another image: ![[attachments/img2.png]]

and another: ![[img3.png]]

$ ./fix-paths2.pl ./app/ ./app/file2.md Here is an image: ![[attachments/img1.png]]

Here is another image: ![[attachments/img2.png]]

and another: ![[other/img3.png]]

This version found and corrected the paths for img1.png and img3.png.

Third version

The third version is useful if attachment files can be found in any sub-directory of ./app/, no matter how many levels deep they are in the directory tree. The only difference between this and the second version is how it populates the @png array. The second version uses the glob() function, while the third uses File::Find.

The @png array to cache the search results really shows its value here - a recursive directory search is an even more expensive operation than "simple" glob searches.

$ cat fix-paths3.pl
#!/usr/bin/perl -p
#!/usr/bin/perl -p -i.bak

use File::Basename; use File::Find;

BEGIN { $dir = shift; $dir =~ s:/+$::;

sub wanted { if (m/.png$/) { ($f = $File::Find::name) =~ s:^$dir/::; push @png, "$f"; }; };

find(&wanted, $dir); };

if (/![[([^]]*.png)]]/i) { $file = $1; next if -f $file1;

$bn = fileparse($file);

($found) = grep { m:(^|/)$bn$: } @png;

if ($found) { s/$file/$found/; } else { print STDERR "WARNING: Attachment '$file' does not exist. $ARGV:$.\n" }; }

Sample run:

$ cat app/file3.md 
Here is an image:
![[img1.png]]

Here is another image: ![[attachments/img2.png]]

and another: ![[img3.png]]

and another: ![[attachments/img4.png]]

and another: ![[other/img5.png]]

$ ./fix-paths3.pl ./app/ ./app/file3.md Here is an image: ![[attachments/img1.png]]

Here is another image: ![[attachments/img2.png]]

and another: ![[other/img3.png]]

and another: ![[img4.png]]

and another:

![[attachments/more/img5.png]]

This version also found that img5.png was actually in attachments/more/ even though file3.md file said it was in other/

BUGS

  1. It may be worthwhile stripping leading and trailing spaces from $file, depending on whether or not you have excess spaces in the attachment filename and on how strictly your markdown interpreter deals with excess spaces. Add the following line after $file = $1;:

    $file =~ s/^\s*|\s*$//g;
    
  2. If the .png file isn't found where the markdown file says it is, the second and third versions will return the first matching file, even if there's more than one file of the same name (yes, this is more of a design decision than an actual bug - I chose to write it this way). Sometimes this may not be the file you expect it to be - this is a natural result of the GIGO rule.

    This could be "fixed" by counting the number of matches (hint: perl's built-in grep function returns an array - the scripts above throw away all but the first result. The $found variable could be replaced with an @found array variable) and either printing an error message if there's more than one, or having some kind of heuristics to prefer attachment files in some directories over others (or to prefer more recent files over older ones, or older files over newer, or ...). The real fix is to edit the input markdown file to avoid ambiguity.

    See perldoc -f grep for details on perl's grep function.

cas
  • 78,579
0

Let's assume that you have only one file link per line, and you are going line by line, then your sed should be fine. So we have something like:

file=$(sed -n 's/!\[\[\(.*\.png\)]]/\1/gp')
name=$(basename "$file")
fullFile=$(find . -name "$name" -print -quit)

In here i use -quit in order to stop searching as soon as the first match will be found. I hope this will get you on the right direction.

v010dya
  • 543
  • 5
  • 11
  • You don't need to use find here (and name=$(basename "$file") isn't needed either, because it's only used in the find command). Try realpath -e ~/app/"$file" - it'll be significantly faster than find because it doesn't have to recursively search for each filename. – cas Dec 01 '21 at 11:36
  • Alternatively, just test if $file exists in ~/app/. if it does, use $file as is. if it doesn't, see if the basename (i.e. $name) exists in ~/app/attachments/ and use ~/app/attachments/$name if it does. if it doesn't exist there, either give up and print a warning to STDERR or maybe try find as a last resort. – cas Dec 01 '21 at 11:42
  • @cas "I would want this to work with wherever the images are located not just under attachments/" from the original question – v010dya Dec 01 '21 at 12:21
  • 1
  • yes, but there are far better ways of doing this than running find once for each and every file name. If find is required, it would be better to run find ~/app/ -type f once to match all files, redirect find's output to a temp file and extract it from there (e.g. with grep) as needed. Or, better yet, use perl and its File::Find module to populate an array or hash. It's generally a bad idea to use a shell loop to process text - see Why is using a shell loop to process text considered bad practice?
  • – cas Dec 02 '21 at 03:10
  • yes, the OP said that but the only examples given were of files that were in ~/app/attachments/. One listed in the markdown file with the full path relative to ~/app/ and one with the attachments directory missing. Sometimes people aren't clear with their questions, so it's better to ask for clarification than make assumptions.
  • – cas Dec 02 '21 at 03:11
  • @cas The question was pretty clear. The OP is not required to give every permutation of directories, they only need to say that the answer needs to work in every directory. – v010dya Dec 02 '21 at 07:02
  • I never said they're required to do anything. I said they were unclear in describing what they want. I suggest you tone down the obnoxious belligerence - it does neither you nor anyone else any good. – cas Dec 02 '21 at 16:11
  • @cas Please go away. I have no problem understanding the question. OP was very clear and they did a great job describing the problem. If you have a different answer you can post it, if it is helpful i will even upvote it. If you want to continue harassing me, i will not respond after this point. – v010dya Dec 03 '21 at 04:26
  • No, their question was unclear, and you didn't bother to gather essential information to answer the question. Instead you just dove in and wrote a bare minimum example which, a) doesn't bother to check if there's more than one result from find, and b) (worse) when run inside a shell loop (which is the OP's intention because there will be more than one attachment in their file1.md) will end up running find once for each file mentioned in the markdown file(s), and c) doesn't answer the inline replacement part of the question. BTW, stop taking things so personally - critique is not an attack. – cas Dec 03 '21 at 04:41
  • actually, forget about the "more than one result from find" bit of the comment. -quit deals with that. – cas Dec 03 '21 at 07:15