0

This site says that functions are faster than aliases, but he rightly points out that aliases are easier to understand - when you want something very simple and do not need to consider passing arguments, aliases are convenient and sensible. That being the case, my personal profile is about 1,000 lines and serves both as a source of functions and tools that I use a lot, and as a means of keeping techniques that I can refer to and reuse for other tasks, with both aliases and functions in there.

A problem though is that aliases take precedence over functions, and re-definitions of aliases and functions can cause problems (e.g. if I have a function called gg and then later on in the script, by accident, I have an alias called gg - But also if a function is redefined later, again as a function, it overrides the previous definition). The profile loads, but I end up with problems. One solution could be to eliminate all aliases and only use functions (does anyone do that, I'd be curious to know, because if I want to do alias m=man that's more intuitive and sensible than function m() { man $@; }?), but I still have the problem of function redefinitions in that case.

Is there a way to parse a script with the goal of answering: "for each declaration of an alias or function, show me all lines that contain a re-declaration (either alias or function) of that item"?

YorSubs
  • 621
  • 2
    Esp. since it should be m() { man "$@"; }. That statement about functions being faster and aliases getting looked up after functions sounds a bit suspect. – ilkkachu Oct 16 '21 at 08:59
  • 1
    Would just grep -E '^alias|^[a-zA-Z0-9_]+\(\)' do? – ilkkachu Oct 16 '21 at 09:01
  • I agree with you that the comment about functions being faster sounds suspect, but apparently this comes from the maintainers of bash if I'm reading that correctly(?). Your grep is definitely part of what I need, but could it be extended to one more generalisation, in that my aliases might not be at ^, they could be declared inside a if-then-fi or for-do-done loop, or could have multiple aliases on a line alias xx='thing1'; alias yy='thing2'; alias zz='thing3' etc? And could it only show the actual duplicates found? – YorSubs Oct 16 '21 at 09:11
  • 1
    You could declare functions read-only with declare -f -r funcname. That would trigger an error if you tried to re-declare it later in the same session. Your question currently seem to be about parsing shell code, but the correct solution may be to organize the code better instead. – Kusalananda Oct 16 '21 at 09:16
  • Reorganising code better is a utopian solution. I would love to have perfect code everywhere, optimised to the millisecond, but life is more complex I would humbly suggest. declare -f -r funcname does sound useful (new to me). If an alias is declared later, it would overwrite that function name, right? declare -f -r xxxx; xxxx() { man "$@"; }; alias xxxx=echo 1 immediately breaks the read-only funcname from what I see. – YorSubs Oct 16 '21 at 09:23
  • Reorganizing is not the same as rewriting each function and alias. Just move the code inte separate source files related to functionality, project, topic, usage pattern, or if you have some other way of categorizing them. While doing so, you'll notice both accidental duplication of names and duplication of functionality. It would possibly also help you with establishing some form of name-spaces too, like prefixing everything that has to do with git with g etc. – Kusalananda Oct 16 '21 at 09:35
  • I know what reorganising means, but again, that not only does not solve the issue, but creates more problems. If I have to maintain 3 or 4 scripts instead of 1, I hit even more issues - remember the nature of this problem: when an alias overwrites another alias or a function, it does so silently as there is nothing wrong with redefining an alias or function. The problem is particularly bad with aliases-vs-functions as you get gnarly issues that are hard to unravel (had those over the last week). I maintain things in clean ways (95%) but the effort to get to utopia is too much. – YorSubs Oct 16 '21 at 11:13
  • "It would possibly also help you with establishing some form of name-spaces too, like prefixing everything that has to do with git with g etc.". I already do this and have an extensive (but also very simple) set of conventions (i.e. I almost never overwrite core binaries with aliases/functions that lock in switches as other do - I want to keep a clean and very flexible set of tools, and I only define things if certain tools/conditions exist, as here: https://github.com/roysubs/custom_bash/blob/master/.custom. BUT, the OP problem remains and it would be very generally useful to address it. – YorSubs Oct 16 '21 at 11:19

2 Answers2

1

Try something like this:

$ cat find-dupes.pl
#!/usr/bin/perl

use strict;
#use Data::Dump qw(dd);

Explanation of the regexes ($f_re and $a_re):

Both $f_re and $a_re start with '(?:^|&&||||;|&)' to anchor

the remainder of the expression to the start of the line or

immediately after a ;, &, &&, or ||. Because it begins with

'?:', this is a non-capturing sub-expression, i.e. it just

matches its pattern but doesn't return what it matches.

$f_re has two main sub-expressions. One to match 'function name ()'

(with 'function ' being optional) and the other to match

'function name () {' (with the '()' being optional).

Each sub-expression contains more sub-expressions, with one of

them being a capture group '([-\w.]+)' and the rest being

non-capturing (they start with '?:'). i.e. it returns the

function name as either $1 or $2, depending on which subexp

matched.

my $f_re = qr/(?:^|&&||||;|&)\s(?:(?:function\s+)?([-\w.]+)\s()|function\s+([-\w.]+)\s+(?:())?\s*{)/;

$a_re matches alias definitions and returns the name of

the alias as $1.

my $a_re = qr/(?:^|&&||||;|&)(?:\s*alias\s+)([-\w.]+)=/;

%fa is a Hash-of-Hashes (HoH) to hold function/alias names and

the files/lines they were found on. i.e an associative array

where each element is another associative array. Search for

HoH in the perldsc man page.

my %fa;

main loop, read and process the input

while(<>) { s/#.|^\s:.//; # delete comments s/'[^']+'/''/g; # delete everything inside ' single-quotes s/"[^"]+"/""/g; # delete everything inside " double-quotes next if /^\s$/; # skip blank lines

while(/$f_re/g) { my $match = $1 // $2; #print "found: '$match':'$&':$ARGV:$.\n"; $fa{$match}{"function $ARGV:$."}++; };

while(/$a_re/g) { #print "found: '$1':'$&':$ARGV:$.\n"; $fa{$1}{"alias $ARGV:$."}++; };

close(ARGV) if eof; };

#dd %fa;

Iterate over the function/alias names found and print the

details of duplicates if any were found.

foreach my $key (sort keys %fa) { my $p = 0;

Is this function/alias ($key) defined more than once on

different lines or in different files?

if (keys %{ $fa{$key} } > 1) { $p = 1; } else { # Iterate over the keys of the second-level hash to find out # if there is more than one definition of a function/alias # ($key) in the same file on the same line ($k) foreach my $k (keys %{ $fa{$key} }) { if ($fa{$key}{$k} > 1) { $p = 1;

    # break out of the foreach loop, there's no need to keep
    # searching once we've found a dupe
    last;
  };
};

};

print the details if there was more than one.

print join("\n\t", "$key:", (keys %{$fa{$key}}) ), "\n\n" if $p; };

The commented-out Data::Dump, print, and dd lines were for debugging. Uncomment them to get a better idea of what this script does and how it works. The output of the dd function from the Data::Dump module is particularly interesting as it shows you the structure (and contents) of the %fa HoH. Data::Dump is not included with perl, it's a library module you need to install. You didn't mention what distro you're using but if you're using debian/ubuntu/mint/etc, you can install it with sudo apt install libdata-dump-perl. Other distros probably have it packaged under a slightly different name. Otherwise, you can install it with cpan.

Example output (using a file containing your aliases from your comment plus a few dummy functions):

$ cat yorsub.aliases 
function foo () { echo ; }
bar () { echo ; }
bar () { echo ; }
function baz () { echo ; } && quux () { echo ; } ; alias xyz=abc; 
type tmux  &> /dev/null && alias t='tmux'
alias cd-='cd -'; alias cd..='cd ..'; alias u1='cd ..'; alias u2='cd ../..'; alias u3='cd ../../..'; alias u4='cd ../../../../..'; alias u5='cd ../../../../../..'; alias u6='cd ../../../../../../..' alias back='cd -'; alias cd-='cd -'; alias .1="cd .."; alias .2="cd ../.."; alias .3="cd ../../.."; alias .4="cd ../../../.."; alias .5="cd ../../../../.."; alias .6='cd ../../../../../../..'
function cd.. { cd .. ; }
function abc () { xyx "$@" }; abc () { xyz } ; function abc { xyz }; alias abc=xyz
$ ./find-dupes.pl yorsub.aliases    
abc:
        function yorsub.aliases:8
        alias yorsub.aliases:8

bar: function yorsub.aliases:3 function yorsub.aliases:2

cd-: alias yorsub.aliases:6

cd..: alias yorsub.aliases:6 function yorsub.aliases:7

cas
  • 78,579
  • Thanks @cas, I can roughly see what is going on. I can't get 2nd to work at all. A problem that I have is that I have about 50 aliases / functions that only activate if an app is found. e.g. type tmux &> /dev/null && alias t='tmux' (and I also have many lines with many aliases on the same line separated by ; for readability, and this helps a lot, i.e. if I didn't do that, related aliases would all need a single line and the script would balloon to 3,000 lines I think. Is there some way that we can get regular expressions to recognise aliases and functions in arbitrary positions? – YorSubs Oct 16 '21 at 16:34
  • This is what I mean about readability (compound lines like this are much easier to scan and stop the script exploding in size) alias cd-='cd -'; alias cd..='cd ..'; alias u1='cd ..'; alias u2='cd ../..'; alias u3='cd ../../..'; alias u4='cd ../../../../..'; alias u5='cd ../../../../../..'; alias u6='cd ../../../../../../..' alias back='cd -'; alias cd-='cd -'; alias .1="cd .."; alias .2="cd ../.."; alias .3="cd ../../.."; alias .4="cd ../../../.."; alias .5="cd ../../../../.."; alias .6='cd ../../../../../../..' – YorSubs Oct 16 '21 at 16:37
  • Another thing that might work (if this is easy to do in Perl...): in memory, a) replace every occurence of ;, &&, ||, then, do by a newline, then b) remove all whitespace at the start of every line. I think this would ensure that every alias or function is now at the start of a line and then your script should (I think) capture everything? – YorSubs Oct 16 '21 at 16:43
  • re: multiple aliases on one line. I just have one per line in my ~/.bash-aliases file. IMO, it's easier to read, easier to edit, much easier to just comment out a single alias rather than delete it, and I really don't care how long the file gets. YMMV. – cas Oct 17 '21 at 01:04
  • 1
    @YorSubs updated my answer with new version that finds multiple alias/function definitions on a line. – cas Oct 17 '21 at 04:17
  • That's perfect. I used it on a thousand line script just now and was able to prune out two issues that were hiding in there. Your solution really is very generally useful for debugging this kind of thing in scripts, and I can maybe even use it for other languages with a bit of tweaking (though I'll have to learn Perl first! ). Many thanks. – YorSubs Oct 17 '21 at 08:29
  • If you have basic familiarity with general programming concepts, awk, sed, and regular expressions perl is very easy to learn. it's fairly easy even if you're a complete newcomer to programming. and it's ideal for almost every text processing task. i very strongly recommend learning it. – cas Oct 17 '21 at 09:06
  • BTW, did you notice that you defined the cd-='cd -' alias twice in that sample in your comment. that was actually really useful for testing the script....it made me notice a bug that caused me to make significant changes to the way the script worked. – cas Oct 17 '21 at 09:09
  • Yes, I saw that as a result of using your script this morning. Some duplicates in my script I want to keep (as they are created only if certain conditions are met (e.g. if it is detected that the instance is in WSL2, then it will overwrite reboot and shutdown due to the absence of systemd, but that cd- duplicate was a mistake. This is my full script. https://github.com/roysubs/custom_bash/blob/master/.custom – YorSubs Oct 17 '21 at 17:45
1

Simple grep to find definitions, but doesn't check for redefines:

$ grep -onE 'alias [[:alnum:]_]+=|[[:alnum:]_]+\(\)' .bashrc .aliases
.bashrc:47:alias foo=
.bashrc:47:alias bar=
.bashrc:49:asfdasdf()
.aliases:3:alias ls=
.aliases:6:alias foo=

This Perl one-liner keeps a count so it can mark redefines:

$ perl -lne 'while( /alias (\w+)=|(\w+)\(\)/g ) { 
                 $name = $1 // $2; $count{$name} += 1; 
                 printf "\"%s\" %s in %s line %s%s\n", $name, $count{$name} > 1 ? "REDEFINED" : "defined", $ARGV, $. 
             }' .bashrc .aliases 
"foo" defined in .bashrc line 47
"bar" defined in .bashrc line 47
"asfdasdf" defined in .bashrc line 49
"ls" defined in .aliases line 53
"foo" REDEFINED in .aliases line 56

(The order of input files influences which one is not marked as a "redefinition".)

ilkkachu
  • 138,973
  • 1
    Far simpler than the existing answer, but I'd written this at some point and forgotten to post it. – ilkkachu Oct 17 '21 at 08:45
  • This is fairly amazing. Really grateful for both approaches, and you are right, this is far simpler, but it doesn't seem to detect all of my functions. Bash allows us to define alias/function names with ., - in the name etc, and I use that quite a bit as it's handy, aliases like alias vi.='vi .bashrc .aliases .functions .vimrc .inputrc. I'm thinking that your regexp is just detecting alphabetic characters (is that the \w+?). How can it be changed to detect all legal alias / function names? – YorSubs Oct 18 '21 at 15:21