62

I am trying to write a bash shell function that will allow me to remove duplicate copies of directories from my PATH environment variable.

I was told that it is possible to achieve this with a one line command using the awk command, but I cannot figure out how to do it. Anybody know how?

19 Answers19

48

If you don't already have duplicates in the PATH and you only want to add directories if they are not already there, you can do it easily with the shell alone.

for x in /path/to/add …; do
  case ":$PATH:" in
    *":$x:"*) :;; # already there
    *) PATH="$x:$PATH";;
  esac
done

And here's a shell snippet that removes duplicates from $PATH. It goes through the entries one by one, and copies those that haven't been seen yet.

if [ -n "$PATH" ]; then
  old_PATH=$PATH:; PATH=
  while [ -n "$old_PATH" ]; do
    x=${old_PATH%%:*}       # the first remaining entry
    case $PATH: in
      *:"$x":*) ;;          # already there
      *) PATH=$PATH:$x;;    # not there yet
    esac
    old_PATH=${old_PATH#*:}
  done
  PATH=${PATH#:}
  unset old_PATH x
fi
Tom Hale
  • 30,455
  • It would be better, if iterate items in $PATH reversely, because the later ones are newly added usually, and they might have the value up to date. – Eric Sep 03 '16 at 06:48
  • 6
    @EricWang I don't understand your reasoning. PATH elements are traversed from front to back, so when there are duplicates, the second duplicate is effectively ignored. Iterating from back to front would change the order. – Gilles 'SO- stop being evil' Sep 03 '16 at 10:50
  • @Gilles When you have duplicated variable in PATH, probably it's added in this way: PATH=$PATH:x=b, the x in original PATH might has value a, thus when iterate in order, then the new value will be ignored, but when in reversed order, the new value will take effect. – Eric Sep 03 '16 at 14:38
  • 6
    @EricWang In that case, the added value has no effect so should be ignored. By going backwards, you're making the added value come before. If the added value had been supposed to go before, it would have been added as PATH=x:$PATH. – Gilles 'SO- stop being evil' Sep 03 '16 at 15:42
  • @Gilles When you append something, that means it's not there yet, or you want to override the old value, so you need to make the new added variable visible. And, by convention, usually it's append in this way: PATH=$PATH:... not PATH=...:$PATH. Thus it's more proper to iterate reversed order. Even though you way would also work, then people append in the way reverse way. – Eric Sep 03 '16 at 16:13
  • I almost passed over this answer because it starts with an "add only if not already there" method, which I wouldn't want to use since it loses the important property of where in PATH I'm inserting the new entry (at the beginning, if I want it to win over everything else, or at the end if I want it to lose over everything else). But then you show an excellent shell-only way to remove dups; that is the valuable part of this answer. – Don Hatch Nov 24 '17 at 20:58
  • @DonHatch When you add-only-if-not-already-there, you can choose where to insert. Ok, I only show inserting at the beginning, but it's trivial to change the code to insert at the end. – Gilles 'SO- stop being evil' Nov 24 '17 at 21:04
  • @Gilles The problem is if the entry is already in $PATH, then your first method wont change $PATH. I am suggesting that in that case it would be better to move the entry to the beginning (if overriding other entries is indeed what is desired). A nice way to accomplish that is to prepend the entry as usual, and then use your second function to remove dups. – Don Hatch Nov 24 '17 at 21:20
  • @DonHatch My own .profile is even more complicated than that (it has complex stuff to sort both existing and added entries), but not everyone needs the complexity. I generally prefer to present possibilities in order of increasing complexity. – Gilles 'SO- stop being evil' Nov 24 '17 at 21:28
  • @Gilles Certainly, but how about refraining from presenting the first possibility at all? It's an accident waiting to happen. E.g. say my original .bashrc prepends ~/bin because I want my ~/bin/cat to win over /usr/bin/cat, then I notice my path is growing so I use your first version to prevent that, without thinking about it deeply enough. Now my setup is broken in a non-obvious way. I think your answer could be improved if you would refrain from presenting the error-prone first method at all-- or, if you are attached to keeping it for some reason, at least point out that it's error prone. – Don Hatch Nov 24 '17 at 21:48
  • 1
    @DonHatch I want to keep it because it serves the needs of most people. I do point out that it assumes that there are no duplicates at the beginning, what more do you want? The order of addition is a different issue which is not mentioned in the question and not solved by the duplicate removal code. – Gilles 'SO- stop being evil' Nov 24 '17 at 21:54
40

Here's an intelligible one-liner solution that does all the right things: removes duplicates, preserves the ordering of paths, and doesn't add a colon at the end. So it should give you a deduplicated PATH that gives exactly the same behavior as the original:

PATH="$(perl -e 'print join(":", grep { not $seen{$_}++ } split(/:/, $ENV{PATH}))')"

It simply splits on colon (split(/:/, $ENV{PATH})), uses uses grep { not $seen{$_}++ } to filter out any repeated instances of paths except for the first occurrence, and then joins the remaining ones back together separated by colons and prints the result (print join(":", ...)).

If you want some more structure around it, as well as the ability to deduplicate other variables as well, try this snippet, which I'm currently using in my own config:

# Deduplicate path variables
get_var () {
    eval 'printf "%s\n" "${'"$1"'}"'
}
set_var () {
    eval "$1=\"\$2\""
}
dedup_pathvar () {
    pathvar_name="$1"
    pathvar_value="$(get_var "$pathvar_name")"
    deduped_path="$(perl -e 'print join(":",grep { not $seen{$_}++ } split(/:/, $ARGV[0]))' "$pathvar_value")"
    set_var "$pathvar_name" "$deduped_path"
}
dedup_pathvar PATH
dedup_pathvar MANPATH

That code will deduplicate both PATH and MANPATH, and you can easily call dedup_pathvar on other variables that hold colon-separated lists of paths (e.g. PYTHONPATH).

  • For some reason I had to add a chomp to remove a trailing newline. This worked for me: perl -ne 'chomp; print join(":", grep { !$seen{$_}++ } split(/:/))' <<<"$PATH" – Håkon Hægland Dec 28 '14 at 19:05
  • 1
    I can't get this fix to persist. It does clean duplicates from $PATH, but if I open a new Ubuntu WSL2 command prompt window, my $PATH is back to having duplicates. How can I make this permanent? – Kyle Vassella Dec 24 '20 at 15:39
  • 1
    @KyleVassella Did you add this code to your shell startup file? – Ryan C. Thompson Dec 25 '20 at 03:44
20

Here's a sleek one:

printf %s "$PATH" | awk -v RS=: -v ORS=: '!arr[$0]++'

Longer (to see how it works):

printf %s "$PATH" | awk -v RS=: -v ORS=: '{ if (!arr[$0]++) { print $0 } }'

Ok, since you're new to linux, here is how to actually set PATH without a trailing ":"

PATH=`printf %s "$PATH" | awk -v RS=: '{ if (!arr[$0]++) {printf("%s%s",!ln++?"":":",$0)}}'`

btw make sure to NOT have directories containing ":" in your PATH, otherwise it is gonna be messed up.

some credit to:

akostadinov
  • 1,048
  • -1 this doesn't work. I still see duplicates in my path. – dogbane Jun 14 '12 at 07:34
  • 4
    @dogbane: It removes duplicates for me. However it has a subtle problem. The output has a : on the end which if set as your $PATH, means the current directory is added the path. This has security implications on a multi-user machine. – camh Jun 14 '12 at 07:42
  • @dogbane, it works and I edited post to have a one line command without the trailing : – akostadinov Jun 14 '12 at 07:59
  • @dogbane your solution has a trailing : in the output – akostadinov Jun 14 '12 at 08:12
  • hmm, your third command works, but the first two do not work unless I use echo -n. Your commands don't seem to work with "here strings" e.g. try: awk -v RS=: -v ORS=: '!arr[$0]++' <<< ".:/foo/bin:/bar/bin:/foo/bin" – dogbane Jun 14 '12 at 08:32
  • so which one will actually give me the desired result? – Johnny Williem Jun 14 '12 at 09:01
  • @dogbane, right, initially I didn't notice the extra line and when I wrote the third command I forgot to update the other two. wrt <<< it adds a new line at end like echo without -n. It is a bash extension though so not portable and does not provide any advantages over piping for this task.

    Johnny Williem, use the third command that starts with PATH=

    – akostadinov Jun 14 '12 at 10:31
  • 1
    Note that echo -n outputs -n in Unix-compliant echo implementations. The standard way to output a $string without the trailing newline character is printf %s "$string", hence Gilles' edit. Generally you can't use echo for arbitrary data – Stéphane Chazelas Sep 05 '16 at 13:47
  • @StéphaneChazelas, ok, old UNIXes. Btw the new line was confusing awk so last entry was not deduplicated. Thanks to Gilles for catching that (and fixing portability). – akostadinov Sep 05 '16 at 14:12
  • @akostadinov, not only old. That's the Unix requirement as in the latest version of the Unix specification (from 2013, same goes for the 2016 specification which is going out shortly). For instance /bin/sh on OS/X is based on bash and echo -n outputs -n<newline> like the Unix specification requires (POSIX leaves the behaviour unspecified for echo -n) – Stéphane Chazelas Sep 05 '16 at 15:07
  • extremely sweet! I just love one-liners... – MoVod Jun 01 '19 at 11:58
  • 1
    Problem I ran into, duplicates with and without trailing slashes "/foo/bar:/foo/bar/" will not removed - however, they are equivalent within the PATH variable. – Christian Herenz Dec 10 '19 at 19:03
  • 1
    @ChristianHerenz, maybe awk can also split on /: and : at the same time, maybe with regular expression/pattern. Not sure ATM but might be a good thing to explore if you want to improve current solution. – akostadinov Dec 12 '19 at 08:25
  • Why do you use printf rather than echo? – einpoklum Feb 12 '20 at 14:14
9

Here is an AWK one liner.

$ PATH=$(printf %s "$PATH" \
     | awk -vRS=: -vORS= '!a[$0]++ {if (NR>1) printf(":"); printf("%s", $0) }' )

where:

  • printf %s "$PATH" prints the content of $PATH without a trailing newline
  • RS=: changes the input record delimiter character (default is newline)
  • ORS= changes the output record delimiter to the empty string
  • a the name of an implicitly created array
  • $0 references the current record
  • a[$0] is a associative array dereference
  • ++ is the post-increment operator
  • !a[$0]++ guards the right hand side, i.e. it makes sure that the current record is only printed, if it wasn't printed before
  • NR the current record number, starting with 1

That means that AWK is used to split the PATH content along the : delimiter characters and to filter out duplicate entries without modifying the order.

Since AWK associative arrays are implemented as hash tables the runtime is linear (i.e. in O(n)).

Note that we don't need look for quoted : characters because shells don't provide quoting to support directories with : in its name in the PATH variable.

Awk + paste

The above can be simplified with paste:

$ PATH=$(printf %s "$PATH" | awk -vRS=: '!a[$0]++' | paste -s -d:)

The paste command is used to intersperse the awk output with colons. This simplifies the awk action to printing (which is the default action).

Python

The same as Python two-liner:

$ PATH=$(python3 -c 'import os; from collections import OrderedDict; \
    l=os.environ["PATH"].split(":"); print(":".join(OrderedDict.fromkeys(l)))' )
maxschlepzig
  • 57,532
  • ok, but does this remove dupes from an existing colon delimited string, or does it prevent dupes from being added to a string? – Alexander Mills Dec 10 '16 at 09:59
  • 1
    looks like the former – Alexander Mills Dec 10 '16 at 10:00
  • 2
    @AlexanderMills, well, the OP just asked about removing duplicates so this is what the awk call does. – maxschlepzig Dec 10 '16 at 18:59
  • 1
    The paste command doesn't work for me unless I add a trailing - to use STDIN. – wisbucky Apr 24 '17 at 21:11
  • @wisbucky, hm, does your paste prints some error message? I tested it with 'paste (GNU coreutils) 8.25'. – maxschlepzig Apr 24 '17 at 21:27
  • It prints usage: paste [-s] [-d delimiters] file .... This is on mac, which I think uses BSD not GNU versions. – wisbucky Apr 24 '17 at 21:47
  • 2
    Also, I need to add spaces after the -v or else I get an error. -v RS=: -v ORS=. Just different flavors of awk syntax. – wisbucky Apr 24 '17 at 21:56
  • For those that don't understand the !a[$0]++ part, what's going on is that 1) a[$0]++ is creating an associative array with the path as the key, and the incrementing count as the value. The first time a unique path is seen, the value will be initialized to 0 and incremented to 1. The second time a path is seen, the value will be incremented to 2, etc. To see this clearly, run this command: printf %s "$PATH" | awk -v RS=: '{print a[$0]++, $0 }' – wisbucky Apr 25 '17 at 23:15
  • In awk, the statement before the {action} is a pattern. If pattern is TRUE, then execute the {action}. Any nonzero number is TRUE, 0 is FALSE. The first time a path is seen, the value of a[$0] is 0 (remember, we are post-incrementing), which evaluates to FALSE. The negated value ! is TRUE. Therefore, it executes the {action}, which is to print the path. All subsequent occurrences of the same path will have value > 0, so they evaluate to TRUE, and the negated values are FALSE. Therefore, the {action} is not executed.
  • – wisbucky Apr 25 '17 at 23:16