How can I remove duplicates in my .bash_history, preserving order?

Question

I really enjoying using control+r to recursively search my command history. I've found a few good options I like to use with it:

# ignore duplicate commands, ignore commands starting with a space
export HISTCONTROL=erasedups:ignorespace
keep the last 5000 entries
export HISTSIZE=5000
append to the history instead of overwriting (good for multiple connections)
shopt -s histappend

The only problem for me is that erasedups only erases sequential duplicates - so that with this string of commands:

ls
cd ~
ls

The ls command will actually be recorded twice. I've thought about periodically running w/ cron:

cat .bash_history | sort | uniq > temp.txt
mv temp.txt .bash_history

This would achieve removing the duplicates, but unfortunately the order would not be preserved. If I don't sort the file first I don't believe uniq can work properly.

How can I remove duplicates in my .bash_history, preserving order?

Extra Credit:

Are there any problems with overwriting the .bash_history file via a script? For example, if you remove an apache log file I think you need to send a nohup / reset signal with kill to have it flush it's connection to the file. If that is the case with the .bash_history file, perhaps I could somehow use ps to check and make sure there are no connected sessions before the filtering script is run?

Try ignoredups instead of erasedups for a while and see how that works for you. — jw013, Sep 20 '12 at 15:54
I don't think bash holds an open file handle to the history file - it reads/writes it when it needs to, so it should (note - should - I haven't tested) be safe to overwrite it from elsewhere. — D_Bye, Sep 20 '12 at 19:00
I just learned something new on the 1st sentence of your question. Good trick! — Ricardo, Dec 01 '16 at 00:48
I'm failing to find the man page for all the options to the history command. Where should I be looking? — Jonathan Hartley, Oct 21 '19 at 14:39
History options are in 'man bash', search for 'shell builtin commands' section, then for 'history' below that. — Jonathan Hartley, Oct 21 '19 at 14:55
This answer https://unix.stackexchange.com/a/18443/8650 claims to erase all duplicates, not just sequential ones, using HISTCONTROL in conjunction with a PROMPT_COMMAND which re-reads the whole HISTFILE after every prompt, which gives erasedups a chance to erase older commands. — Jonathan Hartley, Oct 22 '21 at 15:58

score 107 · Answer 1 · edited Nov 11 '22 at 11:14

107

So I was looking for the same exact thing after being annoyed by duplicates, and found that if I edit my ~/.bash_profile or my ~/.bashrc with:

export HISTCONTROL=ignoreboth:erasedups

It does almost exactly what you wanted, it only keeps the latest of any command in one shell instance. ignoreboth is actually just like doing ignorespace:ignoredups and that along with erasedups gets the job done.

At least on my Mac terminal with bash this work perfect. Found it here on askubuntu.com.

Note that ignoredups won't help with erasing non-sequential duplicates from an existing .bash_history. Duplicates will still appear in the file when using shopt -s histappend.

edited Nov 11 '22 at 11:14

oheikk

3

answered Feb 25 '16 at 07:02

sprite

1,191

19

this should be correct answer – MitchBroadhead Mar 03 '16 at 10:55
tested on Max OS X Yosemite and on Ubuntu 14_04 – Ricardo Dec 01 '16 at 01:12
1

agree with @MitchBroadhead. this solves the problem within bash itself, without external cron-job. tested it on ubuntu 17.04 and 16.04 LTS – Georg Jung Jul 29 '17 at 08:10
works on OpenBSD too. It only removes dups of any command it is appending to the history file, which is fine for me. It has the interesting effect of shortening the history file as I enter commands that had existed as duplicates before. Now I can make my history file max shorter. – WeakPointer Jan 19 '18 at 14:09
21

This only ignores duplicate, consecutive commands. If you alternate repeatedly between two given commands, your bash history will fill up with duplicates – Dylanthepiguy Dec 22 '18 at 00:39
3

This answer contains useful information, but misleadingly claims to "do exactly what you wanted". The question states the "problem for me is that erasedups only erases sequential duplicates". This answer only explains how to use erasedups to erase sequential duplicates. It is not an answer to the actual question of how to erase all duplicates, not just sequential ones. – Jonathan Hartley Oct 22 '21 at 15:09
1

@JonathanHartley, it's not misleading anymore, thankfully, someone kindly edited it to "It does almost exactly what you wanted". – sprite Oct 23 '23 at 10:16

wnrph · Accepted Answer · 2012-09-20T21:04:59.323

52

Sorting the history

This command works like sort|uniq, but keeps the lines in place

nl|sort -k 2|uniq -f 1|sort -n|cut -f 2

Basically, prepends to each line its number. After sort|uniq-ing, all lines are sorted back according to their original order (using the line number field) and the line number field is removed from the lines.

This solution has the flaw that it is undefined which representative of a class of equal lines will make it in the output and therefore its position in the final output is undefined. However, if the latest representative should be chosen you can sort the input by a second key:

nl|sort -k2 -k 1,1nr|uniq -f1|sort -n|cut -f2

Managing .bash_history

For re-reading and writing back the history, you can use history -a and history -w respectively.

edited Sep 20 '12 at 21:04

answered Sep 20 '12 at 15:38

wnrph

1,444

10

A version of decorate-sort-undecorate, implemented with shell tools. Nice. – ire_and_curses Sep 20 '12 at 17:21
With sort, the -r switch always reverses the sorting order. But this won't yield the result you have in mind. sort regards the two occurrences of ls as identical with the result that, even when reversed, the eventual order depends on the sorting algorithm. But see my update for another idea. – wnrph Sep 20 '12 at 19:29
1

In case, you don't want to modify .bash_history, you could put the following in .bashrc: alias history='history | sort -k2 -k 1,1nr | uniq -f 1 | sort -n' – Nathan Jan 15 '14 at 20:35
3

What is nl at the beginning of each code line? Shouldn't it be history? – A.L Feb 04 '15 at 09:50
1

@A.L nl adds line numbers. The command as a whole solves the general problem: removing duplicates while preserving order. The input is read from stdin. – wnrph Feb 05 '15 at 21:24
I'm resisting the urge to downvote, but the fact, as you noted, that there is no way to choose which of equal lines makes it in the output means the awk answer below may be much more helpful for others (including for the case that brought me here). – cbmanica Feb 05 '16 at 19:47
@cbmanica That was true only for the first command and meant as a help to understand the second one. The only difference between the first and the second command is, that the second one does exercise control over output sorting. – wnrph Feb 16 '16 at 17:26
I had to cleanup my history file to remove invalid characters. I used "iconv -f utf-8 -t utf-8 -c file.txt" – vaichidrewar May 17 '16 at 18:10
Fails with bash timestamps. Most things don't take timestamps into account. See my solution. – anthony Oct 09 '20 at 02:57
This answer is bash-fu black belt, of which I am in awe. But it cannot handle history files with multi-line commands in it, or with timestamps in it. (Enabling timestamps in the history file is required for readline to correctly retrieve multi-line commands from the history.) – Jonathan Hartley Oct 22 '21 at 15:15

score 39 · Answer 3 · edited Oct 09 '16 at 14:49

39

Found this solution in the wild and tested:

awk '!x[$0]++'

The first time a specific value of a line ($0) is seen, the value of x[$0] is zero.
The value of zero is inverted with ! and becomes one.
An statement that evaluates to one causes the default action, which is print.

Therefore, the first time an specific $0 is seen, it is printed.

Every next time (the repeats) the value of x[$0] has been incrented,
its negated value is zero, and a statement that evaluates to zero doesn't print.

To keep the last repeated value, reverse the history and use the same awk:

awk '!x[$0]++' ~/.bash_history                 # keep the first value repeated.

tac ~/.bash_history | awk '!x[$0]++' | tac     # keep the last.

edited Oct 09 '16 at 14:49

answered Jun 10 '13 at 06:57

Clayton Stanley

532

Wow! That just worked. But it removes all but the first occurrence I guess. I'd reversed the ordering of the lines using Sublime Text before running this. Now I'll reverse it again to get a clean history with only the last occurrence of all duplicates left behind. Thank you. – trss Aug 27 '14 at 20:05
Check out my answer! – Ali Shakiba Jan 19 '15 at 10:26
Nice clean and general answer (not restricted to the history use-case) without launching a bazilion sub-processes ;-) – JepZ Nov 14 '18 at 00:08
2

Wouldn't this sort of break if .bash_history entries are on two lines - timestamp followed by the command itself? – laur Sep 16 '20 at 22:18
This answer is sublime in the appropriate wielding of awk, at which I'm awestruck. However, as @laur notes, it doesn't work for history files with timestamps in. Enabling timestamps is important because these form the delimiters in the history file that enables readline to retrieve multi-line commands. – Jonathan Hartley Oct 22 '21 at 15:23
For extra credit, add sed 's/\s*$//' *before* the awkto remove trailing whitespace in all lines, leading to a better input list for duplicate detection. – Samveen Nov 27 '22 at 03:54
@JonathanHartley check my provided script using this and other stuff in this thread dealing with timestamps. There is an issue with the awk script though and multiline commands. Nothing is perfect so far. – Marlon Jan 13 '23 at 15:07

score 21 · Answer 4 · answered Jan 19 '15 at 10:23

21

Extending Clayton answer:

tac $HISTFILE | awk '!x[$0]++' | tac | sponge $HISTFILE

tac reverse the file, make sure you have installed moreutils so you have sponge available, otherwise use a temp file.

answered Jan 19 '15 at 10:23

Ali Shakiba

323

2

For those on Mac, use brew install coreutils, and notice that all the GNU utils have a g prepended to avoid confusion with the BSD built-in Mac commands (e.g. gsed is GNU whereas sed is BSD). So use gtac. – tralston Jun 11 '15 at 20:26
1

I needed history -c and history -r to get it to use the history – drescherjm Oct 08 '19 at 19:32
1

Can you explain what sponge is, and why you appended it to Clayton's answer? – Jonathan Hartley Oct 22 '21 at 15:25
1

$ sponge -h: soak up all input from stdin and write it to <file>. I don't yet understand why it has been appended to Clayton's answer. (although I suspect it is incidental, and the main value of this answer was using 'tac', which Clayton later incorporated in his answer too.) – Jonathan Hartley Oct 22 '21 at 15:38
4

Aha, from man sponge: Unlike a shell redirect, sponge soaks up all its input before writing the output file. This allows constructing pipelines that read from and write to the same file. – Jonathan Hartley Oct 22 '21 at 15:39

score 12 · Answer 5 · edited Apr 10 '19 at 18:09

12

This is an old post, but a perpetual issue for users who want to have multiple terminals open, and have the history synched between windows, but not duplicated.

My solution in .bashrc:

shopt -s histappend
export HISTCONTROL=ignoreboth:erasedups
export PROMPT_COMMAND="history -n; history -w; history -c; history -r"
tac "$HISTFILE" | awk '!x[$0]++' > /tmp/tmpfile  &&
                tac /tmp/tmpfile > "$HISTFILE"
rm /tmp/tmpfile

histappend option adds the history of the buffer to the end of the history file ($HISTFILE)
ignoreboth and erasedups prevent duplicate entries from being saved in the $HISTFILE
The prompt command updates the history cache
- history -n reads all lines from $HISTFILE that may have occurred in a different terminal since the last carriage return
- history -w writes the updated buffer to $HISTFILE
- history -c wipes the buffer so no duplication occurs
- history -r re-reads the $HISTFILE, appending to the now blank buffer
the awk script stores the first occurrence of each line it encounters. tac reverses it, and then reverses it back so that it can be saved with the most recent commands still most recent in the history
rm the /tmp file

Every time you open a new shell, the history has all dupes wiped, and every time you hit the Enter key in a different shell/terminal window, it updates this history from the file.

edited Apr 10 '19 at 18:09

G-Man Says 'Reinstate Monica'

22,870

answered Jan 26 '18 at 08:01

smilingfrog

221

1

Here is an excellent explanation to this in the comments – smilingfrog Jan 26 '18 at 08:09
1

If "ignoreboth and erasedups prevent dupes from being saved", then why do you also need to do the "awk" command to remove dupes from the file? Is it because "ignoreboth and erasedups" only prevent consecutive dupes being saved? Sorry to be pedantic, I'm just trying to understand. – Jonathan Hartley Oct 21 '19 at 14:33
2

erasedups only erases consecutive duplicates. And you are correct that the awk command duplicates the erasedupes command making it superfluous. – smilingfrog Oct 22 '19 at 16:29
1

Thank you, that makes it clear to me what's going on. – Jonathan Hartley Oct 22 '19 at 19:09
2

Fails with bash timestamps. Most things don't take timestamps into account. See my solution. – anthony Oct 09 '20 at 02:56
1

This is brilliant, but like many other answers here, doesn't handle history files with timestamps enabled, which is required if you want readline to be able to retrieve multi-line commands saved to your history file. – Jonathan Hartley Oct 22 '21 at 15:33

score 8 · Answer 6 · answered Jun 10 '13 at 15:06

8

These would keep the last duplicated lines:

ruby -i -e 'puts readlines.reverse.uniq.reverse' ~/.bash_history
tac ~/.bash_history | awk '!a[$0]++' | tac > t; mv t ~/.bash_history

answered Jun 10 '13 at 15:06

Lri

5,223

1

To be explicit, am I understanding right that you've shown two (splendid) solutions here, and a user only needs to execute one of them? Either the ruby one, or the Bash one? – Jonathan Hartley Oct 23 '19 at 14:34
fails with bash timestamps. Most things do! – anthony Oct 09 '20 at 02:55
this works, or appears to (I didnt check what it deleted, but the multiple exits are gone, leaving the last one entered, and removed ~200 of 500 entries). I just had to exit shell and reenter (reloading history file.. there is a command for that somewhere). Thanks! – alchemy Feb 01 '22 at 02:16

anthony · Answer 7 · 2021-05-13T01:20:11.150

Almost every answer in this does not take into account history files with: timestamps, or multi-line history entries.

I needed a way to merge my memory and disk history when my shell session exits, (from multiple terminals), or just merge histories from one terminal to another.

I looked for a long time but could not find anything that did it in a way I considered correct. So I eventually DIY'ed a solution...

Here is my solution... Merge the on-disk ".bash_history" with the in-memory shell 'history'. Preserving timestamp ordering, and command order within those timestamps.

Optionally removing non-unique commands (even if multi-line), and/or removing (cleaning out) simple and/or sensitive commands, according to defined perl RE's. Adjust to suit!

This is the result... https://antofthy.gitlab.io/software/history_merge.bash.txt

You can customise it as you like, or make it a bash function if you want. Or adjust the commands that it 'cleans' from the history..

I run this either on demand using an alias (like 'hm' for history merge) or when a shell logs out (from the ".bash_logout"), unless I disabled shell history (by unsetting "$HISTFILE" using a 'hd' alias)

Enjoy.

This one works perfectly. – VinayChoudhary99 Mar 31 '21 at 08:25 — VinayChoudhary99, Mar 31 '21 at 08:25

score 2 · Answer 8 · answered Aug 13 '21 at 16:57

I have timestamps on mine so most solutions to mess with the files dont work. I also have a directory for the history files to be specific per hosts. I used some of the things found here to remove duplicates and such from history before writing back to the history file but sometimes I have a few shells running on the same host which then keeps those duplicates in there. My solution to clean up the mess every now and then is to create an executable file with this in it:

#!/bin/sh
for file in ~/.bash_history/*
do
  tac "$file" | awk '!visited[$0]++' | tac | sed 'N;/^#.\n#./!P;D' > tempfile;
  mv tempfile "$file"
done

Save it and execute it. Basically: reverse file and use awk to clean duplicates while keeping the last one, reverse again, then use sed to delete the consecutive timestamps while keeping the last one. Save file to tempfile, move tempfile to history file. My history directory went from 109M to 1008K :)

Found another issue with the awk is, that it works line by line. Hence it doesn't understand where the history command starts and ends. Where a single line the multi-line command matches with another command, it fails. — Garry, May 02 '22 at 10:35
@Garry Totally true. I have seen that in my history file too so every now and then I have to go and cleanup multiline crud left over. It hasnt been annoying enough to make me revisit the script yet. I was just happy when it shrunk it so much. — Marlon, Jan 13 '23 at 15:03

score 1 · Answer 9 · answered Feb 18 '22 at 09:28

I've wrote a small program that lets you clean your bash/shell history, also retroactively and preserving its order:

https://gitlab.com/vn971/shell-history-cleaner

USAGE:
    shell-history-cleaner [OPTIONS] <TARGET_FILE>
ARGS:
    <TARGET_FILE>
            Target file to clean. You can use "$HISTFILE" to clean up the shell history.
OPTIONS:
    -d, --dedup
            De-duplicate lines to only keep one last occurrence of each dup. In contrast to bash
            built-in deduplication, this also works if the duplicates are sparse and do not
            immediately follow each other.
-r, --remove &lt;REMOVE&gt;
        Lines to remove. For example, 'yt-dlp.*' will remove lines starting with 'yt-dlp'.
        Can be specified multiple times.

        The patterns are regular expressions, assuming the whole line is matched, as defined
        here: https://docs.rs/regex/latest/regex/#syntax

        Another real-life example:
        '(ps aux.*|git checkout .*|git branch .*| .*|yt-dlp .*|chmod .*|echo .*|man .*)'

-h, --help
        Print help information

Zombo · Answer 10 · 2018-07-04T23:58:44.250

0

To uniqely record every new command is tricky. First you need to add to ~/.profile or similar:

HISTCONTROL=erasedups
PROMPT_COMMAND='history -w'

Then you need to add to ~/.bash_logout:

history -a
history -w

edited Jul 04 '18 at 23:58

answered Feb 05 '18 at 12:31

Zombo

1
5
44
63

Can you help me understand why, on logout, you need to append unwritten history to the history file before then rewriting the whole history file? Can't you just write the entire file without the 'append'? – Jonathan Hartley Oct 21 '19 at 14:57
The only reason I do something fancy with history during logout, is because I merge (with locks) the history, sorting by timestamps, and removing some 'sensitive' commands. I don't just simply append, which does not work will when you have multiple shell windows on the same machine. – anthony Apr 06 '21 at 01:41

score 0 · Answer 11 · answered Mar 15 '23 at 18:36

Extending Ali's answer.

The .bash_history file may or may not contain timestamps, and timestamped records can be mixed with non-timestamped if HISTTIMEFORMAT was switched on or off. This script preserves .bash_history timestamps where they are present, and removes duplicate records leaving only the latest ones.

tac $HISTFILE | awk '/^#/{if(l){print l;print;l=""}next} l{print l;l=""} !seen[$0]++{l=$0}' | tac

which can then be sponged back to $HISTFILE.

score 0 · Answer 12 · answered Jun 04 '23 at 13:03

I use this code in .bash_profile:

remove_history_duplicates () {
    local i login_flag
    [ -z "$(history 1)" ] && login_flag=1 && history -r
    for i in $(history | awk '
        $1 ~ "[0-9]+" {
            id = $1
            $1=""
            if (uniq[$0]) {n++; print uniq[$0]}
            uniq[$0] = id
        }
        END {if (n) print "found",n,"duplicates" > "/dev/stderr"}
    ' | sort -nr); do history -d $i; done
if [[ -n $login_flag ]]; then
    [[ -n $i ]] &amp;&amp; history -w &amp;&amp; echo history written
    history -c
fi

}
remove_history_duplicates

It doesn't edit history file directly but with bash commands. The latest duplicate command is preserved, all previous are deleted.

If you use history timestamps (e.g. HISTTIMEFORMAT='%F %T ') replace $1="" line with $1=""; $2=""; $3="" in awk code.

When the function runs at login time it auto loads history from file, edits it, then clears it because bash will load it after.

Nils Lindemann · Answer 13 · 2024-01-15T13:57:04.687

I am a bash noob and I don't understand these answers. Here my own solution attempt.

The problem is command-1 command-2 command-1 command-2 (... 25 times more ...) the-command-i-actually-want. That sucks, obviously.

First, I create a Python script which cleans up the bash history file:

# cleanup_bash_history.py
from pathlib import Path
history_path = (Path.home() / ".bash_history").resolve(strict=True)
history: dict[str, int] = {}
with history_path.open("r", encoding="utf-8") as f:
    for lnum, command in enumerate(f):
        command = command.strip()
        history[command] = lnum
with history_path.open("w", encoding="utf-8") as f:
    for _, command in sorted((l, c) for c, l in history.items()):
        f.write(f"{command}\n")

This assumes hat the .bash_history is just a list of the commands without timestamps. Please let me know how you modified the script in that case.

Then, in the .bashrc, at the end, I add:

PROMPT_COMMAND="history -a; python ~/Sys/scripts/cleanup_bash_history.py; history -c; history -r; $PROMPT_COMMAND"

I copy & pasted that from this answer, this will make it work even when multiple bash windows are open, the shared history is updated whenever a command is executed.

And that's it, no duplicates anymore.

Addition: If you have problems with history entries written by VS Code, manually remove them in the python script, or wrap the above bash command in

if [ "$TERM_PROGRAM" != "vscode" ]; then
    ...
fi

jubilatious1 · Answer 14 · 2024-01-28T07:00:08.790

Using Perl

Keeping the first appearance of a duplicate (compare to awk):

~$ perl -ne 'print unless $hash{$_}++' ~/.bash_history  > outfile

Keeping the last appearance of a duplicate (compare to awk):

~$ tac  ~/.bash_history | perl -ne 'print unless $hash{$_}++' | tac  > outfile

Using Raku (formerly known as Perl_6)

Keeping the first appearance of a duplicate (compare to awk):

~$ raku -ne 'state %hash; .put unless %hash{$_}++' ~/.bash_history  > outfile

Keeping the last appearance of a duplicate (compare to awk):

~$ tac  ~/.bash_history | raku -ne 'state %hash; .put unless %hash{$_}++' | tac  > outfile

Using Raku (formerly known as Perl_6)

Keeping the first appearance of a duplicate (compare to ruby):

~$ raku -e '.put for lines.unique;'  ~/.bash_history  > outfile

Keeping the last appearance of a duplicate (compare to ruby):

~$ tac  ~/.bash_history | raku -e '.put for lines.reverse.unique.reverse;' | tac  > outfile

https://stackoverflow.com/q/1444406/7270649
https://stackoverflow.com/a/32513573/7270649 https://unix.stackexchange.com/a/11941/227738
https://perldoc.perl.org
https://docs.raku.org

social · Answer 15 · 2022-09-29T22:31:49.453

Other ways to do this here as well: https://stackoverflow.com/questions/338285/prevent-duplicates-from-being-saved-in-bash-history/7449399#7449399

Excellent answer. If you would rather preserve the chronological order (instead of the input order) for your commands, modify dedup() by replacing awk '! x[$0]++' $@ with tac $@ | awk '! x[$0]++' | tac – trusktr

We can eliminate duplicate lines without sorting the file by using the awk command in the following syntax.

awk '!seen[$0]++' source.txt > target.txt

https://superuser.com/questions/722461/how-can-you-remove-duplicates-from-bash-history

Also in nano using regex:

Press Ctrl + \ Enter your search string return Enter your replacement string return Press A to replace all instances

in vim :sort u

If some of the suggestions including the ones above don't work immediately. After running the code I do:

history -c

to clear the history first then restore the no duplications version over it:

cp temp.txt ~/.bash_history

Please read the question. If I run date, cat, bc and then awk, and you run sort, you will not be preserving the order. The question *explicitly* says that answers based on sort are unacceptable for this reason. — G-Man Says 'Reinstate Monica', Aug 11 '22 at 20:36

How can I remove duplicates in my .bash_history, preserving order?

keep the last 5000 entries

append to the history instead of overwriting (good for multiple connections)

Extra Credit:

15 Answers15

Sorting the history

Managing .bash_history

Linked

Related