0

I have thousand of files that look like this.

  • wrfout_d03_2010-06-11_00:00:01
  • wrfout_d03_2010-06-11_00:00:08
  • wrfout_d03_2010-06-12_00:00:20
  • wrfout_d03_2010-06-12_00:00:35
  • wrfout_d03_2010-06-12_00:00:40

I need to keep only the first timestamp. In this case,

  • wrfout_d03_2010-06-11_00:00:01
  • wrfout_d03_2010-06-12_00:00:20

may I know to do this without deleting this one by one? Thanks!

3 Answers3

4

With zsh:

typeset -A seen=()
for f (wrfout_d*(N)) (( seen[\${f%_*}]++ )) && echo rm -f $f

(remove the echo if happy with the result)

A bash equivalent (assuming bash 4.0 or newer) would look like:

(shopt -s nullglob
typeset -A seen=()
for f in wrfout_d*; do
   (( seen[\${f%_*}]++ )) && echo rm -f "$f"
done)

Glob expansions are sorted lexically, so with you timestamp format, that does coincide with chronological order. So above we're looping over the files from oldest to youngest and remove the file if the name stripped of the shortest trailing _* (${f%_*}) has already been seen (as recorded in the $seen Associative array. For the reason behind the \ in the arithmetic expression, see How to use associative arrays safely inside arithmetic expressions?

2
prev=
for file in wrfout_d*_*_*; do
  head=${file%_*}
  if [ "$head" = "$prev" ]; then
    # Remove "echo" if output is correct
    echo rm -f -- "$file"
  else
    prev=$head
  fi
done

The part of the filename before the last underscore is taken as the head variable. echo rm is reached when head is the same string as prev, otherwise prev is set to the value of head.

rowboat
  • 2,791
0

An admittedly brittle solution that uses bash arrays:

#!/bin/bash

workdir='/home/haxiel/testdir' prefixes=( $(ls $workdir | cut -d '_' -f 1-3 | sort | uniq) )

for prefix in ${prefixes[@]}; do files=( $workdir/$prefix* ) unset files[0] echo rm -- ${files[@]} done

I'm using the ls|cut|sort|uniq pipe to build a list of unique prefixes.

Then I loop through the prefixes and use shell globbing to grab all files that match a certain prefix and store it in an array. You want to keep the first one, so I remove that file from the array and pass the rest to an rm command.

This solution assumes that you have filenames without special characters. It also assumes that the shell's sort order matches your expected sort order.

Be sure to put the script outside of the working directory. Otherwise, the script name gets captured as one of the prefixes.

Run this once and examine the output to make sure that you're removing the right files. Then, remove the 'echo' command in front of rm and run it once again.

As always, data removal is a risky process, so use caution and have a backup if you think you'll need it.

Haxiel
  • 8,361