1

I would like to number lines according to their content: the first line gets number 1, the second line gets number 2 if it's identical to the first and number 1 if it's different, and so on. For example:

asdf
asdf
asdf
asdf
dfg
dfg
dfg
qwert
qwert
er
qwert
er
asdf

Should result in:

1   asdf
2   asdf
3   asdf
4   asdf
1   dfg
2   dfg
3   dfg
1   qwert
2   qwert
1   er
3   qwert
2   er
5   asdf
martijn
  • 31
  • 2
    Incremental? You are resetting the counter every time there a new item. Or is it a counter and it should resume if the same token is encountered again? – Matteo Sep 05 '12 at 14:23
  • 2
    The question is underspecified. Please look at the comments on @JohnCC's answer, and update the question to clarify the ambiguity. – jw013 Sep 05 '12 at 16:09

4 Answers4

4

Even simpler with awk:-

awk '{ print ++c[$0],$0 }' < test

Where test is the file that contains the data. I made a couple of assumptions here that are not clear from the question. First, I assume the file is already sorted. If not, then:-

sort < test | awk '{ print ++c[$0],$0 }'

Also, I assume that the whole line is significant, and not just the first word if there should be more than one. If you just want to work on the first word then:-

awk '{ print ++c[$1],$0 }' < test

JohnCC
  • 191
  • 1
    But, if asdf occurs again, it will continue numbering, do I understand that correctly? But this was also not clear from the question. I like your approach. – Bernhard Sep 05 '12 at 15:18
  • 1
    Yes, correct. That was why I asked about sorting, since as you say, the question is not very clear. – JohnCC Sep 05 '12 at 15:20
1

You could do this with awk:

number.awk

BEGIN { OFS = "\t" }

last == $1 { cnt += 1}
last != $1 { cnt  = 1 }

{ print cnt, $1; last = $1 }

Run like this:

awk -f number.awk infile
Thor
  • 17,182
0

You can iterate over the input and use a counter

#!/bin/sh                                                                                                                                                     

counter=1
old=""

while IFS= read -r line ; do
    # check if the line is different from the previous one
    if [ "$line" != "$old" ] ; then
        counter=1
    fi
    old="$line"
    printf '%s\t%s\n' "$counter" "$line"
    counter=$((counter+1))
done

You can run the script with:

$ sh scriptname.sh < inputfile
Matteo
  • 9,796
  • 4
  • 51
  • 66
0

If you need something that works independent of whether the input is clustered (i.e. all occurrences of X being after each other) you need to use some counter per each different X. You can e.g. use the following as a filter or with a ommandline parameter, writing to stdout:

#!/usr/bin/env python
import sys, collections
c = collections.Counter()
for line in sys.stdin if len(sys.argv) == 1 else open(sys.argv[1]):
    c[line] += 1
    sys.stdout.write("%s\t%s" % (c[line], line))
Anthon
  • 79,293