2

I have a list of filenames as id-datetime.txt, one each line, where id is always the same and datetime is ordered.

I need first and last datetime so having a sed_script in a variable $script that grabs it I can do

sed -nr "1 $script p; \$ $script'

I get

datetime (oldest)
datetime (newest)

also working with a single-file list.

Now what if I'd like to get also another line with also the id like

id
datetime (oldest)
datetime (newest)

is there an (easy) way of editing line 1 twice so it gives 2 separate lines?

neurino
  • 1,819
  • 3
  • 19
  • 25

2 Answers2

4

Ok, I got it

1 {
    #hold the line
    h
    #extract id
    s|^([0-9]{6}).*|\1|; p
    #put line back again
    g
    #get datetime
    $sed_str
}

this part of the sed script will print (and edit) the first line twice

neurino
  • 1,819
  • 3
  • 19
  • 25
  • +1 for "getting it", and utilizing sed beyong s/x/y/p :) ... It will be handy for other sed tasks, as it certainly works here, but for your specific example (above), you don't really need to use the hold space. You can simply insert \n between \1 (ID) and \2 (datetime)... as per Gilles example (where he uses the line ending as the \n) – Peter.O May 09 '11 at 02:25
3

So you want to extract some data from the first line and the last line? Just use a single command for each, printing two lines the first time.

sed -n -e '1s/^\(.*\)-\(.*\)\.txt$/\1\
\2/p' -e '$s/^\(.*\)-\(.*\)\.txt$/\2/p'

You might find awk clearer if wordier.

awk -F- '{gsub(/\.[^.]*$/,"");
          dt=$2}
         NR==1 {print $1; print $2}
         END {print dt}'

Or Perl.

perl -l -ne 's/\.[^.]*$//;
             /^(.*)-(.*)$/ or next;
             print $1 if $.==1;
             print $2 if $.==1 || eof'
  • I'm not sure but isn't this suppose to parse the file twice, one for each -e? I was already using two different sed calls and wanted to optimize using a single pass, I found how to do it using hold buffer. +1 for awk and perl alternatives, I'll time them to see which is faster, thank you. – neurino May 06 '11 at 14:04
  • 1
    @neurino: sed is a streaming editor, it never backs up in the input file. Multiple -e arguments are equivalent to specifying a multiline script. – Gilles 'SO- stop being evil' May 06 '11 at 14:07
  • @Gilles: I profiled on a dir with 5000+ files in it: time ls -1 | sed -nr -e '1 p' -e '1 p; $ p' takes 159ms, time ls -1 | sed -nr '1 {h; p; g; p}; $ p' 118ms (-25%). – neurino May 06 '11 at 14:19
  • @neurino: I found the times to be effectively the same when I used your version (which still needs a second expression for the last line)... Gilles hasn't used two expressions for your {h; p; g; p}.. He too has used a single expression, with a newline char, to create the two lines you were after.... I did, howerver, need to add a backslash to the end of the first line, so that the newline created by pressing Enter was treated literally as '\n'. – Peter.O May 08 '11 at 10:48
  • @fred: > I found the times to be effectively the same. Do you mean the same execution time for both statements or the same as I stated (-25% for using {h; p; g; p }) ? I repeated profiling several times and had constant results on my linux box. – neurino May 08 '11 at 20:45
  • @neurino.. perhaps I misunderstood your meaning. but it seems that you are trying to compare a single -e {h;p;g;p} expression to two -e expressions to produce the 2-line output you want (for the oldest time).. My comment was in the context of pointing out that Gilles' method has bypassed the need for two -e expressions for the oldest-date, and it is nominally the same "speed" ... Even so, I don't get the results you mention; Here is a link to the test script and the results I got for 100 itterations over 5000 files: http://pastebin.ubuntu.com/605022/ – Peter.O May 09 '11 at 01:30
  • @fred: ok, I got it, don't know why I get different speeds... – neurino May 09 '11 at 07:24