3

My question is the opposite of this one.

My input file is a list of locations and serial numbers (like an inventory list). Some of the serial numbers are listed as ranges (e.g., 11-17) and I need to convert each range to the full list of sequential numbers (e.g., 11, 12, 13, 14, 15, 16, 17).

The input format is like this:

Main Street # 12770-12786, 12980, 13012-13013, 13068, 13093, 13115, 13122, 13137-13156, 13548-13557, 13954-13969, 14471-14475, 14500-14508

Madison Ave # 14071-14074, 14105-14128, 14131-14140, 14603-14612

Each location is separated by an empty line. Each location starts with a name. So far I have only seen names containing [a-zA-Z -] which is upper and lower case letters, spaces and dashes. The name starts at column 0 of a new row and it is followed by a space, a hash and a space: #.

For each range in the format nnnn-mmmm, I need to produce a comma (and space) separated list of the sequential values like n1, n2, n3, n4, n5. For example, The inventory for Madison Ave (above) will need to be listed like this:

Madison Ave # 14071, 14072, 14073, 14074, 14105, 14106, 14107, etc.

Input is one text file and output can be one text file. I'd like to do the processing in bash, but I suppose I could use Python also.

I know some possible pieces of the solution, such as:

  1. find the ranges with grep using a regex pattern like this:

    grep -o -P '\d+-\d+'<input_file>
    
  2. assume the first result of that is the range 4243-4263

    echo {4243-4263} | sed 's/-/../'
    
  3. use a for-loop on the above result like this:

    for i in {4243..4263}; do echo $i; done
    

I don't know how to put all that together into a solution. I also assume there is probably a much better way to go about it.

phuclv
  • 2,086
MountainX
  • 17,948

1 Answers1

6

How about perl?

  • match each sequence of one-or-more digits followed by a dash followed by one-or-more digits (\d+)-(\d+)
  • re-write the capture digit sequences as a perl range expression nnnn...mmmm within parentheses
  • evaluate the result as a perl expression, creating an array and then joining it to produce a suitably delimited string

So

$ perl -pe 's/(\d+)-(\d+)/join ", ", ($1..$2)/ge' input
Main Street # 12770, 12771, 12772, 12773, 12774, 12775, 12776, 12777, 12778, 12779, 12780, 12781, 12782, 12783, 12784, 12785, 12786, 12980, 13012, 13013, 13068, 13093, 13115, 13122, 13137, 13138, 13139, 13140, 13141, 13142, 13143, 13144, 13145, 13146, 13147, 13148, 13149, 13150, 13151, 13152, 13153, 13154, 13155, 13156, 13548, 13549, 13550, 13551, 13552, 13553, 13554, 13555, 13556, 13557, 13954, 13955, 13956, 13957, 13958, 13959, 13960, 13961, 13962, 13963, 13964, 13965, 13966, 13967, 13968, 13969, 14471, 14472, 14473, 14474, 14475, 14500, 14501, 14502, 14503, 14504, 14505, 14506, 14507, 14508

Madison Ave # 14071, 14072, 14073, 14074, 14105, 14106, 14107, 14108, 14109, 14110, 14111, 14112, 14113, 14114, 14115, 14116, 14117, 14118, 14119, 14120, 14121, 14122, 14123, 14124, 14125, 14126, 14127, 14128, 14131, 14132, 14133, 14134, 14135, 14136, 14137, 14138, 14139, 14140, 14603, 14604, 14605, 14606, 14607, 14608, 14609, 14610, 14611, 14612
steeldriver
  • 81,074
  • Thanks. Could you explain how it works? I understand the capture groups and join, but I am missing the overall picture of how this works. – MountainX Dec 15 '19 at 00:22
  • 1
    @MountainXforMonicaCellio I have added a brief explanation above - basically it relies on the e flag to evaluate a perl expression within a regex s/pattern/replacement/ substitution – steeldriver Dec 15 '19 at 00:45