3

For example, we have the content

001
002
004
008
010

in a text file named file, how to extract the missing 3 5 6 7 9?

wsdzbm
  • 2,836
  • @don_crissti I need digits, not filename – wsdzbm Mar 12 '16 at 15:53
  • 1
    Nothing stops you to extract the number from the filename. You can adapt any of the answers there to do that. – don_crissti Mar 12 '16 at 15:56
  • @don_crissti some strings in the filename are random. I cannot use the same way in the link. I guess regular expression is necessary. Anyhow, I edited the post to make it a new one. – wsdzbm Mar 12 '16 at 16:12
  • Sure, after a dozen edits it's no longer a duplicate... – don_crissti Mar 12 '16 at 16:13
  • 2
    this should work - comm -23 <(printf '%03d\n' {1..10}) file – iruvar Mar 12 '16 at 18:34
  • @1_CR it's so simple and really works. It's the most wanted answer. can you post it as an answer and it's better to remove the zeros before nontrivial digits, e.g. 2 rather than 002, 12 rather than 012 – wsdzbm Mar 14 '16 at 12:47
  • @don_crissti so, how about remove the duplication remark? – wsdzbm Mar 14 '16 at 12:48
  • It's not a "remark", it's a vote and 4 other people have voted too so I can't "remove" it. You've been here long enough... you should know by now how to properly phrase a question and that changing requirements is strongly discouraged - I mean, look at your post - now it has almost nothing in common with the initial version (and if memory servers me well, this isn't the first time you're changing the requirements; stop doing that). – don_crissti Mar 14 '16 at 12:57
  • 1
    Lee, I cannot post an answer to a closed question. For stripping leading zeroes you may pipe to a post-processor, so that makes it something like comm -23 <(printf '%03d\n' {1..10}) file | awk '{print +$0}' – iruvar Mar 14 '16 at 13:07

3 Answers3

1

My approach is to have control over the size of your numbers for that I would initialize two variable: starting and ending limit and append starting limit to the file name, Loop indefinite, compare start end limit and exit if starting number is greater than ending number, check if file exists and increment start limit.

StartNumber=$1
EndNumber=$2

while true; do
      [ ${StartNumber} -gt ${EndNumber} ] && { exit 0 ; }
      if [ ! -f ${FileName}_${StartNumber} ]; then
       echo ${StartNumber}
      fi
      ((StartNumber+=1))
done

Couple of suggestions from your comments:

  • Try running find command find . -type f and loop thru the results.
  • For every file the above command produces apply echo ${filename} | tr -dc 0-9 to get the numbers only.
  • You would probably get "yyyyddd" use that as your starting limit and compare that with today's date as ending limit.
  • thanks, but I cannot specify a $FileName. it varies. The filename contains timestamp when saving the file so it could be different. – wsdzbm Mar 12 '16 at 16:13
  • Ok, At some point there needs to be something in common or regularity to automate the process otherwise the script gets long and complicated. Can you tell me the timestamp format? – ObiWanKenobi Mar 12 '16 at 16:51
  • 7 digits as yyyyddd – wsdzbm Mar 12 '16 at 16:55
  • @Lee if your files don't match the example you've given us in your question, how are we supposed to be able to come up with valid suggestions? Please edit your question to provide real examples. – Chris Davies Mar 12 '16 at 17:44
1

An awk way:

$ awk 'NR != $1 { for (i = prev + 1; i < $1; i++) {print i} } { prev = $1 + 1 }' file
3
5
6
7
9

More clearly:

awk 'NR != $1 {
  for (i = prev + 1; i < $1; i++) {
    print i
  }
} 
{ 
  prev = $1
}'

For each line, I check if the line number matches the number, and if not, prints every number between the previous number (prev) and the current number (exclusive, hence i = prev + 1).

muru
  • 72,889
1

Assuming your example file is used, the following command

join -a 1 -o 1.1 2.1 -e missed <(seq -f '%03g' $(tail -1 <(sort file))) file | grep missed

will produce this output

003 missed 005 missed 006 missed 007 missed 009 missed

if that's what you need, i can provide some explanations

Tagwint
  • 2,480