How to generate a new file by extracting some parts?

Question

My initial logs file extract is as follows:

b6227|—|  Thermometer:  CRC matched: computed: 36 == read: 36
b6227|—| SocEvaluator: Final SoC is following to midle range, aka: SoC 64.5537%
b6227|—| SocEvaluator:  Final SoC is 64.5537%
b6227|—| SocEvaluator: Final SoC is following to midle range, aka: SoC 64.5552%

From which I'd like to create a new csv for rendering curves in Excel. The CSV file should look like this:

64.5537

I tried this but didn't make it:

sed -nr 's/ Final SoC is (\d\.\d%)/\1/gp' ~/extremeCold.20220926.log > final.csv

What's wrong?

[UPDATE]

I am running on macOS Monterey (v12.5)

The very number to keep is 64.5537 from the line b6227|—| SocEvaluator: Final SoC is 64.5537% (for you guys having read the before update question, forget about the 64.5536. It was simply the next available SoC).

Please [edit] your question and explain, in words as well as by example, what numbers should be chosen from the file. Where does the 64.5536 come from? Should we only look at lines containing the string Final SoC? Do we need to use ? And what operating system are you using? We need to know to know what tools are available. — terdon, Sep 27 '22 at 09:33
I assume then, the second line of your desired output should actually be 64.5552, not 64.5557? — AdminBee, Sep 27 '22 at 10:35
Nope, one must only keep b6227|—| SocEvaluator: Final SoC is pattern (aka 64.5537 here)and ignore the others lines — Stéphane de Luca, Sep 27 '22 at 10:46
2 lines, each containing a number, isn't a CSV, it's just a text file. A CSV would have Comma Separated Values on each line. It's not at all obvious why your output would not include 64.5552 given the last line of input is b6227|—| SocEvaluator: Final SoC is following to midle range, aka: SoC 64.5552%. — Ed Morton, Sep 27 '22 at 22:54
You seem to only have one line matching your requirement(s), yet you show two lines of predicted (filtered) output? — jubilatious1, Sep 29 '22 at 04:51

Kusalananda · Answer 1 · 2022-09-29T07:23:22.000

This is using sed to match the wanted line(s) and then chop off everything up to the last space:

sed -e '/SocEvaluator:.*Final SoC is [[:digit:]]/!d' -e 's/.* //' file

For the given data, this would output

64.5537%

To remove the % character:

sed -e '/SocEvaluator:.*Final SoC is [[:digit:]]/!d' -e 's/.* //' -e 's/%$//' file

Using awk with exactly the same detecting regular expression, and then printing the last field on each line that this expression matches:

awk '/SocEvaluator:.*Final SoC is [[:digit:]]/ { print $NF }' file

Removing the % sign before printing:

awk '/SocEvaluator:.*Final SoC is [[:digit:]]/ { sub("%$","",$NF); print $NF }' file

The regular expression

SocEvaluator:.*Final SoC is [[:digit:]]

... would match any line that contains the text SocEvaluator: followed later by the text Final SoC is and a digit.

Note that sed and awk on macOS do not understand Perl-compatible regular expressions like \d. Related to this point: Why does my regular expression work in X but not in Y?

GNU sed doesn't recognise \d either. ast-open's sed does (and -r, initially a GNUism, BSD and soon-standard equivalent being -E). — Stéphane Chazelas, Sep 29 '22 at 07:19
@StéphaneChazelas Thanks, I tweaked that point a bit in response to your comment. I'm a bit unfamiliar what expressions GNU tools usually support. — Kusalananda, Sep 29 '22 at 07:24

score 1 · Accepted Answer · edited Sep 29 '22 at 10:51

1

Using sed on Monterey 12.6

$ sed -En '/.* SocEvaluator:  Final SoC is ([0-9.]+)%.*$/s//\1/pwfinal.csv' input_file
64.5537
$ cat final.csv
64.5537

edited Sep 29 '22 at 10:51

Stéphane de Luca

163

answered Sep 27 '22 at 10:45

sseLtaH

2,786

One thing to note: I imported the source file from a colleague working on Windows. I noticed the file is CRLF terminated, which makes the sed not woking. I had to replace by LF. Is there a way to tell sed to handle CRLF as the end of line? – Stéphane de Luca Sep 29 '22 at 09:56
@StéphanedeLuca Unfortunately, I cannnot answer that at this time without importing a similar file myself and testing as I use WSL Linux for files imported from Windows and not the mac. Have you managed to find a solution for the line endings? – sseLtaH Sep 29 '22 at 10:57

score 1 · Answer 3 · answered Sep 29 '22 at 05:42

Using Raku (formerly known as Perl_6)

~$ raku -ne 'put $<> if m/ "Final SoC is " <(\d* \. \d*)> /;'  file
#OR
~$ raku -ne 'put $0 if m/ "Final SoC is " (\d* . \d*) /;' file

Sample Input:

b6227|—|  Thermometer:  CRC matched: computed: 36 == read: 36
b6227|—| SocEvaluator: Final SoC is following to midle range, aka: SoC 64.5537%
b6227|—| SocEvaluator:  Final SoC is 64.5537%
b6227|—| SocEvaluator: Final SoC is following to midle range, aka: SoC 64.5552%

Sample Output:

64.5537

In the first example, Raku's capture markers <( … )> are used to drop the "Final SOC..." text from the match object, and the remaining capture is output using the $<> (or synonymous $/) match variable, subject to an if conditional.

In the second example, parentheses are used to capture a portion of the match into match-variable $<>.[0] which is the same as match-variable $/.[0] which is the same as $0. This $0 capture is output, subject to an if conditional.

https://raku.org

How to generate a new file by extracting some parts?

3 Answers3