4

I am trying to:

  1. Always add 0 0 0 to the first row of the file.
  2. Multiply 2*pi or 6.2832 to the first column of a three-column file formatted similar below, but only if the line begins with a numeral. The second and third columns are retained as is.
  3. Append a * on the start of the line if it does not start with a numeral, except if it already is a *. i.e. just comment out the current row, except if it is already commented out.

This is a sample input file:

* radius, section, index
1.12 A 0
2.0 A 1
   * There is white space before this comment
* This is a comment indicating a new section
5 B 0
3.17 B 1
7.3 B 7
This row starts with an alphabet char and should be commented out by the script.
0 C 1
1 C 2

And here is the intended output:

0 0 0
* radius, section, index
7.037184 A 0
12.5664 A 1
* There is white space before this comment
* This is a comment indicating a new section
31.416 B 0
19.917744 B 1
45.86736 B 7
* This row starts with an alphabet char and should be commented out by the script.
0 C 1
6.2832 C 2

What I've done so far:

for tempfile in *.txt; do
    echo '0 0 0' > temp
    cat $tempfile >> temp
    awk '{$1*=6.2832}{print}' temp > $tempfile
    #awk '/^(0-9)/{$1*=6.2832}{print}' temp > $tempfile
rm temp
done

But what this script does to the sample use-case above:

0 0 0
0 radius, section, index
7.03718 A 0
12.5664 A 1
0 There is white space before this comment
0 This is a comment indicating a new section
31.416 B 0
19.9177 B 1
45.8674 B 7
0 row starts with an alphabet char and should be commented out by the script.
0 C 1
6.2832 C 2

PS. The Linux box is off-grid and does not have mlr. I also do not have root/admin credentials.

Really appreciate any help from the community. Thanks in advance,

C

Carla H.
  • 161
  • 1
    hello and welcome to unix.se . I do not understand the 3rd section of the input file: "This row starts with an alphabet char and should be ignored by the script" : I see no beginning alphabet char, just a "C" in 2nd column (but all other lines you want to modify also have letters in the 2nd column) – Olivier Dulac Aug 11 '23 at 08:50
  • That comment row has been changed to "This row starts with an alphabet char and should be commented out by the script" and refers to the comment itself. The row begins with the alphabet character "T" from "This row...". Sorry for the confusion! – Carla H. Aug 13 '23 at 06:32

4 Answers4

5

Maybe just use awk to do the whole thing:

awk '
 ( FNR==1 ) { print "0 0 0" } # add a first line containing "0 0 0"
 /^[0-9]/   { $1 *= 6.2832 }
 /^[a-zA-Z]/ { $0="* " $0 } # comment lines that start with a letter
 1          # always true, and no {action} speficied:
            # does the default "print $0" action and thus prints every line
' your_input_file > output_file
terdon
  • 242,166
  • I updated to include the "comment ..." part – Olivier Dulac Aug 11 '23 at 08:53
  • 1
    That wouldn't quite produce the expected output, in particular it wouldn't left-shift the * There is white space before this comment string to the start of the line. It'd also fail if a to-be-commented line started with something other than a letter such as an underscore or $ or any other character, maybe change [a-zA-Z] to [^0-9]. – Ed Morton Aug 12 '23 at 13:31
  • 1
    Thank you so much for your time on this! I am able to make the script work now. – Carla H. Aug 13 '23 at 07:14
5

Using any POSIX awk:

$ cat tst.awk
NR == 1 {
    CONVFMT = "%.17g"
    pi = atan2(0, -1)
    two_pi = 2 * pi
    print 0, 0, 0
}
{
    if ( $1 ~ /^[0-9]/ ) {
        $1 *= two_pi
    }
    else {
        sub(/^[[:space:]]*\*?[[:space:]]*/,"* ")
    }
    print
}

$ awk -f tst.awk file
0 0 0
* radius, section, index
7.03718 A 0
12.5664 A 1
* There is white space before this comment
* This is a comment indicating a new section
31.416 B 0
19.9177 B 1
45.8674 B 7
* This row starts with an alphabet char and should be ignored by the script.
0 C 1
6.2832 C 2
Ed Morton
  • 31,617
  • You may prefer to add the 0 0 0 in a BEGIN statement rather than a NR == 1 one so it also be added to an empty file. – Stéphane Chazelas Aug 11 '23 at 15:31
  • It should also be noted that it may affect the spacing in lines where the first field starts with a digit. – Stéphane Chazelas Aug 11 '23 at 15:31
  • @StéphaneChazelas I deliberately put it in that NR==1 block so it would not be added to an empty file (no output is usually best given no input) but the OP can always move it to BEGIN if they really want that. – Ed Morton Aug 11 '23 at 15:32
  • I feel pretty confident that the OP won't care about white space in the numeric lines, but if they do it's an easy tweak. – Ed Morton Aug 11 '23 at 15:33
  • 1
    Thank you so much! I've added -v CONVFMT=%.17g when calling the awk script and hardcoded 2*pi to 6.28318530717958647692 as suggested so it gives more digits/precision. – Carla H. Aug 13 '23 at 07:17
  • 1
    You're welcome. You don't need to hard-code pi, I updated my answer to show how to calculate it. – Ed Morton Aug 13 '23 at 11:22
3

With perl:

perl -MMath::Trig -pe '
  BEGIN{print "0 0 0\n"; $x = 2 * pi}
  unless (s{^\d[\d.]*}{$& * $x}e || /^\s*\*/) {
    $_ = "* $_";
  }' your-file

Here prefixing with "* " the lines that don't start with a digit and don't start with any amount of whitespace followed by a *.

Prefixing with "* " only the lines that start with an alphabet would be:

perl -C -MMath::Trig -pe '
  BEGIN{print "0 0 0\n"; $x = 2 * pi}
  if (/^\pL/) {
    $_ = "* $_";
  } else {
    s{^\d[\d.]*}{$& * $x}ae;
  }' your-file

(here adding the -C option (and assuming the file is encoded in UTF-8 and the locale uses that character encoding) so that it's not limited to ASCII alphabets; but adding the a flag to the s{...}{...}ae substitution so \d only matches ASCII decimal digits as perl can only do calculations with the ASCII decimal digits, not those of other scripts).

Without the Math::Trig module (used for the pi constant), you could change the $x factor definition to:

$x = 2 * atan2(0, -1)

Or hard code it:

$x = 6.28318530717958647692

You should use as much precision as possible (and useful) when doing the calculation, and only truncate numbers for the final result or otherwise after multiplication, the errors can end up being amplified.

For instance, if you wanted those circumferences to be given up to the micrometre (assuming those numbers are expressed in metres), you'd do:

perl -MMath::Trig -pe '
  BEGIN{print "0 0 0\n"; $x = 2 * pi}
  unless (s{^\d[\d.]*}{sprintf "%.6f", $& * $x}e || /^\s*\*/) {
    $_ = "* $_";
  }' your-file

With a 10.000000 metre radius circle, with your 6.2832 factor, you'd get 62.832000 circumference while it's actually 62.831853.

1

It seems like you're trying to handle a larger task than the title suggests.

This is a full bash script. It builds on Ed Morton's awesome answer, which I couldn't really improve

Benefits:

  1. Magic numbers (TAU) extracted to variables
  2. Error handling (TODO)
  3. Easier to run with input argument & optional output argument
  4. Progress reporting in case there are many files
  5. Avoids spacing issues (see NOTEs)
  6. Interactive and standalone execution

Drawbacks:

  1. bash isn't sh
  2. So many lines (even without the commentary)
#!/bin/bash

TODO: Update me for more accuracy

TAU=6.2832

TODO: handle errors

trap on_err ERR on_err() { echo "failed!" >&2; exit 1; }

awk processing for input files

For every line starting with a number,

multiply the first number by tau (pi * 2)

For every line not starting with a number,

change the line to start with a *

Finally, print the line with our changes

AWK_PROG="{ if ($1 ~ /^[[:space:]][0-9]/) { $1 = $TAU } else { sub(/^[[:space:]]\*?[[:space:]]/, "* ") } print }"

Process all files in a directory

Simple usage (reads input from /my/input/*.txt)

process_all_files /my/input

Advanced usage (writes output to /my/output/*.txt)

process_all_files /my/input /my/output

process_all_files() {

Arguments

First argument is the input directory

Second argument (optional) is the output directory

(a tempfile by default)

local in_dir="$1" local out_dir="${2:-$(mktemp -d)}"

List all the files in the input directory (save list in tempfile)

NOTE: xargs details

https://unix.stackexchange.com/questions/175844/use-basename-in-find-exec

local file_list="$(mktemp)" find "$in_dir" -iname '*.txt' -print0
| xargs -0 -n1 -- basename
> "$file_list"

Log state

Total number of files

local n=$(awk 'END{print NR}' "$file_list")

Current file number

i=0

Define log function to print to stderr

NOTE: change log formatting here

log() { echo " $i/$n :: $*" >&2; }

Log the input directory (and number of files to process)

log "$in_dir/*.txt"

Iterate over the list of files

NOTE: using while read avoids word splitting issues

https://www.shellcheck.net/wiki/SC2044

cat "$file_list" | while read -r file; do # Update current file number and log the file ((i++)) log "$file"

# Do the actual processing
#
# 1. Write the first line of the file
# 2. Process the rest of the file with awk
#
# Whitespace alignment for clarity:
#   First line truncates the output file if it already exists
#   Second line appends to the file
#
echo 0 0 0           >  "$out_dir/$file"
awk "$AWK_PROG" "$1" >> "$out_dir/$file"

done }

Only run the main function if this script was executed directly

If this script is sourced by another script or an interactive session,

the function will not run until called directly

If this script is loaded over the network,

it will not execute until it reaches the last line

It won't execute if it fails halfway through, for example.

[[ "$0" == "${BASH_SOURCE[0]}" ]] && process_all_files "$@"

  • Thanks for such a detailed answer! I like that it does progress reports for many files. The commentary really helps me knowing what the script does (lots that I have to learn yet in Linux scripting/tools). – Carla H. Aug 21 '23 at 03:11