separating declarative sentences from an input file into another file

Question

We want to check whether this sentence is declarative or now and then to store it in declarative.txt file then put the rest of the sentences in others.txt file and then put the number of lines of the declarative.txt at the end of the file.

A sentence is "declarative" if it ends with a full stop.

input file ($1)

this life is good.
neverthe less.
suppppppppppppppppperb.
the best coders.
everything is good?
are you okay dude?
ma man !!

my code sofar

#!/bin/sh
while read row
do
x=$row | grep "\.$"
y=$row | grep -v "\.$"
echo $x >> declarative.txt
echo $y >> others .txt
done < $1
cnt=`wc -l declarative.txt`
echo $cnt >> declarative.txt

Learn to use GNU awk or code something in C using strchr(3) – Basile Starynkevitch Apr 15 '20 at 13:03 — Basile Starynkevitch, Apr 15 '20 at 13:03

Kusalananda · Accepted Answer · 2020-04-16T09:23:26.140

To divide the lines in the input file into those that end with a dot and those that don't, assuming there is a single sentence per line, and save them in two different output files, you can use grep twice like so:

grep    '\.$' "$1" >declarative.txt
grep -v '\.$' "$1" >others.txt

There is no need to loop over the lines in a shell loop (in fact, it is discouraged). Unix tools that handle text files has built-in loops that does this already, so grep, for example, will apply the regular expression to each line of input data in turn and output the ones that matches.

You could also get away with parsing the input file only a single time, with e.g. awk:

awk '/\.$/ { print >"declarative.txt"; next }
           { print >"others.txt" }' "$1"

This triggers the block that prints the current line to the file declarative.txt if the line ends with a dot. The other block will be triggered for all other lines.

... or with sed:

sed -n -e '/\.$/w declarative.txt' \
       -e '//!w others.txt' "$1"

This writes the current line to declarative.txt if it ends with a dot, and to others.txt if it doesn't. The empty // expression means "re-use the last regular expression", and the ! means "do this if the expression did not match".

thanks a lot ... that's a piece of great information, you are the best )) — , Apr 15 '20 at 12:37

terdon · Answer 2 · 2020-04-15T12:36:39.540

4

This is not a valid way of identifying declarative sentences. For one thing, none of yours start with a capital letter and many aren't even sentences at all. But if you just want to separate the lines of your input file into two files, one containing those lines that end with a full stop and the other containing the rest, you could just use awk:

awk '{/\.$/ ? f="fullStop" : f="others"; print > f}' file

If you really need to do this as a shell script, you could simply use:

#!/bin/sh
awk '{/\.$/ ? f="fullStop" : f="others"; print > f}' "$1"

And if it must be a shell loop (which is not a good idea), you can do:

#!/bin/bash
while IFS= read -r line; do 
    [[ $line =~ \.$ ]] && 
        echo "$line" >> fullStop || 
        echo "$line" >> others
    done < "$1"

Or, if you can't use bash-specific features:

#!/bin/sh
while IFS= read -r line; do 
    printf '%s\n' "$line" | grep -q '\.$' && 
    echo "$line" >> fullStop || 
    echo "$line" >> others
done < "$1"

edited Apr 15 '20 at 12:36

answered Apr 15 '20 at 12:26

terdon

242,166

Question regarding speed: Does resetting the variable f slow the process down over e.g. '{ if ( /\.$/ ) { print > "dots" } else { print > "no_dots" } }' - I assume the "if"-query will be of the same speed in either way of writing it? – FelixJN Apr 15 '20 at 12:33
1

@Fiximan For a simple variable assignment to have any sort of impact on performance, you would have to measure very carefully, and on absolutely massive amounts of data. In short, it makes no difference. – Kusalananda Apr 15 '20 at 12:35
1

@Fiximan huh, interesting. I really doubt there would ever be a discernible difference, but I don't know. This is very, very unlikely to ever be relevant though. Even for files of several gigabytes. – terdon Apr 15 '20 at 12:35

separating declarative sentences from an input file into another file

input file ($1)

my code sofar

2 Answers2