I have 5,000 text files of journal article citations. I am trying to extract only the abstract portion. Meaning that I want to keep the same text document and delete all the other text except for abstract. I am very new to Linux and I have been trolling your board for a while.
how to extract words that after keyword
execute command on all file in a directory
for file in test
nano my.sh
while read variable do
sed '0,/^Abstract$/d'
done <file
Here is an example of a file its similar to a scientific journal article
Sponsor : Beckman Res Inst Cty Hope
1500 E. Duarte Road
Duarte, CA 910103000 / -
NSF Program : 1114 CELL BIOLOGY
Fld Applictn: 0000099 Other Applications NEC
61 Life Science Biological
Program Ref : 9285,
Abstract :
Studies of chickens have provided serological and nucleic acid
probes useful in defining the major histocompatibility complex
(MHC) in other avian species. Methods used in detecting genetic
diversity at loci within the MHC of chickens and mammals will be
applied to determining the extent of MHC polymorphism within
small populations of ring-necked pheasants, wild turkeys, cranes,
Andean condors and other species. The knowledge and expertise
gained from working with the MHC of the chicken should make for
rapid progress in defining the polymorphism of the MHC in these
species and in detecting the polymorphism of MHC gene pool within
small wild and captive populations of these birds.
Abstract
and line 10 is blank) and keep lines 11 (Studies of chickens …
) through 21 (… these birds.
)? Is line 21 the last line of the file? If not, how is the command supposed to identify the last line of the abstract? (E.g., is line 22 blank? Or does it contain some other constant heading string?) – Scott - Слава Україні Dec 07 '14 at 20:17