I am attempting to use a java based program, CRFVoter, for text mining. I called and executed the script as specified.
[user]$ java -jar CRFVoter.jar
usage: CRFVoter GPRO
Command Line Tool for CRFVoter with GPRO models.
-h,--help Print this help.
-i,--input-text <text> Input text to be processed.
-o,--output <path> Output file.
https://github.com/texttechnologylab/CRFVoter
For testing, I entered the following phrase using the following command.
[user]$ java -jar CRFVoter.jar -i 'Comparison with alkaline phosphatases and 5-nucleotidase'
I get the following output:
ore.api.resources.ResourceObjectProviderBase configure
INFO: Producing resource from file:/N/slate/aplastow/tools/CRFVoter/CRFVoter/target/models/stanford/model.ser.gz
Nov 10, 2021 12:51:50 PM de.tudarmstadt.ukp.dkpro.core.api.resources.ResourceObjectProviderBase loadResource
INFO: Producing resource took 3804ms
Nov 10, 2021 12:51:50 PM de.tudarmstadt.ukp.dkpro.core.api.resources.ResourceObjectProviderBase configure
INFO: Producing resource from jar:file:/N/slate/aplastow/tools/CRFVoter/CRFVoter/target/CRFVoter.jar!/de/tudarmstadt/ukp/dkpro/core/stanfordnlp/lib/ner-en-all.3class.distsim.crf.map
Nov 10, 2021 12:51:50 PM de.tudarmstadt.ukp.dkpro.core.api.resources.ResourceObjectProviderBase loadResource
INFO: Producing resource took 0ms
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.github.jcrfsuite.util.CrfSuiteLoader (file:/N/slate/aplastow/tools/CRFVoter/CRFVoter/target/CRFVoter.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
WARNING: Please consider reporting this to the maintainers of com.github.jcrfsuite.util.CrfSuiteLoader
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Comparison O
with O
alkaline O
phosphatases O
and O
5 O
- O
nucleotidase O
Now, this is the output I expected from this particular bit of software, which is great. So, I set up a python script to output sentences much like the one I entered into this java CLI. I wanted to be able to pass these program outputs to the java based CLI.
The python CLI help menu looks like this:
[user]$ ./text_parser.py -h
usage: text_parser.py [-h] [-i INDEX]
Seriously, this was just for generating a script which would output into the
unix terminal when prompted, that's it
optional arguments:
-h, --help show this help message and exit
-i INDEX, --index INDEX
input index for the element of the line of the gene
validation input
To call a sentence from a document I have elsewhere in the cluster, I call it like this:
./text_parser.py -i $an_int
Where $an_int
is an int data type input corresponding to the input for the file. This seems to work fine, and indeed, I can run a for loop in bash to output specified rows from the file
Command:
[user]$ for i in {1..2} ; do x=$(./text_parser.py -i $i) ; echo ////////////////////////////////////// ; echo $x ; echo ////////////////////////////////////// ; done
Output:
//////////////////////////////////////
'Pharmacologic aspects of neonatal hyperbilirubinemia.'
//////////////////////////////////////
//////////////////////////////////////
'When CSF [HCO3-] is shown as a function of CSF PCO2 the data of K-depleted rats are no longer displaced when compared to controls but still have a significantly greater slope (1.21 +/- 0.23 vs.'
//////////////////////////////////////
Again, the output is exactly what I would have expected. So I tried to deal the final blow and enter this variable I printed into the java CLI.
Command:
[user]$ for i in {1..2} ; do x=$(./text_parser.py -i $i) ; echo ////////////////////////////////////// ; echo $x ; echo ////////////////////////////////////// ; java -jar CRFVoter.jar -i $x ; done
Output:
//////////////////////////////////////
'Pharmacologic aspects of neonatal hyperbilirubinemia.'
//////////////////////////////////////
Nov 10, 2021 1:06:03 PM de.tudarmstadt.ukp.dkpro.core.api.resources.ResourceObjectProviderBase configure
INFO: Producing resource from file:/N/slate/aplastow/tools/CRFVoter/CRFVoter/target/models/stanford/model.ser.gz
Nov 10, 2021 1:06:06 PM de.tudarmstadt.ukp.dkpro.core.api.resources.ResourceObjectProviderBase loadResource
INFO: Producing resource took 3063ms
Nov 10, 2021 1:06:06 PM de.tudarmstadt.ukp.dkpro.core.api.resources.ResourceObjectProviderBase configure
INFO: Producing resource from jar:file:/N/slate/aplastow/tools/CRFVoter/CRFVoter/target/CRFVoter.jar!/de/tudarmstadt/ukp/dkpro/core/stanfordnlp/lib/ner-en-all.3class.distsim.crf.map
Nov 10, 2021 1:06:06 PM de.tudarmstadt.ukp.dkpro.core.api.resources.ResourceObjectProviderBase loadResource
INFO: Producing resource took 0ms
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.github.jcrfsuite.util.CrfSuiteLoader (file:/N/slate/aplastow/tools/CRFVoter/CRFVoter/target/CRFVoter.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,by
te[],int,int,java.security.ProtectionDomain)
WARNING: Please consider reporting this to the maintainers of com.github.jcrfsuite.util.CrfSuiteLoader
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
' O
Pharmacologic O
This is not the entire output, but it captures my problem. The variable $x
can be output using the echo
command, which would suggest $x
carries the information I need. Hence the:
//////////////////////////////////////
'Pharmacologic aspects of neonatal hyperbilirubinemia.'
//////////////////////////////////////
bit, but when I enter this same variable into the Java CLI, it only seems to be able to scan the '
and Pharmacologic
tokens, this is not consistent with the result output when the string is entered manually. The output for a manual input ends in something like this:
with O
alkaline O
phosphatases O
and O
5 O
- O
nucleotidase O
Which means the CLI seemed fine with processing a string entered as 'this sentence for inspecting', but it is not able to process a string which is denoted as $x
where x='this sentence for inspecting'
.
My question is, what gives? Why is echo $x
able to produce the expected output, but not the Java CLI with $x
as the input?
Is there a workaround here? Perhaps a combination of piping and redirection to input the string?
Even if there is a workaround, why is this happening?
java -jar CRFVoter.jar -i "$x"
– steeldriver Nov 10 '21 at 18:30java -jar CRFVoter.jar -i "$x"
instead ofjava -jar CRFVoter.jar -i $x
. I am closing this as a duplicate of our standard "my command chokes on whitespace" question. If that doesn't actually solve your issue, leave a comment here and ping me (@terdon
) and we can reopen. – terdon Nov 10 '21 at 18:33