0

I am attempting to use a java based program, CRFVoter, for text mining. I called and executed the script as specified.

[user]$ java -jar CRFVoter.jar
usage: CRFVoter GPRO
Command Line Tool for CRFVoter with GPRO models.
 -h,--help                Print this help.
 -i,--input-text <text>   Input text to be processed.
 -o,--output <path>       Output file.
https://github.com/texttechnologylab/CRFVoter

For testing, I entered the following phrase using the following command.

[user]$ java -jar CRFVoter.jar -i 'Comparison with alkaline phosphatases and 5-nucleotidase'

I get the following output:

ore.api.resources.ResourceObjectProviderBase configure
INFO: Producing resource from file:/N/slate/aplastow/tools/CRFVoter/CRFVoter/target/models/stanford/model.ser.gz
Nov 10, 2021 12:51:50 PM de.tudarmstadt.ukp.dkpro.core.api.resources.ResourceObjectProviderBase loadResource
INFO: Producing resource took 3804ms
Nov 10, 2021 12:51:50 PM de.tudarmstadt.ukp.dkpro.core.api.resources.ResourceObjectProviderBase configure
INFO: Producing resource from jar:file:/N/slate/aplastow/tools/CRFVoter/CRFVoter/target/CRFVoter.jar!/de/tudarmstadt/ukp/dkpro/core/stanfordnlp/lib/ner-en-all.3class.distsim.crf.map
Nov 10, 2021 12:51:50 PM de.tudarmstadt.ukp.dkpro.core.api.resources.ResourceObjectProviderBase loadResource
INFO: Producing resource took 0ms
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.github.jcrfsuite.util.CrfSuiteLoader (file:/N/slate/aplastow/tools/CRFVoter/CRFVoter/target/CRFVoter.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
WARNING: Please consider reporting this to the maintainers of com.github.jcrfsuite.util.CrfSuiteLoader
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Comparison      O
with    O
alkaline        O
phosphatases    O
and     O
5       O
-       O
nucleotidase    O

Now, this is the output I expected from this particular bit of software, which is great. So, I set up a python script to output sentences much like the one I entered into this java CLI. I wanted to be able to pass these program outputs to the java based CLI.

The python CLI help menu looks like this:

[user]$ ./text_parser.py -h
usage: text_parser.py [-h] [-i INDEX]

Seriously, this was just for generating a script which would output into the unix terminal when prompted, that's it

optional arguments: -h, --help show this help message and exit -i INDEX, --index INDEX input index for the element of the line of the gene validation input

To call a sentence from a document I have elsewhere in the cluster, I call it like this:

./text_parser.py -i $an_int

Where $an_int is an int data type input corresponding to the input for the file. This seems to work fine, and indeed, I can run a for loop in bash to output specified rows from the file

Command:

[user]$ for i in {1..2} ; do x=$(./text_parser.py -i $i) ; echo ////////////////////////////////////// ; echo $x ; echo ////////////////////////////////////// ; done

Output:

//////////////////////////////////////
'Pharmacologic aspects of neonatal hyperbilirubinemia.'
//////////////////////////////////////
//////////////////////////////////////
'When CSF [HCO3-] is shown as a function of CSF PCO2 the data of K-depleted rats are no longer displaced when compared to controls but still have a significantly greater slope (1.21 +/- 0.23 vs.'
//////////////////////////////////////

Again, the output is exactly what I would have expected. So I tried to deal the final blow and enter this variable I printed into the java CLI.

Command:

[user]$ for i in {1..2} ; do x=$(./text_parser.py -i $i) ; echo ////////////////////////////////////// ; echo $x ; echo ////////////////////////////////////// ; java -jar CRFVoter.jar -i $x ; done

Output:

//////////////////////////////////////
'Pharmacologic aspects of neonatal hyperbilirubinemia.'
//////////////////////////////////////
Nov 10, 2021 1:06:03 PM de.tudarmstadt.ukp.dkpro.core.api.resources.ResourceObjectProviderBase configure
INFO: Producing resource from file:/N/slate/aplastow/tools/CRFVoter/CRFVoter/target/models/stanford/model.ser.gz
Nov 10, 2021 1:06:06 PM de.tudarmstadt.ukp.dkpro.core.api.resources.ResourceObjectProviderBase loadResource
INFO: Producing resource took 3063ms
Nov 10, 2021 1:06:06 PM de.tudarmstadt.ukp.dkpro.core.api.resources.ResourceObjectProviderBase configure
INFO: Producing resource from jar:file:/N/slate/aplastow/tools/CRFVoter/CRFVoter/target/CRFVoter.jar!/de/tudarmstadt/ukp/dkpro/core/stanfordnlp/lib/ner-en-all.3class.distsim.crf.map
Nov 10, 2021 1:06:06 PM de.tudarmstadt.ukp.dkpro.core.api.resources.ResourceObjectProviderBase loadResource
INFO: Producing resource took 0ms
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.github.jcrfsuite.util.CrfSuiteLoader (file:/N/slate/aplastow/tools/CRFVoter/CRFVoter/target/CRFVoter.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,by
te[],int,int,java.security.ProtectionDomain)
WARNING: Please consider reporting this to the maintainers of com.github.jcrfsuite.util.CrfSuiteLoader
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
'       O
Pharmacologic   O

This is not the entire output, but it captures my problem. The variable $x can be output using the echo command, which would suggest $x carries the information I need. Hence the:

//////////////////////////////////////
'Pharmacologic aspects of neonatal hyperbilirubinemia.'
//////////////////////////////////////

bit, but when I enter this same variable into the Java CLI, it only seems to be able to scan the ' and Pharmacologic tokens, this is not consistent with the result output when the string is entered manually. The output for a manual input ends in something like this:

with    O
alkaline        O
phosphatases    O
and     O
5       O
-       O
nucleotidase    O

Which means the CLI seemed fine with processing a string entered as 'this sentence for inspecting', but it is not able to process a string which is denoted as $x where x='this sentence for inspecting'.

My question is, what gives? Why is echo $x able to produce the expected output, but not the Java CLI with $x as the input?

Is there a workaround here? Perhaps a combination of piping and redirection to input the string?

Even if there is a workaround, why is this happening?

terdon
  • 242,166
  • 1
    I suspect it's just a matter of properly double-quoting the variable expansion to prevent word-splitting by the shell: java -jar CRFVoter.jar -i "$x" – steeldriver Nov 10 '21 at 18:30
  • You just need to use java -jar CRFVoter.jar -i "$x" instead of java -jar CRFVoter.jar -i $x. I am closing this as a duplicate of our standard "my command chokes on whitespace" question. If that doesn't actually solve your issue, leave a comment here and ping me (@terdon) and we can reopen. – terdon Nov 10 '21 at 18:33
  • Huh, I thought I already tried that. It works as expected after I make that addition. Thank you – Alex Plastow Nov 15 '21 at 23:48

0 Answers0