2

I am trying to write a bash script that will first extract files one-by-one from an archive and call a java program with the file name as a parameter to do something. I have tried with the following script, but it does not work.

I am facing the following problem:

Assuming the compressed file name is compressed.7z. Say there are two files inside the compressed.7z archive: sample_1.json and sample_2.json (could be anything). The 7za command always outputs compressed.7z as file name, which I don't want. I just want to get the extracted sample_1.json in the output folder, and give the name to the java command, then sample_2.json in the next iteration.

Could anybody can help in this issues? Thanks in advance.

#!/bin/bash

for file in *.7z do 7za x -ooutput "${file}" | java -jar Remove_BoilerPlate_JSON_Updated.jar "${file}"; done

polemon
  • 11,431
  • 1
    why? can't you normal extract archive only, then just process file names from 7za l on second run? – alecxs Sep 18 '21 at 13:07

2 Answers2

1

This may be not the most efficient way to do this, but here is what you asked.

First, you need the list of files in the archive. You can get it with 7za l. There is an undocumented -ba switch which makes the output easier to process. We can take the last column of that output, which holds the names of the archived files, with awk '{print $NF}'. To get the output of the command as values in your script, we can use command substitution with $() syntax.

You can use e command instead of x in your 7za file extraction command because you take only files and do not need any directory structure from the archive. Do not forget to provide the archive name as an argument.

Following the above, the script would be something like this:

#!/bin/bash

for file in $(7za l -ba compressed.7z | awk '{print $NF}') do 7za x -ooutput compressed.7z "$file" java -jar Remove_BoilerPlate_JSON_Updated.jar output/"$file" done

1

With libarchive's bsdtar + GNU tar, You could do something like:

bsdtar cf - @compressed.7z |
  tar -x --to-program='
    cat > file.json &&
      java -jar Remove_BoilerPlate_JSON_Updated.jar file.json
  '

Where bsdtar reformats the 7z file on the fly to ustar for GNU tar (as GNU tar doesn't support the 7z format), and use GNU tar's --to-program feature, to pipe each member to a program instead of storing it on disk.

Here, we do store the input always into file.json and call java on that file. If java can take the input directly from its stdin, you can just do --to-program='java -jar Remove_BoilerPlate_JSON_Updated.jar' instead. Or possibly --to-program='java -jar Remove_BoilerPlate_JSON_Updated.jar /dev/stdin'.

If it's important that java receives the file name as stored in the archive, you can get it from the $TAR_FILENAME environment variable: --to-program='f=${TAR_FILENAME##*/}; cat > "$f" && java -jar Remove_BoilerPlate_JSON_Updated.jar "$f" && rm -f -- "$f"'