-8

In a Amazon s3 bucket we have Debian packages stored in different folders. Each folder contains different amounts of files.

While calling Debian packages from the s3 bucket (AWS) the packages are separated with spaces. Now I need to convert those space-separated packages lists into newline-separated list, i.e. one package file per line. The input lines don't contain equal amounts of spaces.

Each directory contains the different numbers of Debian packages and at last after converting packages into line-by-line will store all packages (of different folder) in one folder file.

  • input example:
    package1.deb  package2.deb    pacakge3.deb      pacakge4.deb package5.deb
    
  • desired output:
    package1.deb  
    package2.deb  
    package3.deb  
    pacakge4.deb
    package5.deb
    

This is the current attempt for a function running in the background for different folders of s3 bucket:

function convertSpaceToNewLine(){
    for line in filename; do
       cat $line| grep '.deb$'|tr [:space:] \\t | sed 's/\t\t*/\n/g' >> folder/newfile
    done
}

I have tired many commands like truncate, awk, xargs -n 1, and sed.

AdminBee
  • 22,803
Jaraar
  • 1
  • 1
    Ok, the picture gets clearer now. I can provide you with a relatively easy solution to the immediate problem; however, it may be worthwhile looking at your convert function to find ways of improving that. If you state for line in filename, this implies that you are iterating over a list of files, because cat is used to output the content of a file. However, filename is not used as a variable (like in $filename) but a single string, so the for loop would only run once, with $line set to (literally) filename. Is that really what you intend to do? – AdminBee Jan 19 '23 at 09:17

3 Answers3

0

The immediate solution to the problem "convert a list of space-separated strings to a list of newline-separated strings" is rather straightforward:

awk '{for (i=1;i<=NF;i++) {print $i}}' input_file1 input_file2 ... > output_file

By default, awk splits lines into individual fields at "whitespace" (i.e. any number of consecutive spaces or tabs), so this program simply iterates over all fields (=package file names) of each line and prints the fields individually, one field per line. If a line contains no fields, there will also be no output for that line, so empty lines are not a problem.

awk is able to process more than one input file, so there is no need for a loop, either.

However, the underlying task seems to be somewhat more involved, so for a more comprehensive solution, you would need to provide more details in the question.

AdminBee
  • 22,803
0

Space-separated data is trivial to work with in bare bash, requiring no external programs at all. Well, I suppose cat qualifies as an external program.

Still:

$ cat << EOF > test.sh 
set -- $(cat)
printf '%s\n' "$@"
EOF
$ chmod 755 test.sh
$ cat << EOF > inputfile 
one two three four five six

seven eight

nine ten eleven

12, 13, 14

15,16

EOF $ ./test.sh < inputfile one two three four five six seven eight nine ten eleven 12, 13, 14 15,16

OTOH,

we have Debian packages stored in different folders. Each folder contains different amounts of files.

If what you are really trying to do is create a list of all the package files you have in a given directory tree, one filename per line, and without any path information, then:

$ find path/to/your/packages/ -name \*.deb -type f -exec basename {} \;
Jim L.
  • 7,997
  • 1
  • 13
  • 27
0

Using Raku (formerly known as Perl_6)

~$ perl6 -ne '.put for .words;'  Jarrar.txt

If you're simply looking to read files off the command line, Raku can take their contents, break it into whitespace-separated .words and return one word (i.e. filename) per line.

Sample Input, filename Jarrar.txt (thanks to @Jim_L):

one two three four five six

seven eight

nine ten eleven

12, 13, 14

15,16

Sample Output:

one
two
three
four
five
six
seven
eight
nine
ten
eleven
12,
13,
14
15,16

OTOH, If you want to look at a number of files within a directory, you can use Raku's dir() function, which can return a file listing of .IO objects:

~$ raku -e 'for dir("$*CWD/subdir") {.IO.say};'
"file1.jpg".IO
"file2.png".IO
"Jarrar.txt".IO

Once you've located the correct dir() with the desired file(s), you can test files against a pattern to only return the desired content:

~$ raku -e 'for dir(test => "*.txt") {.words.join("\n").put};'
one
two
three
four
five
six
seven
eight
nine
ten
eleven
12,
13,
14
15,16

https://docs.raku.org/routine/dir
https://raku.org

jubilatious1
  • 3,195
  • 8
  • 17