0

I wrote simple bash script which merge n dicts:

#! /bin/bash

file_out="res.dict" start_time=$(date +%s) count_overall_before=0

while read -rep file in "$@" do count_in_file=$(wc -l $file | grep -o -P "^\d+") echo "$count_in_file lines in $file" let count_overall_before+=count_in_file done echo "Lines count before: $count_overall_before"

sort -u "$@" > $file_out count_overall_after=$(wc -l $file_out | grep -o -P "^\d+")

time_in_seconds=$(($(date +%s)-start_time)) echo "Lines count after: $count_overall_after" echo "Duplicate lines count: $((count_overall_before-count_overall_after))" echo "Seconds estimated: $time_in_seconds"

I have a folder with dicts:

$ ls -l PsycOPacK/Base\ Dictionnaries/
total 8260
-rw-r--r-- 1 russich555 russich555   61288 авг 14  2011 actor-givenname
-rw-r--r-- 1 russich555 russich555  223347 авг 14  2011 actor-names
-rw-r--r-- 1 russich555 russich555  146891 авг 14  2011 actor-surname
-rw-r--r-- 1 russich555 russich555  593687 авг 14  2011 att800
-rw-r--r-- 1 russich555 russich555     152 авг 14  2011 computer-companies
-rw-r--r-- 1 russich555 russich555   26211 авг 14  2011 ComputerSoftandHardwareBrand
-rw-r--r-- 1 russich555 russich555    2409 авг 14  2011 dogs
-rw-r--r-- 1 russich555 russich555   18866 авг 14  2011 drugs
-rw-r--r-- 1 russich555 russich555  755065 авг 14  2011 english
-rw-r--r-- 1 russich555 russich555  311251 авг 14  2011 etc_hosts
-rw-r--r-- 1 russich555 russich555  106743 авг 14  2011 family-names
-rw-r--r-- 1 russich555 russich555    4518 авг 14  2011 famous_people
-rw-r--r-- 1 russich555 russich555   70852 авг 14  2011 female-names
-rw-r--r-- 1 russich555 russich555 3187771 авг 14  2011 french
-rw-r--r-- 1 russich555 russich555   11332 авг 14  2011 french-names
-rw-r--r-- 1 russich555 russich555   78190 авг 14  2011 given-names
-rw-r--r-- 1 russich555 russich555    3077 авг 14  2011 internet-domains
-rw-r--r-- 1 russich555 russich555   87507 авг 14  2011 jargon
-rw-r--r-- 1 russich555 russich555   73635 авг 14  2011 junk
-rw-r--r-- 1 russich555 russich555  353216 авг 14  2011 machine_names
-rw-r--r-- 1 russich555 russich555   46923 авг 14  2011 male-names
-rw-r--r-- 1 russich555 russich555  199597 авг 14  2011 movie-characters
-rw-r--r-- 1 russich555 russich555  296977 авг 14  2011 movie-general
-rw-r--r-- 1 russich555 russich555  101854 авг 14  2011 music-rock
-rw-r--r-- 1 russich555 russich555   11401 авг 14  2011 myths-legends
-rw-r--r-- 1 russich555 russich555    6885 авг 14  2011 places
-rw-r--r-- 1 russich555 russich555  176855 авг 14  2011 pocket-dic
-rw-r--r-- 1 russich555 russich555  101854 авг 14  2011 rock
-rw-r--r-- 1 russich555 russich555  167425 авг 14  2011 rock-groups
-rw-r--r-- 1 russich555 russich555    7350 авг 14  2011 science_fiction
-rw-r--r-- 1 russich555 russich555   14804 авг 14  2011 shows
-rw-r--r-- 1 russich555 russich555    9355 авг 14  2011 special_english
-rw-r--r-- 1 russich555 russich555    3946 авг 14  2011 sports
-rw-r--r-- 1 russich555 russich555    1677 авг 16  2011 SportsTeamsCaps.txt
-rw-r--r-- 1 russich555 russich555    1677 авг 16  2011 SportsTeamsLower.txt
-rw-r--r-- 1 russich555 russich555    1677 авг 16  2011 SportsTeamsUpper.txt
-rw-r--r-- 1 russich555 russich555   91426 авг 14  2011 surnames
-rw-r--r-- 1 russich555 russich555    1677 авг 16  2011 teams
-rw-r--r-- 1 russich555 russich555   58000 авг 14  2011 tech
-rw-r--r-- 1 russich555 russich555   63178 авг 14  2011 technical_dictionary
-rw-r--r-- 1 russich555 russich555   12598 авг 14  2011 unix
-rw-r--r-- 1 russich555 russich555  206403 авг 14  2011 Unix.dict
-rw-r--r-- 1 russich555 russich555  206403 авг 14  2011 unix-words
-rw-r--r-- 1 russich555 russich555   13422 авг 14  2011 us-counties
-rw-r--r-- 1 russich555 russich555  222742 авг 14  2011 words-english
-rw-r--r-- 1 russich555 russich555  222349 авг 14  2011 world_factbook

How you can see there is backslash \ in directory path to acronize space after after itself. My target is to have possibility to put few paths as arguments to dicts like this:

$ ./scripts/merge_dicts.sh dict_1 dict_2

And also make it work this way (with unpacking directory):

./scripts/merge_dicts.sh dicts_folder/*

But i can't run it with * at the end if path contain space

$ ./scripts/merge_dicts.sh PsycOPacK/Base\ Dictionnaries/*
wc: PsycOPacK/Base: No such file or directory
wc: Dictionnaries/actor-givenname: No such file or directory
0 lines in PsycOPacK/Base Dictionnaries/actor-givenname
wc: PsycOPacK/Base: No such file or directory

I red few topics about it, but didn't found right solution. I'm not sure that it's possible to read all args as one line.

Also Tried:

  1. for file in "$@"
  2. while file line in "$@"
  3. while file -r line in "$@"

1 Answers1

0

Using double quotes when referring a variable (wc "$line") ensures that the variable expands to one argument even if its value contains spaces.

Following this simple pattern, you don't need to worry about spaces in your usecase: on one hand, * expands into several arguments, one for every file (even for file names containing spaces); on the other hand, "$@" expands to several arguments, one for every positional parameter (even for parameters containing spaces).

However, your example script is broken in several other ways too:

  1. You use variable $file but never define it; you probably mean $line.
  2. while ... in is not a bash construct; use for ... in

As a minimal working example, we can have following merge_dicts.sh

#!/bin/bash

for filename in "$@"; do wc -l "$filename" done

Then

$ ./merge_dicts.sh PsycOPacK/Base\ Dictionnaries/*
0 PsycOPacK/Base Dictionnaries/first file.txt
0 PsycOPacK/Base Dictionnaries/second file.txt
trosos
  • 339
  • using quotes outside variable is working! $file -> "$file". And yes, i made few typos while was trying other solutions...edited them in question. Thank you a lot! – 555Russich Feb 13 '23 at 22:14