I wrote simple bash
script which merge n dicts:
#! /bin/bash
file_out="res.dict"
start_time=$(date +%s)
count_overall_before=0
while read -rep file in "$@"
do
count_in_file=$(wc -l $file | grep -o -P "^\d+")
echo "$count_in_file lines in $file"
let count_overall_before+=count_in_file
done
echo "Lines count before: $count_overall_before"
sort -u "$@" > $file_out
count_overall_after=$(wc -l $file_out | grep -o -P "^\d+")
time_in_seconds=$(($(date +%s)-start_time))
echo "Lines count after: $count_overall_after"
echo "Duplicate lines count: $((count_overall_before-count_overall_after))"
echo "Seconds estimated: $time_in_seconds"
I have a folder with dicts:
$ ls -l PsycOPacK/Base\ Dictionnaries/
total 8260
-rw-r--r-- 1 russich555 russich555 61288 авг 14 2011 actor-givenname
-rw-r--r-- 1 russich555 russich555 223347 авг 14 2011 actor-names
-rw-r--r-- 1 russich555 russich555 146891 авг 14 2011 actor-surname
-rw-r--r-- 1 russich555 russich555 593687 авг 14 2011 att800
-rw-r--r-- 1 russich555 russich555 152 авг 14 2011 computer-companies
-rw-r--r-- 1 russich555 russich555 26211 авг 14 2011 ComputerSoftandHardwareBrand
-rw-r--r-- 1 russich555 russich555 2409 авг 14 2011 dogs
-rw-r--r-- 1 russich555 russich555 18866 авг 14 2011 drugs
-rw-r--r-- 1 russich555 russich555 755065 авг 14 2011 english
-rw-r--r-- 1 russich555 russich555 311251 авг 14 2011 etc_hosts
-rw-r--r-- 1 russich555 russich555 106743 авг 14 2011 family-names
-rw-r--r-- 1 russich555 russich555 4518 авг 14 2011 famous_people
-rw-r--r-- 1 russich555 russich555 70852 авг 14 2011 female-names
-rw-r--r-- 1 russich555 russich555 3187771 авг 14 2011 french
-rw-r--r-- 1 russich555 russich555 11332 авг 14 2011 french-names
-rw-r--r-- 1 russich555 russich555 78190 авг 14 2011 given-names
-rw-r--r-- 1 russich555 russich555 3077 авг 14 2011 internet-domains
-rw-r--r-- 1 russich555 russich555 87507 авг 14 2011 jargon
-rw-r--r-- 1 russich555 russich555 73635 авг 14 2011 junk
-rw-r--r-- 1 russich555 russich555 353216 авг 14 2011 machine_names
-rw-r--r-- 1 russich555 russich555 46923 авг 14 2011 male-names
-rw-r--r-- 1 russich555 russich555 199597 авг 14 2011 movie-characters
-rw-r--r-- 1 russich555 russich555 296977 авг 14 2011 movie-general
-rw-r--r-- 1 russich555 russich555 101854 авг 14 2011 music-rock
-rw-r--r-- 1 russich555 russich555 11401 авг 14 2011 myths-legends
-rw-r--r-- 1 russich555 russich555 6885 авг 14 2011 places
-rw-r--r-- 1 russich555 russich555 176855 авг 14 2011 pocket-dic
-rw-r--r-- 1 russich555 russich555 101854 авг 14 2011 rock
-rw-r--r-- 1 russich555 russich555 167425 авг 14 2011 rock-groups
-rw-r--r-- 1 russich555 russich555 7350 авг 14 2011 science_fiction
-rw-r--r-- 1 russich555 russich555 14804 авг 14 2011 shows
-rw-r--r-- 1 russich555 russich555 9355 авг 14 2011 special_english
-rw-r--r-- 1 russich555 russich555 3946 авг 14 2011 sports
-rw-r--r-- 1 russich555 russich555 1677 авг 16 2011 SportsTeamsCaps.txt
-rw-r--r-- 1 russich555 russich555 1677 авг 16 2011 SportsTeamsLower.txt
-rw-r--r-- 1 russich555 russich555 1677 авг 16 2011 SportsTeamsUpper.txt
-rw-r--r-- 1 russich555 russich555 91426 авг 14 2011 surnames
-rw-r--r-- 1 russich555 russich555 1677 авг 16 2011 teams
-rw-r--r-- 1 russich555 russich555 58000 авг 14 2011 tech
-rw-r--r-- 1 russich555 russich555 63178 авг 14 2011 technical_dictionary
-rw-r--r-- 1 russich555 russich555 12598 авг 14 2011 unix
-rw-r--r-- 1 russich555 russich555 206403 авг 14 2011 Unix.dict
-rw-r--r-- 1 russich555 russich555 206403 авг 14 2011 unix-words
-rw-r--r-- 1 russich555 russich555 13422 авг 14 2011 us-counties
-rw-r--r-- 1 russich555 russich555 222742 авг 14 2011 words-english
-rw-r--r-- 1 russich555 russich555 222349 авг 14 2011 world_factbook
How you can see there is backslash \
in directory path to acronize space after after itself. My target is to have possibility to put few paths as arguments to dicts like this:
$ ./scripts/merge_dicts.sh dict_1 dict_2
And also make it work this way (with unpacking directory):
./scripts/merge_dicts.sh dicts_folder/*
But i can't run it with *
at the end if path contain space
$ ./scripts/merge_dicts.sh PsycOPacK/Base\ Dictionnaries/*
wc: PsycOPacK/Base: No such file or directory
wc: Dictionnaries/actor-givenname: No such file or directory
0 lines in PsycOPacK/Base Dictionnaries/actor-givenname
wc: PsycOPacK/Base: No such file or directory
I red few topics about it, but didn't found right solution. I'm not sure that it's possible to read all args as one line.
Also Tried:
for file in "$@"
while file line in "$@"
while file -r line in "$@"