_linc() ( ${sh-da}sh ${dbg+-vx} 4<&0 <&3 ) 3<<-ARGS 3<<\CMD
set -- $( [ $((i=${1%%*[!0-9]*}-1)) -gt 1 ] && {
shift && echo "\${inc=$i}" ; }
unset cmd ; [ $# -gt 0 ] || cmd='echo incr "#$((i=i+1))" ; cat'
printf '%s ' 'me=$$ ;' \
'_cmd() {' '${dbg+set -vx ;}' "$@" "$cmd" '
}' )
ARGS
s= ; sed -f - <<-INC /dev/fd/4 | . /dev/stdin
i_cmd <<"${s:=${me}SPLIT${me}}"
${inc:+$(printf '$!n\n%.0b' `seq $inc`)}
a$s
INC
CMD
The above function uses sed
to apply its argument list as a command string to an arbitrary line increment. The commands you specify on the command line are sourced into a temporary shell function which is fed a here document on stdin consisting of every increment's step worth of lines.
You use it like this:
time printf 'this is line #%d\n' `seq 1000` |
_linc 193 sed -e \$= -e r \- \| tail -n2
#output
193
this is line #193
193
this is line #386
193
this is line #579
193
this is line #772
193
this is line #965
35
this is line #1000
printf 'this is line #%d\n' `seq 1000` 0.00s user 0.00s system 0% cpu 0.004 total
The mechanism here is very simple:
i_cmd <<"${s:=${me}SPLIT${me}}"
${inc:+$(printf '$!n\n%.0b' `seq $inc`)}
a$s
That's the sed
script. Basically we just printf $increment * n;
. So if you set your increment to 100 printf
will write you a sed
script consisting of 100 lines that say only $!n
, one insert
line for the top end of the here-doc, and one append
for the bottom line - that's it. Most of the rest just handles options.
The n
ext command tells sed
to print the current line, delete it, and pull in the next one. The $!
specifies that it should only try on any line but the last.
Provided only an incrementer it will:
printf 'this is line #%d\n' `seq 10` | ⏎
_linc 3
#output
incr #1
this is line #1
this is line #2
this is line #3
incr #2
this is line #4
this is line #5
this is line #6
incr #3
this is line #7
this is line #8
this is line #9
incr #4
this is line #10
So what's happening behind the scenes here is the function is set to echo
a counter and cat
its input if not provided a command string. If you saw it on the command line it would look like:
{ echo "incr #$((i=i+1))" ; cat ; } <<HEREDOC
this is line #7
this is line #8
this is line #9
HEREDOC
It executes one of these for every increment. Look:
printf 'this is line #%d\n' `seq 10` |
dbg= _linc 3
#output
set -- ${inc=2}
+ set -- 2
me=$$ ; _cmd() { ${dbg+set -vx ;} echo incr "#$((i=i+1))" ; cat
}
+ me=19396
s= ; sed -f - <<-INC /dev/fd/4 | . /dev/stdin
i_cmd <<"${s:=${me}SPLIT${me}}"
${inc:+$(printf '$!n\n%.0b' `seq $inc`)}
a$s
INC
+ s=
+ . /dev/stdin
+ seq 2
+ printf $!n\n%.0b 1 2
+ sed -f - /dev/fd/4
_cmd <<"19396SPLIT19396"
this is line #1
this is line #2
this is line #3
19396SPLIT19396
+ _cmd
+ set -vx ; echo incr #1
+ cat
this is line #1
this is line #2
this is line #3
_cmd <<"19396SPLIT19396"
REALLY FAST
time yes | sed = | sed -n 'p;n' |
_linc 4000 'printf "current line and char count\n"
sed "1w /dev/fd/2" | wc -c
[ $((i=i+1)) -ge 5000 ] && kill "$me" || echo "$i"'
#OUTPUT
current line and char count
19992001
36000
4999
current line and char count
19996001
36000
current line and char count
[2] 17113 terminated yes |
17114 terminated sed = |
17115 terminated sed -n 'p;n'
yes 0.86s user 0.06s system 5% cpu 16.994 total
sed = 9.06s user 0.30s system 55% cpu 16.993 total
sed -n 'p;n' 7.68s user 0.38s system 47% cpu 16.992 total
Above I tell it to increment on every 4000 lines. 17s later and I've processed 20
million lines. Of course the logic isn't serious there - we only read each line twice and count all of their characters, but the possibilities are pretty open. Also if you look closely you might notice it's seemingly the filters providing the input that are taking the majority of the time anyway.
BULK INSERT
? Separate the data from the SQL statement. – bsd Apr 20 '14 at 11:47