See very end for end result.
for i in {1..1000000..1}
do
echo "$i,$(date -d "2017-08-01 + $(shuf -i 1-31 -n 1) days" +'%Y-%m-%d')" >> $F
done;
Shell loops are slow, and there are two main things that makes this particular loop extra slow:
- Opening and appending to a file in each iteration.
- Two executions of external utilities (
shuf
and date
) in each iteration. The echo
is likely built into the shell, so that incurs less overhead.
The output redirection is most easily remedied:
for i in {1..1000000..1}
do
echo "$i,$(date -d "2017-08-01 + $(shuf -i 1-31 -n 1) days" +'%Y-%m-%d')"
done >"$F"
This only open the output file once and keeps it open for the duration of the loop.
The rest of the code can be done more efficiently with awk
and GNU date
(since you're using shuf
I presume that you are on a Linux system, which means it's pretty likely that date
is in fact GNU date
).
awk 'END { for (i=0;i<100;++i) { printf("2017-08-01 + %d days\n", 1+int(31*rand())) }}' /dev/null
This thing generates 100 lines like
2017-08-01 + 22 days
2017-08-01 + 31 days
2017-08-01 + 11 days
2017-08-01 + 27 days
2017-08-01 + 27 days
2017-08-01 + 20 days
(etc.)
Let's feed these into GNU date
. GNU date
has this flag, -f
, that lets us batch feed it with multiple date specifications, for example those outputted by our awk
program:
awk 'END { for (i=0;i<100;++i) { printf("2017-08-01 + %d days\n", 1+int(31*rand())) }}' /dev/null |
date -f - +'%Y-%m-%d'
Now we get
2017-08-23
2017-08-27
2017-08-21
2017-08-29
2017-08-25
2017-08-17
2017-08-07
(etc.)
Then it's just a matter of adding the unique ID (a sequential integer) to each line:
awk 'END { for (i=0;i<100;++i) { printf("2017-08-01 + %d days\n", 1+int(31*rand())) }}' /dev/null |
date -f - +'%Y-%m-%d' |
awk -vOFS=',' '{ print NR, $0 }'
This gives you
1,2017-08-06
2,2017-08-17
3,2017-08-25
4,2017-08-28
5,2017-08-14
6,2017-08-15
7,2017-08-17
8,2017-08-10
9,2017-08-16
10,2017-08-08
(etc.)
And now we're done. And in the process, I totally forgot we had a shell loop. Turns out it's not needed.
Just crank up the 100
to whatever value you want, and adjust the random number generator to fit your needs. rand()
returns a floating point value such that 0 <= number < 1.
Obviously, if you just want random dates in August (a month with 31 days), you may bypass date
altogether:
awk 'END { for (i=1;i<=100;++i) { printf("%d,2017-08-%02d\n", i, 1+int(31*rand())) }}' /dev/null
With GNU awk
and Mike's awk
(mawk
), but not with BSD awk
, you may even do proper date handling directly in awk
:
awk 'END { for (i=1;i<=100;++i) { printf("%d,%s\n", i, strftime("%Y-%m-%d", 1501545600 + int(2678400*rand()),1 )) }}' /dev/null
Now we're dealing with Unix timestamps rather than with days though. 1501545600 corresponds to "Tue Aug 1 00:00:00 UTC 2017" and there are 2678400 seconds in 31 days.