I have a a file containing about 20 million sentences, how can I extract 2 million sentences out from it? I thought about using the split command like this "split -l 2000000 sub2016", but then it will create a series of texts, while I just need one. So how can I specify it? Thank you!
Asked
Active
Viewed 352 times
1 Answers
1
If you want the first two million lines:
head -n 2000000 sub2016
If you want a block of two million lines taken at random within the file:
tail -n +$((RANDOM * RANDOM % 18000000)) sub2016 | head -n 2000000
This picks two random numbers between 0 and 32767, multiplies them, reduces the range to 18 million (20 million minus 2 million), skips that many lines and outputs two million lines.

Stephen Kitt
- 434,908
-
Thank you! I just need the first two million lines. It worked. By the way, if I want to extract 2 million words, how should i specify then? – Fangting Xu Jan 14 '16 at 13:40