0

I have a a file containing about 20 million sentences, how can I extract 2 million sentences out from it? I thought about using the split command like this "split -l 2000000 sub2016", but then it will create a series of texts, while I just need one. So how can I specify it? Thank you!

Fangting Xu
  • 61
  • 2
  • 3

1 Answers1

1

If you want the first two million lines:

head -n 2000000 sub2016

If you want a block of two million lines taken at random within the file:

tail -n +$((RANDOM * RANDOM % 18000000)) sub2016 | head -n 2000000

This picks two random numbers between 0 and 32767, multiplies them, reduces the range to 18 million (20 million minus 2 million), skips that many lines and outputs two million lines.

Stephen Kitt
  • 434,908
  • Thank you! I just need the first two million lines. It worked. By the way, if I want to extract 2 million words, how should i specify then? – Fangting Xu Jan 14 '16 at 13:40