How to create a large file in UNIX?

Question

I found a way in Windows to do such thing

echo "This is just a sample line appended  to create a big file. " > dummy.txt
for /L %i in (1,1,21) do type dummy.txt >> dummy.txt

http://www.windows-commandline.com/how-to-create-large-dummy-file/

Is there a way in UNIX to copy a file, append and then repeat the process? Something like for .. cat file1.txt > file1.txt?

for i in {1..1000000};do echo "string" >> file;done in bash. — 123, Mar 11 '16 at 15:00
Does it have to be a text file? You can make any size of file from /dev/zero or /dev/urandom. — RealSkeptic, Mar 11 '16 at 15:06
I'd expect type file >> file to run in an infinite loop (at least as soon as it's sufficiently large that it doesn't fit in a buffer). — Stéphane Chazelas, Mar 11 '16 at 15:14
@StéphaneChazelas What, why would that be an infinite loop? type is going to load the contents of file into memory, then open file for writing and append the loaded contents, then exit. — cat, Mar 11 '16 at 22:41
@tac, you mean if the file is 5TB large, type is going to load it whole in memory before display? After all it's Microsoft, why would I be surprised... — Stéphane Chazelas, Mar 12 '16 at 07:07
@StéphaneChazelas I'm pretty sure that type would just run out of memory, because NTFS and the NT kernel lock any file that's opened for reading / writing, unlike POSIX where there are no implicit locks. — cat, Mar 12 '16 at 13:41
@StéphaneChazelas I'm also pretty certain there aren't any seek-esque syscalls in NT, so you have no choice but to malloc the length of the file at once. — cat, Mar 12 '16 at 13:44

Stéphane Chazelas · Accepted Answer · 2016-03-12T13:51:43.390

44

yes "Some text" | head -n 100000 > large-file

With csh/tcsh:

repeat 10000 echo some test > large-file

With zsh:

{repeat 10000 echo some test} > large-file

On GNU systems, see also:

seq 100000 > large-file

Or:

truncate -s 10T large-file

(creates a 10TiB sparse file (very large but doesn't take any space on disk)) and the other alternatives discussed at "Create a test file with lots of zero bytes".

Doing cat file >> file would be a bad idea.

First, it doesn't work with some cat implementations that refuse to read files that are the same as their output file. But even if you work around it by doing cat file | cat >> file, if file is larger than cat's internal buffer, that would cause cat to run in an infinite loop as it would end up reading the data that it has written earlier.

On file systems backed by a rotational hard drive, it would be pretty inefficient as well (after reaching a size greater than would possibly be cached in memory) as the drive would need to go back and forth between the location where to read the data, and that where to write it.

edited Mar 12 '16 at 13:51

answered Mar 11 '16 at 15:07

Stéphane Chazelas

544,893

20

Or dd if=/dev/zero of=large-file bs=1024 count=1024 for a 1MB file – doneal24 Mar 11 '16 at 15:13
7

@DougO'Neal I find dd if=/dev/zero of=test bs=1M count=1 to be clearer. – 123 Mar 11 '16 at 15:27
4

@DougO'Neal, see Create a test file with lots of zero bytes – Stéphane Chazelas Mar 11 '16 at 15:31
1

Or use /dev/urandom instead of /dev/zero if you want random data. – user253751 Mar 12 '16 at 01:06
1

I appreciate Stéphane's answers but I think of Doug's as the typical one. – roberto tomás Mar 12 '16 at 01:50
@robertotomás, again, if it's to create a large file filled with zeroes, see Create a test file with lots of zero bytes linked in the answer, where dd if=/dev/zero is the least efficient. – Stéphane Chazelas Mar 12 '16 at 07:10
hi @StéphaneChazelas , I'm a little confused. Why did you write back to tell me that? The answer you've already written has several inefficient solutions at the top. I actually thought we were in agreement that efficiency doesnt matter for such small "large files". My comment was just pointing out that one of the answers in this discussion is in my opinion the typical lazy solution. – roberto tomás Mar 12 '16 at 13:22
1

@robertotomás, why would it be typical or lazy? It's longer to type than the truncate or fallocate or dd seek=xxx ones, so less lazy in terms of human effort and also a lot less lazy in terms of computer effort. – Stéphane Chazelas Mar 12 '16 at 13:47
as you can see in the comment above, and other answers below, it is in fact a common answer. everyone uses dd, and it is an important tool to know if you find yourself doing cli admin tasks. /dev/zero might be relatively slow but you can always use a form like yes|dd of=... if you prefer. answers that do not use dd don't reinforce the memory of essential admin tools. — again, I want to reiterate, and this is visible in my initial comment, I don't think you are wrong, neither in your answers nor in the comment that dd if=/dev/zero can be slow. I was just talking about what is common – roberto tomás Mar 12 '16 at 15:03
3

@robertotomás yes, everyone uses dd, but I have never understood why. In fact, I think I've only ever used it to read an MBR or similar fringe tasks. In my experience, other tools are faster, simpler and safer for the vast majority of cases where people use dd. I think this is one of those cases where common != optimal, like sudo su or cat file | grep foo. – terdon Mar 12 '16 at 17:48
@terdon the reason people use dd for disk images is because they've seen it used in tutorials and such and assume there's some reason behind it. The original reason for using it was a bug in an early version of GNU cp (which could otherwise be used) that caused it to screw up in writing all-zero blocks. I've written about this: http://unix.stackexchange.com/a/189091/6290 – Random832 Mar 13 '16 at 05:17
The proper non-dd way of doing this specific task would be head -c 1048576 /dev/zero > file – Random832 Mar 13 '16 at 05:19
@terdon It's mostly fs-agnostic and always works. But it's more sane to use it when you really want to copy byte-by-byte Images (isol for example. It's fast if you give proper values for bs= – ljrk Mar 13 '16 at 19:05

Lambert · Answer 2 · 2016-03-11T15:28:05.273

22

You can create a large file on Solaris using:

mkfile 10g /path/to/file

Another way which works on Solaris (and Linux):

truncate -s 10g /path/to file

It is also possible to use:

dd if=/dev/zero of=/path/to/file bs=1048576 count=10240

edited Mar 11 '16 at 15:28

answered Mar 11 '16 at 15:10

Lambert

12,680

dd one is like a terabyte – 123 Mar 11 '16 at 15:23
1

Define "a large file" :-) But I edited since the other samples all state 10g... – Lambert Mar 11 '16 at 15:27
truncate Can't execute 'truncate'. No such file or directory Truncate seems to be Linux only. – schily Mar 11 '16 at 17:19
truncate does exist on Solaris 11.2+ – Lambert Mar 11 '16 at 19:41

score 14 · Answer 3 · answered Mar 11 '16 at 18:04

The fastest way possible to create a big file in a Linux system is fallocate:

sudo fallocate -l 2G bigfile

fallocate manipulates the files system, and does not actually writes to the data sectors by default, and as such is extremely fast. The downside it is that it has to be run as root.

Running it successively in a loop, you can fill the biggest of filesystems in a matter of seconds.

From man fallocate

fallocate is used to manipulate the allocated disk space for a file, either to deallocate or preallocate it.
For filesystems which support the fallocate system call, preallocation is done quickly by allocating blocks and marking them as uninitialized, requiring no IO to the data blocks. This is much faster than creating a file by filling it with zeros.
Supported for XFS (since Linux 2.6.38), ext4 (since Linux 3.0), Btrfs (since Linux 3.7) and tmpfs (since Linux 3.5).

This should be the accepted answer. Easy and fast. – port5432 Mar 13 '16 at 20:54 — port5432, Mar 13 '16 at 20:54

Questionmark · Answer 4 · 2016-03-11T15:14:48.993

10

This will keep going until you CTRL-C:

yes This is stuff that I want to put into my file... >> dummy.txt

Be careful though, because you can get a hundreds of thousands of lines/second...

From man yes:

yes - output a string repeatedly until killed

edited Mar 11 '16 at 15:14

answered Mar 11 '16 at 15:08

Questionmark

3,945

This is a very easy method to create a big file in linux environment. – Chaminda Bandara Mar 16 '19 at 09:19
1

yes $BIG_STRING | head -c $TARGET_SIZE >> dummy.txt would let you get precise amount. (-n $TARGET_NUMBER_OF_LINES). yes would automatically die as result of a 'broken pipe' when head terminates because the target number has been reached. – PypeBros Jul 18 '19 at 07:46

score 5 · Answer 5 · answered Mar 11 '16 at 15:09

5

If I understand you correctly, you are looking for something like:

echo "test line" > file;
for i in {1..21}; do echo "test line" >> file; done

That will create a file with 22 repetitions of "test line". If you want a specific file size, you can use something like this (on Linux). 1024 is one kilobyte:

while [ $(stat -c "%s" file) -le 1024 ]; do echo "test line" >> file; done

Personally, when I want to create a large file, I use two files and cat one into the other. You can repeat the process until you reach the desired size (1MB here):

echo "test line" > file;
while [ $(stat -c "%s" file) -le 1048576 ]; do 
    cat file >> newfile
    cat newfile >> file
done

Note that this solution will often exceed the desired size because if the file is under the limit, everything will be catted into it again.

Finally, if all you want is a file of the desired size and don't need it to actually contain anything, you ca use truncate:

truncate -s 1M file

answered Mar 11 '16 at 15:09

terdon

242,166

1

Does cating the file actually have any advantage to just appending though ? It would seem as though it would take longer as it has to fork two processes every loop and also move the entire contents multiple times. – 123 Mar 11 '16 at 15:14
1

@123 speed. The cat approach is much, much faster. It only makes sense for creating huge files but that created a 545M file in 10 seconds on my machine. The same while loop with echo "test line" >> file created a 96K file in the same amount of time. – terdon Mar 11 '16 at 17:15
I guess the thing with the "cat" approach is that it grows exponentially. On starting the second iteration, 'newfile' already has 1 line and 'file' has 2, and when it is done, 'newfile' is now 3 lines and 'file' is 5. Next, 'newfile' will be 8 and 'file' will be 13. Next (21, 34), etc. – PypeBros Jul 18 '19 at 07:35
downside: it may take more disk space (>= 1.5 * desired_size) than target file size while it is creating the file. – PypeBros Jul 18 '19 at 07:41
btw. If you have truncate around, you can truncate -s 1G to create the file in first place. https://unix.stackexchange.com/a/269184/85549. You could replace it by a head -c $DESIRED_SIZE, possibly within the while loop. – PypeBros Jul 18 '19 at 07:43

score 4 · Answer 6 · answered Mar 12 '16 at 13:29

4

By piping the contents of /dev/urandom to head you can redirect the output to a file, so :

 cat /dev/urandom | head --bytes=100 >> foo.bar

Will give you a file with 100 bytes of garbage.

answered Mar 12 '16 at 13:29

spender

141

Would /dev/zero be faster? – falsePockets Sep 30 '22 at 01:20

score 1 · Answer 7 · edited Mar 12 '16 at 16:12

1

echo "This is just a sample line appended  to create a big file. " > dummy.txt
i=1
while [ $i -le 21 ]
do
  cat dummy.txt >> bigfile
  cat bigfile > dummy.txt
  (( i++ ))
done

same effect of your windows script but in bash, you can not concatenate a file to itself, directly.

edited Mar 12 '16 at 16:12

ott--

856

answered Mar 11 '16 at 14:59

MelBurslan

6,966

Apart from forgetting the .txt extension, you're leaving 2 big files at the end. – ott-- Mar 12 '16 at 13:49

How to create a large file in UNIX?

7 Answers7