why are some files in Linux binary ? For example the /var/log/wtmp log? More precisely why is the log in binary form.
2 Answers
The wtmp
(and utmp
) files date back into the 1970s, and the designers did not give a lot of reasons. What you can see is that utmp
and wtmp
record accounting information using a fixed-length record, A text logfile would have used more space on the disk and taken more time to format a message than just writing a binary record.
Further reading (the Unix 6th edition manual pages):
Also the 1st edition (no wtmp
there):

- 76,765
There are some benefits to "fixed sized records". So the historical structure for a utmp/wtmp record was
struct utmp {
char ut_line[8]; /* tty name */
char ut_name[8]; /* user id */
char ut_host[16]; /* host name, if remote */
long ut_time; /* time on */
};
(That was from a SunOS 4 machine; I just picked that because it's an easy entry to read).
This makes it easy to append data to a file and to display it in reverse order, simply by skipping backwards sizeof(struct utmp)
bytes. This makes it easy for programs like last
to report in reverse order.
The exact data structure has changed over time, but the record is still fixed in size.
eg a current FreeBSD machine has:
struct utmpx {
short ut_type; /* Type of entry. */
struct timeval ut_tv; /* Time entry was made. */
char ut_id[8]; /* Record identifier. */
pid_t ut_pid; /* Process ID. */
char ut_user[32]; /* User login name. */
char ut_line[16]; /* Device name. */
#if __BSD_VISIBLE
char ut_host[128]; /* Remote hostname. */
#else
char __ut_host[128];
#endif
char __ut_spare[64];
};
Another advantage for utmp
and lastlog
is the ability to have sparse files.
For example, with lastlog
(which the finger
command uses to display last login time) the data is stored at an offset based on uid
* sizeof(struct lastlog)
. So you can quickly and easily find the last login time for uid 12345678 by seeking to the calculated position.
With text files these benefits don't exist; each record is variable width or has to be padded. The results are larger, harder to deal with, may require parsing (long ut_time
is easier to handle than trying to parse an ASCII date string).
ASCII is great for humans and for data that humans may need to manipulate. Binary is (sometimes) better for programs, especially for raw data that humans don't necessarily need to see.

- 44,540