2

why are some files in Linux binary ? For example the /var/log/wtmp log? More precisely why is the log in binary form.

cyzczy
  • 366

2 Answers2

4

The wtmp (and utmp) files date back into the 1970s, and the designers did not give a lot of reasons. What you can see is that utmp and wtmp record accounting information using a fixed-length record, A text logfile would have used more space on the disk and taken more time to format a message than just writing a binary record.

Further reading (the Unix 6th edition manual pages):

Also the 1st edition (no wtmp there):

Thomas Dickey
  • 76,765
1

There are some benefits to "fixed sized records". So the historical structure for a utmp/wtmp record was

struct utmp {
    char    ut_line[8];             /* tty name */
    char    ut_name[8];             /* user id */
    char    ut_host[16];            /* host name, if remote */
    long    ut_time;                /* time on */
};

(That was from a SunOS 4 machine; I just picked that because it's an easy entry to read).

This makes it easy to append data to a file and to display it in reverse order, simply by skipping backwards sizeof(struct utmp) bytes. This makes it easy for programs like last to report in reverse order.

The exact data structure has changed over time, but the record is still fixed in size.

eg a current FreeBSD machine has:

struct utmpx {
    short           ut_type;        /* Type of entry. */
    struct timeval  ut_tv;          /* Time entry was made. */
    char            ut_id[8];       /* Record identifier. */
    pid_t           ut_pid;         /* Process ID. */
    char            ut_user[32];    /* User login name. */
    char            ut_line[16];    /* Device name. */
#if __BSD_VISIBLE
    char            ut_host[128];   /* Remote hostname. */
#else
    char            __ut_host[128];
#endif
    char            __ut_spare[64];
};

Another advantage for utmp and lastlog is the ability to have sparse files.

For example, with lastlog (which the finger command uses to display last login time) the data is stored at an offset based on uid * sizeof(struct lastlog). So you can quickly and easily find the last login time for uid 12345678 by seeking to the calculated position.

With text files these benefits don't exist; each record is variable width or has to be padded. The results are larger, harder to deal with, may require parsing (long ut_time is easier to handle than trying to parse an ASCII date string).

ASCII is great for humans and for data that humans may need to manipulate. Binary is (sometimes) better for programs, especially for raw data that humans don't necessarily need to see.