15

I want to tail -f a file, but its content is in sjis encoding, so I need to have it converted to the native (utf-8) encoding of my terminal.

When I do

tail -f x | iconv -fsjis

there will be no output. As

tail x | iconv -fsjis

does work, at first I thought it was a buffering issue, but trying unbuffer and stdbuf as described on Turn off buffering in pipe did not help.

In fact, even after more than 10k of data were added to x, there would be no output, so I guess it is not a buffering issue (buffer is 4k, if I'm not mistaken), but iconv will only start outputting when it receives an EOF.

So how can I tail-follow my sjis encoded file?

Evgeniy Berezovsky
  • 775
  • 1
  • 7
  • 20

2 Answers2

13

(take this with a pinch of salt) As far as I remember, the problem lies in the way libiconv works. Multi-byte encodings need a state machine to decode them, and libiconv prefers to receive entire characters, so you can't just give it half a character in one function call and the other half in the next.

I can think of another two solutions, one is a good out-of-band method, the other is an in-band hack.

Change Terminal Emulator encoding (out-of-band): one is to change the character encoding in your terminal emulator, so its native encoding is Shift JIS. I just checked konsole, and is supports this. From the menu, View→Character encoding→Japenese→sjis. You can then just tail -f the file, and konsole will take care of decoding the multibyte characters and matching them up to font glyphs.

Transcode terminal encoding on the fly (in-band; best): courtesy of Gilles, who reminded me of luit after a very long time. Use luit, which should have come with your XOrg distribution (on Debian, it's package x11-utils). Use it like this:

$ luit -encoding SJIS -- tail -f x

This will make the terminal transcode SJIS to/from your terminal encoding, and run tail -f x. The downside of luit is that it doesn't support the wealth of encodings supported by libiconv. The upside is it's available almost everywhere.

Transcode terminal encoding on the fly (in-band; hack): ttyconv is a hack I wrote many years ago (initially in C, later redone in Python) which uses libiconv to transcode terminal I/O. It spawns a new pseudoterminal and (a) transcodes the characters you type from your local encoding into the remote encoding, and (b) transcodes the characters you receive from the remote encoding to your local encoding. I used it to talk to servers that used encodings not supported by the standard Linux terminals. Please note that all of the remote encodings I tested it with were single-byte encodings, so I can't guarantee it'd work for Shift JIS. I don't often find call to use it these days, with most systems switching to Unicode.

This is how you would use it:

$ ttyconv -rsjis -- tail -f x

The downside of ttyconv is that I wrote it, no-one uses it but me, it's probably full of bugs. I excel at this. The upside is that it uses libiconv, so if your encoding is unusual, it's your best bet. At last count, ttyconv --list supports 100 encodings.

Alexios
  • 19,157
  • Awesome, thanks. out-of-band did not work for me (gnome-terminal, although it does allow you to change the encoding), but ttyconv works like a charm. – Evgeniy Berezovsky Apr 05 '12 at 01:23
  • 2
    These days, there's luit, part of the standard X11 utility suite, which is similar to your ttyconv. – Gilles 'SO- stop being evil' Apr 05 '12 at 01:31
  • @Gilles luit is similar, except that it works far better than mine. ;) Thanks! This is why I stopped using in the first place. In the 12 years since I managed to forget even the command name and I've been looking for it ever since. – Alexios Apr 05 '12 at 09:09
  • @Gilles luit works for me too. Why don't you make it an 'official' answer? It was part of my installation (debian), and thus is the easiest to use for me. – Evgeniy Berezovsky Apr 09 '12 at 01:12
  • 1
    I updated the answer to include luit as the best choice for SJIS. Sadly, it seems it doesn't support every encoding libiconv does. Looks like I still have to use my own solution for my own surreal purposes. :) – Alexios Apr 09 '12 at 09:01
1

Similar to ttyconv there's also tconv, written in C by Rich Felker.

See: Re: A call for fixing aterm/rxvt/etc...

nomur
  • 11