26

I am trying to send messages from kafka-console-producer.sh, which is

#!/bin/bash
if [ "x$KAFKA_HEAP_OPTS" = "x" ]; then
    export KAFKA_HEAP_OPTS="-Xmx512M"
fi
exec $(dirname $0)/kafka-run-class.sh kafka.tools.ConsoleProducer "$@"

I am pasting messages then via Putty terminal. On receive side I see messages truncated approximately to 4096 bytes. I don't see anywhere in Kafka, that this limit is set.

Can this limit be from bash/terminal or Putty?

ilkkachu
  • 138,973
Dims
  • 3,255
  • 1
    P.S. Using exec is rarely needed when running a program from a script. – Barmar Apr 07 '21 at 14:38
  • 1
    @Barmar in this case it looks like it is being used to replace the calling script process, but since it is at the end it's more or less unnecessary – cat Apr 07 '21 at 15:47
  • @cat Unless it's in conditional code, it should always be at the end, so it's unnecessary (some shells automatically replace the calling process when executing the last command, so it's truly redundant). Hence my "rarely" qualification -- I think most of the uses I see are cargo-cultish. – Barmar Apr 07 '21 at 16:17
  • Stack Overflow has a similar question providing the same sort of answer: https://stackoverflow.com/questions/18015137/linux-terminal-input-reading-user-input-from-terminal-truncating-lines-at-4095 – FooF Apr 08 '21 at 09:35
  • @Barmar This is the first time I hear of the "tail exec" optimization implemented by "some shells". Which shells implement this optimization making the exec redundant? – FooF Apr 08 '21 at 09:42
  • 1
    @FooF https://unix.stackexchange.com/questions/466496/why-is-there-no-apparent-clone-or-fork-in-simple-bash-command-and-how-its-done – Barmar Apr 08 '21 at 14:48

4 Answers4

38

4095 is the limit of the tty line discipline internal editor length on Linux. From the termios(3) man page:

  • The maximum line length is 4096 chars (including the terminating newline character); lines longer than 4096 chars are truncated. After 4095 characters, input processing (e.g., ISIG and ECHO* processing) continues, but any input data after 4095 characters up to (but not including) any terminating newline is discarded. This ensures that the terminal can always receive more input until at least one line can be read.

See also the corresponding code in the Linux kernel.

For instance, if you enter:

$ wc -cEnter

Enter in the shell's own line editor (readline in the case of bash) submits the line to the shell. As the command line is complete, the shell is ready to execute it, so it leaves its own line editor, puts the terminal device back in canonical (aka cooked) mode, which enables that crude line editor (actually implemented in tty driver in the kernel).

Then, if you paste a 5000 byte line, press Ctrl+D to submit that line, and once again to tell wc you're done, you'll see 4095 as output.

(Note that that limit does not apply to bash's own line editor, you'll see you can paste a lot more data at the prompt of the bash shell).

So if your receiving application reads lines of input from its stdin and its stdin is a terminal device and that application doesn't implement its own line editor (like bash does) and doesn't change the input mode, you won't be able to enter lines longer than 4096 bytes (including the terminating newline character).

You could however disable the line editor of the terminal device (with stty -icanon) before you start that receiving application so it reads input directly as you enter it. But then you won't be able to use Backspace / Ctrl + W for instance to edit input nor Ctrl + D to end the input.

If you enter:

$ saved=$(stty -g); stty -icanon icrnl; head -n1 | wc -c; stty "$saved"Enter

paste your 5000 byte long line and press Enter, you'll see 5001.

  • Another perfect answer, by both its clarity and explanation of the context (and insights given). Is there any way to receive notification of that truncation on stderr? (or some other way of knowing if chars were discarded?) – Olivier Dulac Apr 08 '21 at 01:36
  • 2
    @OlivierDulac. Thanks. At that point, it's just the terminal or terminal emulator (or other master side of a pty like expect, sshd, etc.) talking to the kernel. There's no process (and their stderr) involved. All the kernel could do is send BEL characters for instance back to the terminal to alert the user, but it doesn't (I've added a link to the kernel code) and I suppose it could backfire if it did by locking up communication with the terminal. You'll see it consumes but discards the excess input until the next newline to avoid deadlock situations. – Stéphane Chazelas Apr 08 '21 at 06:56
  • What is the point of stty icrnl (along putting the terminal in noncanonical mode) in this case? To support exotic terminal emulators running on non-POSIXy machines? – FooF Apr 09 '21 at 15:45
  • 1
    @FooF, no, most terminals send CR upon Enter. That converts it to LF. icrnl is generally on by default, so would not be needed. It serves as a reminder that -icanon is not the same as raw (which disables icrnl, isig...) so you can still enter lines without having to press Ctrl+J (which contrary to Enter send ^J aka LF) to delimit them or press Ctrl+C to interrupt the command for instance. – Stéphane Chazelas Apr 09 '21 at 15:51
  • I've been using cat > file to paste the content of large text files in the terminal. Nice to know this can fail with unusually long lines. – Alex Jasmin Feb 15 '22 at 17:54
3

As mentioned in Stéphane Chazelas's answer, the terminal driver's input editing buffer has a limited size.

Instead of pasting into the terminal, you could redirect the output of kafka-console-producer.sh to a file:

kafka-console-producer.sh > kafka.out

Then upload the file to the server, and use it as the input to whatever program you were pasting input to.

some-program < kafka.out
Barmar
  • 9,927
  • I think in this specific scenario ("messages" by a producer script, and the name "kafka" hints at event processing) this approach does not fly very far. Rather the approach to take would be to set terminal to noncanonical mode before pasting messages through terminal to it (or make this in the script instead, removing "exec" and restoring terminal to canonical mode after it terminated). – FooF Apr 08 '21 at 09:55
  • @FooF I'm not familiar with kafka, so I didn't know that it's an event system. – Barmar Apr 08 '21 at 14:49
0

Yes, there is a limit on the command line length, or more exactly on the length of the arguments passed to execve. See also "man execve".

A long time ago, this limit was 128kB. In modern kernels, it's much higher.

So your truncation to 4096 is not related to that.

Actually, the command line is never silently truncated. If the arguments are too long for execve, the call will fail.

RalfFriedl
  • 8,981
  • 2
    IIRC Linux is an exception to others in that it has a limit for the length of a single command line argument, not just for the whole lot. And that limit was 128 kB the last I looked. I can't find the relevant Q&A on the site here right now. – ilkkachu Apr 06 '21 at 14:45
  • 1
    I don't think the question is about command line arguments, it's about a program reading its stdin. – Barmar Apr 07 '21 at 14:40
0

This is maybe off-topic as the question was why/where, not how. I am prompted to write this anyway, because I can currently find a wrong, already upvoted answer (resulting from a hasty reading of the question) suggesting to output the message to a file and then directing that file to the script.

Stéphane Chazelas answered very well the actual question, namely that the 4095 characters + new line limit on input length comes from the hard-coded Linux kernel limit for how the terminals work in canonical mode (the terminals typically are in canonical mode).

To further demonstrate this in the concrete setup (providing answer to the question of how), we can fix the kafka-console-producer.sh script to get rid of the limit as follows:

#!/bin/bash
if [ "x$KAFKA_HEAP_OPTS" = "x" ]; then
    export KAFKA_HEAP_OPTS="-Xmx512M"
fi

tty_orig=$(stty -g) stty -icanon $(dirname $0)/kafka-run-class.sh kafka.tools.ConsoleProducer "$@" stty "$tty_orig"

This way, much longer messages can be continuously pasted to the script without truncation at 4095 characters. You could also make a wrapper script that calls the original kafka-console-producer.sh script if this came from Kafka application suite and you would rather not edit it.

FooF
  • 655