1

MacOS Yosemite (10.10.5). I know this is the Unix/Linux section...but figure this question probably fits better here than in MacOS land.

My terminal started hanging on startup before showing a prompt...and the CPU usage spiked at the same time. I can CTRL-C and then get a prompt (presumably exiting some hanging/running .bashrc/.profile/etc).

I figured out quickly that certain lines in my .bashrc were causing the hang. This is new (ie I didn't change anything in my .bashrc and everything used to work fine), so something changed on the system.

It seems piping certain longer strings results in the hang/CPU spike.

I can reproduce this by piping a string to tr -d '\n' and seeing if it hangs.

macattack:~ $ openssl rand -base64 93  | tr -d '\n'
eDsz4JqFX/HAVjplNI6WDWwPRp9l9snp6UKp/pLn+GbBvJx0+ZMvSJFS/SuCwjMRRXVXfUvBdkaH1R0UgCr2UOf283MvHVTRusLFEVPcGCIz1t3sFMU/3foRzNWVmattp@macattack:~ $ openssl rand -base64 94 | tr -d '\n'
^C
mattp@macattack:~ $ openssl rand -base64 94 | tr -du '\n'
^C

Seems like 93 characters is the magic number where tr starts to hang. openssl isn't hanging (ie if I remove the pipe to tr everything exits). However my original problem line happened to be different length.

mattp@macattack:~ $ echo 'echo -e "$TS\t${TERM_SESSION_ID}\t$(pwd)\t$(history 1 | cut -c 8-)\n" >> $HOME/.history/history-$(date "+%Y-%m-%d")-${USER}.log;' | tr -d '\n'
^C-bash: echo: write error: Interrupted system call

mattp@macattack:~ $ echo 'echo -e "$TS\t${TERM_SESSION_ID}\t$(pwd)\t$(history 1 | cut -c 8-)\n" >> $HOME/.history/history-$(date "+%Y-%m-%d")-${USER}.log' | tr -d '\n'

mattp@macattack:~ $ echo 'echo -e "$TS\t${TERM_SESSION_ID}\t$(pwd)\t$(history 1 | cut -c 8-)\n" >> $HOME/.history/history-$(date "+%Y-%m-%d")-${USER}.log' | wc -c
     128
mattp@macattack:~ $

This is probably a pipe issue rather than a tr issue. I can reproduce the same issue with sed (command doesn't make sense...just illustrates hang).

mattp@macattack:~ $ echo 'echo -e "$TS\t${TERM_SESSION_ID}\t$(pwd)\t$(history 1 | cut -c 8-)\n" >> $HOME/.history/history-$(date "+%Y-%m-%d")-${USER}.log;'  | sed 's/\n/ /g'
^C-bash: echo: write error: Interrupted system call

mattp@macattack:~ $ echo 'echo -e "$TS\t${TERM_SESSION_ID}\t$(pwd)\t$(history 1 | cut -c 8-)\n" >> $HOME/.history/history-$(date "+%Y-%m-%d")-${USER}.log'  | sed 's/\n/ /g'
echo -e "$TS\t${TERM_SESSION_ID}\t$(pwd)\t$(history 1 | cut -c 8-)\n" >> $HOME/.history/history-$(date "+%Y-%m-%d")-${USER}.log
mattp@macattack:~

I have run out of ideas to troubleshoot this.
The hanging commands run fine on a random centos linux server. The commands ran fine on macos until recently. I've never run into pipe hanging before. I thought maybe it was strange characters in the input causing an issue...but the openssl random string shows otherwise. The ulimits are the same as on another mac that does NOT have this same issue.

mattp@macattack:~ $ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 7168
pipe size            (512 bytes, -p) 1
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 709
virtual memory          (kbytes, -v) unlimited

Using dtruss on tr it seems to hang on read_nocancel call.

Update

Making progress. Found a comment around hanging and pipe buffer sizes. Stole a test script from here: How big is the pipe buffer?

Running while the problem is happening shows a pipe buffer of 128 bytes. Reboot (problem temporarily goes away) and the pipe buffer is 65536 bytes. See test output below.

So now the question is, why/how is "something" reducing the pipe buffer size on the system.

With problem

$ /bin/bash -c 'for p in {0..18}; do pipe-buffer-test.sh $((2 ** $p)) 0.5; done'
write size:          1; bytes successfully before error: 128
write size:          2; bytes successfully before error: 128
write size:          4; bytes successfully before error: 128
write size:          8; bytes successfully before error: 128
write size:         16; bytes successfully before error: 128
write size:         32; bytes successfully before error: 128
write size:         64; bytes successfully before error: 128
write size:        128; bytes successfully before error: 128
write size:        256; bytes successfully before error: 0
write size:        512; bytes successfully before error: 0
write size:       1024; bytes successfully before error: 0
write size:       2048; bytes successfully before error: 0
write size:       4096; bytes successfully before error: 0
write size:       8192; bytes successfully before error: 0
write size:      16384; bytes successfully before error: 0
write size:      32768; bytes successfully before error: 0
write size:      65536; bytes successfully before error: 0
write size:     131072; bytes successfully before error: 0
write size:     262144; bytes successfully before error: 0

After reboot (problem temporarily gone)

$ /bin/bash -c 'for p in {0..18}; do pipe-buffer-test.sh $((2 ** $p)) 0.5; done'
write size:          1; bytes successfully before error: 65536
write size:          2; bytes successfully before error: 65536
write size:          4; bytes successfully before error: 65536
write size:          8; bytes successfully before error: 65536
write size:         16; bytes successfully before error: 65536
write size:         32; bytes successfully before error: 65536
write size:         64; bytes successfully before error: 65536
write size:        128; bytes successfully before error: 65536
write size:        256; bytes successfully before error: 65536
write size:        512; bytes successfully before error: 65536
write size:       1024; bytes successfully before error: 65536
write size:       2048; bytes successfully before error: 65536
write size:       4096; bytes successfully before error: 65536
write size:       8192; bytes successfully before error: 65536
write size:      16384; bytes successfully before error: 65536
write size:      32768; bytes successfully before error: 65536
write size:      65536; bytes successfully before error: 65536
write size:     131072; bytes successfully before error: 0
write size:     262144; bytes successfully before error: 0
mattpr
  • 191
  • 1
  • 7
  • I can't reproduce this in Snow Leopard or El Capitan. I suspect something specific to your environment. – Barmar May 22 '17 at 20:19
  • Why do you think it's hanging in the fstat64 call? The trace shows that the call successfully returned 0. It looks like it's hanging in read_nocancel(), it's the one that got interrupted by the signal from Ctl-C. – Barmar May 22 '17 at 20:21
  • It seems like some kind of pipe deadlock. The writing process is blocking because the pipe is full, but for some reason the reading process isn't reading what's in the pipe. – Barmar May 22 '17 at 20:28
  • Do you have the same problem if you use a different shell, such as ksh or zsh? – Barmar May 22 '17 at 20:29
  • I don't understand: why the tr -d '\n' ?? This will just cause some problems with some commands (wc, for exemple, will not count the last line if it is not "\n" terminated, as wc counts words, lines, etc, in unix files, and a file in unix have to have a terminating newline). You may be on a red herring? Try just to add : set -x at the beginning of the .bashrc and see if (and where) it hangs in there? – Olivier Dulac May 23 '17 at 08:43
  • @Barmar - good point. I edited my question. – mattpr May 23 '17 at 08:46
  • @OlivierDulac pretty sure it is a pipe issue, not an issue with tr (can reproduce piping to sed as well. set -x doesn't reveal anything new but was useful in the beginning for figuring out which lines it was hanging on (pipes with longer length input). Pipes with very small amounts of data seem okay. – mattpr May 23 '17 at 08:48
  • I also discovered that a reboot temporarily fixes the issue and then it comes back after a few hours. So some periodic process (cron or other) probably causes this...but will be much easier to figure out what is causing this after I understand what is actually broken/happening. – mattpr May 23 '17 at 08:50
  • adding -d to dtruss may help too, as it shows relative timestamps, allowing you to see which syscall took longer – Olivier Dulac May 23 '17 at 08:52
  • @Barmar - Any ideas around troubleshooting the pipe deadlock issue or things that can cause it? At this point it is more of an educational issue for me...and that I would rather not clean install out of laziness. – mattpr May 23 '17 at 08:52
  • @OlivierDulac added dtruss with -d... – mattpr May 23 '17 at 08:55
  • @Barmar - ksh also hangs. zsh doesn't. – mattpr May 23 '17 at 09:01
  • the dtruss differ: near the end, the one KO does 9279/0x584a6: 4901052 read_nocancel(0x0, "\0", 0x4000) = -1 Err#4 while the ok one does : 202817 read_nocancel(0x0, "echo -e \"$TS\t${TERM_SESSION_ID}\t$(pwd)\t$(history 1 | cut -c 8-)\n\" >> $HOME/.history/history-$(date \"+%Y-%m-%d\")-${USER}.log;\n$\b\0", 0x4000) = 125 0 (ie, the KO one receives just a "Null" instead of the echo string) : you may want to dtruss the command on the left of the pipe to compare those as well? – Olivier Dulac May 23 '17 at 09:10
  • The fact that rebooting clears it up temporarily makes me think that something is leaking kernel buffers. I'm not sure what to look for, though. – Barmar May 23 '17 at 14:31

1 Answers1

0

Based on @Barmar's comment about leaking kernel buffers I took a look at the current non-OS kexts. I realized there was a relatively new one from a recent install of BlockBlock (https://objective-see.com/products/blockblock.html).

Uninstalled BlockBlock, restarted and the problem has not recurred. So BlockBlock was the culprit in this case and I reported the issue to the author.

However this isn't particularly satisfying since I took a mostly guess-and-check approach to figuring out the cause and to be honest I don't really understand the root cause (in terms of the OS) which means I am none-the-wiser for troubleshooting this kind of problem in the future.

If someone comes across this and can explain what was happening in more detail and provide a troubleshooting approach, it would be a much better answer than "uninstall BlockBlock".

mattpr
  • 191
  • 1
  • 7