19

I generally have a long-running instance of emacs --daemon, which I connect to with various emacsclients. Recently I have started experiencing random crashes. I say "random" because I haven't been able to figure out what causes the crashes. Often it happens e.g. when I leave my work computer over the weekend. However, I also remember at least one time when I started Emacs again after a crash, and it crashed again after a couple hours, during the same session.

I started an instance with emacs -Q (and then did M-x server-start) on my unused laptop, and it has not crashed yet. (Currently it has been: about a week.) One day I had need to use Emacs on that laptop, so I started up a separate session by doing emacs from a terminal window, and did not start the server. I left that session running, and when I came back a day or two later, I found that it had crashed, giving Segmentation fault. Thus, it would seem that this is not related to running Emacs in daemon mode.

I would do a binary search of my init file, but due to the randomness of the crashes, it would be a pain to go without much of the functionality I'm used to for an extended amount of time. Also, I wouldn't have a good way to know when I had fixed it, and the crashes weren't occuring.

I use Emacs 24.4, on Debian GNU/Linux 8, and on Ubuntu. (I have used Emacs on Windows before, but didn't use the --daemon, and didn't experience any crashes.)

How else can I debug these random crashes?

Edit: I get the same result when I launch Emacs and then do M-x server-start, and also when I use the Lucid toolkit.

I ran gdb emacs, and then at the (gdb) prompt did run, and this was the output:

Starting program: /usr/bin/emacs
[Thread debugging using libthread_db_enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Warning: Cannot convert string "-*-courier-medium-r-*-*-*-120-*-*-*-*-io8859-*" to type FontStruct

(Note that the last line also appears as terminal output when I launch Emacs outside of GDB.)

In Emacs, I did M-x server-start, and started work. It didn't crash all day, so I left it running over the weekend. When I got back today, I found Emacs unresponsive, and this additional output in GDB:

Program received signal SIGSEGV, Segmentation fault.
mark_object (arg=42366066) at alloc.c:5601
5601    alloc.c: No such file or directory.

After that line, control returned to the GDB prompt.

Edit 2: After building Emacs from source, I again ran it from GDB and did M-x server-start. Today I again found it unresponsive, and this was the output in GDB:

Starting program: [snip]/emacs-24.5/src/emacs 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffee188700 (LWP 20124)]
[New Thread 0x7fffe7df2700 (LWP 20125)]
[New Thread 0x7fffe75f1700 (LWP 20126)]

Program received signal SIGSEGV, Segmentation fault.
mark_object (arg=47086818) at alloc.c:5973
5973    {
Drew
  • 75,699
  • 9
  • 109
  • 225
Scott Weldon
  • 2,695
  • 1
  • 17
  • 31
  • 3
    I experienced the same. There is a warning that emacs (GTK) might crash when you launch with `--daemon` option. Not sure if it is related, you could try the lucid toolkit. I now just launch emacs with `(server-start)` in my init file and get the same client/server behavior. – Lompik May 11 '15 at 03:24
  • Good point. I had seen that warning, but thought it was only a problem when restarting my X session. I'll report back after I've done some testing with the Lucid toolkit. – Scott Weldon May 11 '15 at 03:34
  • Crashed again while running the Lucid toolkit. Unfortunately, it looks like this isn't the problem. I'll try out your other suggestion, and see if it makes a difference if I run Emacs normally and then do `M-x server-start`. – Scott Weldon May 13 '15 at 15:33
  • Just to make it clearer: Does it crashes as in "dumped core" / received segfault or does it freezes (and you have to kill emacs) ? – Lompik May 13 '15 at 16:08
  • It just dies quietly. All my `emacsclient` windows disappear, and immediately after that, anything like `ps -ef | grep emacs` returns empty. – Scott Weldon May 13 '15 at 16:19
  • 2
    there should be a trace of something like "emacs dumped core" in `sudo journalctl -xe` or even `dmesg | grep emacs`. My point is that is is possible to anlyse the core dump file and get a backtrace. In any case, you could launch emacs from gdb and get a backtrace of the last function called before the the crash. Basically, I would try to get as much info as possible to get rid of the randomness and make it reproducible. This might or not take some time but one it is, `M-x report-emacs-bug` with your finding. See http://git.savannah.gnu.org/cgit/emacs.git/tree/etc/DEBUG for details. – Lompik May 13 '15 at 16:57
  • `sudo journalctl -xe` returns `command not found` (I'm currently on Ubuntu). `dmesg | grep emacs` returns empty. Thanks for the info, I'll take a look at launching it from GDB. – Scott Weldon May 13 '15 at 17:03
  • FWIW, launching Emacs and then doing `M-x server-start` doesn't seem to help, as it just crashed again. Time to launch GDB. – Scott Weldon May 15 '15 at 15:26
  • Is possible for you compile emacs and/or ty other version. One simple one you'd have to check first : does it also happen with `emacs -Q` ? I'd say `M-x report-emacs-bug` and see with the experts. be sure to read line 10 : http://git.savannah.gnu.org/cgit/emacs.git/tree/etc/DEBUG#n10 – Lompik May 19 '15 at 15:04
  • Yes, I will try compiling Emacs myself. If that doesn't work, I'll probably launch with `emacs -Q` and then leave that running over the weekend and see if the crash happens. – Scott Weldon May 19 '15 at 15:26
  • 1
    http://www.reddit.com/r/emacs/comments/2ans0z/have_you_encountered_that_gtk_bug_in_daemon_mode/ https://bugs.launchpad.net/ubuntu/+source/emacs23/+bug/543611 – Lompik May 19 '15 at 17:39
  • @Lompik I've edited my question (again). – Scott Weldon Jun 01 '15 at 15:23
  • Any luck with `M-x report-emacs-bug` ? There should be plenty of information for them to help you. I am not able to reproduce this myself. – Lompik Jun 01 '15 at 19:27
  • Read [emacs-24/DEBUG](http://git.savannah.gnu.org/cgit/emacs.git/tree/etc/DEBUG?h=emacs-24) it explains the process the developers recommend for debugging emacs. If your emacs is unresponsive you could try attaching gdb to the process and inspect it's state `gdb --pid` with the emacs instance pid should get you more info. Also if you haven't tried it yet, try running `emacs --debug-init` and/or adding `(autoload 'debug "debug" "emacs debugger") (setq debug-on-error t)` to your `.emacs` file. – xmonk Sep 06 '16 at 17:39
  • 1
    FWIW (1) I have run Emacs versions=23-24 on Debian for several years without encountering this problem. (2) For most of that time I ran `emacs --debug-init &`, now I'm running `emacs --daemon --debug-init &` , but (3) ... latter can be problematic, see http://emacs.stackexchange.com/q/27376/5444 – TomRoche Sep 28 '16 at 02:22
  • My comment above: typo: `s/could it we/Could it be/`. Also did you try emacs24 (version 24.5) from jessie-backports repository?. – jue Jun 03 '17 at 10:35
  • 3
    A crash indicates an Emacs bug. Why don't you file a bug report so that Emacs maintainers take a look at it and work with you to understand it better and fix it? You can point to this SE question, but the report itself should include most of the relevant information that you've gathered. `M-x report-emacs-bug`. – Drew Oct 15 '17 at 20:12
  • My apologies for the delay, but thanks all for all the help and suggestions so far. I've had other priorities that have prevented me from looking at this, but I'll try to find the time to work on this again eventually. – Scott Weldon May 07 '20 at 17:52
  • @jue: Yes, multiple versions of Emacs, multiple computers (a desktop and a laptop), with and without `--daemon`, all crashed. I actually don't remember now if I have confirmed a crash with `-Q`, I'll check my notes and/or try it again. I've upgraded my Debian install by at least one release since this question, and I've even had a crash when running [this Docker image](https://github.com/Silex/docker-emacs/blob/master/25.3/ubuntu/18.04/Dockerfile). – Scott Weldon May 07 '20 at 18:02
  • And yes, I'm still getting these crashes 5 years (!) later. – Scott Weldon May 07 '20 at 18:04
  • @jue: I've had an instance of Emacs launched with `-Q` running since my last comment, and during that ~2 weeks it hasn't crashed, while my main instance has crashed at least 6 times, and the Docker instance at least 4. I just noticed that I didn't actually run `(server-start)` this time, but as I mentioned in my question, Emacs has crashed even without that, so I don't think that should make a difference. – Scott Weldon May 19 '20 at 17:03

1 Answers1

2

The way to debug crashes is to capture the crash in a debugger. The debugger gives you the tools you need to diagnose why the crash happened.

Specifically you could do is run the emacs server inside gdb. That is, instead of running it directly, run gdb emacs --daemon instead. Then, when it crashes it be kept alive by the debugger and you can diagnose the problem. You report doing this, but not whether you collected any information about the crash. For example, you didn't collect the stack, or information about local variables, etc. On the other hand, it looks like it crashed during garbage collection, which means that the real bug might have occurred much earlier. Corrupted or overwritten data structures can cause all kinds of bad behavior, including crashes, later on when a program tries to use those data structures. Thus it's not always easy to find out exactly what went wrong or how to fix it. The other disadvantage is that if you exit gdb, you lose all information about the crash.

Another option that wasn't available in 2015 when this question was originally asked is to record the emacs server with rr (https://rr-project.org/). Once it crashes you can replay the saved recording as often as necessary. You can set a data watchpoint on the corrupted data structure that caused the crash, and run emacs in reverse until something writes to that location. This frequently pinpoints the cause of the corruption; you'll be stopped right at it. Other times you might have to follow the chain of evidence further back to find the real problem. You can even pack up the recording along with all the other files necessary to replay it and send that recording to another developer who is better at debugging things, if you can't figure it out yourself.

This specific crash has probably been fixed some time in last five years, but there are probably other bugs waiting to be found and erradicated.

db48x
  • 15,741
  • 1
  • 19
  • 23
  • 1
    Thanks for the info! rr looks super useful! And no, this crash actually hasn't been fixed yet, haha. As mentioned above, I've had other priorities preventing me from working on this. Hopefully I can find the time dive into this at some point. – Scott Weldon May 07 '20 at 18:10