0

I have compiled emacs from source on Linux Mint 21 Cinnamon getting in the [src] folder after compilation four identical executables (see the question Why are there four identical executables generated while compiling from source? asking why four?):

-rwxrwxr-x 1 neo neo 24566712 Mar 14 16:45 bootstrap-emacs
-rwxrwxr-x 2 neo neo 24566712 Mar 14 16:45 emacs
-rwxrwxr-x 2 neo neo 24566712 Mar 14 16:45 emacs-28.2.1
-rwxrwxr-x 1 neo neo 24566712 Mar 14 16:45 temacs

wondering comparing it for example to the SciTE text editor executable files including the lexer and scintilla libraries which are total 8 MByte in size, how does it come that the binary is that large (about 24 MByte)?

What makes the emacs executable so large while at the same time having most of the editing functionality delegated to compiled elisp files?

Claudio
  • 410
  • 2
  • 11
  • 1
    Make sure to compare files in the same condition, on my system the ones in the build directory contain the debug information that is stripped from the installed one, the size is 33MB for the former and 6MB for the latter – matteol Mar 17 '23 at 07:05
  • Is 24MB large? – shynur Mar 17 '23 at 13:48
  • @Shynur : OK ... Mr. Spock from the Enterprise would say "It is nor large, nor small, nor not as expected". It is 24 MB :) . – Claudio Mar 17 '23 at 13:59
  • 1
    My `bin/emacs.exe` (emacs-28.2 for Windows 64-bit) is `7,455,947 B`. I downloaded it from the official website. – shynur Mar 17 '23 at 16:44
  • @Shynur see my answer to the question for the actual size of the self-built 28.2.1 executable on Linux after stripping symbol and debug information from it (6,349,376 B). Try to build yours yourself to see if you can reduce the size. If you can't, maybe on Windows it includes part of the GTK library emacs is based upon what would then explain why your Windows version is 1 MByte larger than the Linux one. – Claudio Mar 17 '23 at 17:48

2 Answers2

0

In order to do anything with those elisp files Emacs needs to have an elisp interpreter embedded in its executable. Additionally a large part of the interpreter is written in elisp, which is then "dumped" into the final executable. This would be akin to statically linking libraries into an executable.

nega
  • 3,091
  • 15
  • 21
  • SciTE has the lua interpreter built in not requiring lua to be installed, so the interpreter alone can't explain the difference in size. Is elisp compiled code so much larger than same functionality written in C/C++? Both SciTE and emacs are build upon Tk using the same approach to GUI-elements, right? – Claudio Mar 16 '23 at 18:10
  • Dumping elisp commands into the binary? What sense does it make? It limits then the flexibility ... How can I determine from the source code which .el files are dumped into the executable? I am coming from Python ... wondering that there is no package.class hierarchy in the code. So Lisp 'require *forgets* where it has got its methods from after importing them? How to avoid conflcts in naming if it is the case? – Claudio Mar 16 '23 at 18:18
  • 1
    It makes as much sense as statically linking a C/C++ library into an executable. Or embedding data into an executable. Also, don't conflate the language that you use with the implementation of that language. – nega Mar 16 '23 at 18:54
  • Notwithstanding that the dump process is nowadays different and actually involves a separate "portable dump" file, there's no lack of sense or flexibility. Everything loaded from the dump file can still be manipulated after the fact, and there are extremely good reasons for doing it: https://emacs.stackexchange.com/a/16521 (which is old enough to refer to the old dump mechanism, but the gist is the same). – phils Mar 17 '23 at 14:29
  • 1
    And for clarity, the elisp libraries which are loaded and dumped are not "part of the interpreter". The elisp interpreters are written in C. – phils Mar 17 '23 at 14:44
  • 1
    And you "avoid conflicts in naming" by using reliably non-conflicting names. In third-party code they should typically be prefixed with the suitably-unique name of the library which implements them. Think of that as fully-qualified names, if you like, but there's no system of name-spacing here (and there are some very strong benefits to guaranteed uniqueness of names, so I consider this a good thing personally). – phils Mar 17 '23 at 14:53
  • I appreciate much your helpful comments and efforts to respond to my question, even if your answer is a bit misleading suggesting that dumping the elisp libraries into the executable is the reason for the experienced large size (it isn't - see my answer for the actual reason). – Claudio Mar 17 '23 at 17:23
  • @phils yes, thank you for clarifying. i was speaking in overly broad strokes – nega Mar 17 '23 at 17:58
0

I suggest as first step to understanding where the large size you have experienced and are wondering about comes from to check out:

https://unix.stackexchange.com/questions/2969/what-are-stripped-and-not-stripped-executables-in-unix

The extremely large (24 MByte) size of the executable you are wondering about is caused by debugging information inserted into the executable by the compiler and by the fact the the executable is not stripped and containing symbols. According to what is stated in GNU emacs manual, the standard configuration of the source code package available for download specifies the -g flag for the gcc compiler to include debug information.

Check it out yourself using the file command to see if the executable contains debugging information and is a stripped one:

~ $ file emacs-28.2.1
ELF 64-bit [...skipped some info...], with debug_info, not stripped

In other words if you compare sizes you should first make sure that the compared executable files are without debugging information and stripped.

See below the effect of applying strip which removes symbols and debug information from an executable to tmacs, one of the identical emacs executable files you mention in your question:

~ $ strip temacs
~ $ 
~ $ ls -il *emacs*
4873475 -rwxrwxr-x 1 neo neo 24566712 Mar 14 16:45 bootstrap-emacs
4873479 -rwxrwxr-x 2 neo neo 24566712 Mar 14 16:45 emacs
4873479 -rwxrwxr-x 2 neo neo 24566712 Mar 14 16:45 emacs-28.2.1
4873473 -rwxrwxr-x 1 neo neo  6349376 Mar 17 17:37 temacs

As you can see above emacs and emacs-28.2.1 occupy the same storage space in the file system (have the same inode) and the stripped temacs shrinked from 24 MByte to 6 MByte.

P.S. thanks goes to matteol which helpful comment to my question put me on the right track and made this answer possible.

Claudio
  • 410
  • 2
  • 11
  • [2.3 How do I use a debugger on Emacs?](https://www.gnu.org/software/emacs/manual/html_mono/efaq-w32.html#Debugging): "By default, **Emacs is compiled with debugging on**, and optimizations enabled." It says MS-Windows, but the same is true of GNU/Linux. But considering that mine is just 7MB (with debugging on), why your original one is so much larger than mine? They are too different in size. – shynur Mar 17 '23 at 17:23
  • Notice that temacs is only another name for the same binary executable as emacs. If I were stripping emacs the result would be the same as for tmacs. In other words my version of emacs binary is by 1 MByte smaller than yours. See my comment to the question for the possible reason. – Claudio Mar 17 '23 at 18:15
  • `temacs` is the compiled executable *before* the dump process takes place, and so traditionally there was a *very* substantial difference between that and the subsequent dumped `emacs` executable (the latter containing a ton of lisp). You can still compile Emacs using the old dump system in principle, but the reason for the new portable dumper is that glibc dropped support for dumping, so you can only do this on systems with an older glibc. – phils Mar 18 '23 at 00:24
  • Myself, I've never looked closely at the portable dump mechanisms, and genuinely wasn't expecting `temacs` and `emacs` to now be identical under that system, but I do see the same thing in my current build of Emacs 29, so I guess it must have been this way since portable dumping was introduced. – phils Mar 18 '23 at 00:28
  • It does seem like a vestige from the old days, before the dawn of time^H^H^H^Hthe portable dumper. In principle the build system could be revised to eliminate the duplication when the build is configured for the portable dumper, but in practice the extra complexity might not be worth it. – db48x Mar 18 '23 at 02:28