7

As I understand from the manual (last paragraphs of http://www.gnu.org/software/emacs/manual/html_node/elisp/Creating-Hash.html) and the question https://stackoverflow.com/questions/11745097/ on stackoverflow, one can save a printed version of a hashtable on the disc in order to load it for later use.

For example the printed version of a hashtable created by

(setq ht (make-hash-table :test 'equal))
(puthash "orange" 1 ht)
(puthash "apple" 2 ht)

is as follows

#s(hash-table size 65 test equal rehash-size 1.5 rehash-threshold 0.8 data ("orange" 1 "apple" 2))

Is this printed version already the best format (for speed consideration) that Emacs can use ? Is there an special procedure to re-format (to byte-compile, to change) the above printed format to a better format (maybe only machine-readable) in order that Emacs loads this hashtable faster. If the answer is affirmative what are the ways to do it.

Name
  • 7,689
  • 4
  • 38
  • 84

2 Answers2

5

You're going to have to hash and insert every value no matter what, and unless you're dealing with enormous hash tables, the time spent shouldn't really matter. However, if your tables are large, then you should use the :size parameter to make-hash-table so no reallocations have to occur. When a hash table reaches the threshold, having to reallocate a new place in memory to put the values and rehashing all the current entries will be a big performance loss.

If you know you are about to insert 1 million entries into a hash table, use (make-hash-table :size 1000000)

Consider the following benchmark:

(benchmark 10
           '(let ((ht (make-hash-table :size 1000000)))
              (dotimes (n 1000000) (puthash n (1+ n) ht))
              ht))
"Elapsed time: 4.156233s (2.087411s in 10 GCs)"


(benchmark 10
           '(let ((ht (make-hash-table)))
              (dotimes (n 1000000) (puthash n (1+ n) ht))
              ht))
"Elapsed time: 10.276816s (7.713422s in 41 GCs)"

You can also define your own test and hash function for hash tables. If you know your keys are going to be in a specific set, you could potentially write faster equallity and hashing functions that exploit that. See: define-hash-table-test.

Jordon Biondo
  • 12,332
  • 2
  • 41
  • 62
  • Very interesting time comparison. Thank you. As you have demonstrated, setting the size of a hash-table can significantly influence its creation time. – Name Aug 10 '15 at 20:39
  • Let me however mention that in the original question, I have asked about the speed from a slightly different point of view. I have already created a large hash-table and I have already saved this hash-table on the disc (by print command). So I have a large file with whose content is like `#s(hash-table size 65 test equal rehash-size 1.5 rehash-threshold 0.8 data ("orange" 1 "apple" 2 ..............))`. I can load this hash-table. I was interested to know if this type of the file is the best format that Emacs can use in order to quickly load the table. – Name Aug 10 '15 at 21:19
  • So the emphasis is rather on the time of loading an already saved table on the disc than the time of creation for the first time. – Name Aug 10 '15 at 21:23
3

Yes, it is the best format (for speed consideration).

Stefan
  • 26,154
  • 3
  • 46
  • 84