Q: how do I split a complicated string when whitespace delimiters aren't discriminating enough?
Background
I'm working with BibTeX files. I want to split an author string of the form "first-name middle-name last-name" into its tokens.
Doing so is straightforward when each name is a single word because we can split by whitespace:
(split-string "Adam Smith") ; => ("Adam" "Smith")
Problem: double-barrelled names
The problem is that some names (usually last names, but not always) are multiple words separated by whitespace. BibTeX handles that by enclosing the name entities in curly braces, as with "Martin {Van Buren}". Here, a simple whitespace split gives the wrong answer
(split-string "Martin {Van Buren}") ; => ("Martin" "{Van" "Buren}")
Question: split complicated names?
So: how do I split complicated names that a) can have an arbitrary number of names, and b) each curly-braced name component can have an arbitrary number of whitespace-separated chunks?
Example desired output
Given a hypothetical function fancy-split
, I'm looking to
extract a list of name components:
(fancy-split "A Simple Name") ; => ("A" "Simple" "Name")
(fancy-split "Walter J. {Fancy Pants}") ; => ("Walter" "J." "Fancy Pants")
(fancy-split "Ed {Big D} {del Mar}") ; => ("Ed" "Big D" "del Mar")
(fancy-split "John {El Guapo} Smith") ; => ("John" "El Guapo" "Smith")