Natural Language Processing

My primary project is the Classical Language Toolkit (CLTK), a decentralized platform for natural language processing (NLP) for the languages of Ancient, Classical, and Medieval Eurasia. I am largely responsible for the project's organization of developers and the quality of its NLP. See here for materials from my public talks on the CLTK.


Outside of my professional research, most of my current academic work entails programming, lecturing, and writing about the union of NLP and philology (a predecessor discipline to linguistics). Among Classicists, I advocate for the vital importance of free, open, and decentralized NLP for advanced research into the interconnected ancient world. To NLP specialists, I make the case that philology offers compelling models to understanding human thought in semantically rich documents.

While in academia, I published a few articles and wrote a dissertation. My dissertation is a network-theoretical study of Julius Caesar's organization and leadership of the Roman army. Of my publications, two of the more interesting are one on what comic book art has to offer the study of literature (from Oxford University Press) and a short piece on Etruscan medicine, which to my utter surprise took on a life of its own as a foundation for contemporary pharmaceutical research (example).

For the Pema Ts'al Orthographic System (for which I am the lead developer) I have customized fonts and a keyboard which introduce several new punctuation characters to Tibetan orthography, in order to aid beginners in reading the language.

Recent posts


I born and raised in the State of Washington. I now reside in the Bay Area, where I work as a research scientist specializing in NLP and machine learning. My formal education was in Classics (BA, Reed College; PhD, NYU).