10,000 most frequent lemmata in Greek and Latin canons

This is a followup to a previous post, which got more attention than I anticipated, of 10,000 most frequent words in Greek and Latin canon.

The difference with these latest versions is that the inflected occurances were lemmatized, then counted. This was made possible by the CLTK’s latest release, which now offers good lemmatization for Greek and Latin. You can see the lemmatizer in action in the notebook which generated the following two files.

Greek: The 10,000 most common Greek words, grouped by lemma in the Classical Greek canon.

Latin: The 10,000 most common Latin words, grouped by lemma in the Classical Latin canon.

Similar posts