10,000 most frequent words in Greek and Latin canon

While working on the latest release for the CLTK, which now includes stopword builders, I discovered Python’s built-in Counter().most_common() method, which makes creating word frequency lists easy (Greek notebook here, Latin here). Using some helper methods in the CLTK (namely the PHI and TLG corpus filepath builders and text cleaners), this notebook lists the frequency of words in the TLG corpus.

While a student, the ability to make such lists would have been helpful in optimizing time spent studying vocabulary.

Greek: The 10,000 most common Greek words, in their inflected forms, in the Classical Greek canon.

Latin: The 10,000 most common Latin words, in their inflected forms, in the Classical Latin canon.

Similar posts