Cross–validates the accuracy of the CLTK's taggers, giving mean and standard deviation of each. This is a good check of the tagger's accuracy and demonstrates that the models are not overfit to the data.
Cross–validates the accuracy of the CLTK's taggers, giving mean and standard deviation of each. This is a good check of the tagger's accuracy and demonstrates that the models are not overfit to the data.
This explores from a very high level some averages of words per sentence in all of Ancient Greek literature and within several genres (e.g., history, romance, philosophy, epic, tragedy, comedy). This may not seem like much, though to my knowledge this is the first survey of its kind.
This notebook follows from "Greek authors' average words per sentence", looking instead at the PHI5 corpus in the Latin language. It offers basic table views sorted by words per sentence, total sentences, and total words. I also include a view limited to Roman historians.
A quick overview of part–of–speech (POS) tagging using the NLTK's Unigram machine learning algorithms combined with the CLTK's POS training set. Accuracies are evaluated, too.
A quick overview of part–of–speech (POS) tagging using the NLTK's Bigram machine learning algorithms combined with the CLTK's POS training set. Accuracies are evaluated, too.
A quick overview of part–of–speech (POS) tagging using the NLTK's Trigram machine learning algorithms combined with the CLTK's POS training set. Accuracies are evaluated, too.
Demonstrates use of a TnT tagger for Greek. The `evaluate()` function does not finish after many hours of computing (on my machine, at least), though it does work. See example of it in action.