Important! This post contains incorrect accuracy scores. See http://cltk.org/blog/2015/08/02/corrected-stats-pos-tagger-accuracy.html for better information.

I have been playing around with the NLTK’s POS tagging, following along with Chapter 4 of Python Text Processing with NLTK 2.0 Cookbook by Jacob Perkins. (Note to self: Get the recently released Python 3 Text Processing with NLTK 3 Cookbook.) I have built POS training sets out of the data in the Perseus treebanks, which I am currently collecting in several CLTK repositories for Greek and Latin.

The following notebooks — Unigram POS tagging, Bigram POS tagging, and Trigram POS tagging — evaluate the accuracy of these algorithms combined with this training set. The evaluate() function runs your own tagger on the original training set, and then measures how accurate it was against the known–good values.

The results for this little experiment are:

Greek and Latin tagger accuracy
Tagger Greek Latin
UnigramTagger() 0.9196123340065213 0.8873793350017877
BigramTagger() 0.8125528866223641 0.7211862333703404
TrigramTagger() 0.8101779247007322 0.8162128596428504

The Greek unigram and bigram taggers were more accurate than the Latin. Both of the Unigram taggers were significantly more accurate than the others. The most interesting discrepancy here is the Latin bigram tagger, which is 17% less accurate than the unigram and 10% less than the trigram. In always looking at one previous word, my tagger misinterprets ambiguous forms.

There are probably strengths to each of these taggers. The next step would be to include these in a backoff chain, like so:

tagger = backoff_tagger(train_sents, [UnigramTagger, BigramTagger,
   TrigramTagger], backoff=backoff)

My next step to this will be to experiment with backoff_tagger(), varying the three taggers and their order.