Resource: Train a Custom Model for Book Title Recognition using OpenNLP

OpenNLP provides trained models for identifying Parts of Speech (POS). It also provides trained models for Named Entity Recognition (NER), the ability to identify common structures such as names, locations, organizations, among other things. These models are useful for general language processing requirements, but I am working in the domain of literature, and additional knowledge must be built into the system to extract domain-specific structures. I have shown how POS can be chunked into noun phrases to help identify book titles, but out of five noun phrases only one was a title. I have shown how NER can pull out names from text, potentially book authors, but the name could just as well be a character in a novel. In this post, I show how I trained a custom model for book title recognition using OpenNLP.

Source: Miedema, John: Train a custom model for book Title recognition using OpenNLP. Separate signal from noise like SETI@home.