Resource: Extracting Structured Data From Recipes Using Conditional Random Fields

[…]Until recently, the collection and maintenance of this structured data was a completely manual process. For years, overnight contractors have entered recipes, dropdown by dropdown, into a gray and white web form that lives in our content management system (CMS). Since the database breaks down each ingredient by name, unit, quantity and comment, an average recipe requires over 50 fields, and that number can climb above 100 for more complicated recipes.

I long suspected that the manual process of entering recipes into the database could be replaced with an algorithmic solution. The field of Natural Language Processing (NLP) has developed powerful algorithms to solve similar tasks over the past decade. If a computer can identify the part of speech of each word in a sentence, it should be able to identify an ingredient quantity from an ingredient phrase.

Read the full post here.