Probabilistic Declarative Information Extraction

13 years 9 months ago
Probabilistic Declarative Information Extraction
Abstract-Unstructured text represents a large fraction of the world's data. It often contain snippets of structured information within them (e.g., people's names and zip codes). Information Extraction (IE) techniques identify such structured information in text. In recent years, database research has pursued IE on two fronts: declarative languages and systems for managing IE tasks, and probabilistic databases for querying the output of IE. In this paper, we make the first steps to merge these two directions, without loss of statistical robustness, by implementing a state-of-the-art statistical IE model ? Conditional Random Fields (CRFs) ? in the setting of a Probabilistic Database that treats statistical models as firstclass data objects. We show that the Viterbi algorithm for CRF inference can be specified declaratively in recursive SQL. We also show the performance benefits relative to a standalone open-source Viterbi implementation. This work opens up the optimization oppo...
Daisy Zhe Wang, Eirinaios Michelakis, Joseph M. He
Added 20 Dec 2009
Updated 03 Jan 2010
Type Conference
Year 2010
Where ICDE
Authors Daisy Zhe Wang, Eirinaios Michelakis, Joseph M. Hellerstein, Michael J. Franklin, Minos N. Garofalakis
Comments (0)