Database-text alignment via structured multilabel classification

Benjamin Snyder; Regina Barzilay

Abstract

This paper addresses the task of aligning a database with a corresponding text. The goal is to link individual database entries with sentences that verbalize the same information. By providing explicit semantics-to-text links, these alignments can aid the training of natural language generation and information extraction systems. Beyond these pragmatic benefits, the alignment problem is appealing from a modeling perspective: the mappings between database entries and text sentences exhibit rich structural dependencies, unique to this task. Thus, the key challenge is to make use of as many global dependencies as possible without sacrificing tractability. To this end, we cast text-database alignment as a structured multilabel classification task where each sentence is labeled with a subset of matching database entries. In contrast to existing multilabel classifiers, our approach operates over arbitrary global features of inputs and proposed labels. We compare our model with a baseline classifier that makes locally optimal decisions. Our results show that the proposed model yields a 15% relative reduction in error, and compares favorably with human performance. 1

Keywords

Computer scienceClassifier (UML)Artificial intelligenceTask (project management)Matching (statistics)SentenceNatural language processingSemantics (computer science)Natural languageDatabaseInformation retrieval

Affiliated Institutions

Massachusetts Institute of Technology US

Related Publications

Parsing Natural Scenes and Natural Language with Recursive Neural Networks

Richard Socher , Cliff Chiung-Yu Lin , Christopher D. Manning +1 more

Recursive structure is commonly found in the inputs of different modalities such as natural scene images or natural language sentences. Discovering this recursive structure help...

2011 1202 citations

Supervised Learning of Universal Sentence Representations from Natural\n Language Inference Data

Alexis Conneau , Douwe Kiela , Holger Schwenk +2 more

Many modern NLP systems rely on word embeddings, previously trained in an\nunsupervised manner on large corpora, as base features. Efforts to obtain\nembeddings for larger chunk...

2017 arXiv (Cornell University) 2038 citations

DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding

Tao Shen , Tianyi Zhou , Guodong Long +3 more

Recurrent neural nets (RNN) and convolutional neural nets (CNN) are widely used on NLP tasks to capture the long-term and local dependencies, respectively. Attention mechanisms ...

2018 Proceedings of the AAAI Conference on... 729 citations

Sequence to Sequence Learning with Neural Networks

Ilya Sutskever , Oriol Vinyals , Quoc V. Le

Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training set...

2014 arXiv (Cornell University) 13295 citations

Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars

Luke Zettlemoyer , Michael J. Collins

This paper addresses the problem of mapping natural language sentences to lambda-calculus encodings of their meaning. We describe a learning algorithm that takes as input a trai...

2012 arXiv (Cornell University) 791 citations

Publication Info

Year: 2007
Type: article
Pages: 1713-1718
Citations: 38
Access: Closed

External Links

Citation Metrics

OpenAlex

Cite This

APA Style

                            
                                    Benjamin Snyder, 
                                
                                    Regina Barzilay
                                
                            (2007). 
                            Database-text alignment via structured multilabel classification. 
                            
                            , 1713-1718.