A unified architecture for natural language processing

Abstract

We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense (grammatically and semantically) using a language model. The entire network is trained jointly on all these tasks using weight-sharing, an instance of multitask learning. All the tasks use labeled data except the language model which is learnt from unlabeled text and represents a novel form of semi-supervised learning for the shared tasks. We show how both multitask learning and semi-supervised learning improve the generalization of the shared tasks, resulting in state-of-the-art-performance.

Keywords

Computer scienceNatural language processingArtificial intelligenceSentenceMulti-task learningGeneralizationConvolutional neural networkNatural language understandingNatural languageTask (project management)

Affiliated Institutions

Princeton University US

Related Publications

Evaluating the Effectiveness of Large Language Models in Representing Textual Descriptions of Geometry and Spatial Relations (Short Paper)

T. B. Brown , Benjamin F. Mann , Nick Ryder +28 more

This research focuses on assessing the ability of large language models (LLMs) in representing geometries and their spatial relations. We utilize LLMs including GPT-2 and BERT t...

2023 Leibniz-Zentrum für Informatik (Schlo... 14006 citations

UNet++: A Nested U-Net Architecture for Medical Image Segmentation

Zongwei Zhou , Md Mahfuzur Rahman Siddiquee , Nima Tajbakhsh +1 more

In this paper, we present UNet++, a new, more powerful architecture for medical image segmentation. Our architecture is essentially a deeply-supervised encoder-decoder network w...

2018 Lecture notes in computer science 7871 citations

Attention Is All You Need

Ashish Vaswani , Noam Shazeer , Niki Parmar +5 more

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also co...

2025 6466 citations

ALBERT: A Lite BERT for Self-supervised Learning of Language\n Representations

Zhenzhong Lan , Mingda Chen , Sebastian Goodman +3 more

Increasing model size when pretraining natural language representations often\nresults in improved performance on downstream tasks. However, at some point\nfurther model increas...

2019 arXiv (Cornell University) 4051 citations

Momentum Contrast for Unsupervised Visual Representation Learning

Kaiming He , Haoqi Fan , Yuxin Wu +2 more

We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic diction...

2020 2020 IEEE/CVF Conference on Computer ... 11112 citations

Publication Info

Year: 2008
Type: article
Pages: 160-167
Citations: 5151
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

A unified architecture for natural language processing

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

5151

OpenAlex

Cite This

APA Style

                            
                                    Ronan Collobert, 
                                
                                    Jason Weston
                                
                            (2008). 
                            A unified architecture for natural language processing. 
                            
                            , 160-167.
                            https://doi.org/10.1145/1390156.1390177

Identifiers

DOI: 10.1145/1390156.1390177