Technical terminology: some linguistic properties and an algorithm for identification in text

John S. Justeson; Slava M. Katz

doi:10.1017/s1351324900000048

Abstract

Abstract This paper identifies some linguistic properties of technical terminology, and uses them to formulate an algorithm for identifying technical terms in running text. The grammatical properties discussed are preferred phrase structures: technical terms consist mostly of noun phrases containing adjectives, nouns, and occasionally prepositions; rerely do terms contain verbs, adverbs, or conjunctions. The discourse properties are patterns of repetition that distinguish noun phrases that are technical terms, especially those multi-word phrases that constitute a substantial majority of all technical vocabulary, from other types of noun phrase. The paper presents a terminology indentification algorithm that is motivated by these linguistic properties. An implementation of the algorithm is described; it recovers a high proportion of the technical terms in a text, and a high proportaion of the recovered strings are vaild technical terms. The algorithm proves to be effective regardless of the domain of the text to which it is applied.

Keywords

Computer scienceNoun phraseTerminologyNatural language processingLinguisticsNounVocabularyArtificial intelligencePhraseDomain (mathematical analysis)Identification (biology)Determiner phraseMathematics

Affiliated Institutions

University at Albany, State University of New York US

Related Publications

Analysis of polarity information in medical text.

Yun Niu , Xiaodan Zhu , Jianhua Li +1 more

Knowing the polarity of clinical outcomes is important in answering questions posed by clinicians in patient treatment. We treat analysis of this information as a classification...

2005 PubMed 83 citations

A fuzzy set approach to modifiers and vagueness in natural language.

Harry M. Hersh , Alfonso Caramazza

SUMMARY Recent developments in semantic theory, such as the work of Labov (1973) and Lakoff (1973), have brought into question the assumption that meanings are precise. It has b...

1976 Journal of Experimental Psychology Ge... 309 citations

Term Extraction and Automatic Indexing

Christian Jacquemin , Didier Bourigault

Terms are pervasive in scientific and technical documents and their identification is a crucial issue for any application dealing with the analysis, understanding, generation, o...

2012 Oxford University Press eBooks 95 citations

Publicly Available Clinical

Emily Alsentzer , John R. Murphy , William Boag +4 more

Contextual word embedding models such as ELMo and BERT have dramatically improved performance for many natural language processing (NLP) tasks in recent months. However, these m...

2019 Proceedings of the 2nd Clinical Natur... 1422 citations

A Comparative Study on Feature Selection in Text Categorization

Yiming Yang , Jan Pedersen

This paper is a comparative study of feature selection methods in statistical learning of text categorization. The focus is on aggressive dimensionality reduction. Five methods ...

1997 4766 citations

Publication Info

Year: 1995
Type: article
Volume: 1
Issue: 1
Pages: 9-27
Citations: 812
Access: Closed

External Links

Download PDF (Free) View on DOI.org Semantic Scholar

Social Impact

Altmetric

Technical terminology: some linguistic properties and an algorithm for identification in text

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

812

OpenAlex

Influential

340

CrossRef

Cite This

APA Style

                            
                                    John S. Justeson, 
                                
                                    Slava M. Katz
                                
                            (1995). 
                            Technical terminology: some linguistic properties and an algorithm for identification in text. 
                            Natural Language Engineering
                            , 1
                            (1)
                            , 9-27.
                            https://doi.org/10.1017/s1351324900000048

Identifiers

DOI: 10.1017/s1351324900000048

Data Quality

Data completeness: 81%