Abstract

Abstract This paper identifies some linguistic properties of technical terminology, and uses them to formulate an algorithm for identifying technical terms in running text. The grammatical properties discussed are preferred phrase structures: technical terms consist mostly of noun phrases containing adjectives, nouns, and occasionally prepositions; rerely do terms contain verbs, adverbs, or conjunctions. The discourse properties are patterns of repetition that distinguish noun phrases that are technical terms, especially those multi-word phrases that constitute a substantial majority of all technical vocabulary, from other types of noun phrase. The paper presents a terminology indentification algorithm that is motivated by these linguistic properties. An implementation of the algorithm is described; it recovers a high proportion of the technical terms in a text, and a high proportaion of the recovered strings are vaild technical terms. The algorithm proves to be effective regardless of the domain of the text to which it is applied.

Keywords

Computer scienceNoun phraseTerminologyNatural language processingLinguisticsNounVocabularyArtificial intelligencePhraseDomain (mathematical analysis)Identification (biology)Determiner phraseMathematics

Affiliated Institutions

Related Publications

Publication Info

Year
1995
Type
article
Volume
1
Issue
1
Pages
9-27
Citations
812
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

812
OpenAlex
58
Influential
340
CrossRef

Cite This

John S. Justeson, Slava M. Katz (1995). Technical terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering , 1 (1) , 9-27. https://doi.org/10.1017/s1351324900000048

Identifiers

DOI
10.1017/s1351324900000048

Data Quality

Data completeness: 81%