Abstract
Abstract This paper identifies some linguistic properties of technical terminology, and uses them to formulate an algorithm for identifying technical terms in running text. The grammatical properties discussed are preferred phrase structures: technical terms consist mostly of noun phrases containing adjectives, nouns, and occasionally prepositions; rerely do terms contain verbs, adverbs, or conjunctions. The discourse properties are patterns of repetition that distinguish noun phrases that are technical terms, especially those multi-word phrases that constitute a substantial majority of all technical vocabulary, from other types of noun phrase. The paper presents a terminology indentification algorithm that is motivated by these linguistic properties. An implementation of the algorithm is described; it recovers a high proportion of the technical terms in a text, and a high proportaion of the recovered strings are vaild technical terms. The algorithm proves to be effective regardless of the domain of the text to which it is applied.
Keywords
Affiliated Institutions
Related Publications
Analysis of polarity information in medical text.
Knowing the polarity of clinical outcomes is important in answering questions posed by clinicians in patient treatment. We treat analysis of this information as a classification...
A fuzzy set approach to modifiers and vagueness in natural language.
SUMMARY Recent developments in semantic theory, such as the work of Labov (1973) and Lakoff (1973), have brought into question the assumption that meanings are precise. It has b...
Term Extraction and Automatic Indexing
Terms are pervasive in scientific and technical documents and their identification is a crucial issue for any application dealing with the analysis, understanding, generation, o...
Publicly Available Clinical
Contextual word embedding models such as ELMo and BERT have dramatically improved performance for many natural language processing (NLP) tasks in recent months. However, these m...
A Comparative Study on Feature Selection in Text Categorization
This paper is a comparative study of feature selection methods in statistical learning of text categorization. The focus is on aggressive dimensionality reduction. Five methods ...
Publication Info
- Year
- 1995
- Type
- article
- Volume
- 1
- Issue
- 1
- Pages
- 9-27
- Citations
- 812
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1017/s1351324900000048