Word Sense Disambiguation
Edited by Eneko Agirre and Philip Edmonds
Chapter 6: Unsupervised Corpus-Based Methods for WSD
Ted PedersenAbstract
This chapter focuses on unsupervised corpus-based methods of word sense discrimination that are knowledge-lean, and do not rely on external knowledge sources such as machine readable dictionaries, concept hierarchies, or sense-tagged text. They do not assign sense tags to words; rather, they discriminate among word meanings based on information found in unannotated corpora. This chapter reviews distributional approaches that rely on monolingual corpora and methods based on translational equivalence as found in word-aligned parallel corpora. These techniques are organized into type- and token-based approaches. The former identify sets of related words, while the latter distinguish among the senses of a word used in multiple contexts.Links
Latent Semantic Analysis (LSA)Clustering By Committee (CBC)
SenseClusters Perl package
Contents
6.1 Introduction. 133
6.1.1 Scope. 134
6.1.2 Motivation. 136
Distributional methods. 137
Translational equivalence. 139
6.1.3 Approaches. 140
6.2 Type-based discrimination. 141
6.2.1 Representation of context 142
6.2.2 Algorithms. 145
Latent Semantic Analysis (LSA) 146
Hyperspace Analogue to Language (HAL) 147
Clustering By Committee (CBC) 148
6.2.3 Discussion. 150
6.3 Token-based discrimination. 150
6.3.1 Representation of context 151
6.3.2 Algorithms. 151
Context group discrimination. 152
McQuitty's similarity analysis. 154
6.3.3 Discussion. 157
6.4 Translational equivalence. 158
6.4.1 Representation of context 159
6.4.2 Algorithms. 159
6.4.3 Discussion. 160
6.5 Conclusions and the way forward. 161
Acknowledgements. 162
References. 162
Copyright © 2006 Springer. All rights reserved.