Chapter 7: Supervised Corpus-Based Methods for WSD
Lluís Màrquez, Lluís Màrquez, Gerard Escudero, David Martínez, German RigauAbstract
In this chapter, the supervised approach to word sense disambiguation is presented, which consists of automatically inducing classification models or rules from annotated examples. We start by introducing the machine learning framework for classification and some important related concepts. Then, a review of the main approaches in the literature is presented, focusing on the following issues: learning paradigms, corpora used, sense repositories, and feature representation. We also include a more detailed description of five statistical and machine learning algorithms, which are experimentally evaluated and compared on the DSO corpus. In the final part of the chapter, the current challenges of the supervised learning approach to WSD are briefly discussed.Links
SVMlight Support Vector Machine implementation
Contents
7.1 Introduction to supervised WSD.. 167
7.1.1 Machine learning for classification. 168
An example on WSD.. 170
7.2 A survey of supervised WSD.. 171
7.2.1 Main corpora used. 172
7.2.2 Main sense repositories. 173
7.2.3 Representation of examples by means of features. 174
7.2.4 Main approaches to supervised WSD.. 175
Probabilistic methods. 175
Methods based on the similarity of the examples. 176
Methods based on discriminating rules. 177
Methods based on rule combination. 179
Linear classifiers and kernel-based approaches. 179
Discourse properties: The Yarowsky bootstrapping algorithm.. 181
7.2.5 Supervised systems in the Senseval evaluations. 183
7.3 An empirical study of supervised algorithms for WSD.. 184
7.3.1 Five learning algorithms under study. 185
Naive Bayes (NB) 185
Exemplar-based learning (kNN) 186
Decision lists (DL) 187
AdaBoost (AB) 187
Support Vector Machines (SVM) 189
7.3.2 Empirical evaluation on the DSO corpus. 190
Experiments. 191
7.4 Current challenges of the supervised approach. 195
7.4.1 Right-sized training sets. 195
7.4.2 Porting across corpora. 196
7.4.3 The knowledge acquisition bottleneck. 197
Automatic acquisition of training examples. 198
Active learning. 199
Combining training examples from different words. 199
Parallel corpora. 200
7.4.4 Bootstrapping. 201
7.4.5 Feature selection and parameter optimization. 202
7.4.6 Combination of algorithms and knowledge sources. 203
7.5 Conclusions and future trends. 205
Acknowledgements. 206
References. 207