Was this page helpful?

Combining Labeled and Unlabeled data for semi-supervised learning

    July 22, 2009- Aditi.

    Often, limited labeled data brings up issues like under-fitting in machine learning. I'll talk about Co-training algorithm and some of its variants, and how they can effectively make use of unlabeled data to improve the accuracy of learned model.

    Reading (required):

      1. A. Blum and T. Mitchell (1998). Combining labeled and unlabeled data with co-training. Proceedings of the eleventh annual conference on Computational learning theory. Link

    - This paper introduces the idea of co-training, and outlines problem structures where it could be applied.

    2. K Nigam and R. Ghani (2000). Analyzing the Effectiveness and Applicability of Co-training. Proceedings of the ninth international conference on Information and knowledge management. Link

    - This paper mentions few other algorithms that make use of unlabeled data, and compare those with co-training. 

    3. Qian Xu, Derek Hao Hu, Hong Xue, Weichuan Yu and Qiang Yang (2009). Semi-supervised protein subcellular localization. BMC Bioinformatics. Link

    - An application of algorithm to biology. 

    4. A quick and simple explanation of co-training (Chapter 4): Link. If time permits, I'll delve into Multiview Learning, otherwise not.

    Further recommended Reading:

    1. A Review of Active Learning and Co-Training in Text Classification, by Michael Davy. Link

    2. Krogel and Scheffer (2004). Multi-Relational Learning, Text Mining, and Semi-Supervised Learning for Functional Genomics. Machine Learning, 57, 61–81. Link

     

    Questions/suggestions are welcome!

    Was this page helpful?
    Tag page (Edit tags)
    • No tags
    You must login to post a comment.