Enabling Software Agents to Mine Knowledge from Unstructured Texts

Published:

Textmining The TextMiner is one of the results from our text learning research. Text Learning, which is also called Text Mining, refers to the application of machine learning (or data mining) techniques to the study of Information Retrieval and Natural Language Processing. Loosely speaking, it is defined as the way of discovering knowledge from ubiquitous text data which are easily accessible over the Internet or the Intranet. I believe that the study of text learning is another way of understanding natural language which is one of the primary media for human to communicate with each other. The study of this field is comprised of various sub-fields: text classification, clustering, summarization, extraction, and others. So far, our research has been done on two fields: classification and clustering. Conceptually, TextMiner consists of 4 different layers: User-Interface, Task, Learning Model, and Pre-processing. Read the following papers to learn more about this work:

  • Young-Woo Seo, Anupriya Ankolekar, and Katia Sycara, Feature selections for extracting semantically rich words for ontology learning, In Proceedings of Dagstuhl Seminar Machine Learning for the Semantic Web, February, 2005.
  • Young-Woo Seo and Katia Sycara, Text clustering for topic detection, Tech Report CMU-RI-TR-04-03, the Robotics Institute, Carnegie Mellon University, 2004.
  • Anupriya Ankolekar, Young-Woo Seo, and Katia P. Sycara, Investigating semantic knowledge for text learning, In Proceedings of the ACM SIGIR-2003 Workshop on Semantic Web, pp. 9-17, Toronto, Canada, July, 2003.

Personalized Information Filtering Document filtering is increasingly deployed in Web environments to reduce information overload of users. We formulate online information filtering as a reinforcement learning problem, i.e. TD(0). The goal is to learn user profiles that best represent his information needs and thus maximize the expected value of user relevance feedback. A method is then presented that acquires reinforcement signals automatically by estimating user’s implicit feedback from direct observations of browsing behaviors. This “learning by observation” approach is contrasted with conventional relevance feedback methods which require explicit user feedbacks. Field tests have been performed which involved 10 users reading a total of 18,750 HTML documents during 45 days. Compared to the existing document filtering techniques, the proposed learning method showed superior performance in information quality and adaptation speed to user preferences in online filtering. Read the following papers to learn more about this work: