Friday, October 30, 2009

Week of 10/26/09

Content: Both of this week's lectures were on natural language processing (NLP). Text mining is a technique used in NLP. Some challenges in text mining are ambiguity of terms, lexical variants, an tokenizing text. Regular expressions and some of their uses to find text was covered in the second lecture. Both false positive and false negative results for text searching were taught. Increasing accuracy and increasing coverage can effect error rates in text mining. Regular expressions are one type of formalism for capturing language and others are finite state automata and regular grammers. Another topic in text mining that was covered was morphological parsing aka stemming. Morphology can be either inflectional or derivational. Inflectional morphologies can have different combinations of stems and affixes where the resulting words serve different grammatical and semantic purposes. An example of derivational morphology is changing a verb into a noun. NLP can help with doing research by extracting literature articles that contain terms like gene names. Associations between terms like a gene and a disease can be discovered through NLP techniques. A open source program named BANNER can help with generating useful name entity recognitions.


I liked how Dr. Gonzalez had students do a practice exercise on recognizing lexical variations. Also, she did a nice job with making her lectures interactive with students. The component of her lectures where she asked students questions added helpful opportunities for us in the audience to try to apply the knowledge we were learning from the lectures. Also, the idea of automating specific literature retrieval seem particularly useful to me. Even if some literature retrieval was automated, it seems like someone would want to manually search through other literature too. However, some of the literature search tools seem so effective that searching though other literature than what they found may not be highly useful. The picture that Debbie posted definitely helps with understanding how precision and recall are similar to sensitivity and specificity. Thanks for posting that Debbie.


Some potential NLP research areas are presented in this article:
One of those potential research areas is to improve on the ability to be both easy to install and use and achieve state of the art levels of performance in tools for the automated extraction of gene and/or protein interactions (GPI) from literature. Perhaps making a state of the art performing GPI extraction tool easier to install and use would be an effective way to go about that research.

Posted by:

Nate

No comments:

Post a Comment

Gentle Reminder: Sign comments with your name.