Sunday, November 1, 2009

Interesting lectures on Natural Language Processing

Content:
Dr.Graciela Gonzalez presented us very interesting lectures on Natural Language Processing and its usefulness in dealing with biomedical literature. Many people tend to use the terms ‘text’ mining and ‘data’ mining synonymously. In the lecture, Dr.Gonzalez gave a clear picture of how data mining differs from text mining. Ambiguity is one of the key problems while handling any data especially biomedical data. The primary step in text mining is tokenization, which involves identification of tokens (words) from a given set of information. Its very challenging to tokenize data as we encounter lot of variants like abbreviations, hyphens, apostrophes in the data. The key to success effective biomedical text mining lies in properly handling all these variants of data and ambiguity in data.

In the next lecture, we discussed some details regarding using regular expressions in text processing. This is a good link to understand various regular expressions-
http://www.zytrax.com/tech/web/regex.htm#intro
We also dealt Finite state Automata(FSA) in the lecture. I referred this page for a better understanding on FSA.
http://www.eti.pg.gda.pl/katedry/kiw/pracownicy/Jan.Daciuk/personal/thesis/node12.html

Posted by
Harsha Undapalli
 

No comments:

Post a Comment

Gentle Reminder: Sign comments with your name.