Friday, October 2, 2009

Ontology and Machine Learning revisited!!

Content:
This week we had 2 interesting sessions, one each from Dr. Fridsma and Dr. Ji. The first lecture was given by Dr. Fridsma about Ontologies and Symantic Webs. As Dr. Fridsma said, ontology is just a representation of a set of concepts within a domain and how those concepts are related to each other (Wikipedia). The best way to measure whether the ontology is correct or not is to compare the output based on the used ontology to the desired output. Based on the definition of ontology, we can imagine of using it in a semantic web, where everything is done based on what-it-tries-to-infer. A simple example of semantic web; I want to search for something that is burning. The meaning of burning is something that is on fire. So I would also like to see the pages that includes "on fire". But unfortunately, when I typed "burning" in the google search page, instead of providing me anything based on the actual meaning of "burning", it compared the word, it stemmed (process of extracting the base or root word) the word "burning", compared the huge library of web pages that included "burn", used some ranking algorithm to cluster the related web sites and presented that to me, and that was not what I was looking for. Similarly, when I typed "on fire" and clicked on 'Search', the first item on the list was a song by 50 Cent featuring Lloyd Banks :). There were times when I had to be embarassed because of google's stupid matches. But still I think that it's the best among the rest (bing lovers might not like the statement, try the same thing with bing and compare the results). So, to get the real thing that I wanted, I can create a ontology of "burning", and then categorize it according to different meanings (and meaning of the meaning upto certain depth), and then try to find different pages based on the new words, which carry the meanings of "burning". It's just a concept, and we all know this. If anyone have used Prolog before, then try this once, it's interesting. I love using Prolog to do these kinds of funny stuffs.

There was another interesting slide (I am saying that because that is the most premitive concept on how to make machines intelligent as humans) on first order predicate logic. The notations are easy but to re-write a sentence in FOPL is challanging. FoPL are mainly used to infer something from the given sets of knowledge (or simply knowledge base). We, humans, can easily understand the meaning of "child" and "parent", but we cannot simply add this concept to the machine's knowledge base. We have to translate the meaning of parent and child into machine "understandable" form first. And FOPL provides a very simple notations to convert a human readable sentence to a machine readable one. If anybody wants to go into the detail of FOPL, I'll suggest to go through the books "Artificial Intelligence" by Rich and Knight, and "Artificial Intelligence" by Russell and Norvig.

There was a brief talk on XML as well. According to w3cschools.com, "XML was designed to transport and store data, with focus on what data is; and HTML was designed to display data, with focus on how data looks". So, from the concept of ongology and the objective of XML, we can deal with the limitations of syntax based approach. We have to take a detailed course on XML, ontologies and semantic webs to learn how to implement these concepts together. I recommend the link: http://www.w3schools.com/xmL/xml_whatis.asp if you want to know (more) about XML.

The second lecture by Dr. Ji was the second part of machine learning series. He started from the review of
classification. He then explained the other important part of machine learning which is clustering. The importance of clustering is that it does not require a supervisor. Data sets are grouped together according to their features or values. The main disadvantage of clustering is that we don't know whether the results after clustering is good or bad (google's search result is an example). However, if we don't know the decision boundary and we don't know any guidance to create a boundary, then clustering is the best techniue to train a machine. He went further into the basic classes of clustering: fast clustering and hierarchical clustering. Regression and semi-supervised learning concepts were covered after that. The examples provided in the slides were helpful, but to understand each topic will be very difficult. For now, I'm sticking with K-nearest neighbour and K-means types as per his suggestion. :)



Posted by
Prabal Khanal

1 comment:

Gentle Reminder: Sign comments with your name.