Friday, October 2, 2009

Ontology and datamining

Content:

I have to say the two classes in this week is tough, but I got tons of information from the Dr. Fridsma's introduction on ontology and Mr.Ji's introduction on datamining. Although I feel a little lost during the class because of to many new concepts and new terms flow into my mind, I feel more clear now after I review the information in the class.

In Dr.Fridsma's class about ontology, he talked from semantic web and compare the difference between semantic and syntatic, which give me a clear conception about the core value of the semantic and the significance of semantic language, especially in health care field. Actually, compared with the tradionally used the syntatic language, the semantic language more focus on the underlying understandable meaning of language, and concentrate on how to compose and structure the sentence or syntax to deliver the language to computer, and make computer can extract the meaningful information like human being understand the meaning of received information. I think this is the one important obstacle in the Aritificial Intellegence to prevent the direct communication between human beings and machine. When I first understand what is the semantic in the class, the first example that I thought about is one example given in the Dr. Shortillif's class. The example is that input the below sentence into computer, " The distance between city A and city B is 360 mile, one person drives a car from city A to B taking 6 hours. What is the average speed of the car?" After input these two sentences into computer, the computer can prompt up a dialog box showing "The average speed is 60 miles/hour". Althoug it is kind of a too stupid question to human being, how it can be understood by computer when computer gets this input and how can computer know what it should do to respond to this question are two real key questions. Actually, the one important step in this example is to set up a semantic language that can first transfer the human being readable language to the the computer readable language, and then the further processing will be conducted by computer to choose the right response to the question and calculate the result. For the current technology and protocol used for semantic representation of medical language, Dr. Fridsma also introduced many examples, such as XML format, ICD10, RDF and so on, the principle of each technology and standard were discussed briefly and advantage and disadvantage were also compared.

In Mr. Ji's second class on introduction of Data mining, the unsupervised data mining method-cluster, was mainly introduced. Compared to supervised method-classification, the cluster is also based on the predicators to set up the model. But the unsupervised cluster is not aim to assign a predicated label to the case with unknown label, but focus on how to group the similar cases together. Corresponding the supervised and unsupervised data mining methods, the corresponding two algorithms, k-nearest neighbor and k-means are introduced deeper, and simple demos were also played in class for understanding. In bioinformatics study, the cluster is one of the most common method for genetics and protein study. For example, the cluster method can be used to sample the the representative protein structure from molecular simulation result, and can be used to group the similar genomes into sub catagories. Although in the class there is no introduction on how to implement the algorithm into practice, there is one very easily use and powerful software-WEKA, which has most of the data mining algorithm and have been used in some studies in bioinformatics datamining.

The link of WEKA is as below. It is also free. You can try to play it.

http://www.cs.waikato.ac.nz/ml/weka/

Post by Di Pan

No comments:

Post a Comment

Gentle Reminder: Sign comments with your name.