Research Methods in BMI Arizona State University: Machine Learning and Study Design

One of my teachers once said that computers can solve problems which humans think difficult, but computers are not able to do things that humans can do easily. For example, if we try to calculate 459287 x 979435 by hand, it might take a long time, whereas computers can solve it in less than a second. What if we want a computer to do an easy task, let's say recognize a person? It will take a long time (if it can give solution), but humans can do those recognition thing comfortably. The basic difference between man and machine is the same thing: humans can recognize something (that's need parallel processing) easily but can't perform serial computation where as machine is better at performing serial computations like arithmatic calculations, but they are very bad at parallel processing (must say the designers of computers aren't smart enough to design something like that). Human can classify and infer something, machines cannot. So, the basic difference lies on learning. Humans can learn and store those things in memory for a long time and are able to infer new things. Computers need some technique to "learn" and the procedure that we apply to train a computer in norder to perform some tasks in the future is called "Machine Learning".

There are mainly two types of learning: supervised and unsupervised. In supervised learning, there is something that will "guide" a computer to give the best result. In unsupervised learning, there is no such thing like "guiding". According to the lecture, there are mainly four major topics in machine learning: classification, clusturing, regression, and semi-supervised learning.In classification, we would want to classify things (anything: yes/no, good/bad, healthy/unhealthy and so on). We use several technique to train a computer to classify those things. We first create a model using training data. The model, which is designed based on the training data, is then used to classify new data (test data). In case of surgical training simulator, if a surgeon performs a task, we would want to classify his performance as good or bad (let's not consider fuzzy answers). To evaluate the result, we need to check the results of previous similar cases and notice some key parameters like how it's done, how long did it take etc. Based on the majority of the results, we can classify the performance of the new surgeon. The results obtained from previous surgery cases are referred to as "training set", and the one that we wanted to classify is "test set". There are so many tools that can be used for classification. Few of them, which are mentioned in the class are: k-nearest neighbor, neural networks (artificial), naive bayes classifier, svm etc. Naive bayes classifiers are used in spam filtering purposes (like in spam-assasin). SVM can be used to classify non linear classification problems (like XOR-gates). Neural Networks are mainly used in computer vision to train a model to recognize some parts in images. Classification falls under supervised learning category.

In unsupervised learning, clustering is one of the popular techniques.In this technique, we take observations and put into subsets(clusters) in a way that the observations are similar in some sense (wiki). We didn't go much in detail during the lecture. It would be interesting to know in detail about clustering and semi-supervised learning in his next class.

The next lecture by Dr. Petiti was on study design, which I found very informative. She talked about various design techniques and how we can manipulate the information in an experiment. We went in detail of descriptive, observational, experimental and quasi-experimental studies. The main thing that I understood from the lecture was that experiments are randomized. In fact, experiments mean randomization. If experiments are not randomized, we cannot trust the results from those experiments. But, randomized experiments are difficult to do for a few reasons. The examples given in the lecture slides were practical reason ("we can't randomize smoking") and ethical reason("can't randomize cocaiine use). The classic studies presented during the lecture were interesting. For each of design study, there was a classic study, and they really made things easier to understand.

Posted by
Prabal Khanal

Research Methods in BMI Arizona State University

Friday, September 25, 2009

Machine Learning and Study Design

No comments:

Post a Comment

Followers

Blog Archive

Contributors