Machine learning - A machine learns whenever it changes its structure, program or data based on its inputs or in response to external information in such a manner that its expected future performance improves. For example, when the performance of a speech recognition machine improves after hearing several samples of a person's speech, we feel quite justified in that case to say that the machine has learned. It is possible that hidden among large piles of data are important relationships and correlations. Machine learning methods can often be used to extract these relationships.
Classification Algorithms
The goal of classification is to build a set of models that can correctly predict the class of the different objects. The input to these methods is a set of objects (i.e., training data), the classes which these objects belong to (i.e., dependent variables), and a set of variables describing different characteristics of the objects (i.e., independent variables). Once such a predictive model is built, it can be used to predict the class of the objects for which class information is not known a priori. The key advantage of supervised learning methods over unsupervised methods (for example, clustering) is that by having an explicit knowledge of the classes the different objects belong to, these algorithms can perform an effective feature selection if that leads to better prediction accuracy.
k-Nearest Neighbor(KNN) Algorithm
KNN classifier is an instance-based learning algorithm that is based on a distance function for pairs of observations, such as the Euclidean distance or Cosine. In this classification paradigm, k nearest neighbors of a training data are computed first. Then the similarities of one sample from testing data to the k nearest neighbors are aggregated according to the class of the neighbors, and the testing sample is assigned to the most similar class.
Support vector machines
Support vector machines (SVMs) are a set of related supervised learning methods used for classification and regression. A support vector machine constructs a hyperplane or set of hyperplanes in a high-dimensional space, which can be used for classification, regression or other tasks. Intuitively, a good separation is achieved by the hyperplane that has the largest distance to the nearest training datapoints of any class (so-called functional margin), since in general the larger the margin the lower the generalization error of the classifier.
----
Various types of study designs were discussed in the lecture. They include-
Experimental Design - Experimental designs are often touted as the most "rigorous" of all research designs or, as the "gold standard" against which all other designs are judged. In one sense, they probably are. If you can implement an experimental design well (and that is a big "if" indeed), then the experiment is probably the strongest design with respect to internal validity.
I think this is a good reference -
http://www.socialresearchmethods.net/kb/desexper.php
A randomized controlled trial (RCT) is a type of scientific experiment most commonly used in testing the efficacy or effectiveness of healthcare services (such as medicine or nursing) or health technologies. RCTs involve the random allocation of different interventions (treatments or conditions) to subjects. As long as the numbers of subjects are sufficient, randomization is an effective method for balancing confounding factors between treatment groups.
A longitudinal study is a correlational research study that involves repeated observations of the same items over long periods of time — often many decades. It is a type of observational study.
Case-control is a type of epidemiological study design. Case-control studies are used to identify factors that may contribute to a medical condition by comparing subjects who have that condition (the 'cases') with patients who do not have the condition but are otherwise similar (the 'controls').
Posted by
Harsha Undapalli
No comments:
Post a Comment
Gentle Reminder: Sign comments with your name.