Posted by
When I was young, I grew up on the South side of Chicago, on the campus of the University of Chicago. This was an interesting neighborhood, as the site of the Manhattan Project was 1 block away from my home and many very literal individuals arguing about every possible concept walked the streets. When I saw the slide of the computer guys discussing and setting up guidelines, theories, and new languages and programs, I thought maybe I had returned home. The clear limitations of this singular approach and focus must be balanced against the need for interoperability if we are going to advance clinical care. In sharp contrast to these left brained folks, Dr. Fridsma provides us with both right and left brain excellence and is in sharp contrast with balance; we certainly can see the conflict. These blogs are useful because they provide a means to set down our thoughts and understanding of topics that are not necessarily immediately clear. As such, I will lay out what my concept is wrong or right on his latest 502 lecture. It is extremely important to move to machine computable applications so that information and meaning can be shared. This should not only be syntactical, but also semantic. Ontologies that are domain specific serve as the underpinning of semantics and are purposeful as part of an engineered artifact. They serve as hard specifications for conceptualizations as quoted. Ontologies can be applied to syntactic as well as semantic realms and provide interoperability, an essential feature for data exchange and exchange of meaning, but with limitations. Interoperability is defined as exchanging and using information between systems. These ontologies, which have names for concepts and also domain specific knowledge, depend on controlled vocabularies that may have fundamental issues related to enumeration. Controlled vocabularies provide a means to resolve semiotic triangles that involve different representations of the same item or phenomena with a single word. Controlled vocabularies along with ontology languages specify ontologies and the degree of formality increases the ability for machines to appropriately communicate. RDFS or resource description framework with schema vocabulary and expanded URIs is one language attempt, which has limitations. It is not clear to me how RDFS or other ontology languages provide a means to utilize XML to foster interoperability. RDFS has limitations in attaching meaning to information or reasoning support. This is improvement over syntactical web with information that is hyperlinked. Unfortunately, as illustrated by the opposing ideas and groups that are trying to achieve interoperability, there is neither uniformity, unanimity present or even a definitive model on how to handle these issues of ontologies and interoperability and the functionality of the semantic web. OWL language has the potential for meeting some of these objectives, but still only takes up several levels on the “layer cake.” Comments on these interpretations would be useful.
In Mr. Ji’s second lecture, we again move into an area that is not completely transparent to the uninitiated. In the second iteration, the quality of his handout from initial lecture to this lecture has more sense. I suspect that Tan et al’s book on data mining would be instructive. Given the shear amount of data in biomedical informatics and bioinformatics, the ability to use machine learning and data mining to interpret and find relationships where they are not immediately identifiable (unsupervised learning) versus asking discrete questions with large data sets, i.e. supervised learning is important. Training sets become ways to evaluate models and with further testing with test sets. Classification is one approach for machine learning and data mining, but decision trees, complex mathematical support vectors, regression, and neural networks seem most applicable to clinical and bioinformatics questions. The nearest neighbor analysis that is part of classification can become part of clustering methods; it is unclear to me how attributes in this situation distinguish supervised versus unsupervised learning. Part of the k means clustering involves the use of centroids, which is the mean of points in the cluster. I need or maybe I should say I would like to know what Euclidean and cosine similarity is. I suspect that hierarchical clustering with dendrograms, including agglomerative and divisive types are more useful in this data mining approach. Intercluster similarity is an important piece of this approach. Further cluster validity can be determined. Clustering and classification protocols can be brought together into supervised and unsupervised application with regression, another data mining technique. A combination of supervised and unsupervised machine learning may allow bringing both approaches together for solution of a variety of BMI and bioinformatics problems. What the differences between labeled and unlabeled for supervised and unsupervised learning needs to be clarified.
Stuart
No comments:
Post a Comment
Gentle Reminder: Sign comments with your name.