Saturday, October 31, 2009

Natural Language Processing

Content:

The method used in natural language processing (NLP) is text mining—the processing of discovering and extracting knowledge from unstructured data. NLP concerns with the human-computer interactions from the perspective of language, it is to make machine language more readable to human and human language more understandable to computer. The basic text mining activities include information retrieval, information extraction and data mining. Because of the ambiguity of language, usually, it is not so easy to do the analysis of information at a single step, so the natural language is deal with at several levels.


In the biomedical field, the degree of ambiguity and analysis complexity is much greater. At the lexical level alone, the tokenization and the lexical variants are the distinct problems need to be identified. Morphological analysis is the way to unify the lexical variants by assigning a canonical base form. The concept precision and recall in morphological analysis are associated with probabilistic.

After the two series of lectures given by Dr. Gonzales, I experience the significance of Ontology and ubiquity of statistics once again. If at the beginning we follow the ontology to create the words and terms, maybe it will be much easier for the scientists in the NLP field. But that is the glamour of the natural language.


Posted by Xiaoxiao

Natural Language Processing

Content:

This week was a pretty interesting week that covered the topics of Natural language processing and text mining. This is a very interesting field since it really touches our everyday communication. We write, we speak, we chat. All this eventually if its recorded be comes text and just adds to the massive amount of data out there. (I'm adding some more text writing this blog.)

As for Di's question about how Google works, the original publication by the creators of Google at Stanford are still online. In case anyone wants to do some mining, I used the terms "google search algorithm stanford" to pull up the Stanford page with the information on how Google works (at least when it started).

Link: http://infolab.stanford.edu/~backrub/google.html

The second lecture, touched on finite state automata which is an interesting topic. It was mentioned in the lecture that the FSA is similar to a game for advancing and etc. This reminded me of the Cootie Bug game. The goal of the game is to be the first to finish a bug by assembling its head to its body, adding legs, eyes, tongue, and a hairpiece. Seems simple? Well in order to add parts you have to roll a dice to get certain body parts. It was required that you get a body before the head and that the head and body are attached before getting any other parts. So for example you have to roll a 1 for the body, and then a 2 for the head. It requires then you first get a 1 and then a 2, then the rest of the parts can be taken in any order. so I think it'll be represented like this 12[3456]+.

Robotics Conference
Lastly, I have extra stuff. The Fulton School of Engineering is hosting a Robotics Conference next Friday and Saturday (Nov. 5,6). The cost to attend is 50 dollars but the program looks fairly interesting. It includes robotics in surgery, laproscopic surgery, etc.  I've attached the link the site.

Link: http://roboticsaz.asu.edu/

Posted by
Eric
Content:

This week Dr. Gonzalez gave us two great lecture of text mining. I have heard the term of ‘text mining’ for thousands of times, but this is the first time that I get in touch with this field. Our language is really ambiguous. Those ambiguous can be at lexical level or syntactical level. I like the example she gave us in class about “big stars resist dieting”. Our misunderstanding of this sentence is caused by lexical level. But text mining is to let computer to extract information from such ambiguous language, it is even more difficult. According to my simple understanding, the key point of text mining is to find certain pattern from a large amount of nature language text. Text is written for people to read, how can we make computers to read it? Dr. Gonzalez introducee some basic concept of the algorithms and the underlying principles, which I think are really interesting.

Posted by
Jing Lu

Friday, October 30, 2009

NLP Methods

As we have seen, one important aspect of the discovery of new scientific and medical information is the process of mining for information. This can take the form of data mining, which retrieves patterns from already processed text, or text mining, which extracts information from unstructured data. Text mining can seem intimidating at times, so it was good to see the breakdown of the divisions/steps in the text mining process. Going in the order of the steps (lexical, syntactic, semantic, discourse) makes the process seem much more manageable. Still, there are many challenges in text mining, such as the validation of patterns that are found and the processing of such an overwhelming amount of information. Even if patterns are found, it is important to make sure that proper validation and analyses are conducted to prove that there is a legitimate finding.

Posted by Annie

Text mining in Biomedical informatics

Content:

The class in this week is very interesting. With the introduction by Dr.Gonzalez, I first time get the chance to touch what is text mining, which I felt mysterious and attractive for a long time. The most often application of the text mining technology in our daily life I think should be google, which provide a powerful text searching and analytical service in tons of web, articles and even pictures to find what we are looking for. Although I still do not know how the powerful google search engine realize the text mining technology and the detail about what kind of algorithms are involved, these two lectures absolutely stimulate my curiousness in text mining field. Other than web search engine, the text mining technology also has broad application in biomedical field. In biomedical research field, text mining technology is usually combined with natural language processing and computational linguistics to be applied in bioinformatics and medical informatics. One big application is what we have already heard thousand times, the NCBI PubMed, which use text mining technology to extract biomedical and molecular biology literature from increasing number of electronically available publications stored in database. The other main development of text mining in biomedical field is in the identification of biological entities, so called entity recognition, such as protein and gene names in free text, the association of gene clusters obtained by microarry experiments with the biological context provided by the corresponding literature, automatic extraction of protein interactions and associations of proteins to functional concepts. One sofware tool developed using text mining technology is the FABLE, which is a gene-centric text-mining search engine for MEDLINE. The interested one can go here to try it: http://fable.chop.edu/.



Posted by
Di Pan
Content:NLP as I understood was the processing of natural language ie human language to make it understandable to the computer. Specifically used in bioinformatics domain, NLP is a very useful tool to extract relevant information from the huge amount of literature present, where its humanly impossible to read through every article. The first step in NLP is tokenization which is breaking the sentence into relevant words which are then used for searches. Morphological analysis uses the lexical words (group of words) and identifies its variants which can be linked to the base word. The 3 ways of capturing such language are : regular expressions, finite state automata and regular grammars. However I still need to understand these methodological aspects better. What I understood is that using these methods in bioinformatics world helps to recognize various related terms and therefore consolidate the knowledge which can be scattered all over the literature especially since the literature on genes is so ambiguous.
I found the Stanford website very resourceful for various programs which are freely downloadable for NLP. http://nlp.stanford.edu/

Posted by Sheetal Shetty

Text Mining

Another week of new info. Wouldn't it be great if someone wrote a textbook to include all of these methodologies? I agree with what others have posted. Dr. Gonzalez is an enthusiastic lecturer and her classroom integration is good reinforcement. Still, I need to do some searches to really comprehend the topic. I do know that Precision = TP/TP + FP, and as was debated in class, this is the same as predictive value. Likewise, Recall = TP/TP + FN which is the same as sensitivity.

Text mining is the discipline where patterns are extracted from natural language text (not structured data bases). Althought new to me, this appears to be a discipline studied for years. I found that scientists have hypothesized causes of rare disease by looking for indirect links in different subsets of the bioscience literature for years. I have read in numerous articles that text mining is believed to be extremely useful in the field of biomedicine. Guess that's why it's been presented to us.

Happy Halloween!!!! Lee

last week of october

Content:

This week was interesting, I've never thought I would revisit verbs and nouns in class again. Dr. Gonzalez did I really good job presenting this new topic. I also enjoyed that she involved the class in the process.

Dr. Gonzalez, introduced text mining methods. The overall all goal of text mining in biomedical informatics is to empower discovery by extracting knowledge that is embedded in the literature. I really like how Dr. Gonzalez explained finite state automata as playing a board game, and that you are unable to advanced unless you get the input necessary. In these lectures it was also discussed how to deal with ambiguity. In addition, we did an in-class exercised where we tried to look for ambiguity. Overall these were two great lectures, I am still a bit confused on all the symbols used, but I think it is just a matter of practice.

I found a good article on on challenges and advancements made in biomedical text mining I thought it was really interesting: http://intl-bib.oxfordjournals.org/cgi/reprint/8/5/358


Posted by

Week of 10/26/09

Content: Both of this week's lectures were on natural language processing (NLP). Text mining is a technique used in NLP. Some challenges in text mining are ambiguity of terms, lexical variants, an tokenizing text. Regular expressions and some of their uses to find text was covered in the second lecture. Both false positive and false negative results for text searching were taught. Increasing accuracy and increasing coverage can effect error rates in text mining. Regular expressions are one type of formalism for capturing language and others are finite state automata and regular grammers. Another topic in text mining that was covered was morphological parsing aka stemming. Morphology can be either inflectional or derivational. Inflectional morphologies can have different combinations of stems and affixes where the resulting words serve different grammatical and semantic purposes. An example of derivational morphology is changing a verb into a noun. NLP can help with doing research by extracting literature articles that contain terms like gene names. Associations between terms like a gene and a disease can be discovered through NLP techniques. A open source program named BANNER can help with generating useful name entity recognitions.


I liked how Dr. Gonzalez had students do a practice exercise on recognizing lexical variations. Also, she did a nice job with making her lectures interactive with students. The component of her lectures where she asked students questions added helpful opportunities for us in the audience to try to apply the knowledge we were learning from the lectures. Also, the idea of automating specific literature retrieval seem particularly useful to me. Even if some literature retrieval was automated, it seems like someone would want to manually search through other literature too. However, some of the literature search tools seem so effective that searching though other literature than what they found may not be highly useful. The picture that Debbie posted definitely helps with understanding how precision and recall are similar to sensitivity and specificity. Thanks for posting that Debbie.


Some potential NLP research areas are presented in this article:
One of those potential research areas is to improve on the ability to be both easy to install and use and achieve state of the art levels of performance in tools for the automated extraction of gene and/or protein interactions (GPI) from literature. Perhaps making a state of the art performing GPI extraction tool easier to install and use would be an effective way to go about that research.

Posted by:

Nate

Wednesday, October 28, 2009

Natural Language Processing Lectures

It is never ceasing to amaze me how many new topics can be presented in the course of a week, let alone a semester.  NLP is rather intriguing and what seems like a rather simple concept of breaking down words, the process is very complex when you need to apply semantic meanings to the words to keep them in context.  Dr. Gonzalez is very enthusiastic on this subject.  The first lecture was not as technical as the second one.  When we started getting into the actual coding and search strings, it was very interesting - but this is what I like to learn.  There's a lot of information presented, almost overwhelming again, similar to ontologies and taxonomies so it would be nice to have a review of the important things to take away from this lecture. 

I  need to learn more about Finite State Automata.  I understand the concept but would like to see more examples.  A link I have found helpful to apply NLP to the clinical field was a study using NLP to identify Heart Failure.  Within the study, they quote their sensitivity, specificity and the predictive value --- they didn't utilize the words recall or precision.  Here's the link:  http://www.informatics-review.com/wiki/index.php/Electronic_Medical_Records_for_Clinical_Research:

I also found  a simple diagram showing the relevancy as it relates to Recall and Precision calculations similar to the sensitivity and specificity.





Here's the link for the diagram:  http://pages.cs.wisc.edu/~jerryzhu/cs838/IR.pdf
If anyone else finds some great introductory links, please share them.
Posted by :  Debbie Carter

Monday, October 26, 2009

Bioinformatics Methods

Content:
Well first off, oops I fell asleep and then this slipped my mind. I find this proves that we can't multitask too well and that computers are needed for memory assistance.

Now off to what we did last week. We learned about the different bioinformatics datatypes as well as the flow of genetic data. In the following lecture, we got an introduction into the different methods used for bioinformatics.



Posted by Eric

Saturday, October 24, 2009

Methods in Bioinformatics

Content:

In Professor Dinu’s first lecture, he introduced the 4 fundamental sources of bioinformatics data: DNA, mRNA, protein and metabolite. As the researches of these data go on, some “omes ” are introduced: Genome, Transcriptome, Proteome and Metabolome. Biomarkers are gathered from the biological information flow: DNA->RNA->Protein.





In the second lecture, Dr. Dinu gave an overview of the Bioinformatics techniques. The mathematics devotes a lot to the bioinformatics data analysis, including statistics and machine learning which used to be the topic in last classes. For sequence alignment, BLAST is the most popular tool. The purpose of sequence alignment is to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences.


Posted by  Xiaoxiao

Friday, October 23, 2009

Bioinformatics introduction

Content:

In the two lectures in this week, the main topic mainly focuses on the bioinformatics. Dr.Dinu gave us a well organized introduction from the what kind of data in bioinformatics study to the usually used computational methods in this field. In the first part, many important bioinformatics data was introduced, including the microarray data, genomic data, DNA/Protein sequence, also many important bioinformatics data source are also talked about, such as the NCBI, uniprot, PDB bank, where is usually the starting point for getting initial bioinformatics information for research. In the second part of the class, the prevelant bioinformatics research methods were discussed, which covers the statistics method, such as the chi_square test, t-test, and also the datamining method used in bioinformatics study was also covered. From the second part, I got an understanding about how to connect the previous learned datamining method to the applications in real bioinformatics research. Take the clustering as an exampl. The functional catagorizing similar gene expression to a group is just based on the clustering method. The other usually used bioinformatics analysis method is the blaster method, which can search the gene or protein database to match the most similar gene or protein sequence to the enquiry sequence. In the bioinformatics study for protein, based on the most similar sequence, the homology modeling can be performed to build up the 3 dimensional protein structure for further molecular simulation. The above two applications are mainly in the physical simulation domain, which is a subdomain under the computational biology field. The other fields of bioinformatics, I feel the statistics technologies would be more invovled.


Posted by
Di Pan

Bioinformatics

Content:Excellent overview of methods used in bioinformatics. The first class talked about the data sources in bioinformatics which was broken down very systematically into DNA, RNA, proteins and metabolites. DNA is studied for one of the following: 1. Sequence variation i.e. Single nucleotide polymorphism (SNP), 2. Epigenetic modification i.e. methylation, deacylation 3. Structural variation i.e. translocation, copy number variation.
RNA is studied using gene expression to identify micro RNA which are associated with the disease.
Proteins are studied using mass spectroscopy to identify protein expression. The main problem with problem with protein expression studies is the size of the protein molecule which is the deterrent for high throughput studies using array technology.

The other data source mentioned was metabolites which are by products of a disease process and can be easily detected. Thus assays detecting these are more robust as people with diseases will most likely have the metabolite.

The second lecture covered the methods used for analysis: 1. Biomarker test 2. Data mining 3. Statistics 4. Sequence alignment
Biomarkers testing includes identification of the maker-->sequencing the marker-->identifying the mRNA expression-->identifying the protein expression level-->validation of the biomarker by screening the same in clinical tumors.
Data mining techniques include 1. Unsupervised algorithm eg k-means, clustering 2. Supervised algorithm: Regression, classification (Random forests, decision trees, Neural networks)
Statistics mainly used in genetic data are linear and logistics regression, t-tests and chi-squared test.
Sequence alignment using dynamic mapping which was difficult to understand from the slide. I found this simple website which explains it very well.
http://www.avatar.se/molbioinfo2001/dynprog/dynamic.html



Posted by
Sheetal Shetty

Bioinformatics data and techniques

Content:
In the first lecture by Dr.Dinu, the various kinds of bioinformatics data – genome, proteome, metabolome and transcriptome were discussed. The “ Central Dogma of Life” which involves the conversion of DNA to mRNA(transcription) ,which is in turn converted to protein (translation) is described and its relevance in obtaining various sources of “-omic” data is elaborated. Biomarkers for various diseases like cancer, diabetes, hypertension, alzheimer’s disease were also discussed.

In the second lecture, various techniques that could be used to analyse the bioinformatics data to understand the disease processes were discussed. These techniques include biomarker tests, machine learning approaches like clustering, decision trees, classification, regression and statistical tests of significance like T-test, Chi square test, odds ratio. Pair wise sequence alignment using BLAST (several versions-protein blast, nucleotide blast, blastx, tblastn) and global alignment using dynamic approaches are of significant importance in genomic studies.

Both the lectures gave a good understanding of various concepts and tools in bioinformatics and possible areas of research emphasis.

Posted by
Harsha Undapalli

Bioinformatics Week

Genomic variation is a main focus of research because it has a big impact on the well-being of individuals. Because we now have such large volumes of genomic data, there is more interest in comparing sequences of individuals in similiar and different populations. This, in turn, gives us better insight into the propagation of certain genetic disorders in society. One type of genetic variation is a polymorphism, which is generated from some type of individual mutational change (insertion, deletion, etc.). Most polymorphisms are eliminated from the population, but some can become fixed in a population. A SNP is a variation of a single nucleotide. Other variations include copy number variants and structural variations. It is a pretty daunting task to identify all the types of changes (neutral or harmful) in a genome because they can occur in just a single nucleotide, a larger portion of a gene, several genes, parts of one chromosome, or even multiple chromosomes. We will need very powerful tools in the future to detect tiny variations in disorders that are hard to pinpoint one root cause/loci--like cancer.

Posted by Annie
This week, it was the turn of Bioinformatics. The first lecture was mainly focused on "what"s. I had no idea about the things before the lecture and was difficult for me to catch those things. I got to know more about different bioinformatics data like genomes, and proteomes. Dr. Dinu also coverd different biomarkers. He also mentioned about DNA, RNA and Proteins.

In the second lecture, Dr. Dinu talked about how the machines are trained to find the biomarkers and categorize them. He also mentioned about clustering (k-means , 4) and different approaches of doing clustering like average linkage, single linkage, and complete linkage. He mentioned different things about classification of high-grade brain tumors by using gene experession. This week was more informative to me regarding the novelty of contents that we covered in this course. Now it's time to prepare for the mid-term exam of BMI 501 course. Good luck to everyone.

And thanks Laura for the song, it was great. Ashu's animations were also cool.

-Prabal
Content: Well, Laura and Asutosh, its your week, but definitely not mine! Most of the topics that were covered is this week, I heard for the 1st timein my life! Genetics is really tough field, I totally agree with Pier. My salute to the resercher in this field for doing an awesome and very important job.
I really gotta go thru the this week’s lectures again and again. The 1st lecture introduces different data types in genomics and the 2nd lecture teaches the methods to work with these data. That’s all I can contribute to the blog for now.

Posted by Gazi

Week of 10/19/09


Content:

This week's lectures were on bioinformatics. Some sources of bioinformatics data were covered including genomes, proteomes, and metabolomes. Those data sources can be used to detect biomarkers that are indications of diseases. Some types of genetic analyses that are done in bioinformatics include analyses of SNPs, DNA methylation, and copy number variations. Microarrays are one type of technology that are used for those analyses. Mass spectrometry can be used for protein analyses. The FDA has approved the use of some biomarkers for testing for the presence of illnesses. Data mining can be used to identify disease associations with genetics. Chi square tests can be used to test if the occurrence of genotypes or specific alleles are more frequent in people with diseases than people without diseases. Genetic sequence alignment algorithms can be used to find differences and similarities in genetic sequences.

Dr. Dinu covered a lot of topics that are important in the field of bioinformatics. I liked how he talked about a variety of different biomarkers that can be associated with diseases. Its interesting to me to look at the list of FDA approved biomarkers for cancers. That was from 2005 also. A list of genomic biomarkers for drugs that seems current is here:
According to the FDA site “Pharmacogenomic information is contained in about ten percent of labels for drugs approved by the FDA.“ That is quite a lot of relevance that pharmacogenomics is having on medication. Warfarin is on that list, as mentioned in previous classes. It will be interesting to see what new pharmacogenomics discoveries happen and are relevant to medications in upcoming years.

Posted by:

Nate

bioinformatics

Content: Laura, that's an awesome song! Ashutosh: your animations are pretty cool too, though I was unable to open the last one.

Bioinformatics was the focus in this week lectures.  I really respect people who study/research this area, because for me I think it is very hard to do. As for me, this stuff is not new, but I am no expert.  Dr. Dinu's first lecture introduced us to the types of data gathered in bioinformatics.  Such as gene expression data, transcriptome, proteome, metabolome.  The second lecture he talked about methods used in bioinformatics.  Method such as data mining and statistical tools.

As I reviewed the lecture I found a really good site with definitions about everything and anything that has to do with genes: http://www.genome.gov/Glossary
Hope it helps.


Posted by P.Ortiz

Bioinformatics

Content:I agree with Laura that this was the first lecture where I knew most the topics covered. However the more we dig deep into it, the more complex it gets. Its a very vast domain and for me even I dedicate all my years learning about bioinformatics , I still be left with a lot to know. For my friends who donot have a biology background , here are some animation links which tell how Microarray and PCR work and their applications.
Microarray
www.bio.davidson.edu/Courses/genomics/chip/chip.html
learn.genetics.utah.edu/content/labs/microarray/
highered.mcgraw-hill.com/sites/.../animation_quiz_2.html

PCR
www.maxanim.com/genetics/PCR/PCR.htm
www.sumanasinc.com/webcontent/animations/.../pcr.html
highered.mcgraw-hill.com/sites/.../student.../animation_quiz_6.html
Hope you people will enjoy it and learn from it.


Posted by
Ashutosh
Bioinformatics-Data and Techniques

First of all, Laura's song recommendation is great. Turn up the volume! I was thinking that we might have a class sing-a-long to facilitate memorization. (Much like some of the "pi" songs, it's catchy!) Secondly, Debbie apparently blogged for me this week. Bioinformatics is not purely logical or intuitive, and I am at an extreme disadvantage on the subject.

I reviewed Dr. Dinu's presentation Week 1 in 501. It's a helpful re-introduction to the subject. In the slides he recommended  this website, which gives up some helpful background information http://www.personalizedmedicinecoalition.org/ .

In addition, another search brought me to this website http://sunsite.berkeley.edu/PCR/foundationalPCR.html#anchor1238784
The references seem credible, albeit dated, and offer an explanation and discussions of Laura's favorite subject, PCR.

Lee





Bioinformatics Data and Data Mining

Slowly but surely information is coming together but the entire domain of bioinformatics is very new to me, even with a clinical background.  It is very hard to follow the concepts being taught when I can't apply them.  It would be helpful for me to understand how these tests are even set up, how a microarray is set up (actually see it happen) and then walk through seeing how the information is presented, the tools being used for data mining, etc.  Having it just in lecture is very difficult to understand without applying some of these methods in a lab setting. 

At least there is repetitive use of some common terms which are slowly being understood but I feel to truly understand bioinformatics in a short term of a master's program is rather difficult without having an undergraduate focus.  This is probably the same though for non-clinical people in understanding clinical informatics so I truly understand the frustration others have in trying to focus on an area where they have no expertise.

Posted by :  Debbie Carter

Tuesday, October 20, 2009

Bioinformatics Lecture

Content:  YEAH, I topic I finally understood the entire lecture!!  This is my area, I have taught biotechnology so I have a lot of experience with bioinformatics data.  I taught these same concepts of DNA, genes, genomes, arrays, etc.  I thought I would share a fun video/song with you guys, it is a silly way to remember PCR, a method of amplifying DNA so that you can have lots of it to analyze, usually by sequencing it.  My favorite part of the song is, "who's your daddy?",  What part is your favorite?

http://bio-rad.cnpg.com/lsca/videos/ScientistsForBetterPCR/


Posted by  Laura Wojtulewicz

Saturday, October 17, 2009

After Midterm

Content:

In the BMI502, I get tons of information in every lecture, so when I did my reviewing I was worried about that a lot of knowledge would be covered in the exam. And also this is the real first exam for me in the US though I have taken a midterm exam at the end of Sep for my selective course. But I have learned something about that course before, so it is much easy for me.


My strategy for the review of BMI502 midterm is to remember everything. Honestly, I failed to remember them all. There are so much knowledge and I am tired and too nervous to fall asleep before the day of the exam. When I see the questions, at first I try to use the exact words in the slides to answer, but because the limit of the time, I just write in my way with the understanding of the questions. I think at the moment of the end of the exam, I forget everything that I put into my brain these days. I feel relieved.

Thinking back, I learn more about the topics mentioned in the class, especially for the ontology. Now there is space for the structures, semantics and relationships. Also, the lectures of the research design and research method make me think more about the research. Undoubtedly, the way to study is totally different here from China. I am in the way of converting myself and enjoy it.

For some reason ASU shut off my computer access to the Internet until Friday, hence I spend a lot of time in the library this week and I find it is really a good place to make me focus on study. And I went to the State Fair on Friday, it is so fantastic that I strongly recommend. Just go and have fun. :)


Posted by  Xiaoxiao

Friday, October 16, 2009

Week Eight: Midterm

Content:
So for this week, Kanav held a review session in which he gave lots of hints about what might be and what might not be on the midterm. Too bad I didn't review some of these hints as thoroughly as I should have. The review was a great help in our understanding of topics and I would compare the review to Windex, which cleans up any dirty window and makes it nice and shiny.

Like I mentioned above, the exam did have some questions that were hinted at. The exam was overall okay except for the amount of writing required. Can't we have a computer-based test next time?

Well after all the review of information, I saw a commercial for Dex. A few question came to mind when they advertised "Dex knows best". Is Dex an attempt at semantics? Or is it just an advertising scheme claiming understanding but really its just another query. What's everyone's thought on this?

Posted by Eric

Week

Content: The review was awesome.  It really helped to cleared up so many concepts. I really liked the format of the review session, nothing planned, open to any questions, straight to the point and long. I hope there will be one for the final.

The test was much better than I anticipated.  My hand did hurt after, but nothing I couldn't take.  I also agree it was nice to not have any mathematics in this test.


Posted by: P Ortiz

Mid-term Blah-g

I think the test was a very fair assessment of the topics we covered.    The review session was helpful-I wish those simple math calculations were incorporated into the initial lecture.  Things made a lot more sense after discussing that.  I have to say I was a thrown off by the experimental design question.  I think the majority of that lecture was spent outlining the major categories of observational and experiemental design and examples of the most widely used methods in research.  Study design is such an important part of research.  It is applicable to everyone--even if you do not want to pursue studies in research directly pertaining to patients.  It is important to study research design because it forces you to delve into the real analysis of the problem and question the overall significance of your initial thoughts.  Also, implementing the right statistics is vital to every study's success.  Choosing the right study design is critical and is usually the most time consuming part of the process, but very crucial to successful research.

Posted by Annie

Review session and Mid-term

Content:
Review session was very helpful and it gave a clear idea of the concepts we need to concentrate mainly for the test. Explanation on machine learning topics was really good.

This is the first test I took in US. I am quite overwhelmed because I thought that the test pattern would be completely different from what I have been exposed to, in my studies back home. But after seeing the paper, I felt better. Space constraint was a bit new to me, I actually have a habit of writing pages and pages in the exam. With all these new experiences, I hope I can put more emphasis on the things I need to improve for the next test.

A Very Happy Diwali to all my friends.....Its a festival of glow and glory........Let it spread a lot of happiness n joy among us.

Note : I faced a problem with uploading html links in the blog last time. If anyone had a same problem while posting some references as links, Please be sure to fix the problem before posting the blog as we should not post information on web without references. It would be great if someone can let me know how to figure out and fix that problem.


Posted by
Harsha Undapalli

Midterm review and exam

Content:


This week, no more new information introduced in class, but I feel this is the week that I obtained the most information from both the review session and the preparation for the exam. So I think the real aim of the exam is not to test you but the knowledge obtained and deep understanding of content of the class in the preparation of the exam. Therefore, thanks for the midterm exam. For the exam, I have no more things to say, because I am the kind of person would not like to think about the exam after it finish. However, I feel if there is more mathematics or probability problems in the exam, I will feel more comfortable.


Posted by
Di Pan

Week of 10/12/09

Content:

This week we had a midterm review and a lecture by Dr. Petitti.  I have liked Dr. Petitti's lectures a lot because they seem very practical for helping students to understand journal articles.  Dr. Petitti went over how experimental designs can sometimes be impractical for studies.  Also that other study designs can have a lot of merit.  An article on parachutes counteracting the dangers of gravity was an example of strong study evidence being present with out randomized trials.  It seems that the people who published the article on parachuting were brave.  Those people really took on researchers that challenged non randomized studies.  They may have faced some tough criticism for stating their point the way that they did.

The midterm review helped me to go over several of the topics that we covered in the class.  I appreciate that Dr. Kahol took time outside of our usual class time to have the review.  I was impressed by how fast some people were doing the k-means similarity measures calculations in the class.  Choosing a similarity measure to use is an interesting concept.  Here is an article that I read some of and found interesting on some similarity measures for data mining with categorical data:
http://www.siam.org/proceedings/datamining/2008/dm08_22_Boriah.pdf

Posted By:

Nate

Content: With Kanav’s creative skills that involved bartering, he traded me blog on a medical meeting that I attended in San Diego for my absence on October 7th. As such, Mithra was kind enough to record Dr. Greenes’ lecture that I was able to listen to. First, I will comment on the lecture, given by one of top men in the field. Even though Decision Analysis has really only been applied to public health situations, it utility in clinical situations is clear to me. Why the decision analysis service only functions at Tufts and has not been disseminated is not clear. I have used the hypotheco-deductive approach for many years and the logic that can be attached to decision analysis is really going to be useful in the future for me. I do not have straightforward patients. I have very complex neurology patients with multiple problems and multiple courses of action that have to be weighed. To have this type of analysis available and to be able to apply it, is useful. Unfortunately, the way to derive the probabilities and the numerical values for outcomes is problematic. I will have more to say on this as this evolves. The mathematical models for TPR, TNR, PV plus, etc. are neat and will also change my formal thinking. Second, the meeting in San Diego, was for the American Association of Neuromuscular and Electrodiagnostic Medicine. This organization involves a unification of two disparate medical fields, Neurology and Physiatry or Physical and Rehabilitative Medicine, much like what we see with biomedical informatics. In this field, we look at disorders of nerve and muscle and motor function from an electrophysiological standpoint and also a neuro-rehabilitative standpoint. With regard to decision analysis, the most important presentation at the meeting used these principles. A disorder called Chronic Inflammatory Demyelinating Polyneuropathy (CIDP) is an immunological disorder with varied cause that leads to chronic muscle weakness due to nerve dysfunction throughout the body. The standard criteria for making the diagnosis involve clinical and electrophysiological indices and have been poorly validated. Gorson et al in a recent article in the New England Journal of Medicine and presented at this meeting used a decision analysis approach to develop the best diagnostic criteria for CIDP. An expert group was first put together to generate a set of candidate variables that distinguish CIDP from other neuropathies (disorders of the nerves). In the second step, detailed case descriptions were obtained internationally on patients with putative CIDP that included clinical examination, electrophysiological studies, spinal fluid and blood evaluation and testing, genetic tests, and nerve biopsy information on 267 patients. A gold standard was designated as a consensus diagnosis on these cases by 11/13 experts. The cases were then divided into a “derivation sample” used to define a classification rule and a “validation sample” to validate the classification rule. A decision tree was developed that involved all possible partitions of the cases to distinguish CIDP from non-CIDP. Regressions between variables was used. A validated set of rules was determined from this. Using the initial set of rules and testing these in the validation sample, the sensitivity for the diagnostic criteria was 83% and the specificity was 97%. This important study will now allow more accurate diagnosis and appropriate treatment of patients with this disorder. Importantly, this disorder and its acute variant, guillain barre syndrome, may result from vaccination. There is current concern about the risk of the swine flu vaccine in raising the risk of these disorders.




Third, another presentation at the meeting focused on the concept that when we injure the motor strip in the brain, that there is weakness on the opposite side of the body. The part of the opposite side of the body is represented in an organized fashion in the brain. In fact on the motor strip of the cortex, the face and body are represented upside down based on ablation and many other studies. However, neurophysiologically, if the neurons in this motor strip are looked at, electrophysiologically using recording and complex mathematical models, precise areas of the body may be less localized than previously thought. The nerve cells associated with movement, however may have very different functions for direction of movement. In a virtual reality paradigm with a monkey with an artificial opposite sided limb, the plasticity of these nerve cells and their directionality of primary electrophysiological function, can be altered by learning and by cortical electrical stimulation. The monkey controls movement from the brain and not with the arm. If presented with virtual set of targets for precision and reaching, the brain and the subsequent movement associated with the artificial limb go through a learning phase with more and more precision with time. This is similar to the sorts of things that Kanav does with his virtual reality paradigms. The utility of the presented work is that provides a means to think about how to rehabilitate stroke victims with hemiplegia or weakness of varying degrees on one side of the body. However, the mathematical conceptualization may be an oversimplification of the total system. Systems models of how the brain works are narrow and must be part of a broader approach.



Fourth, I went to a 2 hour practical session on the electrical evaluation and distinction of weakness and pain in the distribution of one nerve, the ulnar nerve that controls the movement of our fifth finger. Disorders of this nerve are exceedingly common due to compression of the nerve as it runs through the elbow. Our tools for electrophysiological diagnosis have evolved over time, but still have real issues with usability. To get results, there is certainly an art involved in addition to the science and engineering aspects of the machine that we use. For my machine, the software code is written in Japanese and the way that this machine and its software for recording and reporting are not necessarily always in synch with taking care of American patients. There is a broad number of these types of machines and all have various advantage and disadvantages from an HCI standpoint, not to mention control of noise and other issues.



I found the review session by Kanav to be exceptionally useful. I continue to appreciate his approach to biomedical informatic problems. As I have said before, the diversity of background in this class is great. I recently learned that Nate Sutton has significant programming background. His expertise is very helpful. Daily, in these areas, I learn from other individuals, including Prabal, Di Pan, and Eric.



Stuart




Midterm week

Content:
After a long long time, an exam without any problem solving question in it! I didn’t have any midterm paper with no mathematical problem since my freshman year in college! Well, it’s kind of fun though. But you gotta be expert in 2 things to rock in these types of tests: ability of neatly presenting the thoughts and of course time management skills. And I’m pathetic in both! Anyway, I totally agree with Prabal regarding the review session, it was extremely effective and really helped me to clarify the confusions in machine learning. Plus, so-called the “big-0” ontology doesn’t seem that blurry anymore!
Like other not-so-good at time management guys in the class, I spent lots of time on the 2 pointers! After realizing that one-third time’s gone, started with the 10 pointers. As Jing mentioned, I was also confused with “pre-design” and that killed some time! Well finally when I started the 5 pointers, had almost no time to finish them all. I barely touched the HCI question, the only topic Dr. Kahol had covered! I can visualize his expression when he will find that on the paper!
Anyway, there’s nothing to do about it now, but I’m glad that the prep for the mid helped me deeply understanding the topics. I actually had a group study session with few of our genius fellow classmates, and the brainstorming in that session was most effective in the learning process.



Posted by Gazi

Mid-term exam week

Content:

Review session by Dr. Kanav was very important before the mid-term exam. Like Laura said, a different perspective definitely made complex things easier. The explanation of SVM was amazing. It is one of the most confusing topics in machine learning and he explained it with the best possible example: weak and strong machines. The review session was also helpful to build a study plan for the exam.

About the exam, no comments. I still think that I would have done better than that (everyone might feel the same). I spent more time on 2 marks questions and had less (very less) time to answer 10 marks questions. As Kanav said, it's all about time management. I would have started from 10 marks questions ;). My fault. We're always learning, and that was a good lesson.

Thanks Ashu for the wishes. We also celebrate "Deepawali" or "Tihar" in Nepal (which lasts for 5 days), and it is similar to "Diwali" in India. Wish you all a happy Deepawali.

Posted by
Prabal

personal opinion: a mistake in question?

Content:

I wished so much that I had brought a pencil and an eraser to the exam. I used a ball-pen so I could only scratch out my wrong answer . I think my exam paper will definitely cause headache to the grader, and I am so sorry about that...

By the way, one question asked us to describe Pre-design, while I don't remember Dr. Petitti has ever mentioned a 'pre-design'. So I am wondering if it is a mistake in question. However, it is only my personal opinion, and I am not quite sure about that.


Posted by
Jing

Midterm Week

Content:
On Monday we had a review session of all the preview lectures of BMI502. This session actuaaly helped me a lot to understand some basic concepts which were initially very big for my acumen. The way in which Kanav explained the difference between classification and clustering and the use to example actually olved a lot of confusion. Initially I had no idea how does an SVM worked but after his explanation it made a lot of sense. This session indeed helped me a lot in preparing for the midterm exam.
As far as midterm is concerned I was glad that he took the consideration of reducing the number of questions and I still feel that some of my 2 marks answers were bigger than the 10 marks one.i am also glad that he stuck to his promise of not asking any questions which involve mathematics in them and sorry for Laura because she cpuldnt get to use her big, sophasticated fraphical scientic calulator. On this forum I also hereby take the opportunity to wish everyone Happy Diwali. People who dont know, Diwali is one the bigest festival in India, just like christmas. It actually marks the the victory of good over evil. Hope you all have a great and festive weekend .  


Posted by
Ashutosh Singraur

Mid Term and Review

I for one am glad the mid term is over.  Although I thought it was a fair assessment of our knowledge, I feel like I'm just scratching the surface of my knowledge on many of the subjects such as machine learning.  I very much appreciated the time spent for reviewing.  This course has a lot of new information given in every class but we have very little application of the knowledge so it's hard to pull the concepts together.  I agree with Lee and Laura that the review was extremely helpful and the math examples for K-Means truly helped paint a clearer picture. 

I thought we would have plenty of time and I was surprised how long it took to write the exam so very much appreciate the reduction of questions to help us finish.

I'd like to see a periodic review in the next half of the semester to help pull important concepts together.

Posted by :  Debbie Carter

Midterm Week

Here's an ego boost for our esteemed professor. . .Monday's review was one of the best classes of the semester, and provided clarity for the first half of the semester. My only regret is that we didn't have the forsight to request the review a week prior to the exam. Following the review, I felt I had direction for preparing for the exam. BMI 501 seems better suited to guest lecturers, but given the content of 502, I would prefer to have Kanav as the primary presenter.

As for the exam, I felt it wasn't unreasonable, and appreciated the format whereby we could choose questions. I must admit, I will feel better guided when/as I prepare for the final, assuming the format is similar.

Lee

Thursday, October 15, 2009

Test Review and Test!!!

Content:  Wow!! we are half way done with this semester already!!!  This week we had a midterm review led by Kahol.  It was amazing how a different perspective can clear up some fuzzy topics for us.  I felt that even though Kahol added some math into the review it was helpful.  I think that adding some of the math in, really did help me understand the actual algorithim used in K-means, and therefore understand K-means.  Overall I thought the review was helpful. 
The test wasn't nearly as bad as I had envisioned.  I am very thankful for the "wiggle room", as far as having an extra option for the longer questions.  I was also glad to see that Kahol dropped five of the short answer questions, because it was a very, VERY long test, and even though I was the first to finish I thought it was long and I don't think I would have been able to finish it had it had five more short answer questions. 



Posted by Laura Wojtulewicz

Saturday, October 10, 2009

Content:
Dr. Greenes lectures introduce us to the world of clinical decision making. On Monday he talked about types of models.  For example, physical models, flow charts and decision trees.  He then went to explain to us the Bayesian type models.  These models have a backbone of probability. A decision tree can be build with every branch having a different probability which can help decide a clinician what route to take with a patient.  Some advantages to models are that they insight for understanding.  Although Bayesian models seems useful, in practice it is very hard to get probability estimates.


On Wednesday, Dr. Greenes continued his introduction to clinical decision making by first stating that decision science is a combination of different domain such as economics, statistics, andpsychology . Thereafter he have a full explanation about decision trees. Decision trees include decision nodes, chance nodes, probabilities and utilities.  He explain a measure that is used when creating a decision tree: Quality Adjusted Life Years (QALY).  He went on to introduce us the conditional probabilities and it applications




Posted by

Friday, October 9, 2009

Week Seven: Decision Science

Content:

This week Dr. Greenes covered the Basics of Decision Science where decision science is decision making using a statistical approach. Dr. Greenes in his lecture decision making in medicine is complex and that the goals, options, assessments, outcomes, etc are not always clear. The lecture then covered the basics of constructing a decision tree as well as solving one. Solving a decision tree involves the replacement of chance nodes with expected values and then working backwards from the terminal nodes to the starting decision node in a process that can be labeled backward induction. Sensitivity analysis of the decision tree was also considered to identify how the tree changes with question based changes, the robustness of the decision, as well as other analysis involving single or multiple parameters.


The second lecture focused mroe on the use and roles of decision analysis as well as its limitations. Decision Science then tries to be explicit/quantitative; however, it really depends on the input/data. Dr. Greenes sums this up nicely with GIGO (Garbage In, Garbage Out). Decision Trees were broken down into its elements: decision nodes, chance nodes, probabilities and utilities. It was covered that a decision tree node can have more than 2 branches, however it is important that the branches or choices are mutually exclusive. It was also important to recognize that chance nodes can come from two different types of data; objective (data from literature), and subjective (expert opinions). As well for the tree to be finished the terminal nodes have to be labeled with some value (ie Life Years - LY). In many cases such as those presented, the values are in LYs. Quality Adjusted Life Years can also be used for this purpose. The markov process was also covered. The Markov process is used when there is a continuous risk involving temporal sequence as well as multiple simultaneous events. This model assumes that the paient is always in a finite health state. The presentation then went on to cover conditional probability as well as sensitivity and specificity. It continued with the application of Baye's rule using some of the basic probability statements. Last but notable is that Dr. Greenes pointed out that decision science in the concepts presented this week are rarely used in actual case and are more prevelant in the development and determination of guidelines and policy.

Posted by Eric
Content: Clinical Decision Support
Dr Greenes introduced us into the realm of using computers for aiding diagnosis and management of the patient. The first lecture introduced decision trees and modeling in clinical decision support with the secon lecture expanding on these models and methods of analysis.
The earliest of these systems included MYCIN for determining the best antibiotic therapy, HELP used for developing rules to issue medical alerts for inpatients in a hospital.
These systems use Bayesian method of analysis. Statistics is broadly divided into 2 categories based on the way a problem is approached and analyzed. Bayesian system uses new knowledge and research data to make inferences on the problem. The second approach is the frequentist approach (which I still have to understand well) which uses a well defined set of experiments to determine the probabilites and does not allow for new information to change these probabilities.
CDS system uses the Bayesian methods for obvious reasons, as the the entire set of probabilities for all possibilities cannot be pre determined.
What is found most challenging to comprehend in this system was the assignment of these probabilities. Some of these could be determined by previous research and literature search. However, some were based purely on "expert opinion" and these can be subjective.
I definitely think that CDS systems can be useful for development of guidelines, but I doubt if this can ever replace the clinical acumen required in bedside medicine which comes through experience with simple algorithims for diagnosis and even management. Then maybe its too early to say that .



Posted by
Sheetal Shetty

Issues related to CDS

Content:



This week's BMI 502 focused on the clinical decision support system by Dr Green. He definitly is one of the founder in the development of clinical decision support system, and his introduction in class give us a full coverage of this field from different perspectives, including technology, fundamental principles, and challenge and problems related to the application of CDSS in clinical practice. The technologies in CDSS discussed in class mainly includes the rule based classification, artificial intelligence, Bayes rules, and the advantage and disadvantage of CDSS in clinical application were also discussed.

With Dr Green's introduction on the rule based clinical decision support system. The first thought in my mind is the decision tree model introduced in Data mining class. They are two really similar mechanims that dealed by computer to give the final unkonwn decision based on the known information or predictor. But one important difference between the decision tree model and rule based clinical decision support system is that the rule based clinical decision support system use the previously setted rules, based on clinical knowledge, to build the ensemble of rules, not as the decision tree which use the unkown trainning data to train the computers to learn the rule by computers themselves. So the rule based clinical decision support system more rely on the knowledge based rule, just as a model which abstracts the clinical diagnostic standards and transfer these standards to computer understandable concepts. Therefore, based on this point, the clinical decision support system is no more than a semi-artificial intelligence system, and what it can do is to efficiently remind the clinicians to notice something and increase the working efficiency and reduce the working pressure of physicians. Also by this way, the medical error can be reduced and the higher quality of health service would be offered to patients. However, whether the CDSS will develop towards to smarter level, which can self learn from the previous patient cases to conclude new rules to fit into new patient case and provide the decisioin advice for physicians.

Dr Green's also discuss one imporant issue about the application off CDSS in clinical practice, and the response of physician to CDSS. From his introduction, I think the resistance for physicians' acceptance of CDSS system is a real complex issue. One of the most imporant issue is the trustable of the CDSS system, which I think is one concern of physician. Although Dr Green mentioned that the aim of CDSS is not to make decision instead of physicians and the physicians still need to based own their own analysis to make the final decision, the reliability and error of CDSS are still a big challenge for widely acceptance by physicians. Because if the CDSS gives the error or wrong decision advice to physician, it is higly probable to mislead physician to get the wrong decision. Moreover, the other issue related to application of CDSS is who will take the responsibility of error diagnostic result. Will the machine take the responsibility? It is a problem...

By Di Pan


Decision support

Content:

Decision making is one of the most difficult tasks given no evidence or proof. Dr. Greenes used appropriate
examples to explain the decision analysis process. In the first lecture, he explained on what basis shall we make decision. For the analysis process, he mentioned about a framework which included establishing the context, finding the alternatives, predicting the consequences, assigning values to the outcomes and finally choosing the best outcome based on the value. He also explained about the different types of models. For decision making, decision trees (which he also called the first cousin to flow charts) are very helpful to identify different stages in different process.

The next lecture was more specific on decision making techniques using probability. There are mainly two kinds of probabilities: Prior probability, and posterior probability. Prior probability is the representation of knowledge about an unknown quantity, whereas, posterior probability is the conditional probability which represents the knowledge about some data based on some other data. we use probabilities to assign values to different data (outcomes). While discussing the decision tree, we also got to know about decision nodes, chance nodes and terminal nodes. Hidden Markov Models (HMMs) and Bayesian networks were also discussed in the lecture, which are very helpful tools to decide during uncertainty. The importance of baye's theorem is that we can calculate the probability of A given B when we know prior probability of A, and B, and conditional probability of B given A. For eg. when we want to find the probability of disease given symptom, then we can do that by knowing the probability of symptom given disease. Dr. Greenes also mentioned different interesting topics like odds (ratio of two probabilities), sensitivity, specificity, and likelihood ratio. The examples shown and mentioned in the lecture were interesting as well as informative.

Posted by
Prabal

Decision Science and Methods

The overall structure of decision making in medicine was laid out for us this week.  Key components include analysis of the problem, outlining alternatives, inclusion of predictive tools, and ultimately making a decision based on these methods.  Once this foundation is established, complexity can be added to each layer of the problem.  Some main points to consider in medical decision making are the characteristics of the patient (medical history, diagnosis, etc.), probabilities of future events, and the assignment of value to each factor in the analysis.  Construction of an accurate decision tree and a well-designed statistical analysis should accompany any major medical decision.  However, this type of decision analysis seems to be used more to develop guidelines for populations, rather than for use of the individual.  It will be interesting to see if there will be a shift toward more decision analysis for individual medical outcomes in the future. 


Posted by Annie

Clinical Decision Support - Dr. Greenes

Clinical Decision Support (CDS) should be viewed as support to help clinicians reach an informed decision. The amount of support which can be provided to a clinician should be carefully considered and this area of informatics is continually evolving. The EMR offers immediate access to patient data in which the system can evaluate and formulate support alerts but the number of alerts to provide and how to prioritize what should and shouldn’t be a CDS alert needs to be evaluated. Just as Internet searching brings massive amounts of data with a few clicks, the EMR could do the same but it could also bring the same noise and actually become detrimental because of frustration. At some point, the interruption of a physician’s thought at the wrong time can lead to additional errors or omission. So the concept of presenting the CDS information at just the right time requires understanding physician and nursing workflows, how they approach ordering and planning care for their patients and the timing associated with the plan of care.

Medicine is a science and making major decisions should use similar decision analysis concepts. If probabilities are not known for some of the arms of the decision tree because of lack of information, the models may not be of much assistance. Also, measuring a patient’s utility and physician biases of practice and incorporating that into the decision analysis model adds complexity.

In creating CDS, then, the EMR can’t take utility or quality of life into account and so the CDS is meant to add information to a clinician and combine this information with their knowledge of the patient to help make informed decisions. Even though the support might recommend a certain action, it may not be the correct action for every patient. Thus CDS is always an open loop action requiring human interaction to decline or implement the recommendation.

Just like in medical studies, the different models used to determine the decision analysis conclusion have to be reviewed and the data used for probabilities is only as good as the studies they come from. As was shown in the balance sheet for decision analysis, the same advantage can also be the disadvantage – such as needing diverse data. Although models are an important step in trying to analyze and make decisions, I think it must be understood this information and the decisions made can be quite variable when you add in all the elements of the tree for an individual patient.

Posted by : Debbie Carter


Decision Science

Content:

Two sessions of Decision Science were given by Dr. Greene this week. So what is decision science? It is a statistical approach for decision making. In everyday life, there are so many things waiting for us to make decision. Shall I get up at 7:30 or 8:00am? Or is it good to do homework first or attend the yoga class first. In our mind, we have our own process of analysis and evaluation and make a decision finally.


The frameworks for analysis include: establishing the context, laying out the alternatives, predicting the consequences, valuing the outcomes and make a choice. The decision which can max the outcomes is a good one. Based on the frameworks, the decision tree is built up for structure the problem. The way using decision tree to make a choice is “folded back”, in which we start from the right, summarizing each possible combination of choice and chance, and end in the left.

For a basic decision tree, which deals with a single disease, we can find out the decision by computing the expected value. But for the decision which involves risks which you do not know when they would occur, the Markov process will replace the decision tree. The famous Baye’s rule is applied for the probability of the disease present, given the results of the tests. However, the threshold is determined according to the experts’ opinion, so it will influence the sensitivity and specificity hence the decision.

Unfortunately, decision science rarely used for actual clinical decision making while most often for determining health guidelines or policy, but it still can provide decision support.


Posted by  Xiaoxiao
Content:  I agree with Lee, it was helpful to have Dr. Greenes' lectures back to back.  I also liked that this wasn't our first introduction to decision trees and Baye's theorem so the material wasn't completely foreign to us.  It is almost like we are no longer "newbies" in this BMI world.  I liked how the lectures built on eachother, in the first lecture Dr. Greenes gave us the basics of decision science then in the second he went into great detail about one example, which was decision trees.
One type of decision making he touched on was DXplain, which is a tool that can help clinicians determine a diagnosis.  I wanted to share the access information with you, in case you wanted to test it out: http://dxplain.org/dxp2/dxp.asp  the log-in codes are; account webmw     Password ze4527
Your browser most be AJAX enabled to run Dxplain.
But you can see how changing a person's sex or age will change the list of possible diseases or at least their ranking.
Also Dr Greenes brought up the article, The Incidentalome, A Threat to Genomic Medicine.  Here is where you can find that article, it is short and very interesting;  http://www.commed.vcu.edu/IntroPH/Genetics/threatgenomejul06.pdf



Posted by Laura Wojtulewicz

Decision Science

Content:
Dr. Greenes’ lectures on decision science discuss the use of decision tree in decision support system. The decision tree about surgery choice and medication choice can help to take decision on which one is better option. But both of the options have got almost same probability. One way that makes the system more acceptable, but I doubt its effectiveness.

He also discussed about Hidden Markov Model (HMM). Markov model consider each states of the patient as markov state and then consider assumptions of transitions between the states. I think it is a very effective tool in decision support system.

The Bayesian concept comes again when it needs to take decision. The threshold determines the sensitivity and specificity of the tests.  We also became familiar with some new concepts like:
-Odds are ratio of 2 probabilities.
-Posterior odd= prior odd x likelihood ratio


Posted by Gazi

Week 10/3-10/5

Content:
The lectures by Dr. Greenes were as very informative . In the first class I learned about the statistical approach of decision making using decision models. The decision model he majorly talked about was Decision tree.The objective for decision analysis using decision models is based on framework where the initial step to try to diagnose the problem and thinking of all possible alternatives. Following this we try to predict the possible consequences of the alternatives taken and then based on the preference we try to make the best choice. The decision model he majorly talked about was  using Decision tree.The probabalistic approach  that was used by Dr. Greenes in the example where we try to find the best choice for the treatment of abdominal pain helped to understand the implications of decision making in real world.
I believe  second lecture was sort of revision of the topics which were discussed by Dr. Shortliffe in his BMI 501 lecture, where the terms like senstivity, specificity, prevalance etc were defined. It actually helped me a lot when Dr. Greenes discussed about using Bayseian theorem to calculate posterior probability and also  prior probability.I also learned new terms like Quality adjusted life years and Declining exponential Approximation of life expectancy and how they could be used for decision analysis tasks.
The other important thing that I learned about is that decision analysis not only revolves around the best clincal treatment but it also encompasses factors of benifits ,the quality of life ,financial and  economic costs and other factors which could have the influemce on the final outcome.
The seminar given by Dr. Fram was really a good one. He mentioned some cutting points that sometimes even producing the high quality data like images of CTs and MRI are not very helpful in solving the  problem but the visual perception also plays a very important role in it, as he showed in one of the examples where physicians fail to regognise a missing bone in a X ray generated image. He presented some amazing facts about human visual perception that we would never ideally. He also demeonstrated that how playing video games increased the the visual perception of  physicians and how they made less errors performing leproscopic surgeries. I all the week was really fun.   


Posted by
Ashutosh Singraur
Content:

This week Dr. Green introduced us some basics of decision science. Decision science is a statistical model for decision making. I believe it can be used to solve various problems relating to the allocation of scarce resources subject to constraints. Dr. Green’s lecture focused on the utilization of decision science to healthcare. Dr. Green introduced us how to construct and analysis decision trees. The four important elements of decision tree are decision nodes, chance nodes, probabilities for each possible outcome of a chance event and utilities.

Another important aspect of applying decision science in diagnose is to test decisions. In this section, statistical terminologies are introduced to us. In diagnose, prior probability is easier to get than posterior probability. I think that’s why we apply Bayes’ rule to calculate posterior probability, using prior probability.


Posted by  Jing

Week 7-Decision Support with Dr. Greenes

2 consecutive lectures with Dr. Greenes was helpful. I believe he customized the 2nd lecture, based on our apparent knowledge deficiencies. Decision support is a remarkable area. Admittedly, the methods are a bit less inspiring than the actual function, but you can’t have one without the other. I think the way this tied in with the 501 lectures was also useful.
I attended the seminar on Thursday with Dr. Fram. He mentioned that the brain diverts 70% of its use to visual matters. This seems to give credence to the development of models, as they are frequently displayed in decision support. I downloaded Dr. Greenes recommended freeware from http://infolab.umdnj.edu/windm/. It appears to be a useful tool for mapping DA, but also a valuable learning aid if you’re new to the subject.

Overall, I did have difficulty assigning values on the decision tree. Perhaps, if important, this could be further addressed in our “math bootcamp”.

Lee

Week of 10/05/09

Content:

The two lectures this week were on the basics of decision science.  The first lecture taught how decision science includes statistical approaches to decision making.  One of the models that is used to represent those statistic approaches is a decision tree.  Expected values in decision trees help medical practitioners to decide on tests or treatments to administer.  An example of a lottery was used to explain estimated values in the first lecture.  Folding back nodes is a technique that is used to evaluate combined expected values of a decision tree.  Decision trees can be folded back from the right to the left through determinations of the product sums of series of chance/terminal nodes.  Sensitivity analyses can be preformed to evaluate how robust the probabilities are on a decision tree.  A probability may not be robust if a small change in the parameters of the calculation for that probability results in a large change in the probability.  Models in decision science can help medical practitioners to compare the possible outcomes of actions that they may take and also compare the inter-relations between variables effecting those outcomes.  Through those comparisons the models can help medical practitioners to make efficient and accurate predictions of the outcomes of actions that they can take.   

The second lecture included information about using decision science to decide whether to do more tests on or to treat a patient.  Elements of economics, statistics, and psychology are all often used for those decisions.  The consequences of those decisions can involve long term benefits, quality of life, and economic costs.  Some of the values of outcomes can be morbid events averted, life years saved, and quality of future life years.  Conditional probabilities were discussed in the class.  An example of a conditional probability is the probability that a disease is present when a test result is positive.  Sensitivity and specificity that are used in Bayes' theorem are conditional probabilities.

The lectures caused me to look at medical decision making in a variety of ways.  One element of the decision making systems that was interesting to me was the measures of utility that the decision analyses use.  Even if surgery can result on average of a higher net gain of quality adjusted life years, the risks might be more than some people would want to endure.  It could be a particularly hard choice to face if evidence was presented to someone that included death by surgery to be a possibility.  Perhaps someone would choose an option with a lower number of quality adjusted life years but virtually no death risk instead of an option with a higher number of quality adjusted life years with a death risk.  I think that decision analysis models can be especially useful for providing a visual representation of what tests and treatments can be preformed for a patient.  Additionally, I think that those models can be highly useful for sorting through the statistical analyses that have been made for those tests and treatments.

An article that discussed a study that evaluated the performance of the Quick Medical Reference (QMR) decision support tool is available here:
http://www.pubmedcentral.nih.gov.ezproxy1.lib.asu.edu/picrender.fcgi?artid=1230623&blobtype=pdf&tool=pmcentrez
The article describes a study done by two physicians using 154 cases of illnesses.  The article includes information about specific cases that were evaluated in it.  The article also includes information about some of the disease that were found in QMR's records of over 600 diseases and some that were not in QMR's records.

Another study that evaluated the performance of QMR and the Illiad decision support tool is available here:
http://www.pubmedcentral.nih.gov.ezproxy1.lib.asu.edu/picrender.fcgi?artid=1726199&blobtype=pdf&tool=pmcentrez
That study showed both QMR and Illiad results in an emergency department in a tertiary care academic medical centre.

Posted by:

Nate

Monday, October 5, 2009

Ontologies and Machine Learning

Content:
2nd lecture on ontology greatly helped me to change the vague idea about this. Dr. Fridsma discussed different web-based ontology languages. it’s the 1st time I kind of understand the difference between syntax and semantic by knowing their actual meaning. It is very clear that syntax is more important than semantics when to place information on a web page. The idea is to process the information in a meaningful way so that the machine can be capable of taking any decision.


In the 2nd lecture of the machine learning Dr. Shuiwang discussed k means and hierarchical clustering. Clustering has applications to detect events in biomedical informatics.  As I discussed before it tries to cluster the data in terms of shortest Euclidian distance between them. Hopefully we will learn about some applications of clustering in the next class which might lucid the idea more.


Posted by Gazi

Ontology Reading Resource

Content:
I noticed that some students are confused about the concept of ontology, as they try to understand it from a philosophical perspective. The following link may be helpful for them.
http://en.wikipedia.org/wiki/Ontology_%28information_science%29

Posted by Shuiwang Ji

Machine Learning Followup

It is my great pleasure to give an introduction to the machine learning algorithms. I went through the blog posts and it seems that some of you are confused about support vector machines (SVM), which is difficult to understand without going into math derivations. A very good tutorial slides on SVM can be find at

http://www.autonlab.org/tutorials/svm15.pdf

and there are a lot of other machine learning tutorials at

http://www.autonlab.org/tutorials/index.html

The most widely used SVM software is LIBSVM:

http://www.csie.ntu.edu.tw/~cjlin/libsvm/

and there is a Java demo on the web site so that you can play with. Moreover, there is a short introduction to SVM at:

http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf

For a complete understanding of SVM, you can refer to this long tutorial paper, which is one of the classic paper in machine learning:

http://research.microsoft.com/en-us/um/people/cburges/papers/SVMTutorial.pdf

It turns out that the SVM decision boundary only depends on the data points close to the boundary. So it seems that the boundary is supported by only these vectors around the boundary, and thus it is called SVM. This is clear if you go through the slides mentioned above (http://www.autonlab.org/tutorials/svm15.pdf). I talked to one of the inventors of SVM, Bernhard Schölkopf, and he said that initially, they plan to call it support vector networks. Since "network" reminds many people about neural networks, and thus it is instead called "machines".

Posted by

Shuiwang Ji

Saturday, October 3, 2009

Content:

Dr. Fridsma gave us a lecture of ontology, information models and the semantic web. The web we used now is the syntactic web which is more about structures. It is a digital library, a database, a platform for multimedia and a naming scheme. However, for the syntactic web, computers do the presentation (easy) and people do the linking and interpreting (hard). The goal of semantic web is to be machine-readable and machine analysis, and make machines smarter and more useful. Though html and xml have given the annotation tags to the machine-read information, there are many things impossible for syntactic web, which need semantics to deal with.


Because even the same thing will have different concepts for different people, so ontology is used to specify the meanings of annotations. Considering the limitations of conventional enumeration, languages for ‘explicit specification’ are invented to represent the two components of ontology: names for important concepts in the domain and background knowledge on the domain. Dr. Fridsma put much emphasis on the RDF- resource description frameworks. RDF is a object oriented which is based on objects, types and relations. RDF data model provides the mechanism to express the semantics and RDF schema helps to build the relations. Also, RDF is too weak to describe resources in sufficient details, that is the reason for development of OWL language. The “layer cake” of OWL is much similar to the TCP/IP, but the relationship between layers is not clear.



In the class on Wednesday, continuing the machine learning topic, we learned the cluster-unsupervised learning. It is easy for me to understand clustering and its types because of less logic and the online demo and examples helped a lot. And I am sure that the student example will be useful for explanation of k-nearest neighbour and k-means. I know that the processes of machine learning are more complicated that what we are talking in the class. And the integration of semantic web and machine learning is the equation: Semantics + Web + AI= more useful web.


Posted by  Xiaoxiao