Talk Creating and Data Mining Semi-Structured Personal Genomics Data with Python

Presented by Michael Cariaso in Scientific Applications 2009 on 2009/07/26 from 09:45 to 10:30
Abstract
Genome Wide Association Studies have recently uncovered thousands of variations in the human genome which predict risk of disease, response to medications, ancestry and aspects of physical and cognitive development. For $400 anyone can learn their own genotypes via 23andMe.com or several similar companies. This creates a diverse and motivated pool of amateur data analysts interested in identifying replications and conflicts among the rapidly growing body of scientific literature. SNPedia.com is a Creative Commons licensed SemanticMediaWiki of semi-structured data summarizing peer reviewed papers. The content is generated primarily by numerous Python programs which data mine via PubMed, Yahoo! Pipes, and dbSNP. Promethease is written in Python and available as a free closed source desktop app to read the formats of the various consumer genotyping companies, parse SNPedia and prepare a user friendly html report. This approach is applicable to other domains and is well suited to labeling large sparse discrete datasets.
tagged by
no related entity