Exploiting the missing link between genes and phenotypes is one of the key questions of genetics: indeed the information available in both fields of genomics and phenomics is rapidly increasing, thanks to the reduced DNA sequencing costs and with the development of automated techniques such as the Phenotype Microarray; nevertheless, clear strategies to interpretate and link genomics and phenomics data are still missing.
To overcome those problems we used many of the available python scientific libraries to developed DuctApe, a robust and efficient set of analysis tools for genomic (pangenome creation, metabolic map reconstruction) and on phenomic data (PM data analysis, differential activity calling, metabolic map reconstruction). Once that the metabolic map is reconstructed (using the KEGG database), genes that can account for the observed phenotypic differences are highlighted for further experimental validations. DuctApe was built using biopython for sequence data handling, numpy, scipy, matplotlib and scikits.learn for PM data analysis (fitting, clustering) and plotting, and using networkx for the exploitation of nodes in the metabolic network connected to the observed genotypic/phenotypic variability.
DuctApe comes in many forms, due to its modular structure: as a command line tool for expert bioinformaticians, as a GUI for further analysis, and possibly as a web server for wide adoption in the scientific community.