Software Tools to Simplify Gene Function Prediction
The development of GeneMANIA has alleviated the need for significant bioinformatics support when analyzing large datasets to study gene function This both facilitates the use of the substantial genomics resources already built using massive government investment, and reduces the future costs associated with exploiting these resources.
GeneMANIA has promoted advances in Canadian biomedical science. We enhanced Canada’s core bioinformatics capabilities as demonstrated by the 6 published articles that arose directly out of the GeneMANIA project. In addition, GeneMANIA contributed to high-quality biomolecular research that resulted in 12 additional publications in outstanding journals.
We have trained a number of individuals as software developers in the nationally underrepresented field of Computational Biology and Bioinformatics, as well as providing training in network analysis using GeneMANIA to bench biologists.
The national and international communities of biologists interested in gene function are now able to access the easy to use GeneMANIA website. They can also make use of the intuitive stand alone Cytoscape plug-in for analyzing their own molecular profile data. Similarly, computational biologists and bioinformatics software developers are able to freely access the functional association network data we collect and use the open source software we have developed for their own computational analysis methods and software development.
Gene function prediction has been proven beneficial in reducing the cost of identifying and characterizing genes of clinical interest such as biomarkers for cancer or targets for rational drug design. We have found that GeneMANIA can identify biomarkers for pancreatic cancer with high accuracy. This demonstrates the potential for improved medical treatments, discovered by users of the GeneMANIA system, that will extend the comfortable life of Canadians. These treatments will yield economic benefits through the development of new Canadian biotechnology companies and cost reductions in government funded health care.
In the late 90’s, Google brought the power of the web search to the average Internet user by, for the first time, providing a simple, intuitive interface backed by powerful analysis software and a massive data warehouse. By extending existing software prototypes to build the “Google of Biology”, a software system that lets users use all existing genomics and proteomics data to answer specific biological questions about gene function, we propose to bring the power of the genomics revolution to the average biologist.
Until now, this revolution has been both a blessing and curse. While it is now possible to collect data about every gene in the genome, the public repositories that store these data have grown exponentially in size and complexity, making the data difficult for biologists to use in their research without advanced statistical and computational expertise. This is a missed opportunity, because computational analyses have shown us that important insights into how cells work and how biological systems fail in disease can be found buried in these data. Since few biologists have this necessary analysis expertise, effective tools for navigating this resource must be available to prevent researchers repeating the efforts of others, at significant additional cost to the already massive international investment in genomics and proteomics data.
We propose to build computer software to help overcome two barriers to progress in genomic and proteomic research: i) the average biologist is unable to participate in and rarely benefits from advanced genomics technology, and ii) collecting and analyzing genome-scale datasets is expensive and time-consuming. Surprisingly, both problems have similar solutions. For the average biologist, we will build a user-friendly website through which they can make predictions about gene function using all available genomics and proteomics data. For the functional genomics researcher, we will build a decision support system that enables evaluation of various data collection strategies by comparing the data quality and novel information content of preliminary data with published genomics, proteomics and gene function data. Both these tools rely on the same underlying software architecture, which we will make freely available over the web. To ensure that our software tools have a maximal enabling impact on a diverse biological user community, we will develop and disseminate extensive user training resources. Our website and decision support system will be supported by a data warehouse of genomics and proteomics data automatically collected and maintained by a series of web-crawling software agents that we will develop. Both tools will also benefit from an advanced network visualization interface that will help users browse and understand the results of their queries.
To build this enabling technology, we are requesting funding for programmers to develop these tools, a training specialist to design educational resources and servers to host our web service. Through the “Google of Biology”, we are increasing the value of genomics and proteomics data by making it accessible to the entire biological research community. Though it will take two years to write and test our software, once built, our automatically updating website and decision support system will require minimal upkeep, so the benefits of our work will be felt for many years to come.