Integrating Open Literature and Databases
The establishment of open access literature makes it possible for knowledge to be extracted from scholarly articles and included in other resources. BioLit aims to extract database identifiers and rich meta-data from open access articles in the life sciences and integrate that information with existing biological databases. We have begun prototyping this effort using a clone of the RCSB Protein Data Bank, a database of macromolecular structures.
Cyberinfrastructure is integral to all aspects of conducting experimental research and distributing those results. However, it has yet to make a similar impact on the way we communicate that information. Peer-reviewed publications have long been the currency of scientific research as they are the fundamental unit through which scientists communicate with and evaluate each other. However, in striking contrast to the data, publications have yet to benefit from the opportunities offered by cyberinfrastructure. While the means of distributing publications has vastly improved, publishers have done little else to capitalize on the electronic medium. In particular, semantic information describing the content of these publications is sorely lacking, as is the integration of this information with data in public repositories. This is confounding considering that many basic tools for marking-up and integrating publication content in this manner already exist, such as a centralized literature database, relevant ontologies, and machine-readable document standards.
We believe that the research community is ripe for a revolution in scientific communication and that the current generation of scientists will be the one to push it forward. These scientists, generally graduate students and new post-docs and have grown up with cyberinfrastructure as a part of their daily lives, not just a specialized aspect of their profession. They have a natural ability to do science in an electronic environment without the need for printed publications or static documents and, in fact, can feel quite limited by the traditional format of a publication. Perhaps most importantly, they appreciate that the sheer amount of data and the number of publications is prohibitive to the traditional methods of keeping current with the literature.