In separate “big data” presentations at the Digital Preservation 2012meeting, Myron Guttmann of the National Science Foundation andLeslie Johnston of the Library of Congress described scenarios that seemed futuristic and fantastic but were in fact present-day realities. Both presenters spoke about researchers using powerful new processing tools to distill information from massive pools of data.
Imagine, say, a researcher gathering social-science and health information about a troubled area of Africa. Using her home computer she connects online to a high-performance processing station, which, in turn, accesses repositories of medical, atmospheric, political and demographic data. She analyzes the data, using special visualization tools, arrives at some fact-based conclusions and generates several possible courses of action.
Professional researchers, particularly in the scientific community, can do that now. And it won’t be long before advanced research capabilities such as filtering and analyzing data on a large scale — or data-driven research — will be available outside of the professional-research community.
Guttman is fervent about the possibilities of data-driven research and about how it is revolutionizing scientific exploration and engineering innovations. He said that data is now gathered at an ever-increasing rate from a range of sources, and virtualization and advanced server architectures will enable complex data mining, machine learning and automatic extraction of new knowledge. Guttman said, “We want to increasingly make use of ‘data that drives our questions’ as well as ‘questions that drive our data.’”