The Third CADRE Pillar: Data

1/6/20
CADRE brings together communities of researchers who want to work with big bibliometric data and a community of builders who want to provide better access to the data.

The word Community and Access, however, are only two of the five pillars that support CADRE’s ultimate mission. The third pillar joining the series is Data -- the content at the heart of what we are working to provide.

So far, centralized CADRE datasets we have taken on include the Web of Science, Microsoft Academic Graph, and U.S. Patent and Trademark Office data. The CADRE team used the initial datasets to create an infrastructure that will solve technological challenges in working with big data beyond the scope of a couple of datasets or institutions.

The data infrastructure CADRE has put in place will go on to solve some major issues that have risen with the introduction of big data, which is addressed in IBM Big Data & Analytics Hub’s “Four V’s of Big Data.”

Four V’s of Big Data

IBM uses the Four V’s to break down different aspects of big data that must be overcome to allow people to work effectively with it. The Four V’s include: Volume (Scale of Data), Variety (Different Forms of Data), Velocity (Analysis of Streaming Data), and Veracity (Uncertainty of Data).

CADRE’s solution incorporates each of the V’s. Our shared cloud storage allows institutions to purchase and work with a standardized version of data that is cleaned, parsed, and updated in the same way, promoting high-quality data that can facilitate reproducible research--and answering the Veracity challenge. CADRE’s shared data infrastructure not only ensures data won’t be duplicated, but also allows users to query millions of scientific publications, export results quickly, and analyze and visualize data effortlessly--removing the Velocity and Volume barriers that users would otherwise encounter.

Researchers across disciplines and experience levels can use CADRE to work with a variety of big datasets in a way that helps them accomplish their specific goals and can use the platform to build their own data-analysis tools or use tools created by others. CADRE’s flexibility contributes to the Variety of data that users can experience on the platform.

The techniques CADRE has incorporated for harnessing big data, wrapped in an affordable and open platform, make it easier for all researchers and institutions to leverage big data effectively and advance research that incorporates it.

Want to learn more about CADRE? Follow us on Twitter to stay in the loop on future CADRE Pillars.