Network science researchers interested in analyzing large bibliographic data.
Network science approaches and the data to power them, have enabled the study of science as a complex system (Zeng et al, 2017), resulting in the emergence of a new field, the “science of science” (Fortunato et al. 2018). Studies in this nascent field rely on open and proprietary big bibliometric data sets such as Web of Science (WoS), and Microsoft Academic Graph (MAG). Yet, cost and expertise needed to host and service large open and proprietary data, are significant access barriers to many. Moreover, data use agreements often prohibit data/algorithm sharing hampering collaboration and reproducibility.
CADRE is a new, cloud-based, science gateway that overcomes these barriers. CADRE hosts large proprietary and open datasets in native and graph formats and offers a suit of analytic tools (Mabry et al., 2020). The tasks of updating and maintaining data and version control are centralized and user-generated digital objects, including code and datasets, are written in personal Jupyter notebooks, and encapsulated using cloud native containerized technologies. This makes research reproducible and research assets easy to share, cite and reuse, while increasing efficiency and reducing cost. Queries can be executed through a graphical interface for users without programming skills. By pooling resources to build a single shared instance, member institutions obtain a superior solution a fraction of the cost they would pay to develop their own. CADRE’s open datasets and basic tools are free for public use
This tutorial is intended for scientometric and Science of Science (SoS) researchers interested in analyzing large bibliographic data using machine learning (ML) and network science. We developed CADRE, a new cloud-based platform-as-a-service to serve as a SoS researcher’s workbench by facilitating reproducibility, research asset sharing, and big bibliographic data analytics (Mabry et al., 2020). Following a brief overview of the CADRE project, our featured speaker, Yong Yeol Ahn will present his “science genome” project as an example of SoS research featuring ML/NS conducted on CADRE. Next, attendees will be guided through a series of online exercises to familiarize them with CADRE’s ML/NS capabilities. Participants will log in to CADRE’s free public tier via an online portal to access the Microsoft Academic Graph (MAG) database hosted on CADRE.
A hands-on tutorial will walk attendees through CADRE‘s registration process and a series of exercises: 1) create a working network embedding pipeline similar to that developed and presented by Dr. Ahn and save it to their CADRE workspace; 2) conduct their own CADRE queries and reproducing complex analyses on these queries; and 3) share complex computational workflows on CADRE. Exercises will be conducted on CADRE’s free tier using the open Microsoft Academic Graph dataset through a programming interface or an intuitive, graphical user interface for those without programming skills. Real-time technical support for CADRE will be available during the tutorial and throughout the conference.
Yong-Yeol Ahn (he/him/his, Indiana University)
The Science Genome Project on CADRE (recorded)
Filipi N. Silva (he/him/his, Indiana University)
Mabry, Patricia L (she/her/hers, Health Partners Institute)
Xiaoran Yan (he/him/his, AI research Institute, Zhejiang Lab)
Valentin Pentchev (he/him/his, Indiana University)
Matthew Hutchinson (he/him/his, Indiana University)
Maksymilian Szostalo (he/him/his, Indiana University)