Network science researchers interested in analyzing large bibliographic data.
Network science approaches and the data to power them, have enabled the study of science as a complex system (Zeng et al, 2017), resulting in the emergence of a new field, the “science of science” (Fortunato et al. 2018). Studies in this nascent field rely on open and proprietary big bibliometric data sets such as Web of Science (WoS), and Microsoft Academic Graph (MAG). Yet, cost and expertise needed to host and service large open and proprietary data, are significant access barriers to many. Moreover, data use agreements often prohibit data/algorithm sharing hampering collaboration and reproducibility.
CADRE is a new, cloud-based, science gateway that overcomes these barriers. CADRE hosts large proprietary and open datasets in native and graph formats and offers a suit of analytic tools (Mabry et al., 2020). The tasks of updating and maintaining data and version control are centralized and user-generated digital objects, including code and datasets, are written in personal Jupyter notebooks, and encapsulated using cloud native containerized technologies. This makes research reproducible and research assets easy to share, cite and reuse, while increasing efficiency and reducing cost. Queries can be executed through a graphical interface for users without programming skills. By pooling resources to build a single shared instance, member institutions obtain a superior solution a fraction of the cost they would pay to develop their own. CADRE’s open datasets and basic tools are free for public use
Drawing on examples from her own research and her experience with CADRE, Staša Milojević will kick off the workshop with reflections on how network science has influenced the emergence of the science of science and discuss the role that CADRE can play in realizing its future potential. Next, a series of 4-5 talks from CADRE users will showcase research projects conducted on CADRE. Presenters will provide a rationale for why CADRE was used and the benefits it conferred to the project. They will also comment on limitations or challenges encountered and recommendations for CADRE enhancements. The portfolio of presentations will intentionally reflect a range of topics while also illustrating as many of CADRE’s capabilities as possible.
Yong Yeol Ahn will present his “science genome” project as an example of research conducted on CADRE. A hands-on tutorial will walk attendees through CADRE‘s registration process and a series of exercises: 1) create a working network embedding pipeline similar to that developed and presented by Dr. Ahn and save it to their CADRE workspace; 2) conduct their own CADRE queries and reproducing complex analyses on these queries; and 3) share complex computational workflows on CADRE. Exercises will be conducted on CADRE’s free tier using the open Microsoft Academic Graph dataset through a programming interface or an intuitive, graphical user interface for those without programming skills. Real-time technical support for CADRE will be available during the tutorial and throughout the conference.
Staša Milojević (she/her/hers, Indiana University)
Science of Science
Yong-Yeol Ahn (he/him/his, Indiana University)
The Science Genome Project on CADRE
Dr. Chao Min (he/him/his, Nanjing University) and Dr. Yi Bu (he/him/his, Peking University)
Towards multi-generation citations and references
Abstract: Scientometrics studies have extended from direct citations to high-order citations, as simple citation count is found to tell only part of the story regarding scientific impact. This extension is deemed to be beneficial in scenarios like research evaluation, science history modelling, and information retrieval. In this presentation, we will discuss our empirical studies on multi-generation citations and references, called forward and backward citations, respectively. We adopt a series of metrics for measuring the unfolding of backward/forward citations of a focal paper to have a better understanding on knowledge foundation and diffusion.
Dr. Yulia V. Sevryugina (she/her/hers, University of Michigan), Andrew Dicks (he/him/his, University of Michigan)
Publication practices in biomedical sciences during the COVID-19 pandemic
Abstract: The coronavirus pandemic introduced many changes to our society, and deeply affected the established in biomedical sciences publication practices. We will present a comprehensive study of the changes in scholarly publication landscape for biomedical sciences during the COVID-19 pandemic, with special emphasis on preprints posted on bioRxiv and medRxiv servers. Specifically, we observe the emergence of a new category of preprint authors, who extensively used preprint platforms during the pandemic for sharing their immediate findings. The majority of these findings were works-in-progress that were published at unprecedented speed. Only one third of COVID-19 preprints posted during the first nine months of the pandemic appeared as peer-reviewed journal articles. These journal articles display high Altmetric Attention Scores further emphasizing a significance of COVID-19 research during 2020. Our study may be of interest to editors, publishers, open science enthusiasts, and anyone interested in changes that the 2020 crisis transpired to publication practices and a culture of preprints in life sciences.
Dr. Russell J. Funk (he/him/his, University of Minnesota)
Collaborators: Michael Park, Erin Leahey
Large scale analysis of the dynamics of disruption in science and technology
Abstract: Theories of scientific and technological change largely view discovery and invention as recombinative processes, wherein prior accumulated knowledge serves as the basis for future progress. Recent decades have witnessed exponential growth in the volume of new scientific and technological knowledge, thereby creating conditions that should in principle be ripe for fostering major advances. Yet contrary to this view, a growing stream of literature reports evidence of slowing rates of discovery and invention. To reconcile this tension, we examine changes in the nature of contributions to science and technology using large-scale data on 25 million research papers and 4 million patents published over more than 6 decades. Using novel bibliometric and text-based measures, we observe a consistent pattern such that over time, papers and patents are increasingly less likely to push science and technology in new directions by disrupting existing streams of knowledge. Network simulations suggest that this decline in disruption is unlikely to be driven by changes in citation practices and subsample analyses of high quality papers (i.e., Nobel-prize-winning papers and papers that are published in Nature, PNAS and Science) show that the decline is likely not due to the changes in the quality of published science over time. Further, we find an inverse relationship between the growth of knowledge and its utilization; as the volume of scientific and technological knowledge increases (as proxied by papers and patents), scientists and technologists focus their attention on increasingly narrow spectra of prior work. These changes in utilization may help account for the simultaneous growth of scientific and technological knowledge alongside slowing progress in discovery and invention---such as the decline in disruption we document.
2:20 pm – 2:25 pm Any final questions and open discussion.
BREAK – 5 min 2:25 pm-2:30 pm Eastern U.S.
Mabry, Patricia L (she/her/hers, Health Partners Institute)
Xiaoran Yan (he/him/his, AI research Institute, Zhejiang Lab)
Valentin Pentchev (he/him/his, Indiana University)
Filipi N. Silva (he/him/his, Indiana University)
Matthew Hutchinson (he/him/his, Indiana University)
Maksymilian Szostalo (he/him/his, Indiana University)