The Science of Science --- many numbers, what do they mean?
I have now taken a more careful look that "The Science of Science" book, and revised my comments accordingly.
One might hope that a book with the title "The Science of Science" would address the concerns that the literature on the reproducibility/replication crisis highlights. It is then disappointing that a recent book with that title, by Dashun Wang and Albert-László Barabási, give the matter scant attention. I found nothing on the important role that the sharing of data and code has played in advances in genomics, climate science, earthquake science, etc. Areas where the gains are less obvious (my comment) need to follow suit. The authors do comment on benefits that flow from having larger groups of scientists working together. Where the effect is to bring together diverse skills (including analysis skills) and data sources, I'd judge that the critique that really matters mostly happens before papers are submitted for publication.
The blurb on the cover claims
This is the first comprehensive overview of the 'science of science,' an emerging interdisciplinary field that relies on big data to unveil the reproducible patterns that govern individual scientific careers and the workings of science. It explores the roots of scientific impact, the role of productivity and creativity, when and what kind of collaborations are effective, the impact of failure and success in a scientific career, and what metrics can tell us about the fundamental workings of science. The book relies on data to draw actionable insights, which can be applied by individuals to further their career or decision makers to enhance the role of science in society. With anecdotes and detailed, easy-to-follow explanations of the research, this book is accessible to all scientists and graduate students, policymakers, and administrators with an interest in the wider scientific enterprise.
The attention is on papers published citations, patents, together with commentary on very high impact work from scientists whose achievements were exceptional. There is scant attention to what these counts might mean. Is more really better, or would be public benefit be better served, at least in laboratory experimental work, by fewer and more carefully considered papers? Some points of consequence do emerge. All publicity benefits citation counts, even where papers are identified as seriously flawed. Those who narrowly miss out on US NIH funding and remain in the field publish papers, in the long run, that make greater "impact". Papers that are initially rejected by referees end up making greater impact. Little or nothing is known of what insights and ideas may have been lost because of early vareer failures. The US spends around 55% of its biomedical funding on genetic approaches, even though genome based variation can explain only 15-30% of disease causation. Environmental effects and diet get even lesss. There is a suggestion that incentives and institutions are needed that encourage researchers to focus their work more towards societal benefit. After Covid-19, would they still say this?
For experimental work, the authors do note the file drawer problem. "Instead of being discarded, negative results should be shared, compiled, and analyzed." The only attention to fraud is a comment, incidental to discussion of the media's role, on the Wakefield MMR scandal.
There is a warning of the potential of algorithms (AI) to amplify and perpetuate human biases, and yes, that comment does apply to the tools and metrics that the science community (and funding agencies?) build.