Thursday, November 10, 2011

Book Review: Reinventing Discovery

I believe that the process of science—how discoveries are made—will change more in the next twenty years than it has in the past 300 years. --Michael Nielsen, Reinventing Discovery
I appreciate an author who's not afraid to make bold claims, and Michael Nielsen certainly fits that description.  He goes on to say even that
To historians looking back a hundred years from now, there will be two eras of science: pre-network science, and networked science.  We are living in the time of transition to the second era of science.
I grew up feeling that the golden age of science was the first half of the twentieth century, which gave us marvelous advances like relativity and quantum mechanics.  According to Nielsen, though, I'm witnessing the most transformative period of scientific development since the invention of the scholarly journal in the 1700's.  Although I'm a firm believer in the power of the internet to accelerate scientific advances, I was skeptical.
I downloaded Michael Nielsen's Reinventing Discovery on Tuesday and read it in less than 48 hours (between shopping trips while on vacation in Dubai).  Although I was familiar with much of the material in the book, it was an engaging and highly thought-provoking read that I think both scientists and laypersons will enjoy.  I'll focus here on the ideas that struck me as especially insightful.
Nielsen gives several examples to illustrate the beginnings of his foretold revolution; some are scientific (the Polymath projectGalaxyZoo, FoldIt) while others simply illustrate the power of our new networked world (Kasparov versus the World, Innocentive).  These examples are used extensively and lend a convincing empricism to a book that claims to predict the future.  They also allow Nielsen to dive into actual science, adding to the fun.
Many scientific advances are the result of combinations of knowledge from different fields, communities, or traditions that are brought together by fortuitous encounters among different people.  In a well-networked world, these encounters can be made to happen by giving individuals enough accessible information and communication. Nielsen refers to this as "designed serendipity".
The reason designed serendipity is important is because in creative work, most of us...spend much of our time blocked by problems that would be routine, if only we could find the right expert to help us. As recently as 20 years ago, finding that right expert was likely to be difficult. But, as examples such as InnoCentive and Kasparov versus the World show, we can now design systems that make it routine.
Offline, it can take months to track down a new collaborator with expertise that complements your own in just the right way. But that changes when you can ask a question in an online forum and get a response ten minutes later from one of the world’s leading experts on the topic you asked about.
The trouble is, of course, that the forum in question doesn't exist -- and if it did, who would have time to read all the messages?  Nielsen delves into this question, discussing how to design an "architecture of attention" that allows individuals to focus on the bits most relevant to them, so that large groups of people can work on a single problem in a way that allows each of them to exercise his particular expertise.  Taking the idea of designed serendipity to its logical yet astounding conclusion, Nielsen presents a science fiction (pun intended) portrayal of a future network that connects all researchers across disciplines to the collaborations they are most aptly suited for.  I found this imaginary future world both fascinating and believable.
The second part of the book explores the powers that are being unleashed as torrents of data are made accessible and analyzable.  Here Nielsen draws examples from Medline, Google Flu Trends, and GalaxyZoo.  While the importance of "data science" is already widely recognized, Nielsen expresses it nicely:
Confronted by such a wealth of data, in many ways we are not so much knowledge-limited as we are question-limited...the questions you can answer are actually an emergent property of complex systems of knowledge: the number of questions you can answer grows much faster than your knowledge.
In my opinion, he gets a bit carried away, suggesting that huge, complex models generated by analyzing mountains of data "might...contain more truth than our conventional theories" and arguing that "in the history of science the distinction between models and explanations is blurred to the point of non-existence", using Planck's study of thermal radiation as an example.  Planck's "model" was trying to explain a tiny amount of data and came up with terse mathematical equations to do so.  The suggestion that such a model is similar to linguistic models based on fitting terabytes (or more!) of data, and that the latter hold some kind of "truth" surprised me -- I suspect rather that models informed by so much data are accurate because they never need to do more than interpolate between nearby known values.  Nevertheless, it was interesting to see Nielsen's different and audacious perspective well-defended.
A question of more practical importance is how to get all those terabytes of data out in the open, and Nielsen brings an interesting point of view to this discussion as well, comparing the current situation to that of the pre-journal scientific era, when figures like Galileo and Newton communicated their discoveries by anagrams, in order to ensure the discoverer could claim credit later but also that his competitors couldn't read the discovery until then.  The solution then was imposed top-down: wealthy patrons demanded that the discoveries they funded be published openly, which meant that one had to publish in order to get and maintain a job.
The logical conclusion is that policies (from governments and granting agencies) should now be used to urge researchers to release their data and code publicly.  Employment decisions should give preference to researchers who follow this approach.  At present, the current of incentives rather discourages such "open science", but like Nielsen I am hopeful that the tide will soon turn.  I was left pondering what I could do to help; Nielsen provides numerous suggestions.  I'll conclude with some of the most relevant for computational scientists like myself.
...a lot of scientific knowledge is far better expressed as code than in the form of a scientific paper. But today, that knowledge often either remains hidden, or else is shoehorned into papers, because there’s no incentive to do otherwise. But if we got a citation-measurement-reward cycle going for code, then writing and sharing code would start to help rather than hurt scientists’ careers. This would have many positive consequences, but it would have one particularly crucial consequence: it would give scientists a strong motivation to create new tools for doing science.
...
Work in cahoots with your scientist programmer friends to establish shared norms for citation, and for sharing of code. And then work together to gradually ratchet up the pressure on other scientists to follow those norms. Don’t just promote your own work, but also insist more broadly on the value of code as a scientific contribution in its own right, every bit as valuable as more traditional forms.

No comments:

Post a Comment