Wednesday, November 30, 2011

Support the stackexchange for computational science

There is a new site on stackexchange for discussing computational science!  Please come participate -- there are already some great discussions.  The site is now in public beta, so you don't need an invitation.

In case you don't know about stackexchange, you may also be interested in the folowing:

Tuesday, November 15, 2011

Do you know what your colleagues are reading?

Up until Google's recent (catastrophic) changes to Reader, I used it to share and discuss interesting journal articles.  It was a near-perfect platform for this, and I'm hopeful that we'll have a replacement soon.

The great utility of it was that my colleagues are very good at discerning which articles may be of interest to others in our circle.  This is no surprise, since we have similar research interests.  The fraction of articles that are actually interesting to me, for most journal RSS feeds that I check, is 1%-5%, which means I spend a lot of time scanning article titles.  In contrast, the fraction of papers shared by my colleagues that I find interesting is probably closer to 50%!

I've found a nice way to display a public RSS feed of papers that I read*, via Mendeley (it's shown here on the right).  Now, ideally, Mendeley would allow me to publish a feed that includes all papers in my Mendeley library as I add them.  They don't, but they do something almost as good: they provide a public RSS feed showing all papers for any public Mendeley group, as they are added.  So here's what I did:

1. Create a public Mendeley group for my own library.

2. Whenever I import a new reference to Mendeley, I also add it to the group (note that you can do this via the dropdown menu in the popup that appears whenever you use the 'Import to Mendeley' bookmarklet.

3. I got the address for the feed from Mendeley (log in, click the 'Groups' tab, click 'Papers' on the left, and look for the RSS feed icon on the top right) and added a widget here on my blog, as well as on my professional webpage.

That's it.  If you want to subscribe to this RSS feed, here it is:



[*] Note that 'read' here means 'read at least the abstract.'

Thursday, November 10, 2011

Book Review: Reinventing Discovery

I believe that the process of science—how discoveries are made—will change more in the next twenty years than it has in the past 300 years. --Michael Nielsen, Reinventing Discovery
I appreciate an author who's not afraid to make bold claims, and Michael Nielsen certainly fits that description.  He goes on to say even that
To historians looking back a hundred years from now, there will be two eras of science: pre-network science, and networked science.  We are living in the time of transition to the second era of science.
I grew up feeling that the golden age of science was the first half of the twentieth century, which gave us marvelous advances like relativity and quantum mechanics.  According to Nielsen, though, I'm witnessing the most transformative period of scientific development since the invention of the scholarly journal in the 1700's.  Although I'm a firm believer in the power of the internet to accelerate scientific advances, I was skeptical.
I downloaded Michael Nielsen's Reinventing Discovery on Tuesday and read it in less than 48 hours (between shopping trips while on vacation in Dubai).  Although I was familiar with much of the material in the book, it was an engaging and highly thought-provoking read that I think both scientists and laypersons will enjoy.  I'll focus here on the ideas that struck me as especially insightful.
Nielsen gives several examples to illustrate the beginnings of his foretold revolution; some are scientific (the Polymath projectGalaxyZoo, FoldIt) while others simply illustrate the power of our new networked world (Kasparov versus the World, Innocentive).  These examples are used extensively and lend a convincing empricism to a book that claims to predict the future.  They also allow Nielsen to dive into actual science, adding to the fun.
Many scientific advances are the result of combinations of knowledge from different fields, communities, or traditions that are brought together by fortuitous encounters among different people.  In a well-networked world, these encounters can be made to happen by giving individuals enough accessible information and communication. Nielsen refers to this as "designed serendipity".
The reason designed serendipity is important is because in creative work, most of us...spend much of our time blocked by problems that would be routine, if only we could find the right expert to help us. As recently as 20 years ago, finding that right expert was likely to be difficult. But, as examples such as InnoCentive and Kasparov versus the World show, we can now design systems that make it routine.
Offline, it can take months to track down a new collaborator with expertise that complements your own in just the right way. But that changes when you can ask a question in an online forum and get a response ten minutes later from one of the world’s leading experts on the topic you asked about.
The trouble is, of course, that the forum in question doesn't exist -- and if it did, who would have time to read all the messages?  Nielsen delves into this question, discussing how to design an "architecture of attention" that allows individuals to focus on the bits most relevant to them, so that large groups of people can work on a single problem in a way that allows each of them to exercise his particular expertise.  Taking the idea of designed serendipity to its logical yet astounding conclusion, Nielsen presents a science fiction (pun intended) portrayal of a future network that connects all researchers across disciplines to the collaborations they are most aptly suited for.  I found this imaginary future world both fascinating and believable.
The second part of the book explores the powers that are being unleashed as torrents of data are made accessible and analyzable.  Here Nielsen draws examples from Medline, Google Flu Trends, and GalaxyZoo.  While the importance of "data science" is already widely recognized, Nielsen expresses it nicely:
Confronted by such a wealth of data, in many ways we are not so much knowledge-limited as we are question-limited...the questions you can answer are actually an emergent property of complex systems of knowledge: the number of questions you can answer grows much faster than your knowledge.
In my opinion, he gets a bit carried away, suggesting that huge, complex models generated by analyzing mountains of data "might...contain more truth than our conventional theories" and arguing that "in the history of science the distinction between models and explanations is blurred to the point of non-existence", using Planck's study of thermal radiation as an example.  Planck's "model" was trying to explain a tiny amount of data and came up with terse mathematical equations to do so.  The suggestion that such a model is similar to linguistic models based on fitting terabytes (or more!) of data, and that the latter hold some kind of "truth" surprised me -- I suspect rather that models informed by so much data are accurate because they never need to do more than interpolate between nearby known values.  Nevertheless, it was interesting to see Nielsen's different and audacious perspective well-defended.
A question of more practical importance is how to get all those terabytes of data out in the open, and Nielsen brings an interesting point of view to this discussion as well, comparing the current situation to that of the pre-journal scientific era, when figures like Galileo and Newton communicated their discoveries by anagrams, in order to ensure the discoverer could claim credit later but also that his competitors couldn't read the discovery until then.  The solution then was imposed top-down: wealthy patrons demanded that the discoveries they funded be published openly, which meant that one had to publish in order to get and maintain a job.
The logical conclusion is that policies (from governments and granting agencies) should now be used to urge researchers to release their data and code publicly.  Employment decisions should give preference to researchers who follow this approach.  At present, the current of incentives rather discourages such "open science", but like Nielsen I am hopeful that the tide will soon turn.  I was left pondering what I could do to help; Nielsen provides numerous suggestions.  I'll conclude with some of the most relevant for computational scientists like myself.
...a lot of scientific knowledge is far better expressed as code than in the form of a scientific paper. But today, that knowledge often either remains hidden, or else is shoehorned into papers, because there’s no incentive to do otherwise. But if we got a citation-measurement-reward cycle going for code, then writing and sharing code would start to help rather than hurt scientists’ careers. This would have many positive consequences, but it would have one particularly crucial consequence: it would give scientists a strong motivation to create new tools for doing science.
Work in cahoots with your scientist programmer friends to establish shared norms for citation, and for sharing of code. And then work together to gradually ratchet up the pressure on other scientists to follow those norms. Don’t just promote your own work, but also insist more broadly on the value of code as a scientific contribution in its own right, every bit as valuable as more traditional forms.

Thursday, November 3, 2011

Collaborative scientific reading

I often feel that the deluge of mathematical publications, fueled by the ever-increasing number of researchers and mounting pressure to publish, threatens to overwhelm my ability to keep up with advances.  I don't think this is peculiar to applied mathematics.  No matter how adept you are at sifting the chaff and finding the most relevant work in your field, you won't possibly have time to read every paper that is germane to your research, let alone those of tangential interest that might provide new research avenues.  For my part, although I take time to read new papers every week, I've resigned myself to the fact that I won't see more than the abstract of most of the papers I'd like to read, because I need to conduct new research, teach, write, and so forth.

Reading and digesting a mathematical paper takes time and concentration.  Nevertheless, I find that perhaps 80% of the value I get out of reading most papers can be summed up in a paragraph or two that is easy to read and understand.  We all have practice producing those terse paragraphs because we regularly referee papers and provide a concise summary for the editor.  This summary includes things like "what's really new in this work" or "how this relates to previous work", as well as an evaluation of its merit.  Unfortunately, those referee reports are kept secret and unavailable to our colleagues.  I mentally create a similar report for most papers that I read in depth, although I don't usually write my evaluation down and I certainly don't send it to anyone.  What if every reader of a paper had access to the summaries and evaluations made by all the other readers?  I think we could all learn a lot more, a lot faster, about what our colleagues are accomplishing.

Recently, Fields medalist Timothy Gowers proposed an approach to accomplishing just that. The idea is to bring the functionality of StackOverflow to the arXiv, creating a place where everyone can publish and everyone can openly referee or comment.  The StackOverflow system of reputation and up-/down-voting would be used to help the best papers and best comments float to the top.  As Gowers admits, there are plenty of obstacles, but I'm hopeful that people with his level of clout in the mathematical community could really bring this to pass.  His interest seems mostly based on issues with the current journal publication system, but I see it primarily as a way to "collaboratively read" the literature.  Indeed, it might be best if the site had no implications for decisions on hiring or tenure, to avoid any motivation to game the system.  The site would also be a great place for expository writing that can't be published in a journal.

It's encouraging to see that some things are already moving in this direction.  A new website named PaperCritic has just been launched to accomplish something roughly along these lines.  It doesn't involve the StackOverflow system, but has Mendeley integration and allows you to post a public review of any paper.  Meanwhile, an increasing number of scientists are including paper reviews in their blog posts -- something I would like to do here.

I think Mendeley could accomplish something useful in this direction if they would give users the option to make their library and notes public.  Then when I find a paper on Mendeley that says "20 Readers", I could find out who they are, see what they've written about that paper, and see what else they're reading.

Note: I know that we already have Mathematical Reviews, but in my opinion it doesn't accomplish the goals mentioned above, mainly because the reviewer of a paper is often not sufficiently knowledgeable about the paper to say anything more insightful than what's in the abstract.  I find that Mathematical Reviews gives me papers to review that I would never have read otherwise.  What I'd like to see are reviews from the people who read the paper because it's germane to their own work.

I discovered while writing this post that there was until very recently a successful site of this kind used by quantum computing researchers called  Perhaps we should focus on helping this guy get the site back up and start using it for math too.

Edit: Another brand-new open review system:


Wednesday, November 2, 2011

A better way to do multiple Gmail signatures: canned responses

I have both my personal and professional e-mail forwarded to a single Gmail account for convenience. One complication this causes is the need to use different signatures for correspondence from a single account. In the past, I've used the Blank Canvas Gmail Signatures extension in Firefox, but that has two drawbacks:

1. It has to be installed and the signatures configured separately on each computer I use.

2. It only works in Firefox.

Credit goes to an entry at for pointing out a better way.  Just use the Gmail labs feature "canned responses".   Save each of your signatures as a canned response, and then you can add it automatically when composing messages.  This works in every browser and only needs to be set up once.  Contrary to what it says on, you can include html in your signatures when using this method.

Something to watch out for: canned responses are actually saved as messages in your drafts folder.  They are hidden in the usual Gmail web view, but are visible in basic HTML mode or if you access mail through your phone.  Don't delete them.