The value of distributed computing: The return of Markov chain Monte Carlo methods

A while back I wrote something on doing Monte Carlo simulations with Web Services and SharePoint. Halfway through I mentioned that Google Pagerank was defined by a Markov chain which in turn was an output of a process called Markov chain Monte Carlo methods. Not that it concerned me but only one person mentioned this, and at that it was a vague mentioning. Huh…

This actually is a big deal. In fact a very big deal. A multi billion dollar deal in fact, as in the case of Google PageRank. Distributed computing has the power to help us solve many things if applied correctly. The “cloud” does not. (A topic for later.) Probably the greatest hurdle in getting people back on track is that this technology has use beyond the scope of most peoples daily lives. For example…

A paper was published in PLoS last week, September 4th 2009, called “Can an Eigenvector Measure Species’ Importance for Coextinctions?” In it the authors state that “PageRank” can be applied to the study of food webs. Food webs are the complex networks of who eats whom in an ecosystem.Typically we’re at the top, unless Hollywood or very bad planning is involved. Essentially, the scientists are saying that their particular version of PageRank could be a simple way of working out which extinctions would lead to ecosystem collapse. A relatively handy thing to have these days… As every species is embedded in a complex network of relationships with others, even a single extinction can rapidly cascade into the loss of seemingly unrelated species. Investigating when this might happen using more conventional methods is complicated as even in simple ecosystems, the number of combinations exceeds the number of atoms in the universe… E.g. a typical lottery which has 8 numbers that can range between 1 and 50 has 39,062,500,000,000 different combinations…

The researchers had to tweak PageRank to it to adapt it for their ecology focused purposes.

“First of all we had to reverse the definition of the algorithm.” “In PageRank, a web page is important if important pages point to it. In our approach a species is important if it points to important species.”

They also tested against algorithms that were already in use in computational biology to find a solution to the same problem. PageRank, in its adjusted form, gave them exactly the same solution as these much more complicated algorithms.

With the right design SharePoint can be an extremely useful, and totally appropriate, interface for accessing and disseminating the inputs and outputs of such an effort. It can store and present this data with all of the requisite benefits one would expect from a collaborative platform. Certainly there’s a world of work involved in doing something like this but the key point is that the right tool for the right job mantra works here. “All” you need is:

  • IIS
  • .NET
  • SharePoint
  • PowerShell
  • Visual Studio
  • SQL
  • Skill