Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Peer (to Peer) Pressure: It's a Good Thing


April 2002: Peer (to Peer) Pressure: It's a Good Thing

"We work in a mature field now." Yeah, right. Working software pros know better. Can you imagine concepts like mindshare and the "hype trajectory" transplanted to, say, aeronautics? Daring engineers design … concrete aircraft! "They're maintenance-free! Built to last, with a mold-on upgrade path (MOUP)." Venture capitalists sign on. Catapulted prototypes (briefly) leave the ground. "Early Q3 2002, transatlantic airliners! Early adoption equals market advantage." A hype-storm brews, acronyms proliferate, pundits pontificate, startups start up. Fleets of concrete aircraft taxi ponderously about, almost leaving the ground. Passengers desert in droves to the freeways. A suspiciously similar set of pundits begins to cast aspersions. Promoters blame halfhearted adoption:

"Remember, we said you'd need 20-mile-long runways." Big-business-conspiracy theorists mutter about Boeing and Airbus crippling the concept. Concrete airplanes fade as the new baking-soda-and-vinegar jet engine captures the attention of the trade media …

Beyond Immoderate Praise
OK, so maybe I'm "piling" it on (sorry). The tricky part is that there's often a nugget of value buried under the mountain of hype. Certainly, peer-to-peer technologies have gotten their share of immoderate praise, and suffered the inevitable backlash. In his April 25, 2001 article for Slashdot, "Does Peer-To-Peer Suck?", Jon Katz quotes "respected Stanford Law Net guru Lawrence Lessig: 'Peer-to-peer,' [Lessig] exults, 'is the next great thing for the Internet.'" Then Katz observes sourly, "If we've learned anything in the past decade or so, it's to run for your life whenever you hear anybody say that. One thing you can take to the bank (if it's still letting you in the door): Peer-to-peer is not the next great thing, on the Net or off." But now that some of the transform-the-planet fervor has died down, useful tools are emerging—and good work has already been accomplished.

What was all the hype about in the first place? What is this P2P stuff, anyway? Ah, that's a slippery question in itself. Do computing clusters count? Napster? ICQ? SETI@Home? Heck, what about Windows file shares? Well, that way lies flame war, friends, so let's not go there. I think we can agree on a spectrum, from definitely not peer-to-peer (your browser retrieves this article from Software Development's Web server) to certainly is (a decentralized file-sharing system like Gnutella). Columnist and Accelerator Group partner Clay Shirky drew a useful line around the middle way back in November 2000 when he published "What is P2P … And What Isn't" on O'Reilly's Openp2p.com.

"P2P," he wrote, "is a class of applications that takes advantage of resources—storage, cycles, content, human presence—available at the edges of the Internet." And "If you're looking for a litmus test for P2P, this is it: First, does it treat variable connectivity and temporary network addresses as the norm, and second, does it give the nodes at the edges of the network significant autonomy?" Finally, and famously, "PCs are the dark matter of the Internet, and their underused resources are fueling P2P."

From MP3s to Mapping the Genome
Unless you've been living with a volleyball on some island for the past year, you're more than familiar with Napster, the absurdly popular music-file sharing service that ran afoul of multiple copyright lawsuits. I work at the University of Wisconsin, a major research institution. We've got your high-speed distributed computing, your distance-education streaming video, your professors swapping data on global climate—all bandwidth hogs. None of those even came close to top honors, however. Statistics showed that, at its height, the single greatest consumer of network bandwidth at UW was Napster traffic to the dorms, by a factor of two or better. Napster's popularity helped jump-start the peer-to-peer movement—to many, the two are synonymous. In fact, back in 2000, Lincoln Stein, a researcher involved in the Human Genome Project, got written up in Wired magazine's online journal for examining Napster as a mechanism for publishing gene-sequence information to researchers worldwide. But when I contacted him, Stein, who is an associate professor at the Cold Spring Harbor Laboratory in New York, told me that so far, peer-to-peer hadn't measured up.

"Napster wasn't scalable because it relies on a central directory. Also, it uses hard-coded attribute fields, such as artist, that apply only to song files. To distribute genome sequence information, I needed a flexible way of describing and searching for attributes." Stein also investigated using the decentralized Gnutella protocol: "Gnutella provides much better handling of attribute fields. However, its concept of a 'network horizon' means that the world is inevitably fragmented into many small subnets that aren't connected. Genome researchers need access to all the data, not to the subset that happens to be connected at the time." And Freenet, according to Stein, wasn't just skimpy with attributes; due to privacy concerns, it also made it impossible to discern a datum's provenance—a critical item for a researcher.

"I haven't given up on P2P," Stein claims, "but I'll need more robust protocols that are available as open-source implementations before I can use it for serious work."

Anyone Out There?
But there's a lot more to P2P than file sharing. Chances are, right now your PC and Internet connections are running at some fraction of their capacity—capacity you paid for whether you're using it or not. There are several big computing projects out there that would be happy to utilize that capacity. This is the key idea behind distributed computing—donating or renting currently unused computing resources to a large project. One of Gnutella's lead developers, Gene Kan, put it this way in an August 2, 2001 article for O'Reilly: "The price of performance is decreasing constantly while the performance itself is increasing ridiculously. That means I'm pretty happy to share my 'Pentium 8 50gHz' because I only need all that horsepower while Windows boots. After that, the CPU is hardly utilized because I can't hit 50 billion keys in a second. Between keys, my computer could be cracking RC5 or musing on colon cancer."

The obvious example of such a system is SETI@Home—download a client, hook up to a server, and whenever your machine is idle, it begins to analyze radio-telescope data for patterns indicating signals from sentient life. The SETI @Home Web site shows that 3.5 million users have registered, donating more than 869,000 years of compute time (working out to something like 30 teraflops—trillions of floating-point operations—per second). Making it simple for volunteers to participate is no trivial proposition, when you think about it. Since many clients connect via ISPs, their IP numbers change all the time (that "dark matter" problem again). So the system designers worked out a protocol whereby the newly connected client uploads its current IP to a known server address; after that, communication proceeds between peers.

CRYSTAL Becomes CONDOR
Harnessing networked computers together into a big parallel processor is nothing new, of course. Look at where I work, for example: Way back in 1985, when I walked into the lab for my first systems-admin job at UW, I did a double-take: Why on earth were we operating a laundromat? It was, in fact, a big rack of 20-some DEC VAX 11/780s, all networked together into a parallel computer called CRYSTAL. The network connection was a highly customized token ring, the operating systems were built practically from scratch, and of course the application software was purpose-built to run on CRYSTAL. But it definitely hummed, and proved the point that shared-memory, close-coupled parallelism wasn't the only way to build a supercomputer.

CRYSTAL made a darned fine space-heater for the lab, too—but it was already obsolete. Three years later, noticing how many faculty members had their own high-powered workstations, UW researchers started building CONDOR. Where CRYSTAL needed a dedicated group of identical machines, CONDOR was software, intended to exploit idle cycles from a pool of workstations. To join the CONDOR pool on the local area network, researchers could simply run a daemon, allowing them to submit jobs to the grid and to add their computer to its resources.

Of course, now that people's personal workstations were involved, the CONDOR team had to develop techniques for keeping participants happy, or they'd drop right out of the flock. (Yes, CONDOR has its own little ornithological jargon: flocks, gliding in to computations, you name it.) So they built an entire language for users to specify things like the maximum permissible load on the machine, who was allowed to submit jobs to it and when, and so on.

Technical challenges arose, too: What happens to a half-completed computation when the computer's owner turns back to it and starts to type? The team had to add checkpointing logic, which let CONDOR save its computational state periodically so that, if interrupted, it could pick up from the most recent checkpoint. Their efforts were fruitful: CONDOR is still in use today, and is still being improved (In fact, the last time I looked, there were three job openings on the Web site!). In September 2001, a new release made CONDOR pools available as resources for something even bigger. CONDOR 6.3.0 included support for the emerging standard in grid computing: the Globus Toolkit.

Technological Toolbox
The Toolkit is an open-source set of software tools, developed primarily at the Argonne National Laboratory, the University of Southern California's Information Sciences Institute and the University of Chicago's Distributed Systems Laboratory. (OK, here's another plug for UW: The CONDOR team is listed as a "major contributor.") The Globus Toolkit tackles problems like resource discovery and directory services, resource allocation, and single-sign-on authentication, to enable users to create Net-wide infrastructures of jaw-dropping computing power. And for anyone who still thinks that "open-source project" means "impractical, wild-eyed dream," the list of Globus supporters is almost a Who's Who of Big Fast Computers: Compaq, Cray, Entropia, Fujitsu, Hitachi, IBM, Microsoft, NEC, SGI, Sun Microsystems and Veridian have all publicly committed to adopt the Toolkit for their platforms; Platform Computing plans to build a commercial implementation of it. Other institutions building grids with the Toolkit include the U.S. National Partnership for Advanced Computational Infrastructure and the U.S. National Computational Science Alliance, the European Datagrid Project, and NASA's Information Power Grid.

The Nitty-Griddy
What's a "grid"? Glad you asked. In "The Anatomy of the Grid: Enabling Scalable Virtual Organizations" (International Journal of Supercomputer Applications, 15(3), 2001), Ian Foster, Carl Kesselman and Steven Tuecke point out that "The real and specific problem underlying the Grid concept is coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations … This sharing is, necessarily, highly controlled, with resource providers and consumers defining clearly and carefully just what is shared, who is allowed to share, and the conditions under which sharing occurs … The following are examples of VOs: the application service providers, storage service providers, cycle providers and consultants engaged by a car manufacturer to perform scenario evaluation during planning for a new factory; members of an industrial consortium bidding on a new aircraft … and members of a large, international, multiyear high-energy physics collaboration."

Foster, Kesselman and Tuecke go on to note that "Current distributed computing technologies do not address [these] concerns and requirements." Technologies like CORBA and J2EE share resources in an organization; commercial solutions for distributed computing require "highly centralized access to those resources." So to qualify as a grid, a distributed-computing setup must provide decentralized access to powerful computing resources, allowing virtual organizations to come and go as needed. In addition, issues like security, access control and who pays for what must be built in from the start, not slathered on as an afterthought.

If your mind is beginning to boggle at the grandiose scale of these plans, I'm right with you, but the companies involved don't seem daunted. IBM, for instance, is jumping in with both big blue feet, implementing the Globus Toolkit in their eServer Linux systems. They're working on the Distributed Terascale Facility, a National Science Foundation project to build a grid with well over 10 teraflops peak capacity, by mid-2003. According to the NSF, the primary vendors are IBM (servers), Intel (processors) and Qwest (40-gigabit/second network); clusters of high-speed Itanium-processor machines are being set up at four sites. The National Center for Supercomputing Applications in Illinois will provide the biggest number-cruncher, with a new 6-teraflop cluster added onto existing resources for a total of 8 teraflops available, plus 240 terabytes of secondary storage. The San Diego Supercomputer Center will handle data and "knowledge management" with a 4-teraflop cluster and another 225 terabytes of storage. At Chicago's Argonne National Laboratory, a 1-teraflop cluster will be available for visualization and data rendering; Caltech will chime in with scientific data, to the tune of 0.4 teraflops and 86 terabytes of storage.

In other words: we're talking serious gaming platform here. SimGalaxy, anyone?

Sharing Standards
There's more to P2P than teraflops and terabytes, though. At the other end of the spectrum, peer-to-peer is not about what you've got (that is, brute computing power), it's more about who you're working with. The file-sharing frameworks pointed the way, but as more people discover the power of end-to-end application connectivity, they're putting P2P to use in all kinds of applications: sharing LANDSAT images, collaborative computer-aided design work, shared diaries, auctions and much more.

Until fairly recently, developing such software was a challenge, not least because the most common P2P architecture was the "silo": a monolithic piece of software that handled everything from getting through firewalls, to discovering peers on the Net, to the nitty-gritty of passing messages back and forth. It's crazy, but your SETI@Home client, your ICQ chat client, and your Gnutella file-sharing client are all doing pretty much the same thing, with independently developed protocols. Fortunately, however, standards are beginning to emerge. For one thing, the burgeoning interest in Web services is creating a whole culture of programmers who grok Simple Object Access Protocol (SOAP) and its friends. While not designed expressly for P2P, the Web services protocols certainly get the job done. So much has been written about SOAP and its companion technologies that it's hardly worth going into here. Suffice it to say that the widespread familiarity with SOAP helps solve the chicken-and-egg problem common to P2P adoption. And there's nothing technically wrong with using SOAP for P2P; in Shirky's opinion, "the Web services stack is a better attempt at encoding and serialization than anything the P2P folks could come up with on their own—SOAP looks like the P2P implementation language to me."

The Next Big Thing?
But Shirky, and many others, are also keeping a close eye on Project JXTA (pronounced juxta, as in juxtapose: it's not an acronym). JXTA is an open-source initiative to develop a complete peer-to-peer infrastructure. Initially bootstrapped by Sun Microsystems and now fueled in large part by its developer community, Project JXTA is intended to promote collaboration among everything from servers to cell phones. Central to JXTA are three concepts: groups, pipes and monitoring. Groups pertain to how peers come online, receive unique IDs, get around firewalls, and, most importantly, discover others in their group. Pipes are the basic communication facility in JXTA, and come in various flavors: unidirectional, bidirectional, propagate-to-group, "reliable" (like TCP) and "unreliable" (like UDP). And with hooks for monitoring services built-in, IT staff can track traffic for peer nodes, and will someday be able to manage them—shutting a node off if it swamps the local net, for instance.

Juan Carlos Soto, whose extra-wide business card bears the title "Group Marketing Manager for Project JXTA and Community Manager for JXTA.org," thinks the project may be positioned to catch the Next Big Thing. "In the 1980s," says Soto, "a big turning point was the adoption of TCP/IP—not because it was a superior network protocol, but because it had been broadly adopted. In the 1990s, the innovation all blew up around HTML. We think that there's a similar phenomenon underway with P2P. The key idea is that the devices on the edges are not just consumers, they're pretty powerful, and able to be providers, too."

Mind you, that's devices, not computers: "People are talking to us about putting it in light switches," says Soto. He claims that's an important distinction from the PC-server-HTTP model of SOAP, UDDI and their ilk: "Most of the Web services protocols seemed like overkill for having your PDA interact with your cell phone," and he goes on to point out that for other kinds of devices, HTTP connectivity can't be assumed in the first place. But Soto says Project JXTA certainly hasn't written off communication with the Web services world, either. "It's still being looked at. There's a project on JXTA.org, Network Services, to find out where it makes sense to either have seamless links into existing Web services or adopt their protocols."

Not Just Java
JXTA is intended to be language-, platform- and transport-agnostic. The reference implementation is in Java; you might assume that since JXTA starts with a J, and Sun started JXTA, it's Yet Another Java Extension. Not so, says Soto: Sun's resources are going into both C and Java 2 Micro Edition versions of the protocols, and the community is busily coming up with others: Objective-C, Perl and Python have all been demonstrated. And while most applications built with JXTA currently use TCP/IP as the low-level transport layer, that's not a requirement.

Soto is quick to point out that Sun hopes to profit substantially from JXTA's success, though not from the protocols themselves. "JXTA is available for anybody to use under an Apache-style OS license. Sun is a player just like anybody else; our view was that a lot of our product line would benefit from having P2P resources available. We hope that JXTA becomes the protocols that trigger the next wave of innovation. One company couldn't do this alone, so the best way was through an open-source effort." He cites the project discussion lists as evidence of JXTA's vitality: "If somebody posts a question, more likely than not, it's answered by a community member, not somebody paid by Sun."

The worlds of grid computing and protocols like JXTA aren't mutually exclusive, of course; in fact, some companies have built systems that use the JXTA protocols for peer discovery and initial communication, then use heavy-duty APIs like the Globus Toolkit to get the computing done.

So real companies are indeed building real applications today with P2P. There's Groove Networks, Consilient and Ikimbo, building their business-to-business collaboration platforms; OpenCola, with collaborative search-and-discovery tools; and Entropia, makers of distributed-computing software. As of this writing, even the beleaguered Napster hasn't tossed in the towel, still hoping to launch a new service in 2002. When the hype-meisters have long since moved on to the next concrete-airplane fad, I'm confident that peer-to-peer technologies will still be delivering real value to their users, and real advantages to software developers.

Information, Please
While there's no map through the forest of peer-to-peer options, these resources might help you find your way to distributed computing success.

The Virtual Bookshelf

  • O'Reilly's OpenP2P (www.openp2p.com): This Web site contains a slew of articles on P2P technoloy. While most of them are positive, they're hardly starry-eyed.
  • "Writing Peer-to-Peer Apps With the Microsoft .NET Framework" (http://msdn.microsoft.com/msdnmag/issues/01/02/default.asp): A useful article found on the Microsoft Developer's Network.
  • Grid Computing Info Centre (www.gridcomputing.com): A portal with more grid-computing links than you can shake a cluster at.
  • "Bill Joy's New Passion: Industrial-Strength P2P" by Erick Schonfeld (Business 2.0, Jan. 2002, www.business2.com/articles/mag/0,1640,35849,FF.html): A quick sketch of the history and prospects of JXTA ("Jini, the Sequel"? Ouch!)
  • "IAAL*: Peer-to-Peer File Sharing and Copyright Law after Napster" by Fred von Lohmann (Electronic Frontier Foundation, www.eff.org/IP/P2P/Napster/20010309_p2p_exec_sum.html): An excellent analysis of the legal pitfalls awaiting the unwary P2P implementer. "IAAL" stands for "I Am A Lawyer," in contrast to the common online "IANAL," "I Am Not A Lawyer."

In-Process Projects

  • SETI@Home (setiathome.ssl.berkeley.edu/): The famous distributed-computing pioneer, SETI@Home enables users' personal computers to search for sentient alien life: Just download a client, hook up to a server, and whenever your machine is idle, it will begin to analyze radio-telescope data for patterns indicating signals from sentient life. Three and a half million users have registered, donating over 869,000 years of compute time (working out to something like 30 teraflops—trillions of floating-point operations—per second).
  • Folding@Home (folding.stanford.edu): Similar to SETI@home, this project exploits distributed computing to investigate protein folding.

Peer-to-Peer Toolkits and Products

  • The Globus Toolkit (www.globus.org): A distributed-computing toolkit for implementing grids.
  • Jabber (www.jabber.org): The self-described "coolest Instant Messaging system on the planet." XML-based, Jabber bridges operating systems, protocols and architectures. (See also this month's New and Noteworthy, in which the book Programming Jabber is described.)
  • Project JXTA (www.jxta.org): A P2P infrastructure project begun by Sun Microsystems, JXTA is now open source.
  • The Beowulf Project (www.beowulf.org): Although a Beowulf cluster isn't really peer-to-peer, it's probably the most common means of assembling a truly high-performance computer on a budget.
  • CONDOR (www.cs.wisc.edu/condor/): High Throughput Computing software toolkit. Optimized for applications like exploiting a network of machines with idle time to run the same simulation with many different sets of parameters, rather than a single parallelized computation at blinding speed.
  • GridSim (www.csse.monash.edu.au/~rajkumar/gridsim/): A toolkit, not for implementing a computational grid, but for simulating one—useful to get the dynamics and resource-allocation policies down.
  • Cluster Development Kit (www.pgroup.com): A commercial toolkit for exploiting Linux clusters with optimized parallelizing compilers and spiffy tools.
  • Sun Grid Engine (www.sun.com/software/gridware/): Grid management software targeted at intra-organization (campus) grids.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.