See this page online at: http://www.laboratoryfocus.com/BioinformaticsSettoJump
Sign up for your free subscription and keep up-to-date.
Stay updated on the latest news and technologies with Bioscienceworld's newsletters.
Five to choose from.
A new computing collaboration aims to benefit the burgeoning bioinformatics community across the nation.
Situated in the National Research Council of Canada’s (NRC) Institute for Marine Biosciences in Halifax, N.S., the Canadian Bioinformatics Resource (CBR) was officially designated by Sun Microsystems Inc. in late October of last year as a Sun Center of Excellence (COE) in Distributed Bioinformatics, the first COE in Atlantic Canada. Now as one of four COEs in Canada and joining 40 others worldwide, CBR will use Sun™ ONE Grid Engine software to expand the accessibility of its resources and accelerate research efforts by creating a computational grid for biologists.
Established in 1996, the CBR currently provides its service to NRC scientists and academic/not-for-profit users associated with a university, hospital or government department. Although users can be unregistered, becoming a member of CBR vastly increases access to the available pool of tools, applications and databases, including basic training and user support.
“NRC is very proud of CBR and its achievements; it has accomplished a great deal in a very short time,” said Arthur Carty, PhD, president of the NRC, while addressing the official announcement gathering at Dalhousie University in Halifax.
The CBR was created, Carty explained, in response to the “explosion” within the biotech industry that saw growth from 50 biotechnology companies Canada-wide in 1984 to 227 in 1997, and over 400 in 2001. Grid computing, he continued, offers a means to help deal with the data influx that has accompanied this industry growth.
“Many applications used in bioinformatics are well suited to a grid environment because they are computationally intensive or require access to very large databases of biological information,” Carty said.
This new Sun-CBR relationship builds upon CBR’s initial acquisition of Sun™ hardware and software for its intranet, unveiled in 1997. Development of this resource served Canadian researchers by providing shared access to the latest DNA sequencing and research findings, which now include 96 biological databases and over 800 executable applications worldwide.
It can be rather costly for researchers to independently access all the software and hardware they may require, said CBR’s manager, Simon Mercer, PhD. For this reason, a bioinformatics grid represents a flexible, larger nationwide resource that will promote collaboration, will reduce redundancy of resources and can be continuously extended. Goals for developing this grid, Mercer said, include providing a simple user interface through a Web-based browser, and hiding the complexity of the network from the user.
“The great thing about the grid is a lot of people are interested in it now and a lot of people want to collaborate. So we’re hoping to get quite a large number of machines involved in the grid from people who aren’t members (of CBR),” Mercer said later in an interview. He gave the example of HPCVL (the High Performance Computing Virtual Laboratory — formed in 1998 by a consortium of four universities in eastern Ontario), also a Sun COE (in Secure Grid and Portal Computing) and one that has an extensive high-performance computing environment.
The COE designation requires that an organization display excellence in its area of specialization and includes provision of the latest generation Sun™ products, Mercer said. In turn, COEs channel product feedback to Sun.
As part of the day’s proceedings, Mercer demonstrated three possible uses of the grid setup involving 50 machines from CBR’s 13 nationwide member sites, which are connected to the national research network, CA
This observed power of the grid — which can accommodate any operating system — is not surprising, Mercer said, as many jobs can be run in parallel.
By involving a network of capable machines, efficiency of work is greatly improved, said Wolfgang Gentzsch, PhD, Sun’s director of Grid Computing.
The problem today, Gentzsch said, is that when there is only one system and it goes down, work is lost and everybody has to wait while the system is being recovered.
“In a grid, however, immediately among 100 nodes, when one node is going down, the grid master software immediately recognizes that one system is down and it doesn’t send any more work to that specific node but distributes it to other suitable locations,” he said. “Now that you distributed your work over 100 nodes, only one per cent is affected and in addition, the other 99 per cent continue working.”
The grid can also identify old systems that are no longer being used because they no longer match the requirements of the current jobs and grid software. In this way, Gentzsch said, the grid system will optimally match the requirements of the user to the underlying system and only where all conditions are met is a match made and that system is selected for a particular task.
Among enthusiastic proponents of the CBR grid is John Nash, PhD, research officer with the Pathogen Genomics Group, Institute for Biological Sciences, NRC, whose research group studies the genomics of bacterial pathogens, including that of Campylobacter jejuni, the major bacterial cause of food poisoning. The group has created a DNA array containing almost every gene of C. jejuni — comprising about 1,600 genes — and is examining under what environmental conditions these genes are turned on, as well as performing serotype work on the microbe.
For Nash and his group, the grid will offer greatly enhanced database searching power.
DNA searches using one central processing unit (CPU) to mine one database may take several minutes, if not hours, to search a couple thousand genes. “When you want to do whole genome sets and you have 2,000 genes, or you’re looking at the human genome that’s 30 to 40 thousand genes, you can’t afford to wait a few minutes each,” Nash added. “You want a second to search; that’s an ideal amount of time.”
Researchers could also increase their computing speed, Nash pointed out, if they had the appropriate finances to buy top-notch systems. Grid-enabled computers offer this capability without costing millions. “It’s almost like a co-operative,” he said.
Contributing to this efficiency, Nash said, is the setup’s ability to use idle processor cycles on others’ computers. “That’s a really effective way of using a CPU that is not doing anything; it doesn’t hurt to be used and it’s not affecting anything,” he said.
While grid computing presents many potential benefits to its prospective users, companies are still reluctant to purchase new IT infrastructure, Gentzsch said. Despite finding that equipment becomes obsolete very quickly, companies had invested a good chunk of their funds on IT in 1998 through 2000, which was followed by a blow to the economy, he said.
Gentzsch advocated making such investments now. “When you buy new IT equipment, with the new software, with the new expertise that we have, you at least double your productivity,” he said.
In a financial example given by Gentzsch of using the grid, one engineer can submit five jobs and see five results rather than waiting for the jobs to run one by one in a non-grid case. An engineering project in Silicon Valley using a $5,000 machine may cost about $205,000. With the grid, however, involving five machines will not amplify the cost five-fold; instead, it would total about $225,000. “So, you invest 10 per cent more and you get a factor of five efficiency,” he said.
Ultimately, return on investment is a critical reason for opting to get on the grid, said Stefan Unger, PhD, Sun’s business development manager for Computational Biology.
Aside from monetary investment in projects, there is a time component, which includes execution time, or the run time of the project, and that is what most people focus on, Unger said. But there is also development time — the amount of time needed to write the code and have it functioning — as well as maintenance time for the machines, he said.
“There are some people who want to focus on computer science, and for them they love the fact that the computer crashes because they get to learn something,” Unger said. “If you have a graduate student who’s supposed to be getting a PhD in biochemistry or bioinformatics, and instead they’re spending half their time fixing their broken machines, that’s a wasted resource.”
Effective time management has particular importance in relation to coping with large quantities of information. For data that are multiplying every six months or so, such as from GenBank — an annotated collection of all publicly available DNA sequences — the increase is actually greater, Unger said, because there are hundreds of databases being derived worldwide from that vault of information.
“So, GenBank goes up one megabyte, the world’s amount of information data goes up by a hundred fold,” he said. “Having people spend time administering computers is not going to solve that problem. It’s going to be solved by having them write more intelligent algorithms.”
While grid usage policies exist, Mercer said, such as allowing grid participants to make available only a certain percentage of their computing power, having machines and not humans setting up collaborations will involve a type of paradigm shift on the part of resource managers.
“The idea that you’ll always have jobs lined up in sequence won’t necessarily be the picture anymore,” Mercer said. “You’ve just got to trust the software.”
By the early part of this year, Mercer said the CBR will have dismantled the grid and re-structured and implemented a second-generation form, based on lessons learned from the initial setup. Regarding the experience thus far, “It’s been more pleasing than frustrating, but of course as with anything else, there are teething troubles,” Mercer admitted.
As for what may evolve beyond the grid, Gentzsch said it’s perhaps too soon to predict just yet. However, the progress will certainly involve complete virtualization of resources and having options for wireless grid access.
“We will be very busy over the next 10 to 15 years to improve the grid, to make the grid really a successor of the Internet,” Gentzsch said. “It’s basically an enrichment of the current Internet infrastructure.”