TMCNC's North Carolina Statewide Grid Computing Initiative
Deploying Emerging Technologies to Spark Innovation and Drive Technology-Based Economic Development
By Phil Emer, Chuck Kesler and Lavanya Ramakrishnan, MCNC
MCNC Grid Computing & Networking Services, a member of
the MCNC non-profit independent family of companies located in
North Carolina's Research Triangle Park, and the University of North Carolina's
16-campus system are grid-enabling the existing statewide research
and education network that interconnects universities throughout North Carolina.
This statewide grid is anticipated to be the first statewide research
and education grid in the country. The initiative is viewed as the
most ambitious upgrade to the state's computing infrastructure in history
and a catalyst for economic development. It will support multiple
scientific disciplines in addition to other grid information technology
applications, such as administrative and library services.
The initiative emerged from MCNC's development of one of
the country's first scientific computing grid networks in 2001, the North
Carolina Bioinformatics Grid Test Bed, and has now been expanded to include:
- The MCNC Enterprise Grid
- The statewide grid network
- The development of a Grid Technology Evaluation Center
Details of these initiative subsets are provided in the following
sections of this article.
Grid computing will benefit urban and rural areas of the state,
spanning business, academia and government. It is especially important to
smaller institutions that only need computing resources periodically and often
cannot afford to invest in new technologies.
Most of the state's high-performance computing resources
are at the large research universities surrounding Research Triangle Park
_ the University of North Carolina at Chapel Hill, Duke University and
N.C. State University. Smaller universities, many in the more rural areas of
the state, have historically lacked access to advanced computing
resources. By enabling researchers and educators throughout the state to take
advantage of computing resources that already exist at the large universities,
and enabling all researchers in the state to pool resources and
intellectual expertise, resources available to an individual researcher anywhere in
the state will vastly increase. The statewide grid will be a catalyst for greater
levels of innovation, the creation of more intellectual property and
ultimately more businesses started with local entrepreneurial leadership
throughout North Carolina.
The Foundation of the Statewide Grid
The North Carolina Research and Education Network
(NCREN), established in 1985 through a collaboration of MCNC and
the University of North Carolina system, is the backbone infrastructure for
the statewide grid.
Operated by MCNC, it has evolved along with the Internet from a
research project to a critical infrastructure for the research and education
community. NCREN is a production-level, IP (Internet Protocol) network
providing advanced communications and Internet services to more than 180
locations, including universities and other government and non-profit
institutions throughout North Carolina. It serves about half a million students,
faculty and staff from the University of North Carolina's 16-campus system,
Duke University, Wake Forest University, and others. The network provides
high-speed Internet service, access to Internet2 and the national research
and education Abilene network, and interactive, near
broadcast-quality video conferencing for distance-learning classes.

The N.C. Bioinformatics Grid
Scientific communities, with exponentially increasing storage
and resource needs, are driving the development of grid
computing frameworks to support the next generation of innovation. Biology
and life sciences researchers, historically not heavy users of
high-performance computing, are now at the leading
edge of this evolution as the sequencing of entire genomes has unlocked a
new horizon of opportunity. The availability of massive compilations of
genomic and related data is merging biology with information science. Storage
and management of these data sets will require systems capable of
managing petabytes of data, and analysis and modeling will require
high-performance computing capabilities.
With enhanced capabilities to address computational and
data requirements, grid computing is an ideal solution to the needs of life sciences
researchers. Also, the transparent and seamless access
to compute and data storage resources, as if they are located on a
single computer, makes grid solutions an even more compelling fit.
MCNC and the North Carolina Biotechnology Center's Genomics
and Bioinformatics Consortium, in collaboration with IBM, launched
the N.C. BioGrid in 2001 as one of the nation's first grid test beds
for computing, data storage and networking resources for life
sciences research. The N.C. Bioinformatics Consortium includes more than
80 organizations representing academia, business and industry.
Members include the University of North Carolina's 16-campus system,
Duke University, Wake Forest University, GlaxoSmithKline Inc., IBM,
the Research Triangle Institute, SAS Institute, Biogen, the National
Institute of Environmental Health Sciences, and the U.S Environmental
Protection Agency.
"As the grid virtual team has learned more about grid computing
and delved further into the NC BioGrid and NCREN, we have come to
view the BioGrid as one of NCREN's most innovative applications and
the one area that can demonstrate networking's greatest potential."
Wayne Clark
Networking Services Architect
Work is being conducted to test software for a better understanding
of the issues associated with storage, analysis and movement of
large bioinformatics data sets in a high-speed networked environment. The
objective is to enable participants to share data and computing resources,
thus eliminating the need for costly duplication of data sets and
computing resources at each institution.
Currently, the test bed involves resources from four organizations _
the University of North Carolina at Chapel Hill, North Carolina State
University, Duke University and MCNC.
In planning the BioGrid test bed, a number of key objectives
were identified:
- The grid must span multiple administrative domains and allow for the independent management of resources.
- A diverse set of hardware and operating system platforms should be supported in the grid.
- Systems that participate in the grid must be configured to meet a superset of the security policies of each organization.
- Data files should be organized in a global namespace so that they can be accessed at the file system level with consistent pathnames throughout the grid.
- The grid must minimize network traffic through data caching and replication.
- For sensitive data being transported across the network, the grid must provide facilities for data encryption and integrity checks.
- The grid requires a "meta-scheduler" that can intelligently and transparently select compute resources capable of running a user's job.
- The grid creates a uniform name space so that resources can be addressed consistently across the grid.
BioGrid Middleware and Supporting Technologies
Perhaps the biggest challenge for the N.C. BioGrid was to identify
the appropriate grid middleware. This led to an evaluation of numerous
grid platforms and testing a mix of solutions. A hybrid solution of
multiple grid middleware platforms was developed, working with
technologies that have a relationship with or roadmap to Open Source
Grid Architecture (OSGA) standards. Currently, the following grid
platforms are deployed:
Globus Toolkit 2.4 - for core grid functionality such as
job scheduling across administrative domains, a resource registry, and
a framework for developing grid-aware applications.
Avaki Data Grid 3.0 — to provide a globally available file system
using a global namespace.
In addition, we are working with the following supporting technologies:
Platform LSF — for job
scheduling on clusters and large SMP servers.
Sun Grid Engine — for job scheduling on clusters.
Sun ONE Directory Server —
LDAP infrastructure for managing user accounts.
CHEF - a framework for building grid-aware collaborative portals.
MyProxy - an online repository that enables remote management
of grid credentials.
BioGrid Applications
The first prototype application selected for the N.C. BioGrid
was NCBI BLAST. This tool is widely used in the bioinformatics
research community to search for similarities between candidate proteomic
or nucleotide sequences and target genomes.
IBM worked with MCNC and its university partners to develop
more sophisticated grid applications. In an Extreme Blue project conducted
during 2003, IBM teamed a group of four student interns with mentors to build
a grid-enabled interface to the BioPerl libraries to address the
computational needs of the Fungal Genomics Lab at N.C. State University.
Results that took one to two weeks using a
single system are now produced in near real time on the BioGrid. The Fungal Genomics Lab has also integrated
one of its clusters with the BioGrid test bed.
In a second example, IBM and MCNC worked with researchers at
the University of North Carolina at Chapel Hill to build a grid-enabled
drug discovery application that screens candidate chemical compounds
for biological activity. This is accomplished by performing a
parameter space study to produce a training set to develop a model. The model is
then applied to other data. Work that previously took a month is
now accomplished in a single day.
The MCNC Enterprise Grid
To address multiple disciplines beyond the original scope of the
N.C. BioGrid test bed, MCNC has reconfigured its compute, storage,
data and application resources into a grid architecture _ the MCNC
Enterprise Grid.
Biology research is an application for grid, but the test bed
infrastructure is now evolving to address multiple research disciplines and
applications. As campus infrastructures are moving to grid frameworks, the development
of the MCNC Enterprise Grid is a step towards the "grid of grids" concept.
As computing and storage clusters evolve into grids, they will be
interconnected into larger grids that will cross
multiple organizational boundaries (firewalls), such as through the North
Carolina statewide grid.
The initial launch of the MCNC Enterprise Grid in October
2003 included two high-performance computing systems:
- A 64-node Massively Parallel Processor (MPP), distributed memory IBM Cluster with a total of 128 Intel 2.8-GHz, 32-bit CPU's running RedHat Linux
- A Symmetric Multi-Processor (SMP), shared memory SGI server with 32 Intel 1.3-GHz, 64-bit CPU's running a variant of the Advanced Server edition of RedHat Linux
A combination of direct-attached and network-attached
(Network Appliance) disks complement the computer systems with over
10 Terabytes of storage. Gigabit Ethernet (Cisco), Infiniband (Topspin),
and Fiber Channel (IBM) comprise the varied technologies used
for interconnection and switching between the compute and storage nodes.
In addition to academic use, North Carolina commercial
organizations may also use the MCNC Enterprise Grid as a fee-based service for
research purposes. Charges are based on a per CPU-Hour basis or negotiated rates
for dedicated access. Services include up to 2 gigabytes of home directory
space and a selection of software packages.
Researchers are using the grid resources for a variety of
tasks, including scientific modeling and analysis. The on-demand
utility computing services model allows customers to pay only for what
they need, when they need it. The shared resources reduce the requirement
for large investments in high performance computing hardware and support
staff at businesses and universities.
MCNC's Enterprise Grid supplements the N.C. BioGrid test
bed. It is a resource for the development of a new Grid Technology
Evaluation Center and the North Carolina statewide grid.
The Grid Technology Evaluation Center
The Grid Technology Evaluation Center (GTEC) is another
development that emerged from MCNC's experience gained with the N.C. BioGrid.
MCNC is working with commercial industry partners to develop the center,
which will further address the challenges associated with moving
grid technologies from the research lab and test bed environment to core
enterprise infrastructure and of the emerging "Next Generation Internet"
that delivers a new generation of digital consumer services.
The GTEC will facilitate, enhance, enable, and expedite the
development and deployment of grid computing infrastructure and services through:
- A test bed supporting integration, experimentation, development, and training in grid deployment across multiple research disciplines and applications.
- A platform for the integration, testing, and development of grid infrastructure, middleware, applications, API's and other grid-related technologies.
- A facility to support grid development activities.
-
-
-
GTEC services will include application benchmarking, interoperability verification, systems integration (including integration with legacy systems), and
operational training.
MCNC's Related Grid Research & Development
As an emerging technology, it will take years before the ubiquitous use
of grid computing on MCNC's statewide network is realized. As early
grid technology adopters and active participants in standards
bodies, MCNC has been able to identify the challenges in deploying, operating,
and scaling a grid infrastructure beyond the test bed phase. MCNC Research
& Development Institute's grid-related research focuses on filling the gaps in existing grid-ware to address
these challenges, as shown in the accompanying illustration.
Some of the research efforts include:
Grid Middleware
GridIR: A grid-based information retrieval system that provides
a scalable framework to uniformly search and retrieve public
and private diverse data across the grid while allowing local control on
the data. MCNC is also actively involved in the Global Grid
Forum (GGF), forming and participating in the GGF GridIR working
group to standardize the interfaces for providing information
retrieval capability in a grid environment.
GridScope: An effort to build a grid monitoring and tracking tool
that presents a logical view of the interactions within and
between grid applications and captures the grid interactions as well.
Cluster-on-Demand (COD): A system to enable rapid,
automated, on-the-fly partitioning of a physical cluster into multiple
independent virtual clusters.
Security Infrastructure
Most organizations today have firewalls around their
organizational computer resources to protect their sensitive and proprietary data.
Grid topologies span multiple administrative domains with autonomous
security mechanisms. Unlike the Internet, the grid allows an outsider complete
access to the resource, thus increasing the risk associated with it. The central idea
of grid computing to enable sharing of resources across existing
organizational and geographical boundaries makes
it difficult to use existing security mechanisms such as firewalls on
the grid.
Though organizations may be willing to share resources and data
with others for collaborative or monetary reasons, information assurance must be guaranteed for participation in a
grid environment. It is imperative to bridge the gap between different
security mechanisms, while providing local autonomy.
Joint Control of Virtual Organizations
(JoVO): JoVO seeks to address difficulty of mapping
into virtual organizations some of the typical social and political
arrangements that are associated with shared resources that need joint control.
The objective of JoVO is to develop a scalable, reliable, distributed
identity, authentication and authorization infrastructure to facilitate
secure collaborations in grid environments. The perceived JoVO framework
will enable multiple parties to form a virtual coalition with jointly agreed
and enforceable rules to enable timely information sharing and
collaborative processing. The joint control of identity, attributes and access
control policy is achieved through the use of threshold-based
certification authorities. The framework is a
public key infrastructure (PKI) that is both fault and intrusion tolerant.
The technical approach to JoVO is based on MCNC's completed
research project funded by the Defense Advanced Research Projects
Agency (DARPA).
SITAR: The need to provide information assurances for data
and applications necessitates the need for intrusion and fault tolerant
security capabilities. SITAR (Scalable Intrusion-Tolerant Architecture)
is designed to ensure that critical services and applications remain
operational, even while under attack.
SITAR, a completed DARPA- funded project, is an
extensible framework that incorporates the fault tolerant concepts of
redundancy, diversity, and ballot voting along with adaptive reconfiguration and
proactive monitoring for extending fault tolerance to distributed services.
Its fault tolerance approach focuses on detecting and mitigating the effects
of known and unknown intruder attacks that attempt to interrupt
service availability.
Network Provisioning
High-speed, on-demand, application-initiated provisioning
of bandwidth will improve the efficiency and reduce the latency in a
grid network. In January, MCNC announced the successful
demonstration of an optical network provisioning protocol to enable more
efficient computing applications. The demonstration of the Just-in-Time
(JIT) protocol for provisioning and managing light path connections in
the all-optical Advanced Technology Demonstration Network (ATDnet)
in Washington, D.C., confirmed the viability of user-initiated,
ultra-fast provisioning of all-optical network connections. The light paths linked
host systems at the U.S. Department of Defense's Laboratory for
Telecommunications Sciences, the Naval Research Laboratory's Center for
Computational Science and the Defense Intelligence Agency.
With JIT, optical connections can be provisioned between sites in a
few milliseconds through microelectromechanical switches, and in a few microseconds when
faster photonic switches are deployed.
JIT research was partially funded by NASA and supported by
the Advanced Research and Development Activity, a Department of
Defense research and development community.

MCNC Grid Project Stack
Reaching the Potential of Grid Computing
In developing the N.C. BioGrid, MCNC created the foundation for
a statewide grid infrastructure that will support on-demand access to
resources and services. The MCNC enterprise grid, which provides core
computing and storage resources to researchers, serves as a model for integration
with and access to inter-domain and global grids. The GTEC provides a
platform for the development and integration of grid infrastructure, applications
and services.
Grid represents a fundamental turning of the technology crank and
is the next big thing in the evolution of the Internet. MCNC is
moving aggressively to accelerate the integration of grid technology into
both the fabric of North Carolina's Internet infrastructure and into the fabric of
the new economy.
About MCNC
MCNC is a private, independent, non-profit corporation established
in 1980 to advance technology-led economic development and
job creation throughout North Carolina. MCNC Research &
Development Institute develops new technologies through its own initiatives and as
a research partner for private industry and the U.S. government,
conducting advanced and applied research across a broad technology spectrum,
including microsystems, flexible electronics, sensor development, signal
electronics, wireless systems, microfabrication, high-speed secure networks and
grid computing. MCNC Grid Computing & Networking Services delivers
advanced communications resources statewide to more than 180 public and
private institutions. MCNC Ventures provides early-stage funding and assistance
to entrepreneurial start-up companies. The MCNC family of companies is
located in North Carolina's Research Triangle Park. For more information,
please visit www.mcnc.org.
About the Authors
Phil Emer is a senior member of MCNC Grid Computing &
Networking Services' Advance Technologies Group technical staff and program director
for The Grid Technology Evaluation Center. He has spent nearly 15
years working at the intersections of networking, research and academia.
He was the chief architect of the N.C. BioGrid, managing the development
of a heterogeneous, multi-institutional, grid test bed that spans MCNC,
N.C. State University, Duke University, and the University of North Carolina
at Chapel Hill.
Chuck Kesler is a program manager for MCNC's Grid
Computing & Networking Services' grid deployment and data center
services. He provides technical architecture and project management for MCNC's
grid computing and hosting initiatives. His activities have included
spearheading the deployment of the North Carolina BioGrid test bed and leading
a collaborative grid infrastructure working group that
includes representatives from the local university community.
Lavanya Ramakrishnan is a research engineer for MCNC
Research & Development Institute. She is currently involved with various
grid and security projects, including the development of a security
infrastructure for grid applications. She is also involved with designing
portal-based interfaces to grid functionality and development, testing and evaluating the use of various grid middleware to
be deployed on several grid test beds, including the NASA-funded
Virtual Collaborative Center and N.C. BioGrid. She serves as a senior
leader for the `Cluster On Demand' project, a collaboration with Duke
University funded by the NSF Middleware Initiative, to develop a grid service
for dynamic virtual clusters. She is also actively involved in the
security working groups at the Global Grid Forum (GGF).
Contact Information
Scott Yates
Director of Corporate Communications
MCNC
P.O. Box 13910
3021 Cornwallis Road
Research Triangle Park, NC 27709
PH : (919)248-1907
FAX : (866)773-5617
E-mail : syates@mcnc.org |