ARCHIVES ON-LINE

SEARCH THE COLLECTION
For information on all members of the Collection, search by Category, Company Name, Nominating Company, Application, Country or Keywords according to your area of interest.

Scalable Concurrent Programming Lab's advantage in micro-electronics, satellite manufacturing and launch vehicle design
California Institute of Technology
Pasadena, CA
USA

Year: 1996
Status: Finalist
Category: Science
Nominating Company: Cray Research, Inc.

Parallel supercomputers help lower the costs of space operations by simulating the behavior of satellite launch vehicles, allowing flaws to be found in systems that cannot be tested on the ground because they create too much heat.
Satellites now provide us with a broad range of services including long
distance telephones, television, modern navigation used by commercial
airlines and ships, television, and weather surveillance. Launch
vehicles, such as the Titan IV and Delta II, are the U.S.A.'s primary
method for placing a satellite into orbit around the earth. Due to the
temperatures involved, it is not possible to fully test a launch vehicle
and its boosters on the ground. If a problem occurs on a flight, there
is little information available to allow engineers to isolate the
problem and suggest a fix. An alternative is to recreate the flight on a
computer using the basic laws of physics and aerodynamics.
Unfortunately, this is a complex, costly, and very time-consuming task.
As a result, until recently simulations have had little impact on real
missions. A capability has recently been developed that allows the
flight of a launch vehicle, with its boosters burning, to be simulated
using not one computer, but hundreds and potentially thousands of
computers. The computers are connected together using new information
technology that allows them to exchange information and cooperate to
achieve a common goal. In this case, the computers work together to
create a single flight simulation in a fraction of the normal time: A
simulation that might take months on a single computer, can execute in a
matter of hours on 500 computers. The simulations can be of direct use
to engineers who are attempting to resolve flight problems under the
time constraints of a mission. The capability was recently used in the
investigation of an anomaly that occurred on the flight of a Delta II
vehicle. This anomaly caused the release of a satellite into the wrong
orbit. A set of simulations were produced in less than two weeks that
showed how the heat from the boosters affected the flight of the
vehicle. Simulations of the flight with the boosters on and with them
off were produced to allow comparisons with recorded flight data. This
information was used by the investigating team to understand and resolve
the issue for future flights. The simulations were conducted using 512
computers coupled together in machine called the Cray T3D. This machine
uses fast switches to allow the computers to exchange information. The
simulation capability can operate on a broad range computers and the
computers can be connected using a wide variety of networking
technologies. The capability has already had direct impact on our
industrial competativeness in the satellite launch business. As
information technology improves, the capability will allow increasing
large number of computers to be brought to bear on problems associated
with vehicle flights. This increased computing power not only allows
faster simulations, but also more accurate simulations. Thus we expect
that in the future the capability can provide direct improvements in
both the cost and safety of space travel.
The capability has already had direct impact on our industrial
competativeness in the satellite launch business. The capability was
recently used in the investigation of an anomaly that occurred on the
flight of a Delta II vehicle. This anomaly caused the release of a
satellite into the wrong orbit. A set of simulations were produced in
less than two weeks that showed how the heat from the boosters affected
the flight of the vehicle. Simulations of the flight with the boosters
on and with them off were produced to allow comparisons with recorded
flight data. This information was used by the investigating team to
understand and resolve the issue for future flights. The simulations
were conducted using 512 computers coupled together in machine called
the Cray T3D.

As information technology improves, the capability will allow increasing
large numbers of computers to be brought to bear on problems associated
with launch vehicle flights. This increased computing power not only
allows faster simulations, but also more accurate simulations. Thus we
expect that in the future the capability can provide not simply an aid
to fixing problems but a tool for designing new vehicles. This can
provide direct improvements in both the cost and safety of space travel.
The capability leverages two new information technologies: 1) SCPlib --
a new software technology developed at Caltech for programming large
numbers of computers and 2) new scalable parallel computer technology --
machines that are able to grow by adding more and more computers to them
with advanced networking technology. The former provides an efficient
method for the design of new programs that operate on potentially
thousands of computers. This technology is capable of automatically
redistributing work among the computers while the program is executing
so as to increase performance. It also allows the program to execute on
many different types of parallel computer. These new computers include
multicomputers like the Intel Paragon or Cray T3D, shared memory
multiprocessors such as the Silicon Graphics Power Challenge, and a
broad variety of networked workstations such as the IBM RS6000, Sun
Sparcstation, Hewlett Packard 700 series, and Silicon Graphics Indego
family. To cope with the differences in these machines, programs must
incorporate methods to trade communication on the network for additional
work to be executed on a computer.
The capability is unique in both its scaling and portability aspects and
utilizes state-of-the-art parallel computing techniques. We are not
aware of other capabilities that have been used successfully for
simulations of launch vehicles under mission critical time-scales. The
numerical techniques for solution of the physics grew out of a
proprietary uniprocessor code that was operated on Vector Supercomputers
by The Aerospace Corporation. This original code expressed the physics
to be solved but could not operate on parallel machines. It took four
years to develop the parallel code, include the physics from the
original code, validate it on standard tests, and apply the new parallel
code to real launch vehicle problems. This period involved periodic
redesign to take advantage of changing technology.
The application is fully functional and in constant use by scientists at
The Aerospace Corporation. It has achieved the goals of the program,
namely to create a capability that can scale to utilize thousands of
computers and drastically increase the turnaround time for simulations.
At the outset, four years ago, the work was considered genuine research
in new technology and although potentially viable, was not then expected
to directly contribute to missions. The primary beneficiaries are anyone
using satellite technology -- without reliable launch technology these
devices cannot be maintained in service.

In future we plan to develop versions of the capability than can cope
with vehicles that separate during flight. This moving boundary type of
problem will allow us to broaden the scope of possible simulations that
can be performed.
There were three primary difficulties:

1) Communication between individuals. The team is multidisciplinary in
nature leveraging skills in parallel computing, graphics, grid
generation, networking, and physics. Each individual speaks a different
language from a technical perspective and comes with a different
educational background. As a result, it is difficult to express an idea
and have that idea understood by other members of the team. Moreover,
since every team member is not an expert there is a significant gap in
understanding. Much patience, practice, and hard work is required to
overcome the prejudices and frustrations of communication so as to
couple technologies effectively.

2) Technology Evolution and Design. There have been numerous types of
parallel computing and information technology available over the last
five years. A pressing task has been to evolve the ideas both and
underlying software technology to keep up and take advantage of this
changing hardware technology. Today, the capability can operate on
virtually every interesting hardware platform available, and can scale
to machines that include thousands of computers. However, this has been
achieved only through a constant reevaluation of direction in light of
changes in the underlying information technology. Software engineering
to facilitate this change required a substantial investment in time.

3) Available Machine Resources. Even today there remain only a few
places in the world where sufficiently large machines exist to be able
to conduct launch vehicle simulations. As time has proceeded the number
of large machines has increased, however, so has the number of competing
groups who require access. As a result, large blocks of machine hours to
carry out complex simulations are still difficult to arrange. Typically
it requires many long hours of details setting up networks, disks, usage
quotas etc. However, the cost of the basic technology is decreasing
every year due to the explosion in commodity microprocessor use. As a
result, we expect large simulations to be commonplace in the next
century.