ARCHIVES ON-LINE

SEARCH THE COLLECTION
For information on all members of the Collection, search by Category, Company Name, Nominating Company, Application, Country or Keywords according to your area of interest.

Scalable/Technical Computing
Pittsburgh Supercomputing Center
Pittsburgh, Pennsylvania
USA

Year: 2004
Status: Finalist
Category: Science
Nominating Company: HP

America’s most powerful supercomputer dedicated solely to public research supports dramatically more detailed understanding of earthquake intensity and how it varies with local subsurface conditions, improving the relevance of building codes in earthquake zones.
Earthquake Modeling to Save Lives

The Northridge earthquake, which struck the densely populated San Fernando Valley of northern LA in January 1994, was the second time in 60 years that the earth ruptured directly beneath a major U.S. urban area. By the time the dust settled and officials assessed the toll, 57 people were dead, more than 1,500 seriously injured. Collapsed freeways choked traffic for days. Over 12,000 buildings and 170 bridges sustained moderate to severe damage. Total economic loss was estimated at $20 billion.

But for building codes that require earthquake-resistant structures, fatalities and damage would have been much worse. Other recent urban earthquakes tell a similar tale: Mexico City, 1985 -- 300 collapsed buildings, nearly 10,000 people dead. San Francisco, 1989: $5 billion in loss, 62 dead. One of the main lessons of these urban earthquakes has been the need for better information about where and how much the ground will shake.

Studies from these quakes have shown that the severity of ground shaking and consequent damage patterns vary significantly within relatively small areas. Even from one block to the next, the level of shaking can change dramatically due to types of subsurface soil and rock and other geological characteristics and the nature of the seismic waves.

Jacobo Bielak, Omar Ghattas and David O'Hallaron of Carnegie Mellon University lead the Quake Group, a large multi-university collaborative research team. Using sophisticated computational methods, they work to create realistic 3D models of earthquakes in geologically complex basins. Their objective is to provide accurate forecasts of earthquake ground motion as a necessary step toward building codes that provide for the safest possible structures at reasonable cost.

Using LeMieux, PSC's terascale system, along with innovative algorithms that maximize their ability to use it well, the Quake Group carried out the most detailed and most accurate simulations of the Northridge earthquake - at twice the frequency of prior models. They’ve also made major inroads on an important problem called "the inverse problem," the goal of which is to determine subsurface geology by working backward from ground motion measurements on the surface.

A Unique Tool for Science

LeMieux (French for "The Best") was developed by PSC through a grant from the National Science Foundation to design and operate the first NSF terascale computing system. Terascale refers to computational power beyond a "teraflop" -- a trillion calculations per second. While several terascale systems have been developed for classified research at national laboratories, the PSC system was, at its installation in October 2001, the most powerful system in the United States committed solely to public research.

PSC and HP collaborated to produce the highest performance possible with available funding. LeMieux is built from high-end microprocessors and exploits the reduced-cost benefit of commodity technology. Careful engineering was used to combine special hardware components and system software with commodity technology designed for a large market whose development is largely underwritten by sales into other markets.

LeMieux is distinguished from most other computer systems by its scale -- 3,000 high-performance Alpha EV68 processors. By itself, this scale confers unique capabilities both in total processing power (6 teraflops) and aggregate available memory (3 terabytes). It enables most computations to be done much faster than other systems, and this transforms the research paradigm in many important fields.
This work will save countless lives. One of the major goals of engineering seismology is to better predict the ground motion generated by earthquakes. The Quake Group has taken large steps in that direction by using LeMieux to simulate, for the first time, ground motion in the frequency range critical for structural design purposes.

Using all 3,000 LeMieux processors, itself a significant technical accomplishment, the Quake Group simulated the Northridge quake at one vibration cycle per second (1 Hz). Prior to this, earthquake modeling was limited to much lower frequencies, even though the higher frequency range, from 0.5 to 5 Hz, presents the greatest danger to "low-to-medium rise" structures - which include most city buildings, predominantly two-to-ten stories. Knowledge of the ground motion at higher frequencies will lead to more effective building standards.

With LeMieux, the Quake Group has also made significant progress on another key aspect of the earthquake problem, the inherent incompleteness of the subsurface geological information. Because of uncertainties in earthquake source and basin material properties, a critical challenge is to obtain these properties by seismic inversion from records of past earthquakes.

This problem -- the inverse problem -- is one of the important challenges of computational science and engineering, with potential applications in many fields such as medicine, energy, and atmospheric sciences. Using a sophisticated mathematical approach, the Quake Group, for the first time, solved a test case that proves the feasibility of this inverse approach.

This work, and the development of LeMieux, made it possible to demonstrate that as the power of computational platforms increases, new fields for important research open up that previously were simply not feasible for work.
LeMieux is information technology, and the availability of this resource was essential to the Quake Group's success at earthquake simulation.

The Quake Group also relied on information technology in creating innovative algorithms essential to achieving its results. Their adaptive-mesh techniques (see Originality) make it possible to achieve accuracy that would otherwise require 1,000 times more computing power. Their inverse algorithm uses sophisticated mathematical concepts to infer unknown basin geology and faults and implements this approach very efficiently for large-scale parallel computing systems.
The terascale computing system, LeMieux, is an original computing architecture with no prior models. The PSC-HP team selected node components to optimize trade-offs between cost and the many aspects of performance, including processing speed, memory size, memory bandwidth (transfer time) and memory latency (access time). A unique connection scheme was employed. A unique approach of integrated "hot spares" combined with "on-the-fly" spare implementation was developed to provide high availability of the total resource.

To accurately capture the wide range of ground motion in a large earthquake-prone basin like Los Angeles poses enormous challenges. One of the Quake Group's key strategies is to tailor their computational mesh - which divides the basin into hundreds of millions of subvolumes - to the local sizes of seismic waves. Compared to a regular mesh, this strategy results in three orders of magnitude reduction in the number of equations that need to be solved to obtain the same accuracy. To generate the large required meshes, they used disk space instead of computer memory. Their out-of-core algorithms can generate an extremely large mesh. For the recent simulations, they represented the basin as 100 kilometers on each side by 37.5 kilometers deep. Within this volume, their irregular mesh generates more than 900 million subvolumes, making their computations with LeMieux the largest unstructured mesh simulations ever done.

A straightforward approach for solving the inverse problem would require 10,000 years on a petascale computer (1,000 trillion operations per second)- which is not expected to become available until the end of the decade. The group’s innovative "preconditioned matrix-free" algorithms employ a special formulation that requires only hundreds of earthquake simulations, instead of the millions that would be required by the standard approach. This reduces the required solution time to 24 hours on theatrical machine, allowing solution of the 3D inverse problem for realistic basin properties, a feat unthinkable until now.
The Quake Group received the 2003 Gordon Bell Prize, the major prize in high-performance computing, in recognition of their work.

Their technical success included using 3,000 LeMieux processors simultaneously to achieve unprecedented performance on an unstructured mesh, more than one teraflop at over 80% efficiency. Their software is highly scalable -- able to use a larger number of processors with little reduction in per-processor performance.

Their award-winning simulations included an adaptive mesh of 900 million elements and 3.2 billion unknowns. This mesh resolved frequencies up to 1.85 Hz. Their inverse calculation involved 17 million inversion parameters and 70 billion total unknowns. This represents the largest nonlinear inverse problem ever solved.

LeMieux, installed on schedule in October 2001, is fully operational. This was achieved through a strong commitment from HP engineers and staff working side-by-side with PSC staff.
The key challenge in the development of LeMieux was that of scale. The unprecedented number of components required innovative solutions to extend hardware and software capabilities far beyond their design limits. Typically, connected systems comprise 32 to 64 nodes. The interconnect hardware was designed for a maximum of 128 nodes. LeMieux was constructed to use, continuously and simultaneously, 750 nodes.

Simulating high frequencies of earthquake motion presents exceptional difficulties. The complexity arises from several sources. First, multiple spatial scales characterize the earthquake source and basin response: the shortest wavelengths are measured in tens of meters and the longest measure in kilometers. Basin dimensions are on the order of tens of kilometers and earthquake sources are up to hundreds of kilometers. Second, temporal scales vary from hundredths of second required to resolve the highest frequencies of the earthquake source up to a several minutes for the duration of shaking within the basin. Third, many basins have highly irregular geometry. Fourth, the material properties of the soil are highly variable. Finally, geology and earthquake source parameters are observable only indirectly thus introducing uncertainty into the modeling process.

The group was faced with the enormous challenge of developing methods to overcome all of these difficulties and also to exploit the unique capabilities offered by the 3,000-processor LeMieux computing system. It took a full decade of work for the group to develop the effective algorithms that enabled the milestone calculations described in the previous sections.

Few applications have efficiently used all 3,000 processors of LeMieux simultaneously and the Quake Project has been exemplary at carrying out productive and landmark research at this scale. Performing the large-scale simulations that won the Gordon Bell Prize required a high degree of teamwork between the PSC staff and the Quake Group.