The Trends Of Supercomputing

quantum_computerComputers have come a long way even in just the last 5 years.  But when you expand your reach and look at the trend of computer speeds over the last 20 years it’s quite mind boggling.  Expand that to 30 years and…well, you get the picture.

This article is a great example of how it can be really fun to look back on how far we have come.  It points out how they saw supercomputing back in the 1980’s, and their interesting predictions for the future.

Trends in Supercomputer Performance and Architecture

Improvement in the performance of supercomputers in the 1950’s and the 1960’s was rapid. First came the switch from cumbersome and capricious vacuum tubes to small and reliable semiconductor transistors. Then in 1958 a method was invented for fabricating many transistors on a single silicon chip a fraction of an inch on a side, the so-called integrated circuit. In the early 1960’s computer switching circuits were made of chips each containing about a dozen transistors. This number increased to several thousand (medium-scale integration) in the early 1970’s and to several hundred thousand (very large scale integration, or VLSI) in the early 1980’s. Furthermore, since 1960 the cost of transistor circuits has decreased by a factor of about 10,000.

The increased circuit density and decreased cost has had two major impacts on computer power. First, it became possible to build very large, very fast memories at a tolerable cost. Large memories are essential for complex problems and for problems involving a large data base. Second, increased circuit density reduced the time needed for each cycle of logical operations in the computer.

Until recently a major limiting factor on computer cycle time has been the gate, or switch, delays. For vacuum tubes these delays are 10(-5) second, for single transistors 10(-7) second, and for integrated circuits 10(-9) second. With gate delays reduced to a nanosecond, cycle times are now limited by the time required for signals to propagate from one part of the machine to another. The cycle times of today’s supercomputers are between 9 and 20 nanoseconds and are roughly proportional to the linear dimensions of the computer, that is, to the length of the longest wire in the machine.

Figure 2 summarizes the history of computer performance, and the data have been extrapolated into the future by approximation with a modified Gompertz curve. The asymptote to the curve, which represents an upper limit on the speed of a single-processor machine, is about 3 billion operations per second. Is this an accurate forecast in view of developments in integrated circuit technology? Estimates (9, 10) are that a supercomputer built with Josephson-junction technology would have a speed of at most 1 billion operations per second, which is greater than the speed of the Cray-1 or the CYBER 205 by only a factor of 10.

Thus, supercomputers appear to be close to the performance maximum based on our experience with single-processor machines. However, most scientists engaged in solving complex problems of the kind outlined above feel that an increase in speed of at least two orders of magnitude is required. If we are to achieve an increase in speed of this size, we must look to machines with multiple processors arranged in parallel architectures, that is, to machines that perform many operations concurrently. Three types of parallel architecture hold promise of providing the needed hundredfold increase in performance: lockstep vector processors, tightly coupled parallel processors, and massively parallel machines.

Vector processors may be the least promising. It has been shown that to achieve maximum performance from a vector processor requires vectorizing at least 90 percent of the operations involved, but a decade of experience with vector processors has revealed that only about 50 percent of the average problem can be vectorized. However, vector processing may be ideal for those special cases that are amenable to high vectorization.

The second type of architecture employs tightly coupled systems of a few high-performance processors. The so-called asynchronous systems that use a few tightly coupled high-speed processors are a natural evolution from high-speed single-processor systems. Indeed, systems with two to four processors are becoming available (for example, the Cray X-MP, the Cray-2, the Denelcor HEP, and the Control Data Cyber 2XX). Systems with 8 to 16 processors are likely to be available by the end of this decade.

What are the prospects for using the parallelism in such systems to achieve high speed in the execution of a single application? Experience with vector processing has shown that plunging forward without a precise understanding of the factors involved can lead to disastrous results. Such understanding will be even more critical for systems now contemplated that may use up to 1000 processors.

A key issue in the parallel processing of a single application is the speedup achieved, especially its dependence on the number of processors used. We define speedup (S) as the factor by which the execution time for the application changes, that is,”

Buzbee, B.L., and D.H. Sharp. “Perspectives on supercomputing.” Science 227 (1985): 591+.