Many problems in scientific computing and mathematical modeling can be reduced to solving large systems of linear equations. Since such systems might be very large (having millions of unknowns), they are often solved on supercomputers. Current supercomputing consist of several hundred thousand of processors, and thus their energy consumption skyrocketed. Therefore, it is a very important problem to optimize energy consumption of scientific computations.
TOP500 List is a biannually updated list of world's fastest supercomputers, which are measured by the number of FLOPS (floating-point operations per second) in solving a dense system of linear equations using Gaussian elimination. Green500 List tries to solve the problem of energy efficiency by solving the same problem but using a different metric of FLOPS/W.
The issue is that both metrics promote power-hungry algorithms instead of other more energy-efficient techniques. By using metrics based on both time-to-solution and energy, it is possible to get a much more realistic picture, which can drive and help the design of future energy-efficient machines and algorithms. In [1], we compare these metrics on an example of solving dense positive definite linear systems using different methods (Cholesky decomposition, CG, specialized iterative method). Our results show clearly that standard metrics are not able to capture the real behaviour of applications, and this is mainly due to the fact that they rank methods and machines on the basis of FLOPS and FLOPS/W rather than real quantities of practical interest, i.e. time-to-solution and energy:
To obtain these data, we develop a fully automated, online, minimally invasive system for measuring power performance on IBM POWER7. Each POWER7 chip has several embedded thermal sensors that measure the temperature of the different components. Data from sensors are transferred to a separate service processor that is embedded in the POWER7 server. We get detailed power measurements as shown below:
Wilkinson's iterative refinement (IR) allows to solve a linear system in high precision while performing most of computational operations in low precision. Let x = 0 be the initial solution and r = b. It works in three steps which are repeated till r is small enough:
It was pointed out by Bekas et al. that a relatively low accuracy of z in step 1 leads to a similar number of iterations as high accuracy:
In [1], we introduce something called an approximative iterative refinement (AIR). It is a modification of IR, solving in step 1 a linear system A'z=r where A' is some approximative matrix. In the standard IR, we may view A' as the low precision approximation of A. Different class of solvers can be obtained when A' contains the key structural elements of A. In [1], we solved model dense linear systems with decaying structure (with largest values around the diagonal). Then a good approximation A' is the choice of a banded submatrix of A, which leads to finding the solution extremely fast.
Also, we have found the surprising connection to the splitting methods. When A = S-T, then Ax=b leads to Sx=Tx+b, or x = S^{-1}Tx+S^{-1}b. The splitting methods iterate the last equation as x_{n+1} = S^{-1}Tx_{n}+S^{-1}b. The surprising connection is that the splitting methods are precisely AIR with A'=S. This allows to give a clear explanation of the splitting methods, which I was using for several years in my advance linear algebra classes.