Using the SVD to solve linear sy

Using the SVD Algorithm to Solve Linear Systems

Rodger Rosenbaum

I'm going to discuss the Singular Value Decomposition (SVD) and some of its properties. I'll be doing calculations on the HP49G+ calculator; the HP48G will give the same results, but I'm not so sure about the HP49 anymore!

The book, "Linear Algebra and its Applications", by Gilbert Strang is a good reference for those who want more information. Another very nice reference on the web is the paper:

http://www.american.edu/academic.depts/cas/mathstat/People/kalman/pdffiles/svd.pdf

I will refer to the length of a vector and the Euclidian norm of a matrix; these are calculated by the ABS command in the HP49G+ (Frobenius norm, FNORM, in the HP71). Another matrix norm is the spectral norm which is the largest singular value of a matrix, calculated by the SNRM command. The (Euclidian) distance between a couple of matrices is calculated by subtracting the two matrices and then applying the ABS command (the matrices have to be conformable for subtraction; the same size, in other words). I will be referring to pure real matrices unless I state otherwise.

Given a square n x n matrix M and an n-dimensional vector V, it is well known that forming the product M*V gives an result vector V_r which, in general, is rotated and changed in length with respect to the original vector. For a concrete example, consider the 2 x 2 matrix M:

[[ 1 2 ]
[ 3 4 ]]

and the vector V, [ 1 0 ]. This vector is oriented along the x-axis on the 2-dimensional plane. After the multiplication by M, the result is [ 1 3 ]. The length is increased by a factor of 3.16 and its direction is changed--it no longer points directly in the direction of the positive x-axis. Imagine that we start with the unit vector
[ 1 0 ] and rotate it slowly in a counter-clockwise direction; the tip of the vector will describe a circle of radius 1. Now, as the vector rotates, multiply it by the matrix M at each position, and plot the curve described by the tip of the result vector. This curve will not be a circle, because as the position of the input vector changes, the transformation caused by multiplication by M also changes. For some positions of the input vector, the multiplication causes the vector to become longer, as in the example above. For other positions, the multiplication by M causes the input vector to become shorter, such as for the vector [ .8 -.5 ] which becomes [ -.2 .4 ]. The curve plotted is an ellipse showing the maximum and minimum change in length of the input vector after multiplication by the matrix M.

The same thing happens for matrices and vectors with more than 2 dimensions. If we could plot the magnification factor caused by the matrix multiplication for all possible vectors in the hyperspace, it would be a hyper ellipse in n-dimensional hyperspace. For a typical matrix, the hyper ellipse would have a maximum size in one particular direction, and along the orthogonal directions relative to the maximum, we would find other multiplication factors.

The SVD can be applied to any matrix, square or rectangular, real or complex. The result of the decomposition as traditionally described in texts is three matrices, U, S, and V (the texts use the capital greek letter sigma instead of S; also the matrix V is usually the transpose of the V returned by the HP49G+, but I will describe what the calculator does).

U and V are orthogonal matrices; that is, they are square matrices whose columns and rows are of unit length, and the rows and columns are linearly independent of one another (orthogonal matrices have the property that when THEY multiply a vector, they don't change its length). In texts, the matrix S is a diagonal matrix with the singular values on the main diagonal, and zeroes elsewhere. From an m x n matrix initially on the stack, the calculator returns on the stack, U V S, with S as a vector rather than a diagonal matrix; the singular values are sorted in decreasing order in the vector S. The number of singular values is the lesser of m and n, and some of them may be zero.

Given an input matrix A, the relationship that holds is given by this expression: A = U * DIAG(S) * V. If the input matrix, A, is an m x n dimensional matrix, then the U matrix is m x m, the V matrix is n x n, and the vector S returned by the calculator must be converted to an m x n diagonal matrix with the singular values on the main diagonal before the multiplication can be carried out.

So, just what are those singular values? Above we plotted a hyper ellipse showing the magnification factor caused when a matrix multiplies all possible vectors. The singular values are just the magnification factors in n orthogonal directions in n-dimensional hyperspace. To use our concrete example above, the 2 x 2 matrix has 2 singular values, namely

5.46498570422 and .365966190626.

This means that the largest magnification caused by that matrix is 5.464..., and the smallest magnification (maximum shrinkage) is .365... We saw that the effect caused by the matrix multiplication depended on the vector we operated on, so we might ask, what vector would experience the maximum magnification of 5.464...? The answer is given by the SVD (I'll explain the details later). The vector

[ -.576048436766 -.817415560470 ] (which is a unit length vector) becomes:

[ -2.21087955771 -4.99780755218 ] after premultiplication by the matrix:

[[ 1 2 ]
[ 3 4 ]]

and we can determine the length of this vector with the ABS command to be 5.46498570422, exactly equal to the largest singular value.

If a square matrix is singular, its determinant is zero, and its columns are linearly dependent. This linear dependence of the columns means that some combination of the columns is zero. How can we find the particular combination of columns that adds to zero? The SVD, of course!

If a square matrix is singular, then at least one of its singular values will be zero. The number of non-zero singular values is the rank of the matrix. In floating point arithmetic, we may not be able to exactly represent a matrix we want to work with. Let's say that a particular element of a singular matrix should be 1/3, but because of the limitations of floating point arithmetic, this element is represented as .333333333333 on the HP49G+. This peturbation will probably make the matrix as represented in the calculator not singular. But the SVD will have an extremely small singular value in the S vector, so we will know that the matrix should be considered "effectively" singular. In fact, it can be shown that the smallest singular value is the distance to the nearest singular matrix! Thus, the SVD provides us with a way to know just how close to being singular the matrix is.

Another property of the singular values is that their product is the determinant of the matrix, if it's a square matrix; this is why at least one of the singular values is zero if the matrix is square and singular. If the matrix is not square, the presence of zero singular values means that the matrix is rank deficient in the smaller of m or n. Also, the singular values are always positive numbers, even if the matrix is a complex matrix.

We can use the SVD to find the inverse of a matrix. The original matrix is reconstituted by the product U * DIAG(s) * V. If we form the product,

TRANSPOSE(V) * DIAG(1/s) * TRANSPOSE(U),

it will be found to be the inverse of the original matrix. The expression DIAG(1/S) means to form an n x m (not m x n) diagonal matrix, with the reciprocals of the singular values on the main diagonal. Since the SVD can be applied to a rectangular matrix, with m not equal to n, we might wonder what we get if we form that product with the SVD of a rectangular matrix. In that case, the result is called the pseudo inverse. The pseudo inverse can be used to solve least squares problems and ill-conditioned systems.

Premultiplication of a vector by a matrix A is the way we get a linear combination of the columns of the matrix. Each component of the vector multiplies a column of the matrix and then the products are added up like this: let the columns of the vector be denoted C₁, C₂, C₃,...C_n and the components of the vector are V₁, V₂, V₃,...V_n. Premultiplying the vector by the matrix forms the linear combination V₁*C₁ + V₂*C₂ + V₃*C₃...+V_n*C_n, the result of which is a vector. If this vector is all zeroes (length zero), we say that the columns are linearly dependent. If the result vector is nearly zero, then the matrix is nearly singular (or rank deficient if it isn't square).

The singular values can all be formed from some linear combination of columns of the matrix A, whose SVD we have. Extract the first row of the V matrix from the SVD into a vector; premultiplying this vector (which has unit length) by the matrix A will give a result vector whose length is the first (largest) singular value. We have thus found a vector which demonstrates the magnification factor which is the first singular value. The second row of the V matrix will give the second singular value and so on. If any of the singular values are zero, such as the last one, the corresponding row of the V matrix will give the combination of the columns of A that result in a zero vector for a result. The singular values can also be formed from linear combinations of the rows of matrix A in a similar way by using the columns of the U matrix.

If the matrix A is very close to a singular matrix, then the procedure of the previous paragraph can find the vector of unit length which comes closest to solving the system Ax = 0.

Solving Ill-Conditioned Systems

In this section, I'm going to show how to use the SVD to solve ill-conditioned systems. I'll be using the HP49G+ to do the computations. I have a special directory for trying out various SVD related schemes. Valentin's original ill-conditioned system will be solved. I created several variables in this directory, namely AA, BB, A, B, U, V and S. I use AA and BB to hold backups of Valentin's A and B matrices. The A and B variables hold the linear system under investigation and U, V and S hold the SVD of the A matrix. Sometimes A and B will hold AA and BB with small changes, so having AA and BB save one from having to type in Valentin's system again.

Some programs used are as follows (sometimes I leave results on the stack, sometimes not, with no particular organizational scheme in mind!):

→SV (this program computes the Singular Value Decomposition of a matrix on level 1 of the stack, and leaves the results, U V S, on the stack)

« SVD »

SV→ (this program expects the same objects on the stack that SVD would leave there, namely U V S. It then makes a suitable diagonal matrix from S and multiplies the 3 objects, re-creating the matrix whose SVD was in U, V and S.)

« 3 PICK SIZE OBJ→ DROP2 3 PICK SIZE OBJ→ DROP2 2 →LIST DIAG→ SWAP * * »

PSU (The product TRANSPOSE(V) * DIAG(1/s) * TRANSPOSE(U) is computed. This program expects an integer N on the stack. It then recalls U, V, and S from the variables and computes a pseudo inverse for the matrix whose SVD is stored in those variables. This pseudo inverse can be rank reduced by means of the integer initially on the stack when PSU is invoked. What it does is to zero N of the reciprocals of the singular values starting with the smallest and working toward the largest. If the matrix SVD stored in the variables is of a square matrix, and if N is 0, then the ordinary inverse is returned.)

« → N << U TRN V TRN S OBJ→ OBJ→ DROP →LIST 1E-499 ADD INV OBJ→ 1 SWAP 2 →LIST
→ARRY DUP SIZE OBJ→ DROP N - 2 →LIST { 1 1 } SWAP SUB 3 PICK SIZE OBJ→ DROP2 3
PICK SIZE OBJ→ DROP2 SWAP 2 →LIST DIAG→ ROT * * »

Cond (this program computes the condition number of a matrix on the stack as the ratio of the largest to the smallest singular values. Be sure to spell the name with a capital "C" and lower case "ond" so as not to conflict with the built-in COND command.)

« SVL DUP 1 GET SWAP DUP SIZE 1 GET GET / »

MADJ (This program mean adjusts the columns of a matrix on the stack. It computes the mean of each column and subtracts that value from all the elements of the corresponding column.)

« DUP TRN DUP SIZE OBJ→ DROP→ c r « r 1 2 ?LIST 1 CON * r / c DIAG→ c r 2 →LIST
1 CON * TRN - » »

CORM (This program computes the column-wise correlation matrix of a matrix on the stack. It uses MADJ. It only works right for a square matrix, but it doesn't check for this.)

« MADJ DUP TRN SWAP * DUP →DIAG OBJ→ OBJ→ DROP →LIST SQRT INV OBJ→ 1 →LIST →ARRY
DUP SIZE OBJ→ DROP DIAG→ SWAP OVER * * »

Three other programs I gave in another post, on the HP Museum web site (visit www.hpmuseum.com), named IREF, SRREF and TST0 should be in the directory, too.

First, type Valentin's system into AA and BB; the 7 x 7 design matrix into AA and the 7 x 1 column matrix into BB; then recall AA and BB and store them into the A and B variables. Recall A and press Cond and see 3.17E12; this is a very ill-conditioned system, and the usual rule of thumb says that you will lose about LOG10(3.17E12) ~= 12 digits in your solution. Fortunately, the HP49G+ uses 15 digits internally to solve the system with the B/A method, so we actually get a reasonable solution Clear the stack.

Type A →SV and when it's done (takes a while), use the blue left arrow key as a short-cut for store, and store the 3 objects on the stack into the variables S, V, and U in that order. Type U V S SV→ and see the re-constituted matrix A. Recall A and type -. You can see some tiny residual errors from all the number crunching that has occurred. Clear the stack.

Now type 0 PSU. You will see the inverse of A on the stack. (this assumes that the SVD of A is stored in U, V, and S. If not, then you will see the (pseudo) inverse of whatever matrix had its SVD stored in those variables.) Then recall B to the stack and press *. You have just calculated INVERSE(A) * B, which is a traditional way to solve the system. You won't get the same solution as if you calculate the inverse of A with the 1/x key because the A matrix is so ill-conditioned that numerical errors are becoming overwhelming, and cause differences in the two methods of computing the inverse. Recall the variable S and look at the last element of that vector (the smallest singular value of the matrix A); it is 1.257E-11. This is the distance to the nearest singular matrix; the small size of this number is an indication of severe ill-conditioning. Clear the stack.

When PSU computes the reciprocals of the singular values to create DIAG(1/s), the last one is very large, namely, 7.955E10. This is many orders of magnitude larger than any of the other singular value reciprocals, and its size is the reason the x = INVERSE(A) * B method has difficulties. If the solution vector x computed by this method is used with a perturbed A matrix to compute A * x = B' and the residual (B'-B) is computed, it can be shown that the perturbations can be magnified by a factor about equal to that very large last element in DIAG(1/s).

We can eliminate this difficulty by putting the number 1 on the stack and then executing PSU; then recall B and press *; this replaces that very large reciprocal of the last singular value by zero. We now have a solution for the system that has avoided the ill-conditioning of the nearness of the matrix A to a singular matrix. This solution has avoided the magnification of small perturbations (errors) in the A matrix at the expense of the size of the residual. If the solution is computed with the B/A method, the residual is 0. As we compute successive solutions with N PSU B * (N starting at 0 and increasing 1 at at time), the error magnification decreases because the largest singular value reciprocal in DIAG(1/s) is smaller, but the residual size increases. The residual size obtained with 1 PSU B * B - ABS is 2E-9. We can go further; compute 2 PSU B * B - ABS, dropping the last two singular value reciprocals from the solution, and we get a residual size of 7.6E-2. We can go further still, but the size of the residual should not be much larger than the expected error in the elements of the A matrix. Since the A matrix was a series of measurements with only two significant digits of accuracy, the error in each element should be taken to be +-.05 (1/2 LSD), so the solution given by 2 PSU B * is probably acceptable.

To see the power of the SVD method of solution, modify the A matrix (in the variable A) so that column 7 is identical to column 3. Now the condition number of the system is infinite and it cannot be solved by the B/A method, or the INVERSE(A)*B method, no matter how many significant digits your calculator or computer has. But type A →SV and then store the U V S objects off the stack into the variables U, V and S. Now type 1 PSU B * and see a solution with a residual size of about 3.7E-9, a quite respectable result.

Try modifying the A and B variables as I suggested in earlier posts and experiment with solutions you get with the SVD method vs. the traditional methods.

You can use the CORM command to see which of the columns of a matrix are highly correlated. If you want to see how well the rows are correlated, transpose the matrix first. Recall the AA matrix and transpose it; then type CORM. You will see that the {7 3} element of the correlation matrix is .999035, indicating that rows 3 and 7 are closely correlated. This is a clue as to how Valentin created this matrix. Normally, a random matrix won't have such a high correlation between any 2 rows. Test this by executing {7 7} RANM CORM repeatedly and examining the off-diagonal elements; you won't see .999 often.

Note from the editor: This article first appeared as multiple messages on a discussion forum in the HP Museum web site (www.hpmuseum.com), which is a web site that deals with vintage and new HP calculators. Rodger Rosenbaum has given his permission to publish the material on this web site.

BACK