Interpretation of Matrix
When I study and implement machine learning algorithm, it is crucial and tricky to use matrix-matrix multiplication (which generalizes all matrix-vector and vector-vector) to speed up algorithms. However it is difficult to interpret the vectorized expressions, which needs strong linear algebra background. This post summarizes some basic concept in linear algebra and focuses more on the interpretation, which could be very helpful for us to understand some machine learning algorithms such as Neural Networks (just a chain of matrix-matrix multiplication).
1. N linear equation with n unknowns
Example with N = 2 (2-dimension):
Row picture
This is the way we often interpret the two equations above: just two line in 2-D space, and the solution is the point lies on both lines.

Column picture - linear combination of columns
Follow the column we can rewrite the equations above as follows:
We can interpret the above equation as linear combination of columns which are vectors in 2-D, and the + is overloaded for 2-D vector addition, as compared with scalar addition in row picture interpretation. The geometry is shown below.

When considering high dimension problem (say n = 10, i.e., 10 linear equation with n unknowns), it is not easy to imagine n-D space from Row Picture. However from Column Picture, the result is just the linear combination of 10 vectors.
2. Matrix multiplication as linear combination
Usually we do matrix multiplication is to get the result cell as the dot product of a row in the first matrix with a column in the second matrix. However there is a very good interpretation from linear combination aspect, which is a core concept in linear algebra.
2.1 Linear combination of columns of matrix
Representing the columns of matrix by colorful boxes will help visualize this as follows: (the picture is from Eli Bendersky)

For matrix multiply a column vector, the result is a column vector which is the linear combination of the columns of the matrix and the coefficients are the second vector. This idea can also be generalized to Matrix-Matrix multiplication, i.e., the columns of the result matrix is the first matrix multiply each column (vector) in the second matrix respectively. The following picture shows the idea.

2.2 Linear combination of rows of matrix
Similarly we can view the matrix as different rows.

For matrix-matrix multiplication, the rows of the result matrix is each row (vector) in first matrix multiply the second matrix. The idea can be represented graphically following:

2.3 Column-row multiplication
There is another interpretation of matrix multiplication from
The above result is a 3 by 3 matrix. And if we two matrix are m by n and n by p, the shape of the result matrix is m by p and the result is the sum of all the matrix (m by p) computed by all the
2.4 Block matrix multiplication
There is even another amazing interpretation of matrix multiplication. It is often convenient to partition a matrix A into smaller matrices called blocks. Then we can treat the blocks as matrix entries when do matrix multiplication.
which is equal to:
2.5 Elimination matrices
The linear combination interpretation of matrix multiplication is very useful for us to understand matrix transformation. Especially when we do row operation, we can achieve elimination to solve a system of linear equations. Take the following matrix multiplication for example AX=B, we want to choose the first matrix A in order to transform matrix X to B.
Recall that the row in result matrix the row in A multiply X, which is the linear combination of rows of X. Because the first and third row is the same, so the first and third row in A should be
2.6 Permutation of matrix
Exchange the two rows in a matrix A, we just need to multiply some matrix on the left as shown as follows. For example in the result matrix, the first row is the linear combination of the rows in the second matrix with respect to first row in the first matrix. What we want is the second row, so the second cell in the first matrix should be 1, and first cell should be 0, which has no contribution to the first row in the result matrix.
If want to exchange the columns, we just need to do column operation:
In short, if we want to do column operations the matrix multiplies on the right, and to do row operations, it multiplies on the left.
##3 Inverse or non-singular matrix
Suppose A is square matrix, and if
There couldn’t be an inverse for the first matrix above. We can think that the first and second column (vector) are in same direction, the linear combination of them can not be 0.
If matrix A does has inverse matrix, then Guass-Jordan elimination can solve it:
Another property of invertible square matrix is that you can exchange transposing and inversing for a singular matrix.
3. Vector Spaces
Vector space means the “space” of vectors, which should satisfy some rules, i.e., we can multiply any vector v by any scalar c in that space, that’s to say they can produce linear combination. For example,
Subspace is a vector space which contains some or all the vectors from another vector space. Subspace should be satisfy the definition of space (linear combination) and it is based on another vector space. For instance, there are 4 subspace of
In return, we can use some vectors or a matrix to construct vector space because all we need is linear combination. Given a matrix, the linear combination of all the columns of matrix from a space, which is called column space. For example:
The column vector of A is in
##4. Interpret
Take above equation for example, not for all b we can find a solution. Because b could be any vector in
Null space
Particularly for equation
5. Compute
Null space b = 0
We can do elimination for matrix A and get the matrix U, and then continue simplifying the U to get matrix R which is reduced row echelon form and R has the form of
b != 0
First we should consider whether the equation has solution or not, as mentioned above, Ax = b is solvable when b is column space of A, i.e., C(A). On the other hand, after finishing the elimination step, if the a combination of rows of A gives zero rows, the same combination of entries of b must give 0.
If the equation does have solutions, we can use elimination to find a particular solution. As long as we get one particular solution, the complete solution is the particular solution plus the any vector in the null space of A. that’s to say,
Solution discussion – m by n matrix A of rank r
Full column rank, i.e., r = n
- There are free variables
- The null space is {0}
- Unique solution if it exists (0 or 1 solution)
- The reduced row echelon form is
.
Full row rank, i.e., r = m
- It can be solved Ax=b for every b, because every row have a pivot and no zero rows.
- There are n - r free variables and there are infinite solutions.
- The reduced row echelon form is
Full column and row rank, i.e., r = m = n
- Invertible matrix of A
- Unique solution
- The reduced row echelon form is identity I.
Not full rank, i.e., r < m, and r < n
- There are no solutions or infinite solutions
- The reduced row echelon form is
6. Independent, span, basis dimension
Independent is used to describe the relation between vectors. Vectors
Vectors
We are more interested in the vectors spanning a space are independent, which means the right number or minimal number of vectors to span a given space and we use basis to indicate this idea. Basis for a vector space is segment of vectors with 2 properties: (1) They are independent; (2)They span a space.
For a given space such as
7. Orthogonal
Vector x is orthogonal to vector y, when
8. “Solve” when there are no solutions
We know that
When
For simplicity, we first consider projection from vector to vector (see the left diagram above). We use p to indicate the projection of b onto a and b is equal to some multiple of a, i.e.,
Notice that P is a n by n matrix if vector b and a have n elements and it is determined only by the vector a which we want to project onto. P is called projection matrix and we can interpret it as a function coming from vector a to project another vector to itself. Additionally we can observe that
Next we consider projection from vector to space (see the right diagram above), the plane is the column space of A = [a1 a2], the error e = b - p and it is perpendicular to A. The projection b to A is
From above equation we can interpret the (b-Ax) is in the null space of
9. Determinant
The square matrix is relatively easy to deal with, and the determinant is a number that associates with a square matrix,
- Determinant of I is 1
- Exchange two rows of a matrix: reverse sign of determinant.
- Linear combination of one row, and det (A+B) != det(A) + det(B)
- Two equal rows, then determinant is 0. We can get it by exchanging the same row.
- Subtract l * row i from row k, and the determinant doesn’t change, i.e. elimination process.
- Row of zeros, the determinant is equal to 0
- Triangular matrix, the determinant is product of pivots, which is based on property 3.
- det A = 0, when A is singular, and det A != 0 when A is invertible.
- det(AB) = det(A)*det(B), and
-
Cofactor of one entry
Cofactor of
Another very good interpretation is that the determinant is the volume of a box determined by row vectors.
10. Eigenvalue and eigenvector
The result vector
The
Notice that
Above we know how to compute the eigenvalues and eigenvectors, then how to use them. Suppose there are n independent eigenvectors of n by n matrix A, and put them in column matrix S.
In fact, the eigenvectors and eigenvectors give a way what is going on inside a matrix and to understand the power of matrix. For example,
Base on the above equation, A is sure to have n independent eigenvectors and can be diagonalizable if all the eigenvalues
We can use eigenvectors to solve following problem:
Using above idea to solve Fibonacci problem, [0, 1, 1, 2, 3, 5, 8, 13, …], then how to get
We want the form
11. First order differential equation
We also arrange two or more equation into matrix form and try to solve it from matrix aspect. For example:
We can see from the above equation, u2 and u1 can be affected by each other and the relationship or all information, in fact, is the matrix A. So we maybe solve the differential equations as a single system. Usually this system is called Linear Dynamic System.
We can use eigenvectors and eigenvalues of matrix A to solve above system. The result is very simple
In the above equation, the S and
12. Symmetric matrix and positive definite
Symmetric matrix is very special matrix and they are good matrices: the eigenvalues are REAL and eigenvectors can be chosen PERPENDICULAR. For usual case
In additional,
Then how about the sign of
Another fact is that the signs of pivots are the same as signs of
We can interpret the quadratic form
Use eigenvectors and eigenvalues to diagonalized
So
So in the xy system, the axes are along he eigenvectors of A. In the XY system, the axes are along the eigenvectors of
Singular value decomposition (SVD)
For a full rank square matrix, we can diagonalize the matrix as
Based on the symmetric matrix, suppose A can be diagonalized as
Because