GNU Octave SOCIS16 : Improve iterative methods for sparse linear systems

CGS, TFQMR and final revision of all the codes

2016-08-30T09:34:00.000-07:00

Dear all,

During these weeks I worked on improving CGS, which I rewrite from scratch. Then I implemented TFQMR, which is not available in Octave. Finally I made a revision on the codes that I improved the past weeks.
All the file revisioned (gmres, pcg, bicg, bicgstab) and the new ones (cgs and tfqmr) are available here:

https://CrisDorigo@bitbucket.org/CrisDorigo/socis16-octave-iter_meth

in the folder definitive_codes

CGS

As written in my second post, the actual Octave's CGS implementation is not very clear. Indeed I didn't understand most of the code (I was not able to recognize how the is was adapted). Moreover I tested it and the results are not correct (in my second post there is an example of this).
Then I decided to rewrite if from scratch following closely the algorithm given by Saad in the reference book suggested.
I wrote it following the pattern that I used in the other codes (i.e. to wrote three subscripts, each for the different cases: all matrices, all functions handle or mixed). Iit has also the same characteristics of the other methods improved, for example: the approximation returned is the one with the minimum residual, there are five types of flag (0 convergence, 1 maximum number of iterations reached, 2 preconditioner singular, 3 stagnation, 4 division by zero during the process)... I don't go in deep about this details to make less boring this post, since they are already explained in my previous posts.

TFQMR

In my last post I wrote that the mentors told me to implement a new algorithm not available in Octave. Since I had no suggestion about which method write, and the community had no particular needs, I chose the transpose-free qmr algorithm because it is the only methods treated in the Saad book.
Also with this algorithm, I use the same pattern of the others (i.e. the three subscripts) and synchronized in such a way it has the same type of input and output as the others.

One mention to the number of iterations effectively performed. In the Saad book, the method has different behaviours if the iteration number is even or if it is odd, but at every iteration the approximation is computed. Making some test in Matlab with its tfmqr, I noticed that if the method converges, the vector of the residuals in long two times the number of iteration that Matlab tells that are necessary to convergence. Then I think that Matlab counts as one iteration the block (odd iteration + even iteration), i.e. I think that Matlab_it = odd_Saad_it + even_Saad_it. In this way the iterations performed by Matlab is two times the effective iterations performed. I decided to count the iterations in this way to be compatible.

LAST REVISION

Finally I checked (one more time) the codes that I improved in these weeks (i.e. pcg, gmres, bicg, bicgstab).
Indeed I updated the documentation (for example: making all similar, giving more details about the preconditioning, adding the reference book,...).
Then I checked the codes and I tried to make it more readable and to adapt it at the Octave conventions.
Then I added some tests in all the codes, mostly to check if all the subscripts works (i.e. checked every combination of A matrix/function, M1 matrix/function, M2 matrix/function), but also to check that the flags works well (I checked only the flag 0, 1 and 2, since the others are difficult to reproduce) and also to check that these methods solves also complex linear systems.

The mentors suggest me to make a patch with the methods improved (pcg, gmres, bicg, bicgstab and cgs), in such a way it can be easily the review of them and if it is in time (maybe) they can be included in Octave 4.2.
This patch is available here

https://CrisDorigo@bitbucket.org/CrisDorigo/socis16-octave-iter_meth

and it is the file SoCiS16_Improve_iterative_methods.diff

I made this patch from the Octave changeset 22406:dc4e2203cd15 of the Octave 4.1.0+ version.

THANKS TO OCTAVE

Since the official deadline for the SoCiS is tomorrow (the 31-th of August), this is my last post in this blog.

To make a small summary, during this summer of codes I make an improving on
most of the iterative methods to solve linear systems. In particular I improved pcg, gmres, bicg, bicgstab and cgs.
I wrote for all of these methods three subscripts to make them more efficient, dividing the particular cases "matrix", "function handle" and "mixed", in such a way that, according to the input data, the method uses the appropriate subscripts to make computations.
I tried to make these methods more compatible with Matlab (for example noticing that Matlab gives as output the approximation with the minimum residual and not the last performed, as used in Octave, or adding the possibility of "flag =2", i.e. singular preconditioner).
I tried to fix some "non-clear" situation that I noticed, for example pcg does not noticed if the matrix was not Hermitian positive definite if it was a complex matrix, or to face the problem of a division by zero in bicg, bicgstab and cgs.
Then I implemented tfqmr which is available in Matlab, but not in Octave.

Unfortunately, there are other two methods that needs an improve (pcr and qmr), but due to unexpected problems with the above methods I had not enough time to improve also these. When my scholastic commitments are not so big, I want to try to improve also these two methods.

I want to thank all the Octave community that gives me this fantastic opportunity to face a big project as this and then to learn much theoretically and practical things about these iterative solvers for linear systems.
I want to thank specially the mentors, who were always available to give suggestions, advices and ideas of how to proceed in the work, mostly when came some unexpected problems.
I hope that this improving will be useful to Octave, and to contact me if there is something not clear or wrong.
I also hope to continue to help the Octave community in the following times.

One more thanks to all of you.

BICG and BICGSTAB

2016-08-02T05:52:00.001-07:00

Dear all,

In this last period I worked on the BICG and BICGSTAB algorithm.
You can find the codes in the folder bicg_bicgstab at
https://CrisDorigo@bitbucket.org/CrisDorigo/socis16-octave-iter_meth

Since the main loop of both of them have more or less 30-40 lines of code, I wrote them from scratch, because in my opinion was easier instead to modify and to adapt the original code at the many situations that occurred.

The strategy that I used is the same of PCG and GMRES: for each algorithm I wrote three sub-scripts to subdivide the cases which: (a) the matrix and the preconditioners are all matrices, (b) all functions handle or (c) some are matrices and some functions handle.

As for PCG and GMRES, I tried to make these codes compatible with Matlab.

In my BICG and BICGSTAB codes there are not actually the documentation, but I'll add them as soon as possible

Main modifications on BICGSTAB and BICG

Many of the changes that I applied at the BICGSTAB and BICG were quite similar to those that I made in PCG and GMRES. I make a little list to summarize them:

Now the approximation of the solution X returned is the one that has the minimum residual and not the last one computed by the algorithm. Also the output variable ITER contains the iteration in which X was computed, not the number of iterations performed.
In the original Octave BICGSTAB code the stagnation criterion is
resvec(end) == resvec(end - 1).
Instead in the BICG code it is: res0 <= res1 (where res1 is the actual residual and res0 the previous)
I changed this criterion with:
norm(x - x_pr) <= norm(x)*eps
(where x_pr is the previous iteration approximation)
This criterion is more efficient, since it checks the variation of the approximations and not the variation of the residual, and it seems more similar to the "stagnation" described in the Matlab help. Moreover in BICGSTAB (and in BICG) there are many fluctuations of the residuals (they are not decrescent in general), so it's better to check directly the approximations.
Due to these fluctuactions, the BICG criterion is wrong, since in this manner the Octave BICG usually stops also if it is not necessary.
I added the "flag = 2" criterion, the one that Matlab describes as "preconditioner matrix ill-conditioned", that in Octave is skipped. The check that I implemented is for the singularity of the preconditioner matrix M, since seems that in Matlab this flag appears only in this situation.

Now I will focus on some situations regarding BICGSTAB:

I added the "half iterations" as in Matlab: studying the Saad implementation, I noticed that the approximation X is made in two steps at every iteration, indeed it is defined as:
x = x + alpha*p + omega*s
Then to be compatible with Matlab (since p/alpha and s/omega are not related and computed in different situations), I set x = x + alpha*p (and its residual) as the half iteration, and x = x + alpha*p + omega*s as the complete iteration.
In this way the "real" total number of iteration performed (recalling that ITER contains the iteration which X was computed) are
(length(RESVEC) - 1) / 2.
There was a problem when the matrices A, M1, M2 and/or the right-hand-side b are complex, indeed many times the residual blow up and the method does not converge. In the Saad reference book (where it was taken the Octave BICGSTAB, I think), the method is formulated only for the real case. The only difference is that using this formulation, some inner product are inverted (e.g. it uses <v, w>, instead of <w, v>). In the real case there is no problem since the inner real product is symmetric, but in the complex not, indeed the result of <v, w> is the complex conjugate of <w, v>.
In the Matlab help I noticed that there is the possibility of "flag = 4". I tried to reproduce it, then I used the same input data in the Octave BICGSTAB: the approximation X was all composed by NaN, and also for the residuals. Then I study in deep the Octave code and the reference book. I discovered that ih this situation there was some diviosn by zero probably caused by the Non-Symmetric Lanzcos Biorthogonalization (that is used in both BICG and BICGSTAB).
With this Lanczos algorithm is possible the situation of a "serious breakdown" (not going into deep details, but it happens when a certain inner product is zero, instead of a non-zero number), and there is not easy examples to reproduce it. The refernce book suggest two possibility to avoid this situation: to restart the Lanczos biortho, or to use a difficult variant called "Look-Ahead Lanczos".
I tried to study the "Look-Ahead" but in the Saad book is only mentioned the general idea, and it is not simple to implement and to understand quickly. Then I tried the second strategy it is not efficient (an easy experiment to try this strategy is: run BICGSTAB to find the last iteration without NaN, then run another time the BICGSTAB with the output X as initial guess), indeed the method does not converge and the residual does not decrease.
Then (since in Matlab is not mentioned if it uses or not the Look Ahead) in the BICGSTAB implementation that I wrote I don't use any strategy and I set "flag = 4" when the guilty inner product is zero.
To see an example of this situation in the Octave BICGSTAB see the file example_bicgistab.m

Instead, during the implementation of BICG there was not (for now) strange situations. For both BICG and BICGSTAB I tried the same tests that are given in their Octave implementations and I add some test to try the "flag =2" and "flag = 4" situations.

DURING THE REMAINING TIME

I know that to fix PCG, GMRES, BICG and BICGSTAB I used a lot of time. This because in their implementation emerged a lot of strange situations that needed a deep revision of these algorithms and many times also a discussion (with the mentors or the community) of how to proceed to solve particular problems. Also the mentors told me that they expected less time/work to improve and make a revision of the codes already implemented.
They also told me to spend the remaning time of the SoCiS to implement some methods not present in Octave. For this, I want to ask you if there is one of the suggested algorithms (minres, symmlq, symmetr, tfqmr, bicgstabl or lsqr) that you prefer to implement.

As usual, if there is something wrong/not clear, please contact me via e-mail or write a comment.

Pcg and Gmres

2016-07-18T06:58:00.000-07:00

Hi all,

during these two weeks I tried to fix some problems in pcg and gmres.
You can find the codes at
https://bitbucket.org/CrisDorigo/socis16-octave-iter_meth/src
in the folder "pcg_gmres".

PCG

Question on my previous post

In my previous post I had some problem in pcg with the check if the matrix was or not positive definite. Indeed the check done in the Octave code was not precise, especially if involved complex numbers. (To recall: in the algorithm is computed alpha = (r' * r) / (p' * (A*p)) and the control was: if alpha <= 0, break. The problem is that also if alpha was complex the algorithm must stop, moreover for Octave every complex number is greater than 0, so pcg didn't notice if alpha has an imaginary part).

After a (brief) discussion with the Octave community I changed the control on alpha. First of all it checks separately the numerator and the denominator of alpha (because if there is a preconditioner matrix then the definition of alpha "slightly" changes and also the numerator can be negative and/or complex), then the algorithm stops if

(abs (imag (num)) >= eps (real (num))*tol/eps) or real (num) <= 0

(the same control is made for den).

In this way the algorithm stops if really the numerator (or denominator) of alpha is negative or if it has an imaginary part "too big" (this is relative to the tolerance, because we thought that it is not the case to stop the algorithm if the imaginary part is, for example, 1e-16). If this happen then the FLAG output is setted as 4.

Unfortunately, alpha can be real and positive at every iteration also if the matrix is not positive definite, so there are cases when this check doesn't works, but there are nothing that we can do.

Other changes

I made other changes to make the code more compatible with Matlab. The most relevant are:

Making some tests with Matlab I noticed that the output approximation X is not the last computed, but the one with the minimum residual and the output variable IT gives the iteration which X was computed. The Matlab help is not very explicit about this. I noticed this fact because controlling its outputs, sometimes lenght(RESVEC) is greater than IT, where RESVEC is the vector of the residuals. So at first I thought that there was something wrong (it was strange that there are 10, 20 or 30 residuals more than iterations), then I noticed this "minimum" relation. Then I modified the pcg in such a way also the Octave one has this output.
I added the check about the stagnation. Indeed there is the possibility that the algorithm stagnates (in general when the (preconditioned) matrix A is not positive definite but the algorithm doesn't notice it). So I add the control:
if (norm (x - x_pr) <= eps*norm (x)) , where x_pr is the approximation of the previous iteration, then the algorithm stagnates and the FLAG output is setted as 3.
I added a check to control if the preconditioner matrix is singular, and if this happens then FLAG = 2. To do this, I use a try-catch approach:

try

warning("error","Octave:singular-matrix");

. . .   (pcg computations)

catch
flag = 2;
end_try_catch

In this way the warning "Octave:singular-matrix" is setted as an error, and with the try-catch I control if this error appears.
I used this approach because in Matlab and Octave there are different results when we try to solve M \ b, with M singular:
In Matlab there is always a warning and the output has some NaN or Inf;
I Octave the warning appears only the first time that we try to solve it and the output is the least squares minimization.
Unfortunately, since in Octave this warning appears only the first time, if we use pcg for a second time (without quitting Octave) with the same linear system with singular M, then FLAG = 2 is not setted.
There is another case where this approach doesn't work: when the matrix is setted as "Diagonal Matrix" then this warning doesn't appear. For example:

octave-cli:3> M1 = eye(3);
octave-cli:4> M1(3,3) = 0
M1 =

Diagonal Matrix

   1   0   0
   0   1   0
   0   0   0

octave-cli:5> M2 = [0 0 0;0 1 0; 0 0 1]
M2 =

   0   0   0
   0   1   0
   0   0   1

octave-cli:6> M1 \ ones(3,1)
ans =

   1
   1
   0

octave-cli:7> M2 \ ones(3,1)
warning: matrix singular to machine precision
ans =

   0
   1
   1

In this case M1 and M2 are different singular matrices, but if I try to solve a linear system with them there is a warning only with M2.
I setted the output flag as in Matlab:
FLAG = 0 : algorithm converged
FLAG = 1 : maximum number of iterations reached
FLAG = 2 : preconditioner matrix is singular
FLAG = 3 : stagnation
FLAG = 4: (preconditioned) matrix A not positive definite

About FLAG = 2 and FLAG = 4:

"FLAG = 2": the Matlab help tell that this flag appears when M is ill-conditioned, but the only case in which I can reproduce this flag is when M is singular. But singular and ill-conditioned matrix are different concepts, (ill-conditioned when the condition number is high, singular when det(A) is zero). Indeed, for example, Hilbert matrices are ill-conditioned but not singular. Since in Matlab I had FLAG = 2 only when M singular, in the code and in my pcg help is written that this flag appears only when M is singular, and not when M ill-conditioned.

"FLAG = 4": Also with this flag Matlab is not very clear. In the help is written that this flag appears when the algorithm can not continue due to some values too big or too small. But I can reproduce it only when A is not positive definite, so in my pcg help I wrote that it appears only when the algorithm detect that A is not positive definite.
I fixed the output strings that appears when there are asked less than 2 outputs and I wrote them in such a way they are similar to the Matlab ones.
I update the pcg help, in such a way it is synchronized with these changes.

GMRES

More or less I made the same changes of the pcg algorithm also in gmres (the discussion about the positive definitiveness of the matrix doesn't care here, since the gmres algorithms must works with any matrix, so there are 3 types of flag and not 4 as in the pcg).
One little things about the variable IT. In both Matlab and Octave, there are written in the help that in the variable IT are memorized the inner and the outer iterations, but they are not so clear about what they are. For this reason in the help I added a little deep explanation about what they are and how they are related with the total number of iterations. In particular the inner iterations are the iteration before that the restart is applied, instead the outer iterations count how many times the restart is applied. Then the total number of iterations are:
(Outer Iterations)*RESTART + (Inner Iterations).

I think that the most of the work about these two algorithms is done. Up to small changes and/or bugs that I can find trying my codes I think that they are more or less definitive.

If you find some bugs, you find something not clear, or have some advices please contact me.

Pcg analysis

2016-07-03T02:59:00.002-07:00

Dear all,

In this second and half week I worked on the pcg to try to improve it.
All the codes can be found at
https://bitbucket.org/CrisDorigo/socis16-octave-iter_meth/src
in the folder "pcg_codes".

The first thing that I made is the private function __checkandstring__ (the name is not definitive).
This function checks if the input argument A, b, M1 and M2 are consistent, i.e. if A, M1, M2 are matrices or function handle, and if A is a matrix, checks also that the number of columns of A are equal to the number of rows of b.
In addition, this function has as output two strings: is_prec and type.
The first one can be "prec" if M1 is not empty, or "no_prec" otherwise (I check only M1 because in the pcg documentation is written that if M1 is empty then no preconditioner is applied).
Instead the string type can be "matrix", "function_handle" or "mixed". If M1 empty then type is "matrix" if A is a matrix, "function handle" otherwise.
If M1 is not empty then the string type is "matrix" if A, M1, M2 matrices, "function_handle" if they are all function handle or "mixed" if not all of them are matrices or function handle.
I made this function as general as possible to use it also for the other algorithms that need an improvement in my project.
You can see the code in the function __checkandstring__.m.

After that I take the pcg Octave code and I adapted it to the different situation:
I made a switch-case that treat differently the three cases mentioned before ("matrix", "function handle" and "mixed"). Moreover the cases "matrix" and "function handle", have an if block that check if there exist the preconditioners: if they are not passed, then applies the unpreconditioned conjugate gradient, otherwise applies the preconditioned one.
For the "mixed" case instead, differently from my previous post (in which I said that if A, M1 and M2 are not of the same type, then I set them as functions handle) I check every time the type of A / M1 / M2 and if it is a matrix then I apply the simple matrix-vector product, if it is a function handle then I made the evaluation.
You can see this code in the function pcg_tmp_02.m.

I made this choice because I noticed that it is more (time) efficient than to transform all in functions handle. Indeed, before the two codes mentioned above, I made a function similar to __checkandstring__ that in case A, M1 and M2 not of the same type then transform them all in function handle, and then I made a version of pcg that works with this function (you can see these code in the files __check02__ and pcg_tmp_01 respectively). Then I tested the octave pcg, pcg_tmp_01 and pcg_tmp_02 20 times on the same example, with and without preconditioners, and passing all the variables as matrices, as function handles and not all of the same type. At the end I plotted the times and I compared it and I noticed that the better performs are given form pcg_tmp_02.
You can see this example in the file example_pcg.m.

Talking with the mentors, they told me that (using as example pcg_tmp_02) the code is too long. This because the pcg code is "repeated" 5 times:

matrix case without preconditioners
matrix case with preconditioners
function handle case without preconditioners
function handle case with preconditioners
mixed case (it is only with preconditiones because otherwise there is only A and it is or a matrix or a function handle)

They suggested me to substitute this switch-case approach with another approach, for example to use nested function or to insert some scripts in the private folder. They told me to ask to all the maintainers which is your preferred approach or to suggest me another one in case.

Complex case

Making some tests with complex matrices, I noticed strange behaviour of the pcg.
For example if I test the Octave pcg with the matrix

A = [2 , 2+1i, 4;
        2-1i , 3, 1i;
        4 , -1i , 1 ]

that is an hermitian matrix, but not positive definite (indeed its eigenvalue are not all positive: -3.16512506440744e+00   2.85307928516935e+00 6.31204577923809e+00 )
and with right-hand-side

b = [ 5.59693687377178e-01;
       2.42878791850453e-02;
         9.77795880620014e-01 ]

the Octave pcg doesn't notice that this matrix is not positive definite.
Indeed the result is:

x = pcg(A,b)
warning: pcg: maximum number of iterations (3) reached
warning: the initial residual norm was reduced 2.12736e+15 times.
x =

   2.04219909327648e-01 - 3.21781116721895e-02i
-1.11630690554651e-01 + 5.38299446813016e-02i
   1.07086298628122e-01 + 1.70817561341068e-02i

Studying the code, I notice that the only check for the positive definitiveness of the matrix is on alpha. Not going into deep details, this alpha is necessary to compute the approximant solution at every step (indeed at every iteration, x = x + alpha*p for a certain p).
If there are no preconditioners (as in our example) alpha is defined as:
alpha = < r, r > / < p, A*p >
(where < , > is the vector scalar product, and r is the residual of the previous iteration).
Then, since we suppose that A is positive definite, alpha must be real and positive (both in case A real or complex matrix).
Since the numerator is positive for every r, if alpha is negative then surely A is not positive definite. Indeed the check in the Octave pcg code is:

if (alpha <= 0.0)
      ## Negative matrix.
      matrix_positive_definite = false;
    endif

If A is not positive definite then < p, A*p> is not always negative, so if we are "lucky", if we pass a not positive definite matrix in pcg and from the computation alpha is positive at every iteration, there is nothing that we can do.
But if we use A and b definite as previous, pcg makes three iterations and the alpha are:

alpha = 2.11061741184511e-01 - 6.08511684384459e-20i
alpha = -5.59991838158184e-01 + 1.05798887633281e-17i
alpha = 1.48434184396776e-01 + 8.13133738226767e-18i

(the function pcg01.m in the folder "pcg_codes" is the octave version of pcg that print at every iteration alpha)

We notice that these three alpha have an imaginary part, but it is under the machine precision, so they are not computationally relevant.
But the second alpha is (obviously) negative, and the pcg doesn't notice it!

This because for Octave:

octave4.1-cli:29> -5.59991838158184e-01 + 1.05798887633281e-17i < 0
ans = 0

and

octave4.1-cli:30> -5.59991838158184e-01 + 1.05798887633281e-17i > 0
ans = 1

Making some tests I notice that a number with an imaginary part is greater than 0, also if this imaginary part is negative, indeed

octave4.1-cli:31> -3-1i > 0
ans = 1

I don't understand very well what Octave do to compare two complex numbers (I think that it compares the modules, i.e. if we want to verify if a < b, where at least one of a or b is complex, then it compares abs(a) and abs(b), but I'm not sure).
I compared some complex numbers also in Matlab, and I think that it compares only the real part (and "forgetting" the imaginary part).

I talked with the mentors about this fact, but there are some questions:

Is it correct to check only the real part of alpha?
Also if the imaginary part of alpha is relevant?
(if for example we use the matrix

A = [ 0.71290 + 0.59353i   0.97470 + 0.36591i   0.50060 + 0.53652i;
   0.37411 + 0.11662i   0.38904 + 0.43489i   0.03555 + 0.23431i;
   0.35482 + 0.23601i   0.44859 + 0.31402i   0.54356 + 0.72676i]

that is a matrix not symmetric, and not positive definite,
and b = ones(3,1), we obtain as alpha:

alpha = 0.47882 - 0.51052i
alpha = 1.4276 - 1.0314i
alpha = 0.43535 - 0.36971i

and the pcg doesn't give any flag of this "non-correctness" of A, but also if we check only the real part of alpha the code doesn't give any flag or any error)
Do we break the algorithm if alpha is complex? Also if the imaginary part is under the machine precision?
(it is possible that also if A is positive definite and symmetric but the alpha are complex, because of computations and the machine precision)

The mentors suggested me two things:

First of all to check both numerator and denominator of alpha: this because if there are preconditioners also the numerator can be negative, since it becomes <M*r, r>, where M is the inverse of M1*M2.
To break the algorithm if the imaginary part of (the numerator or the denominator of) alpha is relevant. But, how to decide if it is relevant?
For example, 1e-15 is relevant? And 1e-14?
So they suggested me a possible criterion to use to determine the relevancy of this imaginary part:

if real(alpha) + (eps / tol)*imag(alpha) == real(alpha)
break
endif

The motivation of this criterion is because, since we want to find a solution within a tolerance (tol), we "normalize" the relevancy of this imaginary part with this tolerance.

The mentors also told me to ask also to all the maintainers, because this is their suggestion, but can be different from the direction that wants the Octave community.
Then I please ask from the community some advices of how to proceed for this situation.

Actually (and until I'll have some advices) I'm working on the gmres, and I'll try to improve it in the same manner of the first part of this post for the pcg.

Please contact me with some feedback via e-mail at cristiano.dorigo@hotmail.it or as a comment under this post.

First week - Studying codes and some problems

2016-06-20T06:45:00.000-07:00

Dear all,

In this first and half week, as written in the first post, I studied the algorithms and also their codes in Octave. I recall that the algorithms are gmres, pcg, bicg, bicgstab, qmr, cgs and pcr.

The codes that I used that are not in Octave can be found at
https://bitbucket.org/CrisDorigo/socis16-octave-iter_meth
in the folder "First_week".

Octave codes

Studying the implementations I noticed problems in some of them:

pcg - Trying to solve the simple problem 1*x = 1, with initial guess 1, there is the following output:

octave4.1-cli:29> x = pcg(1,1,[ ],[ ],[ ],[ ],1)
warning: division by zero
warning: called from
pcg at line 367 column 10
pcg: converged in 0 iterations. pcg: the initial residual norm was reduced NaN times.
x = 1

We can see that in 0 iteration there is a warning for a division by zero and that the residual is reduced NaN times. Then this output text needs to be fixed.
bicg - I tried to solve the small linear system A*x = b, with
A = [7 7 8 4 4; 7 0 6 3 4; 3 2 3 7 6; 6 0 9 7 7; 1 0 0 1 7];
b = sum(A,2);
(I obtained this matrix with floor(10*rand(5)) so it is a simple integer matrix, and b computed as the sum of the columns of A in such a way the exact solution is the vector x = [1; 1; 1; 1; 1])
With this configuration I discovered that the algorithm stops before the convergence, because of stagnation. Indeed the output is:

octave4.1-cli:39> [x, flag, relres, iter, resvec] = bicg(A,b)
bicg stopped at iteration 4 without converging to the desired tolerance 1.000000e-06
because the method stagnated.
The iterate returned (number 3) has relative residual 3.772464e-02
x =

   1.67650
   1.04764
   0.47523
   0.92726
   0.97121

flag = 3
relres = 0.038052
iter = 4
resvec =

   0.070581   0.042765   0.037725

As we can see the last residual (relres = 0.038052) is greater than the third (that is the third component of resvec, i.e. 0.037725). Then the algorithm detect a stagnation and stops.
I tried to solve the same linear system with Matlab, and I noticed that also its bicg has the fourth iteration with the residual greater than the third one, but it doesn't stop and at the fifth iteration it converges at the solution.
So I tried to comment the part of the stagnation (i.e. lines from 187 to 195 in the Octave's bicg) and tried to solve this linear system, obtaining:

octave4.1-cli:41> [x,flag,relres,iter,resvec] = bicg_1(A,b)
bicg stopped at iteration 5 without converging to the desired tolerance 1.000000e-06
because the maximum number of iterations was reached. The iterate returned (number 5) has relative residual 3.371209e-14
x =

   1.0000
   1.0000
   1.0000
   1.0000
   1.0000

flag = 1
relres =    3.3712e-14
iter = 5
resvec =

   0.070581   0.042765   0.037725   0.038052

Except that an error text that says that the maximum number of iterations was reached, we can see that at the fifth iteration the algorithm converges with tolerance smaller than the default one 1e-6.
The bicg with the "stagnation" commented is the file bigc_1.m, instead the matrix A and the test with bigc and bicg_1, can be found in example_bicg.m, in the folder First_week .
cgs - I studied the algorithm from the Saad's book, and then I tried to study the codes, but I noticed that its Octave implementation was not taken from this book. Since in the help there is not the reference book I can't compare this implementation with the reference.
I also tried to solve the same linear system discussed in the bicg section, and I noticed that the algorithm does not converge. The obtained output is:

octave4.1-cli:56> [x,flag,relres,iter,resvec] = cgs(A,b)
x =

   0.9799298
-0.0060874
   0.7904128
   0.6945865
   1.8885042

flag = 1
relres = 0.17700
iter = 5
resvec =

   1.000000
   0.070581
   0.129415
   0.158692
   0.159693
   0.177003

We can see that the solution given by the algorithm is far from the exact solution, also the flag equal to 1 indicates that the algorithm reached the maximum number of iterations without convergence.
Then I tried to make a simple and basic implementation from scratch using the formulation in the Saad book, I tried to solve the linear system with this implementation and this implementation converges. The output is:

octave4.1-cli:58> [x,resvec] = cgs_1(A,b)
x =

   1.0000
   1.0000
   1.0000
   1.0000
   1.0000

resvec =

   5.1604e+01
   3.1724e+00
   3.3616e+00
   1.4778e+00
   3.0564e+00
   3.6586e-10

To be sure that is not a pathological case I tried to solve with the Octave implementation and mine the linear system A*x = b, with
A = [6 6 8 7 2; 5 4 1 0 3; 4 1 2 6 3; 6 9 3 8 3; 3 5 6 9 4];
b = sum(A,2);
obtaining similar results.

Then, after a discussion with the mentors, we decided that maybe it is easier to implement this method from scratch, instead to search the problems, due to the fact that there is no reference for its implementation.
My simple and basic cgs implementation is the file cgs_1.m, and the test with the second matrix can be found in the file example_cgs.m, in the folder First_week.
bicgstab - I noticed that in Matlab are displayed also the "half iteration" with the "half residual", but not in Octave. Then talking with the mentors we decided to display them also in Octave.

Matrix-vector product in the algorithms

Since all of the algorithms of the project are solvers for linear systems, the matrix-vector product is essential in all of their implementation, but since the matrix A and the (eventual) preconditioners can be passed as matrices or as function handle, it is not easy to decide how to compute all these matrix-vector products.

For example in the gmres algorithm the strategy is to transform A, M1, M2 into function handle, i.e. (tanking A as example) to create the function handle

Ax = @(x) A*x; if A is a matrix
Ax = A;              if A is a function handle

and then every time that there is a matrix-vector product, for example A*v, it is simply computed as Ax(v) or feval(Ax, v).

Instead, in the pcg implementation, every time that there is a matrix-vector product it is checked if A is a matrix or a function handle. For example if we need to compute A*v then there are the following lines of codes:

if (isnumeric(A))
w = A*v;
else
w = feval(A,v);
endif

(the other algorithms use one of these two strategies, or similar variations).

Then it is natural the question: which is the best strategy?
To answer, I simulate these strategy and the simple matrix-vector product, then I computed the times, using sparse matrices of different dimensions (n = 100,
n = 200, n = 500, n= 1000, n = 2000) in this way

A = sprand(n, n, rand(1));
Ax = @(x) A*x;
w = rand(length(A),1);

# Estimation of the simply A*w
tic
for j =1:100
    u = A*w;
endfor
t1 = toc;

# Estimation of feval(Ax, w)
tic
for j = 1:100
    u = feval (Ax, w);
endfor
t2 = toc;

# Estimation of the if block with A as matrix
tic
for j = 1:100
   if (isnumeric (A))
     u = A*w;
   else
     u = feval(A,w);
   endif
endfor
t3 = toc;

# Estimation of the if block with Ax the function handle
tic
    for j = 1:100
   if (isnumeric (Ax))
     u = Ax*w;
   else
     u = feval(Ax,w);
   endif
endfor
t4 = toc;

obtaining (for example, since I used random matrices) as results:

n = 100
t1 = 0.0038838
t2 = 0.0068600
t3 = 0.0049579
t4 = 0.0076430

n = 200
t1 = 0.0051990
t2 = 0.0083339
t3 = 0.0062981
t4 = 0.0091600

n = 500
t1 = 0.0042169
t2 = 0.0071800
t3 = 0.0051830
t4 = 0.0081360

n = 1000
t1 = 0.055708
t2 = 0.059194
t3 = 0.056590
t4 = 0.060054

n = 2000
t1 = 0.15554
t2 = 0.16691
t3 = 0.16327
t4 = 0.16677

This test with different matrices can be found in the file quicktest.m in the folder First_week.

We can see that if we consider n<=1000, the best strategy is to use the simple matix-vector product (indeed t1 is the less value). Instead if we consider the two strategies of gmres and pcg, we notice that the "if block" used in pcg takes more time than the simple matrix-vector product and the evaluation of the function handle.
Instead the case n = 2000 is not a good example to see the differences of these approaches, probably because the matrix is too big and requires a lot of operations.
Then, after a discussion with the mentors, we decided that the "best strategy" is to write in every algorithm two nested function "ad hoc" to the type of the data, and if A, M1 and M2 are not of the same type, to transform all of them in function handle. To be clear, for example, as sketch of how it will be:

function [x, flag, relres, iter, resvec] = pcg (A, b, tol, maxit, M1, M2, x0)

if A, M1, M2 are matrices
[x, flag, relres, iter, resvec] = pcg_matrix (A, b, tol, maxit, M1, M2, x0)

if A, M1, M2 are function handle
   [x, flag, relres, iter, resvec] = pcg_function (A, b, tol, maxit, M1, M2, x0)

if A, M1, M2 are not of the same type
   [x, flag, relres, iter, resvec] = pcg_mixed (A, b, tol, maxit, M1, M2, x0)

endfunction

Priority of the algorithms

The last part of this post is about the chosen priority of the improvements of the algorithms for the next weeks.
The mentors suggested me to improve the algorithms in the following order:

pcg
gmres
bicgstab
pcr
qmr
cgs
bicg

This order is suggested because pcg, gmres and bicgstab are the most used to solve iteratively linear systems.
Then follows pcr because it is the only method already implemented in Octave that solve linear system when the input matrix A is Hermitian (bicgstab and gmres work for general matrices, instead pcg works for SPD matrices).
And in the last follows qmr, cgs and bicg because they are the less used methods, with cgs above bicg because it has to be written from scratch.

If there are something that is not clear or if there are some suggestions/advices, comment below or contact me via email at cristiano.dorigo@hotmail.it

Presentation of the Project and Timeline

2016-06-08T02:08:00.000-07:00

Dear all,

I'm Cristiano Dorigo, a bachelor graduated student in Applied Mathematics in Verona, Italy, and now I'm a student of Master Degree in Mathematics in the University of Verona.

I was selected as partecipant student for the SOCIS 2016 under GNU Octave for the project "Improve iterative methods for sparse linear systems".

The idea of this project is at first to improve the methods actually implemented in Octave to solve linear systems as gmres, pcg, bicg, bicgstab, qmr, pcr. Ideed these algorithms have, for example, a similar structure in the initial part (where the inputs are checked). Other possible improvement can be the synchronization of their error messages or to check the documentation. Another critical point is to check the compatibility of the results with Matlab, i.e. to check, for example, if the quality of the solution with the same method is equal both in Matlab and Octave, to see if they use the same number of iterations, same residual,...

The second step is to implement, as much as possible, some iterative methods that, for example, are in Matlab, but not in Octave. The suggested methods are minres, symmlq, tfqmr, bicgstabl, lsqr. The order of the implementation can be decided with the mentors giving priority to the useful or the necessary ones.

Timeline:

I think to split the work in the same two step mentioned before, i.e. improvements and implementation.

The first two weeks I want to study the methods already implemented but that I haven't faced off during my student career, i.e. bicg, bicgstab, qmr and pcr and to study their codes.
The second two/three weeks I want at first to consult the mentors to decide where and how to improve these methods, then to apply these ideas.
The remaining time will be spent to implement the new methods. I want to implement these methods one by one, after a discussion with the mentors about which methods have priority.

The reference book is "Iterative Methods for sparse linear systems" 2-nd Edition by Yousef Saad, which is available here www-users.cs.umn.edu/~saad/IterMethBook_2ndEd.pdf