[Scipy-svn] r6675 - in trunk: doc/source scipy/optimize scipy/optimize/tests

Sat Sep 4 18:53:52 EDT 2010

Author: ptvirtan
Date: 2010-09-04 17:53:52 -0500 (Sat, 04 Sep 2010)
New Revision: 6675

Added:
   trunk/doc/source/optimize.nonlin.rst
Modified:
   trunk/doc/source/optimize.rst
   trunk/scipy/optimize/__init__.py
   trunk/scipy/optimize/nonlin.py
   trunk/scipy/optimize/tests/test_minpack.py
   trunk/scipy/optimize/tests/test_nonlin.py
Log:
ENH: optimize: Complete rewrite of scipy.optimize.nonlin

Rewrite the large-scale equation solver module scipy.optimize.nonlin.

The following new features are added

  - Proper tolerance-based termination conditions
  - Generic framework for tolerance-controlled inexact Newton methods,
    where all the different methods are represented by different Jacobian
    approximations.
  - Backtracking (i.e. line searches)
  - Limited-memory Broyden methods, including SVD rank reduction
    in addition to more basic options
  - Newton-Krylov solver, using any of the solvers from scipy.sparse.linalg

The new code API is mostly backward-compatible to the old one.  The following
methods are deprecated:

  - broyden_modified (not useful)
  - broyden1_modified (not useful)
  - broyden_generalized (equivalent to anderson)
  - anderson2 (equivalent to anderson)
  - broyden3 (obsoleted by the new limited-memory broyden methods)
  - vackar (renamed to diagbroyden)

Tests for all implemented methods are included.

Added: trunk/doc/source/optimize.nonlin.rst
===================================================================

--- trunk/doc/source/optimize.nonlin.rst	                        (rev 0)
+++ trunk/doc/source/optimize.nonlin.rst	2010-09-04 22:53:52 UTC (rev 6675)
@@ -0,0 +1 @@
+.. automodule:: scipy.optimize.nonlin

Modified: trunk/doc/source/optimize.rst
===================================================================
--- trunk/doc/source/optimize.rst	2010-09-04 22:53:26 UTC (rev 6674)
+++ trunk/doc/source/optimize.rst	2010-09-04 22:53:52 UTC (rev 6675)
@@ -69,8 +69,8 @@
 
    fsolve
 
-Scalar function solvers
------------------------
+Scalar functions
+----------------
 
 .. autosummary::
    :toctree: generated/
@@ -88,19 +88,41 @@
 
    fixed_point
 
-General-purpose nonlinear (multidimensional)
---------------------------------------------
+Multidimensional
+----------------
 
+.. toctree::
+   :maxdepth: 1
+
+   optimize.nonlin
+
+General nonlinear solvers:
+
 .. autosummary::
    :toctree: generated/
 
+   fsolve
    broyden1
    broyden2
-   broyden3
-   broyden_generalized
+
+Large-scale nonlinear solvers:
+
+.. autosummary::
+   :toctree: generated/
+
+   newton_krylov
    anderson
-   anderson2
 
+Simple iterations:
+
+.. autosummary::
+   :toctree: generated/
+
+   excitingmixing
+   linearmixing
+   vackar
+
+
 Utility Functions
 =================
 

Modified: trunk/scipy/optimize/__init__.py
===================================================================
--- trunk/scipy/optimize/__init__.py	2010-09-04 22:53:26 UTC (rev 6674)
+++ trunk/scipy/optimize/__init__.py	2010-09-04 22:53:52 UTC (rev 6675)
@@ -11,8 +11,7 @@
 from lbfgsb import fmin_l_bfgs_b
 from tnc import fmin_tnc
 from cobyla import fmin_cobyla
-from nonlin import broyden1, broyden2, broyden3, broyden_generalized, \
-    anderson, anderson2
+from nonlin import *
 from slsqp import fmin_slsqp
 from nnls import nnls
 

Modified: trunk/scipy/optimize/nonlin.py
===================================================================
--- trunk/scipy/optimize/nonlin.py	2010-09-04 22:53:26 UTC (rev 6674)
+++ trunk/scipy/optimize/nonlin.py	2010-09-04 22:53:52 UTC (rev 6675)
@@ -1,468 +1,1505 @@
-"""
+r"""
 Nonlinear solvers
 =================
 
-These solvers find x for which F(x)=0. Both x and F is multidimensional.
+.. currentmodule:: scipy.optimize
 
-They accept the user defined function F, which accepts a python tuple x and it
-should return F(x), which can be either a tuple, or numpy array.
+This is a collection of general-purpose nonlinear multidimensional
+solvers.  These solvers find *x* for which *F(x) = 0*. Both *x*
+and *F* can be multidimensional.
 
-Example:
+Routines
+--------
 
-def F(x):
-    "Should converge to x=[0,0,0,0,0]"
-    import numpy
-    d = numpy.array([3,2,1.5,1,0.5])
-    c = 0.01
-    return -d*numpy.array(x)-c*numpy.array(x)**3
+Large-scale nonlinear solvers:
 
-from scipy import optimize
-x = optimize.broyden2(F,[1,1,1,1,1])
+.. autosummary::
 
-All solvers have the parameter iter (the number of iterations to compute), some
-of them have other parameters of the solver, see the particular solver for
-details.
+   newton_krylov
+   anderson
 
- A collection of general-purpose nonlinear multidimensional solvers.
+General nonlinear solvers:
 
-   broyden1            --  Broyden's first method - is a quasi-Newton-Raphson
-                           method for updating an approximate Jacobian and then
-                           inverting it
-   broyden2            --  Broyden's second method - the same as broyden1, but
-                           updates the inverse Jacobian directly
-   broyden3            --  Broyden's second method - the same as broyden2, but
-                           instead of directly computing the inverse Jacobian,
-                           it remembers how to construct it using vectors, and
-                           when computing inv(J)*F, it uses those vectors to
-                           compute this product, thus avoding the expensive NxN
-                           matrix multiplication.
-   broyden_generalized --  Generalized Broyden's method, the same as broyden2,
-                           but instead of approximating the full NxN Jacobian,
-                           it construct it at every iteration in a way that
-                           avoids the NxN matrix multiplication.  This is not
-                           as precise as broyden3.
-   anderson            --  extended Anderson method, the same as the
-                           broyden_generalized, but added w_0^2*I to before
-                           taking inversion to improve the stability
-   anderson2           --  the Anderson method, the same as anderson, but
-                           formulated differently
+.. autosummary::
 
- The broyden2 is the best. For large systems, use broyden3. excitingmixing is
- also very effective. There are some more solvers implemented (see their
- docstrings), however, those are of mediocre quality.
+   broyden1
+   broyden2
 
+Simple iterations:
 
- Utility Functions
+.. autosummary::
 
-   norm --  Returns an L2 norm of the vector
+   excitingmixing
+   linearmixing
+   diagbroyden
 
+
+Examples
+========
+
+Small problem
+-------------
+
+>>> def F(x):
+...    return np.cos(x) + x[::-1] - [1, 2, 3, 4]
+>>> import scipy.optimize
+>>> x = scipy.optimize.broyden1(F, [1,1,1,1], f_tol=1e-14)
+>>> x
+array([ 4.04674914,  3.91158389,  2.71791677,  1.61756251])
+>>> np.cos(x) + x[::-1]
+array([ 1.,  2.,  3.,  4.])
+
+
+Large problem
+-------------
+
+Suppose that we needed to solve the following integrodifferential
+equation on the square :math:`[0,1]\times[0,1]`:
+
+.. math::
+
+   \nabla^2 P = 10 \left(\int_0^1\int_0^1\cosh(P)\,dx\,dy\right)^2
+
+with :math:`P(x,1) = 1` and :math:`P=0` elsewhere on the boundary of
+the square.
+
+The solution can be found using the `newton_krylov` solver:
+
+.. plot::
+
+   import numpy as np
+   from scipy.optimize import newton_krylov
+   from numpy import cosh, zeros_like, mgrid, zeros
+
+   # parameters
+   nx, ny = 75, 75
+   hx, hy = 1./(nx-1), 1./(ny-1)
+
+   P_left, P_right = 0, 0
+   P_top, P_bottom = 1, 0
+
+   def residual(P):
+       d2x = zeros_like(P)
+       d2y = zeros_like(P)
+
+       d2x[1:-1] = (P[2:]   - 2*P[1:-1] + P[:-2]) / hx/hx
+       d2x[0]    = (P[1]    - 2*P[0]    + P_left)/hx/hx
+       d2x[-1]   = (P_right - 2*P[-1]   + P[-2])/hx/hx
+
+       d2y[:,1:-1] = (P[:,2:] - 2*P[:,1:-1] + P[:,:-2])/hy/hy
+       d2y[:,0]    = (P[:,1]  - 2*P[:,0]    + P_bottom)/hy/hy
+       d2y[:,-1]   = (P_top   - 2*P[:,-1]   + P[:,-2])/hy/hy
+
+       return d2x + d2y - 10*cosh(P).mean()**2
+
+   # solve
+   guess = zeros((nx, ny), float)
+   sol = newton_krylov(residual, guess, method='lgmres', verbose=1)
+   print 'Residual', abs(residual(sol)).max()
+
+   # visualize
+   import matplotlib.pyplot as plt
+   x, y = mgrid[0:1:(nx*1j), 0:1:(ny*1j)]
+   plt.pcolor(x, y, sol)
+   plt.colorbar()
+   plt.show()
+   
 """
+# Copyright (C) 2009, Pauli Virtanen <pav at iki.fi>
+# Distributed under the same license as Scipy.
 
-import math
+import sys
+import numpy as np
+from scipy.linalg import norm, solve, inv, qr, svd, lstsq, LinAlgError
+from numpy import asarray, dot, vdot
+import scipy.sparse.linalg
+import scipy.sparse
+import scipy.lib.blas as blas
+import inspect
+from linesearch import scalar_search_wolfe1, scalar_search_armijo
 
-import numpy
+__all__ = [
+    'broyden1', 'broyden2', 'anderson', 'linearmixing',
+    'diagbroyden', 'excitingmixing', 'newton_krylov',
+    # Deprecated functions:
+    'broyden_generalized', 'anderson2', 'broyden3']
 
-def mlog(x):
-    if x==0.:
-        return 13
-    else:
-        return math.log(x)
+#------------------------------------------------------------------------------
+# Utility functions
+#------------------------------------------------------------------------------
 
-def norm(v):
-    """Returns an L2 norm of the vector."""
-    return math.sqrt(numpy.sum((numpy.array(v)**2).flat))
+class NoConvergence(Exception):
+    pass
 
-def myF(F,xm):
-    return numpy.matrix(F(tuple(xm.flat))).T
+def maxnorm(x):
+    return np.absolute(x).max()
 
-def difference(a,b):
-    m=0.
-    for x,y in zip(a,b):
-        m+=(x-y)**2
-    return math.sqrt(m)
+def _as_inexact(x):
+    """Return `x` as an array, of either floats or complex floats"""
+    x = asarray(x)
+    if not np.issubdtype(x.dtype, np.inexact):
+        return asarray(x, dtype=np.float_)
+    return x
 
-def sum(a,b):
-    return [ai+bi for ai,bi in zip(a,b)]
+def _array_like(x, x0):
+    """Return ndarray `x` as same array subclass and shape as `x0`"""
+    x = np.reshape(x, np.shape(x0))
+    wrap = getattr(x0, '__array_wrap__', x.__array_wrap__)
+    return wrap(x)
 
-def mul(C,b):
-    return [C*bi for bi in b]
+def _safe_norm(v):
+    if not np.isfinite(v).all():
+        return np.array(np.inf)
+    return norm(v)
 
-def solve(A,b):
-    """Solve Ax=b, returns x"""
-    try:
-        from scipy import linalg
-        return linalg.solve(A,b)
-    except:
-        return A.I*b
+#------------------------------------------------------------------------------
+# Generic nonlinear solver machinery
+#------------------------------------------------------------------------------
 
-def broyden2(F, xin, iter=10, alpha=0.4, verbose = False):
-    """Broyden's second method.
+_doc_parts = dict(
+    params_basic="""
+    F : function(x) -> f
+        Function whose root to find; should take and return an array-like
+        object.
+    x0 : array-like
+        Initial guess for the solution
+    """.strip(),
+    params_extra="""
+    iter : int, optional
+        Number of iterations to make. If omitted (default), make as many
+        as required to meet tolerances.
+    verbose : bool, optional
+        Print status to stdout on every iteration.
+    maxiter : int, optional
+        Maximum number of iterations to make. If more are needed to
+        meet convergence, `NoConvergence` is raised.
+    f_tol : float, optional
+        Absolute tolerance (in max-norm) for the residual.
+        If omitted, default is 6e-6.
+    f_rtol : float, optional
+        Relative tolerance for the residual. If omitted, not used.
+    x_tol : float, optional
+        Absolute minimum step size, as determined from the Jacobian
+        approximation. If the step size is smaller than this, optimization
+        is terminated as successful. If omitted, not used.
+    x_rtol : float, optional
+        Relative minimum step size. If omitted, not used.
+    tol_norm : function(vector) -> scalar, optional
+        Norm to use in convergence check. Default is the maximum norm.
+    line_search : {None, 'armijo' (default), 'wolfe'}, optional
+        Which type of a line search to use to determine the step size in the
+        direction given by the Jacobian approximation. Defaults to 'armijo'.
+    callback : function, optional
+        Optional callback function. It is called on every iteration as
+        ``callback(x, f)`` where `x` is the current solution and `f`
+        the corresponding residual.
 
-    Updates inverse Jacobian by an optimal formula.
-    There is NxN matrix multiplication in every iteration.
+    Returns
+    -------
+    sol : array-like
+        An array (of similar array type as `x0`) containing the final solution.
 
-    The best norm |F(x)|=0.003 achieved in ~20 iterations.
+    Raises
+    ------
+    NoConvergence
+        When a solution was not found.
 
-    Recommended.
+    """.strip()
+)
+
+def _set_doc(obj):
+    if obj.__doc__:
+        obj.__doc__ = obj.__doc__ % _doc_parts
+
+def nonlin_solve(F, x0, jacobian='krylov', iter=None, verbose=False,
+                 maxiter=None, f_tol=None, f_rtol=None, x_tol=None, x_rtol=None,
+                 tol_norm=None, line_search='armijo', callback=None):
     """
-    xm=numpy.matrix(xin).T
-    Fxm=myF(F,xm)
-    Gm=-alpha*numpy.matrix(numpy.identity(len(xin)))
-    for n in range(iter):
-        deltaxm=-Gm*Fxm
-        xm=xm+deltaxm
-        Fxm1=myF(F,xm)
-        deltaFxm=Fxm1-Fxm
-        Fxm=Fxm1
-        Gm=Gm+(deltaxm-Gm*deltaFxm)*deltaFxm.T/norm(deltaFxm)**2
-        if verbose:
-            print "%d:  |F(x)|=%.3f"%(n+1, norm(Fxm))
-    return xm.flat
+    Find a root of a function, in a way suitable for large-scale problems.
 
-def broyden3(F, xin, iter=10, alpha=0.4, verbose = False):
-    """Broyden's second method.
+    Parameters
+    ----------
+    %(params_basic)s
+    jacobian : Jacobian
+        A Jacobian approximation: `Jacobian` object or something that
+        `asjacobian` can transform to one. Alternatively, a string specifying
+        which of the builtin Jacobian approximations to use:
 
-    Updates inverse Jacobian by an optimal formula.
-    The NxN matrix multiplication is avoided.
+            krylov, broyden1, broyden2, anderson
+            diagbroyden, linearmixing, excitingmixing
 
-    The best norm |F(x)|=0.003 achieved in ~20 iterations.
+    %(params_extra)s
 
-    Recommended.
+    See Also
+    --------
+    asjacobian, Jacobian
+
+    Notes
+    -----
+    This algorithm implements the inexact Newton method, with
+    backtracking or full line searches. Several Jacobian
+    approximations are available, including Krylov and Quasi-Newton
+    methods.
+
+    References
+    ----------
+    .. [KIM] C. T. Kelley, \"Iterative Methods for Linear and Nonlinear
+       Equations\". Society for Industrial and Applied Mathematics. (1995)
+       http://www.siam.org/books/kelley/
+
     """
-    zy=[]
-    def updateG(z,y):
-        "G:=G+z*y.T"
-        zy.append((z,y))
-    def Gmul(f):
-        "G=-alpha*1+z*y.T+z*y.T ..."
-        s=-alpha*f
-        for z,y in zy:
-            s=s+z*(y.T*f)
-        return s
-    xm=numpy.matrix(xin).T
-    Fxm=myF(F,xm)
-#    Gm=-alpha*numpy.matrix(numpy.identity(len(xin)))
-    for n in range(iter):
-        #deltaxm=-Gm*Fxm
-        deltaxm=Gmul(-Fxm)
-        xm=xm+deltaxm
-        Fxm1=myF(F,xm)
-        deltaFxm=Fxm1-Fxm
-        Fxm=Fxm1
-        #Gm=Gm+(deltaxm-Gm*deltaFxm)*deltaFxm.T/norm(deltaFxm)**2
-        updateG(deltaxm-Gmul(deltaFxm),deltaFxm/norm(deltaFxm)**2)
+
+    condition = TerminationCondition(f_tol=f_tol, f_rtol=f_rtol,
+                                     x_tol=x_tol, x_rtol=x_rtol,
+                                     iter=iter, norm=tol_norm)
+
+    x0 = _as_inexact(x0)
+    func = lambda z: _as_inexact(F(_array_like(z, x0))).flatten()
+    x = x0.flatten()
+
+    dx = np.inf
+    Fx = func(x)
+    Fx_norm = norm(Fx)
+
+    jacobian = asjacobian(jacobian)
+    jacobian.setup(x.copy(), Fx, func)
+
+    if maxiter is None:
+        if iter is not None:
+            maxiter = iter + 1
+        else:
+            maxiter = 100*(x.size+1)
+
+    if line_search is True:
+        line_search = 'armijo'
+    elif line_search is False:
+        line_search = None
+
+    if line_search not in (None, 'armijo', 'wolfe'):
+        raise ValueError("Invalid line search")
+
+    # Solver tolerance selection
+    gamma = 0.9
+    eta_max = 0.9999
+    eta_treshold = 0.1
+    eta = 1e-3
+
+    for n in xrange(maxiter):
+        if condition.check(Fx, x, dx):
+            break
+
+        # The tolerance, as computed for scipy.sparse.linalg.* routines
+        tol = min(eta, eta*Fx_norm)
+        dx = -jacobian.solve(Fx, tol=tol)
+
+        if norm(dx) == 0:
+            raise ValueError("Jacobian inversion yielded zero vector. "
+                             "This indicates a bug in the Jacobian "
+                             "approximation.")
+
+        # Line search, or Newton step
+        if line_search:
+            s, x, Fx, Fx_norm_new = _nonlin_line_search(func, x, Fx, dx,
+                                                        line_search)
+        else:
+            s = 1.0
+            x += dx
+            Fx = func(x)
+            Fx_norm_new = norm(Fx)
+
+        jacobian.update(x.copy(), Fx)
+
+        if callback:
+            callback(x, Fx)
+
+        # Adjust forcing parameters for inexact methods
+        eta_A = gamma * Fx_norm_new**2 / Fx_norm**2
+        if gamma * eta**2 < eta_treshold:
+            eta = min(eta_max, eta_A)
+        else:
+            eta = min(eta_max, max(eta_A, gamma*eta**2))
+
+        Fx_norm = Fx_norm_new
+
+        # Print status
         if verbose:
-            print "%d:  |F(x)|=%.3f"%(n+1, norm(Fxm))
-    return xm.flat
+            sys.stdout.write("%d:  |F(x)| = %g; step %g; tol %g\n" % (
+                n, norm(Fx), s, eta))
+            sys.stdout.flush()
+    else:
+        raise NoConvergence(_array_like(x, x0))
 
-def broyden_generalized(F, xin, iter=10, alpha=0.1, M=5, verbose = False):
-    """Generalized Broyden's method.
+    return _array_like(x, x0)
 
-    Computes an approximation to the inverse Jacobian from the last M
-    interations. Avoids NxN matrix multiplication, it only has MxM matrix
-    multiplication and inversion.
+_set_doc(nonlin_solve)
 
-    M=0 .... linear mixing
-    M=1 .... Anderson mixing with 2 iterations
-    M=2 .... Anderson mixing with 3 iterations
-    etc.
-    optimal is M=5
+def _nonlin_line_search(func, x, Fx, dx, search_type='armijo', rdiff=1e-8,
+                        smin=1e-2):
+    tmp_s = [0]
+    tmp_Fx = [Fx]
+    tmp_phi = [norm(Fx)**2]
+    s_norm = norm(x) / norm(dx)
 
+    def phi(s, store=True):
+        if s == tmp_s[0]:
+            return tmp_phi[0]
+        xt = x + s*dx
+        v = func(xt)
+        p = _safe_norm(v)**2
+        if store:
+            tmp_s[0] = s
+            tmp_phi[0] = p
+            tmp_Fx[0] = v
+        return p
+
+    def derphi(s):
+        ds = (abs(s) + s_norm + 1) * rdiff
+        return (phi(s+ds, store=False) - phi(s)) / ds
+
+    if search_type == 'wolfe':
+        s, phi1, phi0 = scalar_search_wolfe1(phi, derphi, tmp_phi[0],
+                                             xtol=1e-2, amin=smin)
+    elif search_type == 'armijo':
+        s, phi1 = scalar_search_armijo(phi, tmp_phi[0], -tmp_phi[0],
+                                       amin=smin)
+
+    if s is None:
+        # XXX: No suitable step length found. Take the full Newton step,
+        #      and hope for the best.
+        s = 1.0
+
+    x = x + s*dx
+    if s == tmp_s[0]:
+        Fx = tmp_Fx[0]
+    else:
+        Fx = func(x)
+    Fx_norm = norm(Fx)
+
+    return s, x, Fx, Fx_norm
+
+class TerminationCondition(object):
     """
-    xm=numpy.matrix(xin).T
-    Fxm=myF(F,xm)
-    G0=-alpha
-    dxm=[]
-    dFxm=[]
-    for n in range(iter):
-        deltaxm=-G0*Fxm
-        if M>0:
-            MM=min(M,n)
-            for m in range(n-MM,n):
-                deltaxm=deltaxm-(float(gamma[m-(n-MM)])*dxm[m]-G0*dFxm[m])
-        xm=xm+deltaxm
-        Fxm1=myF(F,xm)
-        deltaFxm=Fxm1-Fxm
-        Fxm=Fxm1
+    Termination condition for an iteration. It is terminated if
 
-        if M>0:
-            dxm.append(deltaxm)
-            dFxm.append(deltaFxm)
-            MM=min(M,n+1)
-            a=numpy.matrix(numpy.empty((MM,MM)))
-            for i in range(n+1-MM,n+1):
-                for j in range(n+1-MM,n+1):
-                    a[i-(n+1-MM),j-(n+1-MM)]=dFxm[i].T*dFxm[j]
+    - |F| < f_rtol*|F_0|, AND
+    - |F| < f_tol
 
-            dFF=numpy.matrix(numpy.empty(MM)).T
-            for k in range(n+1-MM,n+1):
-                dFF[k-(n+1-MM)]=dFxm[k].T*Fxm
-            gamma=a.I*dFF
+    AND
 
-        if verbose:
-            print "%d:  |F(x)|=%.3f"%(n, norm(Fxm))
-    return xm.flat
+    - |dx| < x_rtol*|x|, AND
+    - |dx| < x_tol
 
-def anderson(F, xin, iter=10, alpha=0.1, M=5, w0=0.01, verbose = False):
-    """Extended Anderson method.
+    """
+    def __init__(self, f_tol=None, f_rtol=None, x_tol=None, x_rtol=None,
+                 iter=None, norm=maxnorm):
 
-    Computes an approximation to the inverse Jacobian from the last M
-    interations. Avoids NxN matrix multiplication, it only has MxM matrix
-    multiplication and inversion.
+        if f_tol is None:
+            f_tol = np.finfo(np.float_).eps ** (1./3)
+        if f_rtol is None:
+            f_rtol = np.inf
+        if x_tol is None:
+            x_tol = np.inf
+        if x_rtol is None:
+            x_rtol = np.inf
+        
+        self.x_tol = x_tol
+        self.x_rtol = x_rtol
+        self.f_tol = f_tol
+        self.f_rtol = f_rtol
 
-    M=0 .... linear mixing
-    M=1 .... Anderson mixing with 2 iterations
-    M=2 .... Anderson mixing with 3 iterations
-    etc.
-    optimal is M=5
+        self.norm = maxnorm
+        self.iter = iter
 
+        self.f0_norm = None
+        self.iteration = 0
+        
+    def check(self, f, x, dx):
+        self.iteration += 1
+        f_norm = self.norm(f)
+        x_norm = self.norm(x)
+        dx_norm = self.norm(dx)
+
+        if self.f0_norm is None:
+            self.f0_norm = f_norm
+            
+        if f_norm == 0:
+            return True
+
+        if self.iter is not None:
+            # backwards compatibility with Scipy 0.6.0
+            return self.iteration > self.iter
+
+        # NB: condition must succeed for rtol=inf even if norm == 0
+        return ((f_norm <= self.f_tol and f_norm/self.f_rtol <= self.f0_norm)
+                and (dx_norm <= self.x_tol and dx_norm/self.x_rtol <= x_norm))
+
+
+#------------------------------------------------------------------------------
+# Generic Jacobian approximation
+#------------------------------------------------------------------------------
+
+class Jacobian(object):
     """
-    xm=numpy.matrix(xin).T
-    Fxm=myF(F,xm)
-    dxm=[]
-    dFxm=[]
-    for n in range(iter):
-        deltaxm=alpha*Fxm
-        if M>0:
-            MM=min(M,n)
-            for m in range(n-MM,n):
-                deltaxm=deltaxm-(float(gamma[m-(n-MM)])*dxm[m]+alpha*dFxm[m])
-        xm=xm+deltaxm
-        Fxm1=myF(F,xm)
-        deltaFxm=Fxm1-Fxm
-        Fxm=Fxm1
+    Common interface for Jacobians or Jacobian approximations.
 
-        if M>0:
-            dxm.append(deltaxm)
-            dFxm.append(deltaFxm)
-            MM=min(M,n+1)
-            a=numpy.matrix(numpy.empty((MM,MM)))
-            for i in range(n+1-MM,n+1):
-                for j in range(n+1-MM,n+1):
-                    if i==j: wd=w0**2
-                    else: wd=0
-                    a[i-(n+1-MM),j-(n+1-MM)]=(1+wd)*dFxm[i].T*dFxm[j]
+    The optional methods come useful when implementing trust region
+    etc.  algorithms that often require evaluating transposes of the
+    Jacobian.
 
-            dFF=numpy.matrix(numpy.empty(MM)).T
-            for k in range(n+1-MM,n+1):
-                dFF[k-(n+1-MM)]=dFxm[k].T*Fxm
-            gamma=solve(a,dFF)
-#            print gamma
+    Methods
+    -------
+    solve
+        Returns J^-1 * v
+    update
+        Updates Jacobian to point `x` (where the function has residual `Fx`)
 
-        if verbose:
-            print "%d:  |F(x)|=%.3f"%(n, norm(Fxm))
-    return xm.flat
+    matvec : optional
+        Returns J * v
+    rmatvec : optional
+        Returns A^H * v
+    rsolve : optional
+        Returns A^-H * v
+    matmat : optional
+        Returns A * V, where V is a dense matrix with dimensions (N,K).
+    todense : optional
+        Form the dense Jacobian matrix. Necessary for dense trust region
+        algorithms, and useful for testing.
+        
+    Attributes
+    ----------
+    shape
+        Matrix dimensions (M, N)
+    dtype
+        Data type of the matrix.
+    func : callable, optional
+        Function the Jacobian corresponds to
 
-def anderson2(F, xin, iter=10, alpha=0.1, M=5, w0=0.01, verbose = False):
-    """Anderson method.
+    """
 
-    M=0 .... linear mixing
-    M=1 .... Anderson mixing with 2 iterations
-    M=2 .... Anderson mixing with 3 iterations
-    etc.
-    optimal is M=5
+    def __init__(self, **kw):
+        names = ["solve", "update", "matvec", "rmatvec", "rsolve",
+                 "matmat", "todense", "shape", "dtype"]
+        for name, value in kw.items():
+            if name not in names:
+                raise ValueError("Unknown keyword argument %s" % name)
+            if value is not None:
+                setattr(self, name, kw[name])
 
+        if hasattr(self, 'todense'):
+            self.__array__ = lambda: self.todense()
+
+    def aspreconditioner(self):
+        return InverseJacobian(self)
+
+    def solve(self, v, tol=0):
+        raise NotImplementedError
+
+    def update(self, x, F):
+        pass
+
+    def setup(self, x, F, func):
+        self.func = func
+        self.shape = (F.size, x.size)
+        self.dtype = F.dtype
+        if self.__class__.setup is Jacobian.setup:
+            # Call on the first point unless overridden
+            self.update(self, x, F)
+
+class InverseJacobian(object):
+    def __init__(self, jacobian):
+        self.jacobian = jacobian
+        self.matvec = jacobian.solve
+        self.update = jacobian.update
+        if hasattr(jacobian, 'setup'):
+            self.setup = jacobian.setup
+        if hasattr(jacobian, 'rsolve'):
+            self.rmatvec = jacobian.rsolve
+
+    @property
+    def shape(self):
+        return self.jacobian.shape
+
+    @property
+    def dtype(self):
+        return self.jacobian.dtype
+
+def asjacobian(J):
     """
-    xm=numpy.matrix(xin).T
-    Fxm=myF(F,xm)
-    dFxm=[]
-    for n in range(iter):
-        deltaxm=Fxm
-        if M>0:
-            MM=min(M,n)
-            for m in range(n-MM,n):
-                deltaxm=deltaxm+float(theta[m-(n-MM)])*(dFxm[m]-Fxm)
-        deltaxm=deltaxm*alpha
-        xm=xm+deltaxm
-        Fxm1=myF(F,xm)
-        deltaFxm=Fxm1-Fxm
-        Fxm=Fxm1
+    Convert given object to one suitable for use as a Jacobian.
+    """
+    spsolve = scipy.sparse.linalg.spsolve
+    if isinstance(J, Jacobian):
+        return J
+    elif inspect.isclass(J) and issubclass(J, Jacobian):
+        return J()
+    elif isinstance(J, np.ndarray):
+        if J.ndim > 2:
+            raise ValueError('array must have rank <= 2')
+        J = np.atleast_2d(np.asarray(J))
+        if J.shape[0] != J.shape[1]:
+            raise ValueError('array must be square')
 
-        if M>0:
-            dFxm.append(Fxm-deltaFxm)
-            MM=min(M,n+1)
-            a=numpy.matrix(numpy.empty((MM,MM)))
-            for i in range(n+1-MM,n+1):
-                for j in range(n+1-MM,n+1):
-                    if i==j: wd=w0**2
-                    else: wd=0
-                    a[i-(n+1-MM),j-(n+1-MM)]= \
-                        (1+wd)*(Fxm-dFxm[i]).T*(Fxm-dFxm[j])
+        return Jacobian(matvec=lambda v: dot(J, v),
+                        rmatvec=lambda v: dot(J.conj().T, v),
+                        solve=lambda v: solve(J, v),
+                        rsolve=lambda v: solve(J.conj().T, v),
+                        dtype=J.dtype, shape=J.shape)
+    elif scipy.sparse.isspmatrix(J):
+        if J.shape[0] != J.shape[1]:
+            raise ValueError('matrix must be square')
+        return Jacobian(matvec=lambda v: J*v,
+                        rmatvec=lambda v: J.conj().T * v,
+                        solve=lambda v: spsolve(J, v),
+                        rsolve=lambda v: spsolve(J.conj().T, v),
+                        dtype=J.dtype, shape=J.shape)
+    elif hasattr(J, 'shape') and hasattr(J, 'dtype') and hasattr(J, 'solve'):
+        return Jacobian(matvec=getattr(J, 'matvec'),
+                        rmatvec=getattr(J, 'rmatvec'),
+                        solve=J.solve,
+                        rsolve=getattr(J, 'rsolve'),
+                        update=getattr(J, 'update'),
+                        setup=getattr(J, 'setup'),
+                        dtype=J.dtype,
+                        shape=J.shape)
+    elif callable(J):
+        # Assume it's a function J(x) that returns the Jacobian
+        class Jac(Jacobian):
+            def update(self, x, F):
+                self.x = x
+            def solve(self, v, tol=0):
+                m = J(self.x)
+                if isinstance(m, np.ndarray):
+                    return solve(m, v)
+                elif scipy.sparse.isspmatrix(m):
+                    return spsolve(m, v)
+                else:
+                    raise ValueError("Unknown matrix type")
+            def matvec(self, v):
+                m = J(self.x)
+                if isinstance(m, np.ndarray):
+                    return dot(m, v)
+                elif scipy.sparse.isspmatrix(m):
+                    return m*v
+                else:
+                    raise ValueError("Unknown matrix type")
+            def rsolve(self, v, tol=0):
+                m = J(self.x)
+                if isinstance(m, np.ndarray):
+                    return solve(m.conj().T, v)
+                elif scipy.sparse.isspmatrix(m):
+                    return spsolve(m.conj().T, v)
+                else:
+                    raise ValueError("Unknown matrix type")
+            def rmatvec(self, v):
+                m = J(self.x)
+                if isinstance(m, np.ndarray):
+                    return dot(m.conj().T, v)
+                elif scipy.sparse.isspmatrix(m):
+                    return m.conj().T * v
+                else:
+                    raise ValueError("Unknown matrix type")
+        return Jac()
+    elif isinstance(J, str):
+        return dict(broyden1=BroydenFirst,
+                    broyden2=BroydenSecond,
+                    anderson=Anderson,
+                    diagbroyden=DiagBroyden,
+                    linearmixing=LinearMixing,
+                    excitingmixing=ExcitingMixing,
+                    krylov=KrylovJacobian)[J]()
+    else:
+        raise TypeError('Cannot convert object to a Jacobian')
 
-            dFF=numpy.matrix(numpy.empty(MM)).T
-            for k in range(n+1-MM,n+1):
-                dFF[k-(n+1-MM)]=(Fxm-dFxm[k]).T*Fxm
-            theta=solve(a,dFF)
-#            print gamma
 
-        if verbose:
-            print "%d:  |F(x)|=%.3f"%(n, norm(Fxm))
-    return xm.flat
+#------------------------------------------------------------------------------
+# Broyden
+#------------------------------------------------------------------------------
 
-def broyden_modified(F, xin, iter=10, alpha=0.35, w0=0.01, wl=5, verbose = False):
-    """Modified Broyden's method.
+class GenericBroyden(Jacobian):
+    def setup(self, x0, f0, func):
+        Jacobian.setup(self, x0, f0, func)
+        self.last_f = f0
+        self.last_x = x0
 
-    Updates inverse Jacobian using information from all the iterations and
-    avoiding the NxN matrix multiplication. The problem is with the weights,
-    it converges the same or worse than broyden2 or broyden_generalized
+        if hasattr(self, 'alpha') and self.alpha is None:
+            # autoscale the initial Jacobian parameter
+            self.alpha = 0.5*max(norm(x0), 1) / norm(f0)
 
+    def _update(self, x, f, dx, df, dx_norm, df_norm):
+        raise NotImplementedError
+
+    def update(self, x, f):
+        df = f - self.last_f
+        dx = x - self.last_x
+        self._update(x, f, dx, df, norm(dx), norm(df))
+        self.last_f = f
+        self.last_x = x
+
+class LowRankMatrix(object):
+    r"""
+    A matrix represented as
+
+    .. math:: \alpha I + \sum_{n=0}^{n=M} c_n d_n^\dagger
+
+    However, if the rank of the matrix reaches the dimension of the vectors,
+    full matrix representation will be used thereon.
+
     """
-    xm=numpy.matrix(xin).T
-    Fxm=myF(F,xm)
-    G0=alpha
-    w=[]
-    u=[]
-    dFxm=[]
-    for n in range(iter):
-        deltaxm=G0*Fxm
-        for i in range(n):
-            for j in range(n):
-                deltaxm-=w[i]*w[j]*betta[i,j]*u[j]*(dFxm[i].T*Fxm)
-        xm+=deltaxm
-        Fxm1=myF(F,xm)
-        deltaFxm=Fxm1-Fxm
-        Fxm=Fxm1
 
-        w.append(wl/norm(Fxm))
+    def __init__(self, alpha, n, dtype):
+        self.alpha = alpha
+        self.cs = []
+        self.ds = []
+        self.n = n
+        self.dtype = dtype
+        self.collapsed = None
 
-        u.append((G0*deltaFxm+deltaxm)/norm(deltaFxm))
-        dFxm.append(deltaFxm/norm(deltaFxm))
-        a=numpy.matrix(numpy.empty((n+1,n+1)))
-        for i in range(n+1):
-            for j in range(n+1):
-                a[i,j]=w[i]*w[j]*dFxm[j].T*dFxm[i]
-        betta=(w0**2*numpy.matrix(numpy.identity(n+1))+a).I
+    @staticmethod
+    def _matvec(v, alpha, cs, ds):
+        axpy, scal, dotc = blas.get_blas_funcs(['axpy', 'scal', 'dotc'],
+                                               cs[:1] + [v])
+        w = alpha * v
+        for c, d in zip(cs, ds):
+            a = dotc(d, v)
+            w = axpy(c, w, w.size, a)
+        return w
 
-        if verbose:
-            print "%d:  |F(x)|=%.3f"%(n, norm(Fxm))
-    return xm.flat
+    @staticmethod
+    def _solve(v, alpha, cs, ds):
+        """Evaluate w = M^-1 v"""
+        if len(cs) == 0:
+            return v/alpha
 
-def broyden1(F, xin, iter=10, alpha=0.1, verbose = False):
-    """Broyden's first method.
+        # (B + C D^H)^-1 = B^-1 - B^-1 C (I + D^H B^-1 C)^-1 D^H B^-1
 
-    Updates Jacobian and computes inv(J) by a matrix inversion at every
-    iteration. It's very slow.
+        axpy, dotc = blas.get_blas_funcs(['axpy', 'dotc'], cs[:1] + [v])
 
-    The best norm |F(x)|=0.005 achieved in ~45 iterations.
+        c0 = cs[0]
+        A = alpha * np.identity(len(cs), dtype=c0.dtype)
+        for i, d in enumerate(ds):
+            for j, c in enumerate(cs):
+                A[i,j] += dotc(d, c)
+
+        q = np.zeros(len(cs), dtype=c0.dtype)
+        for j, d in enumerate(ds):
+            q[j] = dotc(d, v)
+        q /= alpha
+        q = solve(A, q)
+
+        w = v/alpha
+        for c, qc in zip(cs, q):
+            w = axpy(c, w, w.size, -qc)
+
+        return w
+
+    def matvec(self, v):
+        """Evaluate w = M v"""
+        if self.collapsed is not None:
+            return np.dot(self.collapsed, v)
+        return LowRankMatrix._matvec(v, self.alpha, self.cs, self.ds)
+
+    def rmatvec(self, v):
+        """Evaluate w = M^H v"""
+        if self.collapsed is not None:
+            return np.dot(self.collapsed.T.conj(), v)
+        return LowRankMatrix._matvec(v, np.conj(self.alpha), self.ds, self.cs)
+
+    def solve(self, v, tol=0):
+        """Evaluate w = M^-1 v"""
+        if self.collapsed is not None:
+            return solve(self.collapsed, v)
+        return LowRankMatrix._solve(v, self.alpha, self.cs, self.ds)
+
+    def rsolve(self, v, tol=0):
+        """Evaluate w = M^-H v"""
+        if self.collapsed is not None:
+            return solve(self.collapsed.T.conj(), v)
+        return LowRankMatrix._solve(v, np.conj(self.alpha), self.ds, self.cs)
+
+    def append(self, c, d):
+        if self.collapsed is not None:
+            self.collapsed += c[:,None] * d[None,:].conj()
+            return
+
+        self.cs.append(c)
+        self.ds.append(d)
+
+        if len(self.cs) > c.size:
+            self.collapse()
+
+    def __array__(self):
+        if self.collapsed is not None:
+            return self.collapsed
+
+        Gm = self.alpha*np.identity(self.n, dtype=self.dtype)
+        for c, d in zip(self.cs, self.ds):
+            Gm += c[:,None]*d[None,:].conj()
+        return Gm
+
+    def collapse(self):
+        """Collapse the low-rank matrix to a full-rank one."""
+        self.collapsed = np.array(self)
+        self.cs = None
+        self.ds = None
+        self.alpha = None
+
+    def restart_reduce(self, rank):
+        """
+        Reduce the rank of the matrix by dropping all vectors.
+        """
+        if self.collapsed is not None:
+            return
+        assert rank > 0
+        if len(self.cs) > rank:
+            del self.cs[:]
+            del self.ds[:]
+
+    def simple_reduce(self, rank):
+        """
+        Reduce the rank of the matrix by dropping oldest vectors.
+        """
+        if self.collapsed is not None:
+            return
+        assert rank > 0
+        while len(self.cs) > rank:
+            del self.cs[0]
+            del self.ds[0]
+
+    def svd_reduce(self, max_rank, to_retain=None):
+        """
+        Reduce the rank of the matrix by retaining some SVD components.
+
+        This corresponds to the \"Broyden Rank Reduction Inverse\"
+        algorithm described in [vR]_.
+
+        Note that the SVD decomposition can be done by solving only a
+        problem whose size is the effective rank of this matrix, which
+        is viable even for large problems.
+
+        Parameters
+        ----------
+        max_rank : int
+            Maximum rank of this matrix after reduction.
+        to_retain : int, optional
+            Number of SVD components to retain when reduction is done
+            (ie. rank > max_rank). Default is ``max_rank - 2``.
+
+        References
+        ----------
+        .. [vR] B.A. van der Rotten, PhD thesis,
+           \"A limited memory Broyden method to solve high-dimensional
+           systems of nonlinear equations\". Mathematisch Instituut,
+           Universiteit Leiden, The Netherlands (2003).
+           
+           http://www.math.leidenuniv.nl/scripties/Rotten.pdf
+
+        """
+        if self.collapsed is not None:
+            return
+
+        p = max_rank
+        if to_retain is not None:
+            q = to_retain
+        else:
+            q = p - 2
+
+        if self.cs:
+            p = min(p, len(self.cs[0]))
+        q = max(0, min(q, p-1))
+
+        m = len(self.cs)
+        if m < p:
+            # nothing to do
+            return
+
+        C = np.array(self.cs).T
+        D = np.array(self.ds).T
+
+        D, R = qr(D, mode='qr', econ=True)
+        C = dot(C, R.T.conj())
+
+        U, S, WH = svd(C, full_matrices=False, compute_uv=True)
+
+        C = dot(C, inv(WH))
+        D = dot(D, WH.T.conj())
+
+        for k in xrange(q):
+            self.cs[k] = C[:,k].copy()
+            self.ds[k] = D[:,k].copy()
+
+        del self.cs[q:]
+        del self.ds[q:]
+
+_doc_parts['broyden_params'] = """
+    alpha : float, optional
+        Initial guess for the Jacobian is (-1/alpha).
+    reduction_method : str or tuple, optional
+        Method used in ensuring that the rank of the Broyden matrix
+        stays low. Can either be a string giving the name of the method,
+        or a tuple of the form ``(method, param1, param2, ...)``
+        that gives the name of the method and values for additional parameters.
+
+        Methods available:
+            - ``restart``: drop all matrix columns. Has no extra parameters.
+            - ``simple``: drop oldest matrix column. Has no extra parameters.
+            - ``svd``: keep only the most significant SVD components.
+              Extra parameters:
+                  - ``to_retain`: number of SVD components to retain when
+                    rank reduction is done. Default is ``max_rank - 2``.
+    max_rank : int, optional
+        Maximum rank for the Broyden matrix.
+        Default is infinity (ie., no rank reduction).
+    """.strip()
+
+class BroydenFirst(GenericBroyden):
+    r"""
+    Find a root of a function, using Broyden's first Jacobian approximation.
+
+    This method is also known as \"Broyden's good method\".
+
+    Parameters
+    ----------
+    %(params_basic)s
+    %(broyden_params)s
+    %(params_extra)s
+
+    Notes
+    -----
+    This algorithm implements the inverse Jacobian Quasi-Newton update
+
+    .. math:: H_+ = H + (dx - H df) dx^\dagger H / ( dx^\dagger H df)
+
+    which corresponds to Broyden's first Jacobian update
+
+    .. math:: J_+ = J + (df - J dx) dx^\dagger / dx^\dagger dx
+
+
+    References
+    ----------
+    .. [vR] B.A. van der Rotten, PhD thesis,
+       \"A limited memory Broyden method to solve high-dimensional
+       systems of nonlinear equations\". Mathematisch Instituut,
+       Universiteit Leiden, The Netherlands (2003).
+
+       http://www.math.leidenuniv.nl/scripties/Rotten.pdf
+
     """
-    xm=numpy.matrix(xin).T
-    Fxm=myF(F,xm)
-    Jm=-1/alpha*numpy.matrix(numpy.identity(len(xin)))
 
-    for n in range(iter):
-        deltaxm=solve(-Jm,Fxm)
-        #!!!! What the fuck?!
-        #xm+=deltaxm
-        xm=xm+deltaxm
-        Fxm1=myF(F,xm)
-        deltaFxm=Fxm1-Fxm
-        Fxm=Fxm1
-        Jm=Jm+(deltaFxm-Jm*deltaxm)*deltaxm.T/norm(deltaxm)**2
-        if verbose:
-            print "%d:  |F(x)|=%.3f"%(n, norm(Fxm))
-    return xm.flat
+    def __init__(self, alpha=None, reduction_method='restart', max_rank=None):
+        GenericBroyden.__init__(self)
+        self.alpha = alpha
+        self.Gm = None
+        
+        if max_rank is None:
+            max_rank = np.inf
+        self.max_rank = max_rank
 
-def broyden1_modified(F, xin, iter=10, alpha=0.1, verbose = False):
-    """Broyden's first method, modified by O. Certik.
+        if isinstance(reduction_method, str):
+            reduce_params = ()
+        else:
+            reduce_params = reduction_method[1:]
+            reduction_method = reduction_method[0]
+        reduce_params = (max_rank - 1,) + reduce_params
 
-    Updates inverse Jacobian using some matrix identities at every iteration,
-    its faster then newton_slow, but still not optimal.
+        if reduction_method == 'svd':
+            self._reduce = lambda: self.Gm.svd_reduce(*reduce_params)
+        elif reduction_method == 'simple':
+            self._reduce = lambda: self.Gm.simple_reduce(*reduce_params)
+        elif reduction_method == 'restart':
+            self._reduce = lambda: self.Gm.restart_reduce(*reduce_params)
+        else:
+            raise ValueError("Unknown rank reduction method '%s'" %
+                             reduction_method)
 
-    The best norm |F(x)|=0.005 achieved in ~45 iterations.
+    def setup(self, x, F, func):
+        GenericBroyden.setup(self, x, F, func)
+        self.Gm = LowRankMatrix(-self.alpha, self.shape[0], self.dtype)
+
+    def todense(self):
+        return inv(self.Gm)
+
+    def solve(self, f, tol=0):
+        r = self.Gm.matvec(f)
+        if not np.isfinite(r).all():
+            # singular; reset the Jacobian approximation
+            self.setup(self.last_x, self.last_f, self.func)
+        return self.Gm.matvec(f)
+
+    def matvec(self, f):
+        return self.Gm.solve(f)
+
+    def rsolve(self, f, tol=0):
+        return self.Gm.rmatvec(f)
+
+    def rmatvec(self, f):
+        return self.Gm.rsolve(f)
+
+    def _update(self, x, f, dx, df, dx_norm, df_norm):
+        self._reduce() # reduce first to preserve secant condition
+
+        v = self.Gm.rmatvec(dx)
+        c = dx - self.Gm.matvec(df)
+        d = v / vdot(df, v)
+
+        self.Gm.append(c, d)
+
+
+class BroydenSecond(BroydenFirst):
     """
-    def inv(A,u,v):
+    Find a root of a function, using Broyden\'s second Jacobian approximation.
 
-        #interesting is that this
-        #return (A.I+u*v.T).I
-        #is more stable than
-        #return A-A*u*v.T*A/float(1+v.T*A*u)
-        Au=A*u
-        return A-Au*(v.T*A)/float(1+v.T*Au)
-    xm=numpy.matrix(xin).T
-    Fxm=myF(F,xm)
-    Jm=alpha*numpy.matrix(numpy.identity(len(xin)))
-    for n in range(iter):
-        deltaxm=Jm*Fxm
-        xm=xm+deltaxm
-        Fxm1=myF(F,xm)
-        deltaFxm=Fxm1-Fxm
-        Fxm=Fxm1
-#        print "-------------",norm(deltaFxm),norm(deltaxm)
-        deltaFxm/=norm(deltaxm)
-        deltaxm/=norm(deltaxm)
-        Jm=inv(Jm+deltaxm*deltaxm.T*Jm,-deltaFxm,deltaxm)
+    This method is also known as \"Broyden's bad method\".
 
-        if verbose:
-            print "%d:  |F(x)|=%.3f"%(n, norm(Fxm))
-    return xm
+    Parameters
+    ----------
+    %(params_basic)s
+    %(broyden_params)s
+    %(params_extra)s
 
-def vackar(F, xin, iter=10, alpha=0.1, verbose = False):
-    """J=diag(d1,d2,...,dN)
+    Notes
+    -----
+    This algorithm implements the inverse Jacobian Quasi-Newton update
 
-    The best norm |F(x)|=0.005 achieved in ~110 iterations.
+    .. math:: H_+ = H + (dx - H df) df^\dagger / ( df^\dagger df)
+
+    corresponding to Broyden's second method.
+
+    References
+    ----------
+    .. [vR] B.A. van der Rotten, PhD thesis,
+       \"A limited memory Broyden method to solve high-dimensional
+       systems of nonlinear equations\". Mathematisch Instituut,
+       Universiteit Leiden, The Netherlands (2003).
+
+       http://www.math.leidenuniv.nl/scripties/Rotten.pdf
+
     """
-    def myF(F,xm):
-        return numpy.array(F(tuple(xm.flat))).T
-    xm=numpy.array(xin)
-    Fxm=myF(F,xm)
-    d=1/alpha*numpy.ones(len(xin))
-    Jm=numpy.matrix(numpy.diag(d))
 
-    for n in range(iter):
-        deltaxm=1/d*Fxm
-        xm=xm+deltaxm
-        Fxm1=myF(F,xm)
-        deltaFxm=Fxm1-Fxm
-        Fxm=Fxm1
-        d=d-(deltaFxm+d*deltaxm)*deltaxm/norm(deltaxm)**2
-        if verbose:
-            print "%d:  |F(x)|=%.3f"%(n, norm(Fxm))
-    return xm
+    def _update(self, x, f, dx, df, dx_norm, df_norm):
+        self._reduce() # reduce first to preserve secant condition
 
-def linearmixing(F,xin, iter=10, alpha=0.1, verbose = False):
-    """J=-1/alpha
+        v = df
+        c = dx - self.Gm.matvec(df)
+        d = v / df_norm**2
+        self.Gm.append(c, d)
 
-    The best norm |F(x)|=0.005 achieved in ~140 iterations.
+
+#------------------------------------------------------------------------------
+# Broyden-like (restricted memory)
+#------------------------------------------------------------------------------
+
+class Anderson(GenericBroyden):
     """
-    def myF(F,xm):
-        return numpy.array(F(tuple(xm.flat))).T
-    xm=numpy.array(xin)
-    Fxm=myF(F,xm)
-    for n in range(iter):
-        deltaxm=alpha*Fxm
-        xm=xm+deltaxm
-        Fxm1=myF(F,xm)
-        deltaFxm=Fxm1-Fxm
-        Fxm=Fxm1
-        if verbose:
-            print "%d: |F(x)|=%.3f" %(n,norm(Fxm))
+    Find a root of a function, using (extended) Anderson mixing.
 
-    return xm
+    The Jacobian is formed by for a 'best' solution in the space
+    spanned by last `M` vectors. As a result, only a MxM matrix
+    inversions and MxN multiplications are required. [Ey]_
 
-def excitingmixing(F,xin,iter=10,alpha=0.1,alphamax=1.0, verbose = False):
-    """J=-1/alpha
+    Parameters
+    ----------
+    %(params_basic)s
+    alpha : float, optional
+        Initial guess for the Jacobian is (-1/alpha).
+    M : float, optional
+        Number of previous vectors to retain. Defaults to 5.
+    w0 : float, optional
+        Regularization parameter for numerical stability.
+        Compared to unity, good values of the order of 0.01.
+    %(params_extra)s
 
-    The best norm |F(x)|=0.005 achieved in ~140 iterations.
+    References
+    ----------
+    .. [Ey] V. Eyert, J. Comp. Phys., 124, 271 (1996).
+
     """
-    def myF(F,xm):
-        return numpy.array(F(tuple(xm.flat))).T
-    xm=numpy.array(xin)
-    beta=numpy.array([alpha]*len(xm))
-    Fxm=myF(F,xm)
-    for n in range(iter):
-        deltaxm=beta*Fxm
-        xm=xm+deltaxm
-        Fxm1=myF(F,xm)
-        deltaFxm=Fxm1-Fxm
-        for i in range(len(xm)):
-            if Fxm1[i]*Fxm[i] > 0:
-                beta[i]=beta[i]+alpha
-                if beta[i] > alphamax:
-                    beta[i] = alphamax
-            else:
-                beta[i]=alpha
-        Fxm=Fxm1
-        if verbose:
-            print "%d: |F(x)|=%.3f" %(n,norm(Fxm))
 
-    return xm
+    # Note:
+    #
+    # Anderson method maintains a rank M approximation of the inverse Jacobian,
+    #
+    #     J^-1 v ~ -v*alpha + (dX + alpha dF) A^-1 dF^H v
+    #     A      = W + dF^H dF
+    #     W      = w0^2 diag(dF^H dF)
+    #
+    # so that for w0 = 0 the secant condition applies for last M iterates, ie.,
+    #
+    #     J^-1 df_j = dx_j
+    #
+    # for all j = 0 ... M-1.
+    #
+    # Moreover, (from Sherman-Morrison-Woodbury formula)
+    #
+    #    J v ~ [ b I - b^2 C (I + b dF^H A^-1 C)^-1 dF^H ] v
+    #    C   = (dX + alpha dF) A^-1
+    #    b   = -1/alpha
+    #
+    # and after simplification
+    #
+    #    J v ~ -v/alpha + (dX/alpha + dF) (dF^H dX - alpha W)^-1 dF^H v
+    #
+
+    def __init__(self, alpha=None, w0=0.01, M=5):
+        GenericBroyden.__init__(self)
+        self.alpha = alpha
+        self.M = M
+        self.dx = []
+        self.df = []
+        self.gamma = None
+        self.w0 = w0
+
+    def solve(self, f, tol=0):
+        dx = -self.alpha*f
+
+        n = len(self.dx)
+        if n == 0:
+            return dx
+
+        df_f = np.empty(n, dtype=f.dtype)
+        for k in xrange(n):
+            df_f[k] = vdot(self.df[k], f)
+
+        try:
+            gamma = solve(self.a, df_f)
+        except LinAlgError:
+            # singular; reset the Jacobian approximation
+            del self.dx[:]
+            del self.df[:]
+            return dx
+
+        for m in xrange(n):
+            dx += gamma[m]*(self.dx[m] + self.alpha*self.df[m])
+        return dx
+
+    def matvec(self, f):
+        dx = -f/self.alpha
+
+        n = len(self.dx)
+        if n == 0:
+            return dx
+
+        df_f = np.empty(n, dtype=f.dtype)
+        for k in xrange(n):
+            df_f[k] = vdot(self.df[k], f)
+
+        b = np.empty((n, n), dtype=f.dtype)
+        for i in xrange(n):
+            for j in xrange(n):
+                b[i,j] = vdot(self.df[i], self.dx[j])
+                if i == j and self.w0 != 0:
+                    b[i,j] -= vdot(self.df[i], self.df[i])*self.w0**2*self.alpha
+        gamma = solve(b, df_f)
+
+        for m in xrange(n):
+            dx += gamma[m]*(self.df[m] + self.dx[m]/self.alpha)
+        return dx
+
+    def _update(self, x, f, dx, df, dx_norm, df_norm):
+        if self.M == 0:
+            return
+        
+        self.dx.append(dx)
+        self.df.append(df)
+
+        while len(self.dx) > self.M:
+            self.dx.pop(0)
+            self.df.pop(0)
+
+        n = len(self.dx)
+        a = np.zeros((n, n), dtype=f.dtype)
+
+        for i in xrange(n):
+            for j in xrange(i, n):
+                if i == j:
+                    wd = self.w0**2
+                else:
+                    wd = 0
+                a[i,j] = (1+wd)*vdot(self.df[i], self.df[j])
+
+        a += np.triu(a, 1).T.conj()
+        self.a = a
+
+#------------------------------------------------------------------------------
+# Simple iterations
+#------------------------------------------------------------------------------
+
+class DiagBroyden(GenericBroyden):
+    """
+    Find a root of a function, using diagonal Broyden Jacobian approximation.
+    
+    The Jacobian approximation is derived from previous iterations, by
+    retaining only the diagonal of Broyden matrices.
+
+    .. warning::
+
+       This algorithm may be useful for specific problems, but whether
+       it will work may depend strongly on the problem.
+
+    Parameters
+    ----------
+    %(params_basic)s
+    alpha : float, optional
+        Initial guess for the Jacobian is (-1/alpha).
+    %(params_extra)s
+    """
+
+    def __init__(self, alpha=None):
+        GenericBroyden.__init__(self)
+        self.alpha = alpha
+
+    def setup(self, x, F, func):
+        GenericBroyden.setup(self, x, F, func)
+        self.d = np.ones((self.shape[0],), dtype=self.dtype) / self.alpha
+
+    def solve(self, f, tol=0):
+        return -f / self.d
+
+    def matvec(self, f):
+        return -f * self.d
+
+    def rsolve(self, f, tol=0):
+        return -f / self.d.conj()
+
+    def rmatvec(self, f):
+        return -f * self.d.conj()
+
+    def todense(self):
+        return np.diag(-self.d)
+
+    def _update(self, x, f, dx, df, dx_norm, df_norm):
+        self.d -= (df + self.d*dx)*dx/dx_norm**2
+
+class LinearMixing(GenericBroyden):
+    """
+    Find a root of a function, using a scalar Jacobian approximation.
+
+    .. warning::
+
+       This algorithm may be useful for specific problems, but whether
+       it will work may depend strongly on the problem.
+
+    Parameters
+    ----------
+    %(params_basic)s
+    alpha : float, optional
+        The Jacobian approximation is (-1/alpha).
+    %(params_extra)s
+    """
+
+    def __init__(self, alpha=None):
+        GenericBroyden.__init__(self)
+        self.alpha = alpha
+
+    def solve(self, f, tol=0):
+        return -f*self.alpha
+
+    def matvec(self, f):
+        return -f/self.alpha
+
+    def rsolve(self, f, tol=0):
+        return -f*np.conj(self.alpha)
+
+    def rmatvec(self, f):
+        return -f/np.conj(self.alpha)
+
+    def todense(self):
+        return np.diag(-np.ones(self.shape[0])/self.alpha)
+
+    def _update(self, x, f, dx, df, dx_norm, df_norm):
+        pass
+
+class ExcitingMixing(GenericBroyden):
+    """
+    Find a root of a function, using a tuned diagonal Jacobian approximation.
+
+    The Jacobian matrix is diagonal and is tuned on each iteration.
+
+    .. warning::
+
+       This algorithm may be useful for specific problems, but whether
+       it will work may depend strongly on the problem.
+
+    Parameters
+    ----------
+    %(params_basic)s
+    alpha : float, optional
+        Initial Jacobian approximation is (-1/alpha).
+    alphamax : float, optional
+        The entries of the diagonal Jacobian are kept in the range
+        ``[alpha, alphamax]``.
+    %(params_extra)s
+    """
+
+    def __init__(self, alpha=None, alphamax=1.0):
+        GenericBroyden.__init__(self)
+        self.alpha = alpha
+        self.alphamax = alphamax
+        self.beta = None
+
+    def setup(self, x, F, func):
+        GenericBroyden.setup(self, x, F, func)
+        self.beta = self.alpha * np.ones((self.shape[0],), dtype=self.dtype)
+
+    def solve(self, f, tol=0):
+        return -f*self.beta
+
+    def matvec(self, f):
+        return -f/self.beta
+
+    def rsolve(self, f, tol=0):
+        return -f*self.beta.conj()
+
+    def rmatvec(self, f):
+        return -f/self.beta.conj()
+
+    def todense(self):
+        return np.diag(-1/self.beta)
+
+    def _update(self, x, f, dx, df, dx_norm, df_norm):
+        incr = f*self.last_f > 0
+        self.beta[incr] += self.alpha
+        self.beta[~incr] = self.alpha
+        np.clip(self.beta, 0, self.alphamax, out=self.beta)
+
+
+#------------------------------------------------------------------------------
+# Iterative/Krylov approximated Jacobians
+#------------------------------------------------------------------------------
+
+class KrylovJacobian(Jacobian):
+    r"""
+    Find a root of a function, using Krylov approximation for inverse Jacobian.
+
+    This method is suitable for solving large-scale problems.
+
+    Parameters
+    ----------
+    %(params_basic)s
+    rdiff : float, optional
+        Relative step size to use in numerical differentiation.
+    method : {'lgmres', 'gmres', 'bicgstab', 'cgs', 'minres'} or function
+        Krylov method to use to approximate the Jacobian.
+        Can be a string, or a function implementing the same interface as
+        the iterative solvers in `scipy.sparse.linalg`.
+
+        The default is `scipy.sparse.linalg.lgmres`.
+    inner_M : LinearOperator or InverseJacobian
+        Preconditioner for the inner Krylov iteration.
+        Note that you can use also inverse Jacobians as (adaptive)
+        preconditioners. For example,
+
+        >>> jac = BroydenFirst()
+        >>> kjac = KrylovJacobian(inner_M=jac.inverse).
+    inner_tol, inner_maxiter, ...
+        Parameters to pass on to the \"inner\" Krylov solver.
+        See `scipy.sparse.linalg.gmres` for details.
+    outer_k : int, optional
+        Size of the subspace kept across LGMRES nonlinear iterations.
+        See `scipy.sparse.linalg.lgmres` for details.
+    %(params_extra)s
+
+    See Also
+    --------
+    scipy.sparse.linalg.gmres
+    scipy.sparse.linalg.lgmres
+
+    Notes
+    -----
+    This function implements a Newton-Krylov solver. The basic idea is
+    to compute the inverse of the Jacobian with an iterative Krylov
+    method. These methods require only evaluating the Jacobian-vector
+    products, which are conveniently approximated by numerical
+    differentiation:
+
+    .. math:: J v \approx (f(x + \omega*v/|v|) - f(x)) / \omega
+
+    Due to the use of iterative matrix inverses, these methods can
+    deal with large nonlinear problems.
+
+    Scipy's `scipy.sparse.linalg` module offers a selection of Krylov
+    solvers to choose from. The default here is `lgmres`, which is a
+    variant of restarted GMRES iteration that reuses some of the
+    information obtained in the previous Newton steps to invert
+    Jacobians in subsequent steps.
+
+    For a review on Newton-Krylov methods, see for example [KK]_,
+    and for the LGMRES sparse inverse method, see [BJM]_.
+
+    References
+    ----------
+    .. [KK] D.A. Knoll and D.E. Keyes, J. Comp. Phys. 193, 357 (2003).
+    .. [BJM] A.H. Baker and E.R. Jessup and T. Manteuffel,
+             SIAM J. Matrix Anal. Appl. 26, 962 (2005).
+
+    """
+
+    def __init__(self, rdiff=None, method='lgmres', inner_maxiter=20,
+                 inner_M=None, outer_k=10, **kw):
+        self.preconditioner = inner_M
+        self.rdiff = rdiff
+        self.method = dict(
+            bicgstab=scipy.sparse.linalg.bicgstab,
+            gmres=scipy.sparse.linalg.gmres,
+            lgmres=scipy.sparse.linalg.lgmres,
+            cgs=scipy.sparse.linalg.cgs,
+            minres=scipy.sparse.linalg.minres,
+            ).get(method, method)
+
+        self.method_kw = dict(maxiter=inner_maxiter, M=self.preconditioner)
+
+        if self.method is scipy.sparse.linalg.gmres:
+            # Replace GMRES's outer iteration with Newton steps
+            self.method_kw['restrt'] = inner_maxiter
+            self.method_kw['maxiter'] = 1
+        elif self.method is scipy.sparse.linalg.lgmres:
+            self.method_kw['outer_k'] = outer_k
+            # Replace LGMRES's outer iteration with Newton steps
+            self.method_kw['maxiter'] = 1
+            # Carry LGMRES's `outer_v` vectors across nonlinear iterations
+            self.method_kw.setdefault('outer_v', [])
+            # But don't carry the corresponding Jacobian*v products, in case
+            # the Jacobian changes a lot in the nonlinear step
+            #
+            # XXX: some trust-region inspired ideas might be more efficient...
+            #      See eg. Brown & Saad. But needs to be implemented separately
+            #      since it's not an inexact Newton method.
+            self.method_kw.setdefault('store_outer_Av', False)
+
+        for key, value in kw.items():
+            if not key.startswith('inner_'):
+                raise ValueError("Unknown parameter %s" % key)
+            self.method_kw[key[6:]] = value
+
+    def _update_diff_step(self):
+        mx = abs(self.x0).max()
+        mf = abs(self.f0).max()
+        self.omega = self.rdiff * max(1, mx) / max(1, mf)
+
+    def matvec(self, v):
+        nv = norm(v)
+        if nv == 0:
+            return 0*v
+        sc = self.omega / nv
+        r = (self.func(self.x0 + sc*v) - self.f0) / sc
+        if not np.all(np.isfinite(r)) and np.all(np.isfinite(v)):
+            raise ValueError('Function returned non-finite results')
+        return r
+
+    def solve(self, rhs, tol=0):
+        sol, info = self.method(self.op, rhs, tol=tol, **self.method_kw)
+        return sol
+
+    def update(self, x, f):
+        self.x0 = x
+        self.f0 = f
+        self._update_diff_step()
+
+        # Update also the preconditioner, if possible
+        if self.preconditioner is not None:
+            if hasattr(self.preconditioner, 'update'):
+                self.preconditioner.update(x, f)
+
+    def setup(self, x, f, func):
+        Jacobian.setup(self, x, f, func)
+        self.x0 = x
+        self.f0 = f
+        self.op = scipy.sparse.linalg.aslinearoperator(self)
+
+        if self.rdiff is None:
+            self.rdiff = np.finfo(x.dtype).eps ** (1./2)
+
+        self._update_diff_step()
+
+
+        # Setup also the preconditioner, if possible
+        if self.preconditioner is not None:
+            if hasattr(self.preconditioner, 'setup'):
+                self.preconditioner.setup(x, f, func)
+
+
+#------------------------------------------------------------------------------
+# Wrapper functions
+#------------------------------------------------------------------------------
+
+def _nonlin_wrapper(name, jac):
+    """
+    Construct a solver wrapper with given name and jacobian approx.
+
+    It inspects the keyword arguments of ``jac.__init__``, and allows to
+    use the same arguments in the wrapper function, in addition to the
+    keyword arguments of `nonlin_solve`
+
+    """
+    import inspect
+    args, varargs, varkw, defaults = inspect.getargspec(jac.__init__)
+    kwargs = zip(args[-len(defaults):], defaults)
+    kw_str = ", ".join(["%s=%r" % (k, v) for k, v in kwargs])
+    if kw_str:
+        kw_str = ", " + kw_str
+    kwkw_str = ", ".join(["%s=%s" % (k, k) for k, v in kwargs])
+    if kwkw_str:
+        kwkw_str = kwkw_str + ", "
+
+    # Construct the wrapper function so that it's keyword arguments
+    # are visible in pydoc.help etc.
+    wrapper = """
+def %(name)s(F, xin, iter=None %(kw)s, verbose=False, maxiter=None, 
+             f_tol=None, f_rtol=None, x_tol=None, x_rtol=None, 
+             tol_norm=None, line_search='armijo', callback=None, **kw):
+    jac = %(jac)s(%(kwkw)s **kw)
+    return nonlin_solve(F, xin, jac, iter, verbose, maxiter,
+                        f_tol, f_rtol, x_tol, x_rtol, tol_norm, line_search,
+                        callback)
+"""
+
+    wrapper = wrapper % dict(name=name, kw=kw_str, jac=jac.__name__,
+                             kwkw=kwkw_str)
+    ns = {}
+    ns.update(globals())
+    exec wrapper in ns
+    func = ns[name]
+    func.__doc__ = jac.__doc__
+    _set_doc(func)
+    return func
+
+broyden1 = _nonlin_wrapper('broyden1', BroydenFirst)
+broyden2 = _nonlin_wrapper('broyden2', BroydenSecond)
+anderson = _nonlin_wrapper('anderson', Anderson)
+linearmixing = _nonlin_wrapper('linearmixing', LinearMixing)
+diagbroyden = _nonlin_wrapper('diagbroyden', DiagBroyden)
+excitingmixing = _nonlin_wrapper('excitingmixing', ExcitingMixing)
+newton_krylov = _nonlin_wrapper('newton_krylov', KrylovJacobian)
+
+
+# Deprecated functions
+
+ at np.deprecate
+def broyden_generalized(*a, **kw):
+    """Use *anderson(..., w0=0)* instead"""
+    kw.setdefault('w0', 0)
+    return anderson(*a, **kw)
+
+ at np.deprecate
+def broyden1_modified(*a, **kw):
+    """Use `broyden1` instead"""
+    return broyden1(*a, **kw)
+
+ at np.deprecate
+def broyden_modified(*a, **kw):
+    """Use `anderson` instead"""
+    return anderson(*a, **kw)
+
+ at np.deprecate
+def anderson2(*a, **kw):
+    """Use `anderson` instead"""
+    return anderson(*a, **kw)
+
+ at np.deprecate
+def broyden3(*a, **kw):
+    """Use `broyden2` instead"""
+    return broyden2(*a, **kw)
+
+ at np.deprecate
+def vackar(*a, **kw):
+    """Use `diagbroyden` instead"""
+    return diagbroyden(*a, **kw)

Modified: trunk/scipy/optimize/tests/test_minpack.py
===================================================================
--- trunk/scipy/optimize/tests/test_minpack.py	2010-09-04 22:53:26 UTC (rev 6674)
+++ trunk/scipy/optimize/tests/test_minpack.py	2010-09-04 22:53:52 UTC (rev 6675)
@@ -9,7 +9,7 @@
 from scipy import optimize
 from scipy.optimize.minpack import fsolve, leastsq, curve_fit
 
-class TestFSolve(TestCase):
+class TestFSolve(object):
     def pressure_network(self, flow_rates, Qtot, k):
         """Evaluate non-linear equation system representing
         the pressures and flows in a system of n parallel pipes::

Modified: trunk/scipy/optimize/tests/test_nonlin.py
===================================================================
--- trunk/scipy/optimize/tests/test_nonlin.py	2010-09-04 22:53:26 UTC (rev 6674)
+++ trunk/scipy/optimize/tests/test_nonlin.py	2010-09-04 22:53:52 UTC (rev 6675)
@@ -6,89 +6,340 @@
 from numpy.testing import *
 
 from scipy.optimize import nonlin
-from numpy import matrix, diag
+from numpy import matrix, diag, dot
+from numpy.linalg import inv
+import numpy as np
 
+SOLVERS = [nonlin.anderson, nonlin.diagbroyden, nonlin.linearmixing,
+           nonlin.excitingmixing, nonlin.broyden1, nonlin.broyden2,
+           nonlin.newton_krylov]
+MUST_WORK = [nonlin.anderson, nonlin.broyden1, nonlin.broyden2,
+             nonlin.newton_krylov]
 
+#-------------------------------------------------------------------------------
+# Test problems
+#-------------------------------------------------------------------------------
+
 def F(x):
-    def p3(y):
-        return float(y.T*y)*y
-    try:
-        x=tuple(x.flat)
-    except:
+    x = np.asmatrix(x).T
+    d = matrix(diag([3,2,1.5,1,0.5]))
+    c = 0.01
+    f = -d*x - c*float(x.T*x)*x
+    return f
+F.xin = [1,1,1,1,1]
+F.KNOWN_BAD = []
+
+def F2(x):
+    return x
+F2.xin = [1,2,3,4,5,6]
+F2.KNOWN_BAD = [nonlin.linearmixing, nonlin.excitingmixing]
+
+def F3(x):
+    A = np.mat('-2 1 0; 1 -2 1; 0 1 -2')
+    b = np.mat('1 2 3')
+    return np.dot(A, x) - b
+F3.xin = [1,2,3]
+F3.KNOWN_BAD = []
+
+def F4_powell(x):
+    A = 1e4
+    return [A*x[0]*x[1] - 1, np.exp(-x[0]) + np.exp(-x[1]) - (1 + 1/A)]
+F4_powell.xin = [-1, -2]
+F4_powell.KNOWN_BAD = [nonlin.linearmixing, nonlin.excitingmixing,
+                       nonlin.diagbroyden]
+
+from test_minpack import TestFSolve as F5_class
+F5_object = F5_class()
+def F5(x):
+    return F5_object.pressure_network(x, 4, np.array([.5, .5, .5, .5]))
+F5.xin = [2., 0, 2, 0]
+F5.KNOWN_BAD = [nonlin.excitingmixing, nonlin.linearmixing, nonlin.diagbroyden]
+
+def F6(x):
+    x1, x2 = x
+    J0 = np.array([[ -4.256     ,  14.7       ],
+                [  0.8394989 ,   0.59964207]])
+    v = np.array([(x1 + 3) * (x2**5 - 7) + 3*6,
+                  np.sin(x2 * np.exp(x1) - 1)])
+    return -np.linalg.solve(J0, v)
+F6.xin = [-0.5, 1.4]
+F6.KNOWN_BAD = [nonlin.excitingmixing, nonlin.linearmixing, nonlin.diagbroyden]
+
+#-------------------------------------------------------------------------------
+# Tests
+#-------------------------------------------------------------------------------
+
+class TestNonlin(object):
+    """
+    Check the Broyden methods for a few test problems.
+
+    broyden1, broyden2, and newton_krylov must succeed for
+    all functions. Some of the others don't -- tests in KNOWN_BAD are skipped.
+
+    """
+
+    def _check_func(self, f, func, f_tol=1e-2):
+        x = func(f, f.xin, f_tol=f_tol, maxiter=200, verbose=0)
+        assert np.absolute(f(x)).max() < f_tol
+
+    @dec.knownfailureif(True)
+    def _check_func_fail(self, *a, **kw):
         pass
-    x=matrix(x).T
 
-    d=matrix(diag([3,2,1.5,1,0.5]))
-    c=0.01
-    f=-d*x-c*p3(x)
+    def test_problem(self):
+        for f in [F, F2, F3, F4_powell, F5, F6]:
+            for func in SOLVERS:
+                if func in f.KNOWN_BAD:
+                    if func in MUST_WORK:
+                        yield self._check_func_fail, f, func
+                    continue
+                yield self._check_func, f, func
 
-    return tuple(f.flat)
 
-class TestNonlin(TestCase):
+class TestSecant(TestCase):
+    """Check that some Jacobian approximations satisfy the secant condition"""
+
+    xs = [np.array([1,2,3,4,5], float),
+          np.array([2,3,4,5,1], float),
+          np.array([3,4,5,1,2], float),
+          np.array([4,5,1,2,3], float),
+          np.array([9,1,9,1,3], float),
+          np.array([0,1,9,1,3], float),
+          np.array([5,5,7,1,1], float),
+          np.array([1,2,7,5,1], float),]
+    fs = [x**2 - 1 for x in xs]
+
+    def _check_secant(self, jac_cls, npoints=1, **kw):
+        """
+        Check that the given Jacobian approximation satisfies secant
+        conditions for last `npoints` points.
+        """
+        jac = jac_cls(**kw)
+        jac.setup(self.xs[0], self.fs[0], None)
+        for j, (x, f) in enumerate(zip(self.xs[1:], self.fs[1:])):
+            jac.update(x, f)
+
+            for k in xrange(min(npoints, j+1)):
+                dx = self.xs[j-k+1] - self.xs[j-k]
+                df = self.fs[j-k+1] - self.fs[j-k]
+                assert np.allclose(dx, jac.solve(df))
+
+            # Check that the `npoints` secant bound is strict
+            if j >= npoints:
+                dx = self.xs[j-npoints+1] - self.xs[j-npoints]
+                df = self.fs[j-npoints+1] - self.fs[j-npoints]
+                assert not np.allclose(dx, jac.solve(df))
+
+    def test_broyden1(self):
+        self._check_secant(nonlin.BroydenFirst)
+        
+    def test_broyden2(self):
+        self._check_secant(nonlin.BroydenSecond)
+
+    def test_broyden1_update(self):
+        # Check that BroydenFirst update works as for a dense matrix
+        jac = nonlin.BroydenFirst(alpha=0.1)
+        jac.setup(self.xs[0], self.fs[0], None)
+
+        B = np.identity(5) * (-1/0.1)
+
+        for last_j, (x, f) in enumerate(zip(self.xs[1:], self.fs[1:])):
+            df = f - self.fs[last_j]
+            dx = x - self.xs[last_j]
+            B += (df - dot(B, dx))[:,None] * dx[None,:] / dot(dx, dx)
+            jac.update(x, f)
+            assert np.allclose(jac.todense(), B, rtol=1e-10, atol=1e-13)
+
+    def test_broyden2_update(self):
+        # Check that BroydenSecond update works as for a dense matrix
+        jac = nonlin.BroydenSecond(alpha=0.1)
+        jac.setup(self.xs[0], self.fs[0], None)
+
+        H = np.identity(5) * (-0.1)
+
+        for last_j, (x, f) in enumerate(zip(self.xs[1:], self.fs[1:])):
+            df = f - self.fs[last_j]
+            dx = x - self.xs[last_j]
+            H += (dx - dot(H, df))[:,None] * df[None,:] / dot(df, df)
+            jac.update(x, f)
+            assert np.allclose(jac.todense(), inv(H), rtol=1e-10, atol=1e-13)
+            
+    def test_anderson(self):
+        # Anderson mixing (with w0=0) satisfies secant conditions
+        # for the last M iterates, see [Ey]_
+        #
+        # .. [Ey] V. Eyert, J. Comp. Phys., 124, 271 (1996).
+        self._check_secant(nonlin.Anderson, M=3, w0=0, npoints=3)
+
+class TestLinear(TestCase):
+    """Solve a linear equation;
+    some methods find the exact solution in a finite number of steps"""
+
+    def _check(self, jac, N, maxiter, complex=False, **kw):
+        np.random.seed(123)
+
+        A = np.random.randn(N, N)
+        if complex:
+            A = A + 1j*np.random.randn(N, N)
+        b = np.random.randn(N)
+        if complex:
+            b = b + 1j*np.random.randn(N)
+
+        def func(x):
+            return dot(A, x) - b
+
+        sol = nonlin.nonlin_solve(func, b*0, jac, maxiter=maxiter,
+                                  f_tol=1e-6, line_search=None, verbose=0)
+        assert np.allclose(dot(A, sol), b, atol=1e-6)
+
+    def test_broyden1(self):
+        # Broyden methods solve linear systems exactly in 2*N steps
+        self._check(nonlin.BroydenFirst(alpha=1.0), 20, 41, False)
+        self._check(nonlin.BroydenFirst(alpha=1.0), 20, 41, True)
+        
+    def test_broyden2(self):
+        # Broyden methods solve linear systems exactly in 2*N steps
+        self._check(nonlin.BroydenSecond(alpha=1.0), 20, 41, False)
+        self._check(nonlin.BroydenSecond(alpha=1.0), 20, 41, True)
+
+    def test_anderson(self):
+        # Anderson is rather similar to Broyden, if given enough storage space
+        self._check(nonlin.Anderson(M=50, alpha=1.0), 20, 29, False)
+        self._check(nonlin.Anderson(M=50, alpha=1.0), 20, 29, True)
+
+    def test_krylov(self):
+        # Krylov methods solve linear systems exactly in N inner steps
+        self._check(nonlin.KrylovJacobian, 20, 2, False, inner_m=10)
+        self._check(nonlin.KrylovJacobian, 20, 2, True, inner_m=10)
+
+
+class TestJacobianDotSolve(object):
+    """Check that solve/dot methods in Jacobian approximations are consistent"""
+
+    def _func(self, x):
+        return x**2 - 1 + np.dot(self.A, x)
+
+    def _check_dot(self, jac_cls, complex=False, tol=1e-6, **kw):
+        np.random.seed(123)
+
+        N = 7
+        def rand(*a):
+            q = np.random.rand(*a)
+            if complex:
+                q = q + 1j*np.random.rand(*a)
+            return q
+
+        def assert_close(a, b, msg):
+            d = abs(a - b).max()
+            f = tol + abs(b).max()*tol
+            if d > f:
+                raise AssertionError('%s: err %g' % (msg, d))
+
+        self.A = rand(N, N)
+
+        # initialize
+        x0 = np.random.rand(N)
+        jac = jac_cls(**kw)
+        jac.setup(x0, self._func(x0), self._func)
+
+        # check consistency
+        for k in xrange(2*N):
+            v = rand(N)
+
+            if hasattr(jac, '__array__'):
+                Jd = np.array(jac)
+                if hasattr(jac, 'solve'):
+                    Gv = jac.solve(v)
+                    Gv2 = np.linalg.solve(Jd, v)
+                    assert_close(Gv, Gv2, 'solve vs array')
+                if hasattr(jac, 'rsolve'):
+                    Gv = jac.rsolve(v)
+                    Gv2 = np.linalg.solve(Jd.T.conj(), v)
+                    assert_close(Gv, Gv2, 'rsolve vs array')
+                if hasattr(jac, 'matvec'):
+                    Jv = jac.matvec(v)
+                    Jv2 = np.dot(Jd, v)
+                    assert_close(Jv, Jv2, 'dot vs array')
+                if hasattr(jac, 'rmatvec'):
+                    Jv = jac.rmatvec(v)
+                    Jv2 = np.dot(Jd.T.conj(), v)
+                    assert_close(Jv, Jv2, 'rmatvec vs array')
+
+            if hasattr(jac, 'matvec') and hasattr(jac, 'solve'):
+                Jv = jac.matvec(v)
+                Jv2 = jac.solve(jac.matvec(Jv))
+                assert_close(Jv, Jv2, 'dot vs solve')
+
+            if hasattr(jac, 'rmatvec') and hasattr(jac, 'rsolve'):
+                Jv = jac.rmatvec(v)
+                Jv2 = jac.rmatvec(jac.rsolve(Jv))
+                assert_close(Jv, Jv2, 'rmatvec vs rsolve')
+
+            x = rand(N)
+            jac.update(x, self._func(x))
+
+    def test_broyden1(self):
+        self._check_dot(nonlin.BroydenFirst, complex=False)
+        self._check_dot(nonlin.BroydenFirst, complex=True)
+
+    def test_broyden2(self):
+        self._check_dot(nonlin.BroydenSecond, complex=False)
+        self._check_dot(nonlin.BroydenSecond, complex=True)
+
+    def test_anderson(self):
+        self._check_dot(nonlin.Anderson, complex=False)
+        self._check_dot(nonlin.Anderson, complex=True)
+
+    def test_diagbroyden(self):
+        self._check_dot(nonlin.DiagBroyden, complex=False)
+        self._check_dot(nonlin.DiagBroyden, complex=True)
+
+    def test_linearmixing(self):
+        self._check_dot(nonlin.LinearMixing, complex=False)
+        self._check_dot(nonlin.LinearMixing, complex=True)
+
+    def test_excitingmixing(self):
+        self._check_dot(nonlin.ExcitingMixing, complex=False)
+        self._check_dot(nonlin.ExcitingMixing, complex=True)
+
+    def test_krylov(self):
+        self._check_dot(nonlin.KrylovJacobian, complex=False, tol=1e-4)
+        self._check_dot(nonlin.KrylovJacobian, complex=True, tol=1e-4)
+
+class TestNonlinOldTests(TestCase):
     """ Test case for a simple constrained entropy maximization problem
     (the machine translation example of Berger et al in
     Computational Linguistics, vol 22, num 1, pp 39--72, 1996.)
     """
-    def setUp(self):
-        self.xin=[1,1,1,1,1]
 
-
-    def test_linearmixing(self):
-        x = nonlin.linearmixing(F,self.xin,iter=60,alpha=0.5)
-        assert nonlin.norm(x)<1e-7
-        assert nonlin.norm(F(x))<1e-7
-
     def test_broyden1(self):
-        x= nonlin.broyden1(F,self.xin,iter=11,alpha=1)
+        x= nonlin.broyden1(F,F.xin,iter=12,alpha=1)
         assert nonlin.norm(x)<1e-9
         assert nonlin.norm(F(x))<1e-9
 
     def test_broyden2(self):
-        x= nonlin.broyden2(F,self.xin,iter=12,alpha=1)
+        x= nonlin.broyden2(F,F.xin,iter=12,alpha=1)
         assert nonlin.norm(x)<1e-9
         assert nonlin.norm(F(x))<1e-9
 
-    def test_broyden3(self):
-        x= nonlin.broyden3(F,self.xin,iter=12,alpha=1)
-        assert nonlin.norm(x)<1e-9
-        assert nonlin.norm(F(x))<1e-9
-
-    def test_exciting(self):
-        x= nonlin.excitingmixing(F,self.xin,iter=20,alpha=0.5)
-        assert nonlin.norm(x)<1e-5
-        assert nonlin.norm(F(x))<1e-5
-
     def test_anderson(self):
-        x= nonlin.anderson(F,self.xin,iter=12,alpha=0.03,M=5)
+        x= nonlin.anderson(F,F.xin,iter=12,alpha=0.03,M=5)
         assert nonlin.norm(x)<0.33
 
-    def test_anderson2(self):
-        x= nonlin.anderson2(F,self.xin,iter=12,alpha=0.6,M=5)
-        assert nonlin.norm(x)<0.2
-
-    def test_broydengeneralized(self):
-        x= nonlin.broyden_generalized(F,self.xin,iter=60,alpha=0.5,M=0)
+    def test_linearmixing(self):
+        x = nonlin.linearmixing(F,F.xin,iter=60,alpha=0.5)
         assert nonlin.norm(x)<1e-7
         assert nonlin.norm(F(x))<1e-7
-        x= nonlin.broyden_generalized(F,self.xin,iter=61,alpha=0.1,M=1)
-        assert nonlin.norm(x)<2e-4
-        assert nonlin.norm(F(x))<2e-4
 
-    def xtest_broydenmodified(self):
-        x= nonlin.broyden_modified(F,self.xin,iter=12,alpha=1)
-        assert nonlin.norm(x)<1e-9
-        assert nonlin.norm(F(x))<1e-9
+    def test_exciting(self):
+        x= nonlin.excitingmixing(F,F.xin,iter=20,alpha=0.5)
+        assert nonlin.norm(x)<1e-5
+        assert nonlin.norm(F(x))<1e-5
 
-    def test_broyden1modified(self):
-        x= nonlin.broyden1_modified(F,self.xin,iter=35,alpha=1)
-        assert nonlin.norm(x)<1e-9
-        assert nonlin.norm(F(x))<1e-9
+    def test_diagbroyden(self):
+        x= nonlin.diagbroyden(F,F.xin,iter=11,alpha=1)
+        assert nonlin.norm(x)<1e-8
+        assert nonlin.norm(F(x))<1e-8
 
-    def test_vackar(self):
-        x= nonlin.vackar(F,self.xin,iter=11,alpha=1)
-        assert nonlin.norm(x)<1e-9
-        assert nonlin.norm(F(x))<1e-9
-
-
 if __name__ == "__main__":
     run_module_suite()