|
"""Trust Region Reflective algorithm for least-squares optimization. |
|
|
|
The algorithm is based on ideas from paper [STIR]_. The main idea is to |
|
account for the presence of the bounds by appropriate scaling of the variables (or, |
|
equivalently, changing a trust-region shape). Let's introduce a vector v: |
|
|
|
| ub[i] - x[i], if g[i] < 0 and ub[i] < np.inf |
|
v[i] = | x[i] - lb[i], if g[i] > 0 and lb[i] > -np.inf |
|
| 1, otherwise |
|
|
|
where g is the gradient of a cost function and lb, ub are the bounds. Its |
|
components are distances to the bounds at which the anti-gradient points (if |
|
this distance is finite). Define a scaling matrix D = diag(v**0.5). |
|
First-order optimality conditions can be stated as |
|
|
|
D^2 g(x) = 0. |
|
|
|
Meaning that components of the gradient should be zero for strictly interior |
|
variables, and components must point inside the feasible region for variables |
|
on the bound. |
|
|
|
Now consider this system of equations as a new optimization problem. If the |
|
point x is strictly interior (not on the bound), then the left-hand side is |
|
differentiable and the Newton step for it satisfies |
|
|
|
(D^2 H + diag(g) Jv) p = -D^2 g |
|
|
|
where H is the Hessian matrix (or its J^T J approximation in least squares), |
|
Jv is the Jacobian matrix of v with components -1, 1 or 0, such that all |
|
elements of matrix C = diag(g) Jv are non-negative. Introduce the change |
|
of the variables x = D x_h (_h would be "hat" in LaTeX). In the new variables, |
|
we have a Newton step satisfying |
|
|
|
B_h p_h = -g_h, |
|
|
|
where B_h = D H D + C, g_h = D g. In least squares B_h = J_h^T J_h, where |
|
J_h = J D. Note that J_h and g_h are proper Jacobian and gradient with respect |
|
to "hat" variables. To guarantee global convergence we formulate a |
|
trust-region problem based on the Newton step in the new variables: |
|
|
|
0.5 * p_h^T B_h p + g_h^T p_h -> min, ||p_h|| <= Delta |
|
|
|
In the original space B = H + D^{-1} C D^{-1}, and the equivalent trust-region |
|
problem is |
|
|
|
0.5 * p^T B p + g^T p -> min, ||D^{-1} p|| <= Delta |
|
|
|
Here, the meaning of the matrix D becomes more clear: it alters the shape |
|
of a trust-region, such that large steps towards the bounds are not allowed. |
|
In the implementation, the trust-region problem is solved in "hat" space, |
|
but handling of the bounds is done in the original space (see below and read |
|
the code). |
|
|
|
The introduction of the matrix D doesn't allow to ignore bounds, the algorithm |
|
must keep iterates strictly feasible (to satisfy aforementioned |
|
differentiability), the parameter theta controls step back from the boundary |
|
(see the code for details). |
|
|
|
The algorithm does another important trick. If the trust-region solution |
|
doesn't fit into the bounds, then a reflected (from a firstly encountered |
|
bound) search direction is considered. For motivation and analysis refer to |
|
[STIR]_ paper (and other papers of the authors). In practice, it doesn't need |
|
a lot of justifications, the algorithm simply chooses the best step among |
|
three: a constrained trust-region step, a reflected step and a constrained |
|
Cauchy step (a minimizer along -g_h in "hat" space, or -D^2 g in the original |
|
space). |
|
|
|
Another feature is that a trust-region radius control strategy is modified to |
|
account for appearance of the diagonal C matrix (called diag_h in the code). |
|
|
|
Note that all described peculiarities are completely gone as we consider |
|
problems without bounds (the algorithm becomes a standard trust-region type |
|
algorithm very similar to ones implemented in MINPACK). |
|
|
|
The implementation supports two methods of solving the trust-region problem. |
|
The first, called 'exact', applies SVD on Jacobian and then solves the problem |
|
very accurately using the algorithm described in [JJMore]_. It is not |
|
applicable to large problem. The second, called 'lsmr', uses the 2-D subspace |
|
approach (sometimes called "indefinite dogleg"), where the problem is solved |
|
in a subspace spanned by the gradient and the approximate Gauss-Newton step |
|
found by ``scipy.sparse.linalg.lsmr``. A 2-D trust-region problem is |
|
reformulated as a 4th order algebraic equation and solved very accurately by |
|
``numpy.roots``. The subspace approach allows to solve very large problems |
|
(up to couple of millions of residuals on a regular PC), provided the Jacobian |
|
matrix is sufficiently sparse. |
|
|
|
References |
|
---------- |
|
.. [STIR] Branch, M.A., T.F. Coleman, and Y. Li, "A Subspace, Interior, |
|
and Conjugate Gradient Method for Large-Scale Bound-Constrained |
|
Minimization Problems," SIAM Journal on Scientific Computing, |
|
Vol. 21, Number 1, pp 1-23, 1999. |
|
.. [JJMore] More, J. J., "The Levenberg-Marquardt Algorithm: Implementation |
|
and Theory," Numerical Analysis, ed. G. A. Watson, Lecture |
|
""" |
|
import numpy as np |
|
from numpy.linalg import norm |
|
from scipy.linalg import svd, qr |
|
from scipy.sparse.linalg import lsmr |
|
from scipy.optimize import OptimizeResult |
|
|
|
from .common import ( |
|
step_size_to_bound, find_active_constraints, in_bounds, |
|
make_strictly_feasible, intersect_trust_region, solve_lsq_trust_region, |
|
solve_trust_region_2d, minimize_quadratic_1d, build_quadratic_1d, |
|
evaluate_quadratic, right_multiplied_operator, regularized_lsq_operator, |
|
CL_scaling_vector, compute_grad, compute_jac_scale, check_termination, |
|
update_tr_radius, scale_for_robust_loss_function, print_header_nonlinear, |
|
print_iteration_nonlinear) |
|
|
|
|
|
def trf(fun, jac, x0, f0, J0, lb, ub, ftol, xtol, gtol, max_nfev, x_scale, |
|
loss_function, tr_solver, tr_options, verbose): |
|
|
|
|
|
|
|
|
|
if np.all(lb == -np.inf) and np.all(ub == np.inf): |
|
return trf_no_bounds( |
|
fun, jac, x0, f0, J0, ftol, xtol, gtol, max_nfev, x_scale, |
|
loss_function, tr_solver, tr_options, verbose) |
|
else: |
|
return trf_bounds( |
|
fun, jac, x0, f0, J0, lb, ub, ftol, xtol, gtol, max_nfev, x_scale, |
|
loss_function, tr_solver, tr_options, verbose) |
|
|
|
|
|
def select_step(x, J_h, diag_h, g_h, p, p_h, d, Delta, lb, ub, theta): |
|
"""Select the best step according to Trust Region Reflective algorithm.""" |
|
if in_bounds(x + p, lb, ub): |
|
p_value = evaluate_quadratic(J_h, g_h, p_h, diag=diag_h) |
|
return p, p_h, -p_value |
|
|
|
p_stride, hits = step_size_to_bound(x, p, lb, ub) |
|
|
|
|
|
r_h = np.copy(p_h) |
|
r_h[hits.astype(bool)] *= -1 |
|
r = d * r_h |
|
|
|
|
|
p *= p_stride |
|
p_h *= p_stride |
|
x_on_bound = x + p |
|
|
|
|
|
|
|
_, to_tr = intersect_trust_region(p_h, r_h, Delta) |
|
to_bound, _ = step_size_to_bound(x_on_bound, r, lb, ub) |
|
|
|
|
|
|
|
|
|
|
|
r_stride = min(to_bound, to_tr) |
|
if r_stride > 0: |
|
r_stride_l = (1 - theta) * p_stride / r_stride |
|
if r_stride == to_bound: |
|
r_stride_u = theta * to_bound |
|
else: |
|
r_stride_u = to_tr |
|
else: |
|
r_stride_l = 0 |
|
r_stride_u = -1 |
|
|
|
|
|
if r_stride_l <= r_stride_u: |
|
a, b, c = build_quadratic_1d(J_h, g_h, r_h, s0=p_h, diag=diag_h) |
|
r_stride, r_value = minimize_quadratic_1d( |
|
a, b, r_stride_l, r_stride_u, c=c) |
|
r_h *= r_stride |
|
r_h += p_h |
|
r = r_h * d |
|
else: |
|
r_value = np.inf |
|
|
|
|
|
p *= theta |
|
p_h *= theta |
|
p_value = evaluate_quadratic(J_h, g_h, p_h, diag=diag_h) |
|
|
|
ag_h = -g_h |
|
ag = d * ag_h |
|
|
|
to_tr = Delta / norm(ag_h) |
|
to_bound, _ = step_size_to_bound(x, ag, lb, ub) |
|
if to_bound < to_tr: |
|
ag_stride = theta * to_bound |
|
else: |
|
ag_stride = to_tr |
|
|
|
a, b = build_quadratic_1d(J_h, g_h, ag_h, diag=diag_h) |
|
ag_stride, ag_value = minimize_quadratic_1d(a, b, 0, ag_stride) |
|
ag_h *= ag_stride |
|
ag *= ag_stride |
|
|
|
if p_value < r_value and p_value < ag_value: |
|
return p, p_h, -p_value |
|
elif r_value < p_value and r_value < ag_value: |
|
return r, r_h, -r_value |
|
else: |
|
return ag, ag_h, -ag_value |
|
|
|
|
|
def trf_bounds(fun, jac, x0, f0, J0, lb, ub, ftol, xtol, gtol, max_nfev, |
|
x_scale, loss_function, tr_solver, tr_options, verbose): |
|
x = x0.copy() |
|
|
|
f = f0 |
|
f_true = f.copy() |
|
nfev = 1 |
|
|
|
J = J0 |
|
njev = 1 |
|
m, n = J.shape |
|
|
|
if loss_function is not None: |
|
rho = loss_function(f) |
|
cost = 0.5 * np.sum(rho[0]) |
|
J, f = scale_for_robust_loss_function(J, f, rho) |
|
else: |
|
cost = 0.5 * np.dot(f, f) |
|
|
|
g = compute_grad(J, f) |
|
|
|
jac_scale = isinstance(x_scale, str) and x_scale == 'jac' |
|
if jac_scale: |
|
scale, scale_inv = compute_jac_scale(J) |
|
else: |
|
scale, scale_inv = x_scale, 1 / x_scale |
|
|
|
v, dv = CL_scaling_vector(x, g, lb, ub) |
|
v[dv != 0] *= scale_inv[dv != 0] |
|
Delta = norm(x0 * scale_inv / v**0.5) |
|
if Delta == 0: |
|
Delta = 1.0 |
|
|
|
g_norm = norm(g * v, ord=np.inf) |
|
|
|
f_augmented = np.zeros(m + n) |
|
if tr_solver == 'exact': |
|
J_augmented = np.empty((m + n, n)) |
|
elif tr_solver == 'lsmr': |
|
reg_term = 0.0 |
|
regularize = tr_options.pop('regularize', True) |
|
|
|
if max_nfev is None: |
|
max_nfev = x0.size * 100 |
|
|
|
alpha = 0.0 |
|
|
|
termination_status = None |
|
iteration = 0 |
|
step_norm = None |
|
actual_reduction = None |
|
|
|
if verbose == 2: |
|
print_header_nonlinear() |
|
|
|
while True: |
|
v, dv = CL_scaling_vector(x, g, lb, ub) |
|
|
|
g_norm = norm(g * v, ord=np.inf) |
|
if g_norm < gtol: |
|
termination_status = 1 |
|
|
|
if verbose == 2: |
|
print_iteration_nonlinear(iteration, nfev, cost, actual_reduction, |
|
step_norm, g_norm) |
|
|
|
if termination_status is not None or nfev == max_nfev: |
|
break |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
v[dv != 0] *= scale_inv[dv != 0] |
|
|
|
|
|
d = v**0.5 * scale |
|
|
|
|
|
diag_h = g * dv * scale |
|
|
|
|
|
|
|
|
|
g_h = d * g |
|
|
|
f_augmented[:m] = f |
|
if tr_solver == 'exact': |
|
J_augmented[:m] = J * d |
|
J_h = J_augmented[:m] |
|
J_augmented[m:] = np.diag(diag_h**0.5) |
|
U, s, V = svd(J_augmented, full_matrices=False) |
|
V = V.T |
|
uf = U.T.dot(f_augmented) |
|
elif tr_solver == 'lsmr': |
|
J_h = right_multiplied_operator(J, d) |
|
|
|
if regularize: |
|
a, b = build_quadratic_1d(J_h, g_h, -g_h, diag=diag_h) |
|
to_tr = Delta / norm(g_h) |
|
ag_value = minimize_quadratic_1d(a, b, 0, to_tr)[1] |
|
reg_term = -ag_value / Delta**2 |
|
|
|
lsmr_op = regularized_lsq_operator(J_h, (diag_h + reg_term)**0.5) |
|
gn_h = lsmr(lsmr_op, f_augmented, **tr_options)[0] |
|
S = np.vstack((g_h, gn_h)).T |
|
S, _ = qr(S, mode='economic') |
|
JS = J_h.dot(S) |
|
B_S = np.dot(JS.T, JS) + np.dot(S.T * diag_h, S) |
|
g_S = S.T.dot(g_h) |
|
|
|
|
|
theta = max(0.995, 1 - g_norm) |
|
|
|
actual_reduction = -1 |
|
while actual_reduction <= 0 and nfev < max_nfev: |
|
if tr_solver == 'exact': |
|
p_h, alpha, n_iter = solve_lsq_trust_region( |
|
n, m, uf, s, V, Delta, initial_alpha=alpha) |
|
elif tr_solver == 'lsmr': |
|
p_S, _ = solve_trust_region_2d(B_S, g_S, Delta) |
|
p_h = S.dot(p_S) |
|
|
|
p = d * p_h |
|
step, step_h, predicted_reduction = select_step( |
|
x, J_h, diag_h, g_h, p, p_h, d, Delta, lb, ub, theta) |
|
|
|
x_new = make_strictly_feasible(x + step, lb, ub, rstep=0) |
|
f_new = fun(x_new) |
|
nfev += 1 |
|
|
|
step_h_norm = norm(step_h) |
|
|
|
if not np.all(np.isfinite(f_new)): |
|
Delta = 0.25 * step_h_norm |
|
continue |
|
|
|
|
|
if loss_function is not None: |
|
cost_new = loss_function(f_new, cost_only=True) |
|
else: |
|
cost_new = 0.5 * np.dot(f_new, f_new) |
|
actual_reduction = cost - cost_new |
|
Delta_new, ratio = update_tr_radius( |
|
Delta, actual_reduction, predicted_reduction, |
|
step_h_norm, step_h_norm > 0.95 * Delta) |
|
|
|
step_norm = norm(step) |
|
termination_status = check_termination( |
|
actual_reduction, cost, step_norm, norm(x), ratio, ftol, xtol) |
|
if termination_status is not None: |
|
break |
|
|
|
alpha *= Delta / Delta_new |
|
Delta = Delta_new |
|
|
|
if actual_reduction > 0: |
|
x = x_new |
|
|
|
f = f_new |
|
f_true = f.copy() |
|
|
|
cost = cost_new |
|
|
|
J = jac(x, f) |
|
njev += 1 |
|
|
|
if loss_function is not None: |
|
rho = loss_function(f) |
|
J, f = scale_for_robust_loss_function(J, f, rho) |
|
|
|
g = compute_grad(J, f) |
|
|
|
if jac_scale: |
|
scale, scale_inv = compute_jac_scale(J, scale_inv) |
|
else: |
|
step_norm = 0 |
|
actual_reduction = 0 |
|
|
|
iteration += 1 |
|
|
|
if termination_status is None: |
|
termination_status = 0 |
|
|
|
active_mask = find_active_constraints(x, lb, ub, rtol=xtol) |
|
return OptimizeResult( |
|
x=x, cost=cost, fun=f_true, jac=J, grad=g, optimality=g_norm, |
|
active_mask=active_mask, nfev=nfev, njev=njev, |
|
status=termination_status) |
|
|
|
|
|
def trf_no_bounds(fun, jac, x0, f0, J0, ftol, xtol, gtol, max_nfev, |
|
x_scale, loss_function, tr_solver, tr_options, verbose): |
|
x = x0.copy() |
|
|
|
f = f0 |
|
f_true = f.copy() |
|
nfev = 1 |
|
|
|
J = J0 |
|
njev = 1 |
|
m, n = J.shape |
|
|
|
if loss_function is not None: |
|
rho = loss_function(f) |
|
cost = 0.5 * np.sum(rho[0]) |
|
J, f = scale_for_robust_loss_function(J, f, rho) |
|
else: |
|
cost = 0.5 * np.dot(f, f) |
|
|
|
g = compute_grad(J, f) |
|
|
|
jac_scale = isinstance(x_scale, str) and x_scale == 'jac' |
|
if jac_scale: |
|
scale, scale_inv = compute_jac_scale(J) |
|
else: |
|
scale, scale_inv = x_scale, 1 / x_scale |
|
|
|
Delta = norm(x0 * scale_inv) |
|
if Delta == 0: |
|
Delta = 1.0 |
|
|
|
if tr_solver == 'lsmr': |
|
reg_term = 0 |
|
damp = tr_options.pop('damp', 0.0) |
|
regularize = tr_options.pop('regularize', True) |
|
|
|
if max_nfev is None: |
|
max_nfev = x0.size * 100 |
|
|
|
alpha = 0.0 |
|
|
|
termination_status = None |
|
iteration = 0 |
|
step_norm = None |
|
actual_reduction = None |
|
|
|
if verbose == 2: |
|
print_header_nonlinear() |
|
|
|
while True: |
|
g_norm = norm(g, ord=np.inf) |
|
if g_norm < gtol: |
|
termination_status = 1 |
|
|
|
if verbose == 2: |
|
print_iteration_nonlinear(iteration, nfev, cost, actual_reduction, |
|
step_norm, g_norm) |
|
|
|
if termination_status is not None or nfev == max_nfev: |
|
break |
|
|
|
d = scale |
|
g_h = d * g |
|
|
|
if tr_solver == 'exact': |
|
J_h = J * d |
|
U, s, V = svd(J_h, full_matrices=False) |
|
V = V.T |
|
uf = U.T.dot(f) |
|
elif tr_solver == 'lsmr': |
|
J_h = right_multiplied_operator(J, d) |
|
|
|
if regularize: |
|
a, b = build_quadratic_1d(J_h, g_h, -g_h) |
|
to_tr = Delta / norm(g_h) |
|
ag_value = minimize_quadratic_1d(a, b, 0, to_tr)[1] |
|
reg_term = -ag_value / Delta**2 |
|
|
|
damp_full = (damp**2 + reg_term)**0.5 |
|
gn_h = lsmr(J_h, f, damp=damp_full, **tr_options)[0] |
|
S = np.vstack((g_h, gn_h)).T |
|
S, _ = qr(S, mode='economic') |
|
JS = J_h.dot(S) |
|
B_S = np.dot(JS.T, JS) |
|
g_S = S.T.dot(g_h) |
|
|
|
actual_reduction = -1 |
|
while actual_reduction <= 0 and nfev < max_nfev: |
|
if tr_solver == 'exact': |
|
step_h, alpha, n_iter = solve_lsq_trust_region( |
|
n, m, uf, s, V, Delta, initial_alpha=alpha) |
|
elif tr_solver == 'lsmr': |
|
p_S, _ = solve_trust_region_2d(B_S, g_S, Delta) |
|
step_h = S.dot(p_S) |
|
|
|
predicted_reduction = -evaluate_quadratic(J_h, g_h, step_h) |
|
step = d * step_h |
|
x_new = x + step |
|
f_new = fun(x_new) |
|
nfev += 1 |
|
|
|
step_h_norm = norm(step_h) |
|
|
|
if not np.all(np.isfinite(f_new)): |
|
Delta = 0.25 * step_h_norm |
|
continue |
|
|
|
|
|
if loss_function is not None: |
|
cost_new = loss_function(f_new, cost_only=True) |
|
else: |
|
cost_new = 0.5 * np.dot(f_new, f_new) |
|
actual_reduction = cost - cost_new |
|
|
|
Delta_new, ratio = update_tr_radius( |
|
Delta, actual_reduction, predicted_reduction, |
|
step_h_norm, step_h_norm > 0.95 * Delta) |
|
|
|
step_norm = norm(step) |
|
termination_status = check_termination( |
|
actual_reduction, cost, step_norm, norm(x), ratio, ftol, xtol) |
|
if termination_status is not None: |
|
break |
|
|
|
alpha *= Delta / Delta_new |
|
Delta = Delta_new |
|
|
|
if actual_reduction > 0: |
|
x = x_new |
|
|
|
f = f_new |
|
f_true = f.copy() |
|
|
|
cost = cost_new |
|
|
|
J = jac(x, f) |
|
njev += 1 |
|
|
|
if loss_function is not None: |
|
rho = loss_function(f) |
|
J, f = scale_for_robust_loss_function(J, f, rho) |
|
|
|
g = compute_grad(J, f) |
|
|
|
if jac_scale: |
|
scale, scale_inv = compute_jac_scale(J, scale_inv) |
|
else: |
|
step_norm = 0 |
|
actual_reduction = 0 |
|
|
|
iteration += 1 |
|
|
|
if termination_status is None: |
|
termination_status = 0 |
|
|
|
active_mask = np.zeros_like(x) |
|
return OptimizeResult( |
|
x=x, cost=cost, fun=f_true, jac=J, grad=g, optimality=g_norm, |
|
active_mask=active_mask, nfev=nfev, njev=njev, |
|
status=termination_status) |
|
|