Derivatives

Applications should provide partial first derivatives whenever possible, to make Knitro more efficient and more robust. If first derivatives cannot be supplied, then the application should instruct Knitro to calculate finite-difference approximations.

First derivatives are represented by the gradient of the objective function and the Jacobian matrix of the constraints. Second derivatives are represented by the Hessian matrix, a linear combination of the second derivatives of the objective function and the constraints.

First derivatives

The default version of Knitro assumes that the user can provide exact first derivatives to compute the objective function gradient and constraint gradients. It is highly recommended that the user provide exact first derivatives if at all possible, since using first derivative approximations may seriously degrade the performance of the code and the likelihood of converging to a solution. However, if this is not possible the following first derivative approximation options may be used.

  • Forward finite-differences This option uses a forward finite-difference approximation of the objective and constraint gradients. The cost of computing this approximation is n function evaluations (where n is the number of variables). The option is invoked by choosing user option gradopt = 2.
  • Centered finite-differences This option uses a centered finite-difference approximation of the objective and constraint gradients. The cost of computing this approximation is 2n function evaluations (where n is the number of variables). The option is invoked by choosing user option gradopt = 3. The centered finite-difference approximation is often more accurate than the forward finite-difference approximation; however, it is more expensive to compute if the cost of evaluating a function is high.

Note

When using finite-difference gradients, the sparsity patten for the constraint Jacobian must still be provided. To tell Knitro to use the full, dense pattern (i.e., assume all partial derivatives are nonzero), you can simply set nnzJ<0 in the call to KTR_init_problem() or KTR_mip_init_problem().

Although these finite-differences approximations should be avoided in general, they are useful to track errors: whenever the derivatives are provided by the user, it is useful to check that the differentiation (and the subsequent implementation of the derivatives) is correct. Indeed, providing derivatives that are not coherent with the function values is one of the most common errors when solving a nonlinear program. This check can be done automatically by comparing finite-differences approximations with user-provided derivatives. This is explained below (Checking derivatives).

Second derivatives

The default version of Knitro assumes that the application can provide exact second derivatives to compute the Hessian of the Lagrangian function. If the application is able to do so and the cost of computing the second derivatives is not overly expensive, it is highly recommended to provide exact second derivatives. However, Knitro also offers other options which are described in detail below.

  • (Dense) Quasi-Newton BFGS:

    The quasi-Newton BFGS option uses gradient information to compute a symmetric, positive-definite approximation to the Hessian matrix. Typically this method requires more iterations to converge than the exact Hessian version. However, since it is only computing gradients rather than Hessians, this approach may be more efficient in some cases. This option stores a dense quasi-Newton Hessian approximation so it is only recommended for small to medium problems (e.g., n < 1000). The quasi-Newton BFGS option is chosen by setting user option hessopt = 2.

  • (Dense) Quasi-Newton SR1:

    As with the BFGS approach, the quasi-Newton SR1 approach builds an approximate Hessian using gradient information. However, unlike the BFGS approximation, the SR1 Hessian approximation is not restricted to be positive-definite. Therefore the quasi-Newton SR1 approximation may be a better approach, compared to the BFGS method, if there is a lot of negative curvature in the problem (i.e., the problem is not convex) since it may be able to maintain a better approximation to the true Hessian in this case. The quasi-Newton SR1 approximation maintains a dense Hessian approximation and so is only recommended for small to medium problems (e.g., n < 1000). The quasi-Newton SR1 option is chosen by setting user option hessopt = 3.

  • Finite-difference Hessian-vector product option:

    If the problem is large and gradient evaluations are not a dominant cost, then Knitro can internally compute Hessian-vector products using finite-differences. Each Hessian-vector product in this case requires one additional gradient evaluation. This option is chosen by setting user option hessopt = 4. The option is only recommended if the exact gradients are provided.

Note

This option may not be used when algorithm = 1 or 4 since the Interior/Direct and SQP algorithms need the full expression of the Hessian matrix (Hessian-vector products are not sufficient).

  • Exact Hessian-vector products:

    In some cases the application may prefer to provide exact Hessian-vector products, but not the full Hessian (for instance, if the problem has a large, dense Hessian). The application must provide a routine which, given a vector v (stored in hessVector), computes the Hessian-vector product, H*v, and returns the result (again in hessVector). This option is chosen by setting user option hessopt = 5.

Note

This option may not be used when algorithm = 1 or 4 since, as mentioned above, the Interior/Direct and SQP algorithms need the full expression of the Hessian matrix (Hessian-vector products are not sufficient).

  • Limited-memory Quasi-Newton BFGS:

    The limited-memory quasi-Newton BFGS option is similar to the dense quasi-Newton BFGS option described above. However, it is better suited for large-scale problems since, instead of storing a dense Hessian approximation, it stores only a limited number of gradient vectors used to approximate the Hessian. The number of gradient vectors used to approximate the Hessian is controlled by user option lmsize.

    A larger value of lmsize may result in a more accurate, but also more expensive, Hessian approximation. A smaller value may give a less accurate, but faster, Hessian approximation. When using the limited memory BFGS approach it is recommended to experiment with different values of this parameter (e.g. between 5 and 15).

    In general, the limited-memory BFGS option requires more iterations to converge than the dense quasi-Newton BFGS approach, but will be much more efficient on large-scale problems. The limited-memory quasi-Newton option is chosen by setting user option hessopt = 6.

Note

When using a Hessian approximation option (i.e. hessopt > 1), you do not need to provide any sparsity pattern for the Hessian matrix.

As with exact first derivatives, exact second derivatives often provide a substantial benefit to Knitro and it is advised to provide them whenever possible. If the exact second derivative (i.e. the Hessian) matrix is provided by the user, it can (and should) be checked against a finite-difference approximation for errors using the Knitro derivative checker. See (Checking derivatives) below.

Jacobian and Hessian derivative matrices

The Jacobian matrix of the constraints is defined as

J(x) = \begin{bmatrix}
\nabla c_0(x) &
\dots &
\nabla c_{m-1}(x)
\end{bmatrix}

and the Hessian matrix of the Lagrangian is defined as

H(x,\lambda) = \sigma \nabla^2 f(x) + \sum_{i=0}^{m-1} \lambda_i \nabla^2 c_i(x)

where \lambda is the vector of Lagrange multipliers (dual variables), and \sigma is a scalar (either 0 or 1) for the objective component of the Hessian that was introduced in Knitro 8.0.

Note

For backwards compatibility with older versions of Knitro, the user can always assume that \sigma=1 if the user option hessian_no_f=0 (which is the default setting). However, if hessian_no_f=1, then Knitro will provide a status flag to the user when it needs a Hessian evaluation indicating whether the Hessian should be evaluated with \sigma=0 or \sigma=1. The user must then evaluate the Hessian with the proper value of \sigma based on this status flag. Setting hessian_no_f=1 and computing the Hessian with the requested value of \sigma may improve Knitro efficiency in some cases. Examples of how to do this can be found in the examples/C directory.

Example

Assume we want to use Knitro to solve the following problem:

\min \; x_0 + x_1 x_2^3

\mbox{subject to:}

\cos(x_0) = 0.5 \\
3  \le x_0^2+x_1^2 \le 8 \\
x_0 + x_1 + x_2 \le & 10 \\
x_0, x_1, x_2 \ge 1.

Rewriting in the Knitro standard notation, we have

f(x)   & = x_0 + x_1 x_2^3  \\
c_0(x) & = \cos(x_0) \\
c_1(x) & = x_0^2+x_1^2 \\
c_2(x) & = x_0 + x_1 + x_2.

Computing the Sparse Jacobian Matrix

The gradients (first derivatives) of the objective and constraint functions are given by

\nabla f(x) = \begin{bmatrix} 1 \\ x_2^3 \\ 3 x_1 x_2^2 \end{bmatrix},
\nabla c_0(x) = \begin{bmatrix} -\sin(x_0) \\ 0 \\ 0 \end{bmatrix},
\nabla c_1(x) = \begin{bmatrix} 2 x_0 \\ 2 x_1 \\ 0 \end{bmatrix},
\nabla c_2(x) = \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix}.

The constraint Jacobian matrix J(x) is the matrix whose rows store the (transposed) constraint gradients, i.e.,

J(x) = \begin{bmatrix} \nabla c_0(x)^T \\ \nabla c_1(x)^T \\ \nabla c_2(x)^T \end{bmatrix} =
\begin{bmatrix}
-\sin(x_0)  & 0      & 0 \\
2 x_0       & 2 x_1  & 0 \\
1           & 1      & 1
\end{bmatrix} .

The values of J(x) depend on the value of x and change during the solution process. The indices specifying the nonzero elements of this matrix remain constant and are set in KTR_init_problem() by the values of jacIndexCons and jacIndexVars.

Computing the Sparse Hessian Matrix

For the example above, the Hessians (second derivatives) of the objective function is given by

\nabla^2 f(x) =
\begin{bmatrix}
     0           & 0       & 0 \\
     0           & 0       & 3 x_2^2 \\
         0           & 3 x_2^2 & 6 x_1 x_2
\end{bmatrix},

and the Hessians of constraints are given by

\nabla^2 c_0(x) =
\begin{bmatrix}
     -\cos(x_0)  & 0       & 0 \\
     0           & 0       & 0 \\
         0           & 0       & 0
\end{bmatrix},
    \nabla^2 c_1(x) =
\begin{bmatrix}
     2           & 0       & 0 \\
     0           & 2       & 0 \\
         0           & 0       & 0
\end{bmatrix},
    \nabla^2 c_2(x) =
\begin{bmatrix}
     0           & 0       & 0 \\
     0           & 0       & 0 \\
         0           & 0       & 0
\end{bmatrix}.

Scaling the objective matrix by \sigma, and the constraint matrices by their corresponding Lagrange multipliers and summing, we get

H(x,\lambda) =
\begin{bmatrix}
  -\lambda_0 \cos(x_0) + 2 \lambda_1  & 0 & 0 \\
  0 & 2 \lambda_1  & \sigma 3 x_2^2 \\
      0 & \sigma 3 x_2^2      & \sigma 6 x_1 x_2
\end{bmatrix} .

The values of H(x,\lambda) depend on the value of x and \lambda (and \sigma, which is either 0 or 1) and change during the solution process. The indices specifying the nonzero elements of this matrix remain constant and are set in KTR_init_problem() by the values of hessIndexRows and hessIndexCols.

Inputing derivatives

MATLAB users can provide the Jacobian and Hessian matrices in standard MATLAB format, either dense or sparse. See the fmincon documentation, http://www.mathworks.com/help/optim/ug/writing-constraints.html#brhkghv-16, for more information. Users of the callable library must provide derivatives to Knitro in sparse format. In the above example, the number of nonzero elements nnzJ in J(x) is 6, and these arrays would be specified as follows (here in column-wise order, but the order is arbitrary) using the callable library.

jac[0] = -sin(x[0]);  jacIndexCons[0] = 0; jacIndexVars[0] = 0;
jac[1] = 2*x[0];      jacIndexCons[1] = 1; jacIndexVars[1] = 0;
jac[2] = 1;           jacIndexCons[2] = 2; jacIndexVars[2] = 0;
jac[3] = 2*x[1];      jacIndexCons[3] = 1; jacIndexVars[3] = 1;
jac[4] = 1;           jacIndexCons[4] = 2; jacIndexVars[4] = 1;
jac[5] = 1;           jacIndexCons[5] = 2; jacIndexVars[5] = 2;

In the object-oriented interface, these values are set in the user-defined problem class by implementing:

std::vector<int> KTRIProblem::getJacIndexCons();
std::vector<int> KTRIProblem::getJacIndexVars();

to return vectors with the constraint and variable indices in either column-wise or row-wise order. If using the KTRProblem class, setting the values with:

KTRProblem::setJacIndexCons(int id, int val);
KTRProblem::setJacIndexVars(int id, int val);

will store the values to be returned by the appropriate get functions.

Note

Using KTRPProblem class, by default the Jacobian is assumed to be dense and stored row-wise.

Note

Even if the application does not evaluate derivatives (i.e. finite-difference first derivatives are used), it must still provide a sparsity pattern for the constraint Jacobian matrix that specifies which partial derivatives are nonzero. Knitro uses the sparsity pattern to speed up linear algebra computations.

Note

When using finite-difference first derivatives (gradopt > 1), if the sparsity pattern is unknown, then the application should specify a fully dense pattern (i.e., assume all partial derivatives are nonzero). This can easily and automatically be done by setting nnzJ<0 in the callable library interface beginning with Knitro 10.0 (and setting jacIndexCons and jacIndexVars to be NULL).

Since the Hessian matrix will always be a symmetric matrix, Knitro only stores the nonzero elements corresponding to the upper triangular part (including the diagonal). In the example here, the number of nonzero elements in the upper triangular part of the Hessian matrix nnzH is 4. The Knitro array hess stores the values of these elements, while the arrays hessIndexRows and hessIndexCols store the row and column indices respectively. The order in which these nonzero elements is stored is not important. If we store them column-wise, the arrays hess, hessIndexRows and hessIndexCols are as follows:

hess[0] = -lambda[0]*cos(x[0]) + 2*lambda[1];
hessIndexRows[0] = 0;
hessIndexCols[0] = 0;

hess[1] = 2*lambda[1];
hessIndexRows[1] = 1;
hessIndexCols[1] = 1;

hess[2] = sigma*3*x[2]*x[2];
hessIndexRows[2] = 1;
hessIndexCols[2] = 2;

hess[3] = sigma*6*x[1]*x[2];
hessIndexRows[3] = 2;
hessIndexCols[3] = 2;

In the object-oriented interface, the Hessian matrix column indices are set in the user-defined problem class by implementing:

std::vector<int> KTRProblem::getHessIndexRows();
std::vector<int> KTRProblem::getHessIndexCols();

and having them return vectors with the Hessian row and column indices, respectively. If using the KTRProblem class, setting the values with:

KTRProblem::setHessIndexRows(int id, int val);
KTRProblem::setHessIndexCols(int id, int val);

will store the values to be returned by the appropriate get functions.

Note

In Knitro, the array objGrad stores all of the elements of \nabla f(x), while the arrays jac, jacIndexCons, and jacIndexVars store information concerning only the nonzero elements of J(x). The array jac stores the nonzero values in J(x) evaluated at the current solution estimate x, jacIndexCons stores the constraint function (or row) indices corresponding to these values, and jacIndexVars stores the variable (or column) indices. There is no restriction on the order in which these elements are stored; however, it is common to store the nonzero elements of J(x) in column-wise fashion.

MATLAB example

Let us modify our example from Getting started with MATLAB so that the first derivatives are provided as well. In MATLAB, you only need to provide the derivatives for the nonlinear functions, whereas in the callable library API you need to provide the derivatives for both linear and nonlinear constraints in J(x). In the example below, only the inequality constraint is nonlinear, so we only provide the derivative for this constraint.

function firstDer()

        function [f, g] = obj(x)
                f = 1000 - x(1)^2 - 2*x(2)^2 - x(3)^2 - x(1)*x(2) - x(1)*x(3);
                if nargout == 2
                   g = [-2*x(1) - x(2) - x(3); - 4*x(2) - x(1); -2*x(3) - x(1)];
                end
        end

        % nlcon should return [c, ceq, GC, GCeq]
        % with c(x) <= 0 and ceq(x) = 0
        function [c, ceq, GC, GCeq] = nlcon(x)
                c = -(x(1)^2 + x(2)^2 + x(3)^2 - 25);
                ceq = [];
                if nargout==4
                        GC = -([2*x(1); 2*x(2); 2*x(3)]);
                        GCeq = [];
                end
        end

        x0  = [2; 2; 2];
        A = []; b = []; % no linear inequality constraints ("A*x <= b")
        Aeq = [8 14 7]; beq = [56]; % linear equality constraints ("Aeq*x = beq")
        lb = zeros(3,1); ub = []; % lower and upper bounds

        options = optimset('GradObj', 'on', 'GradConstr', 'on');
        knitromatlab(@obj, x0, A, b, Aeq, beq, lb, ub, @nlcon, [], options);

end

The only difference with the derivative-free case is that the code that computes the objective function and the constraints also returns the first derivatives along with function values. The output is as follows.

=======================================
          Commercial License
         Artelys Knitro 10.0.0
=======================================

Knitro presolve eliminated 0 variables and 0 constraints.

algorithm:            1
gradopt:              4
hessopt:              2
honorbnds:            1
maxit:                10000
outlev:               1
par_concurrent_evals: 0
Knitro changing bar_initpt from AUTO to 3.
Knitro changing bar_murule from AUTO to 4.
Knitro changing bar_penaltycons from AUTO to 1.
Knitro changing bar_penaltyrule from AUTO to 2.
Knitro changing bar_switchrule from AUTO to 2.
Knitro changing linsolver from AUTO to 2.

Problem Characteristics                    ( Presolved)
-----------------------
Objective goal:  Minimize
Number of variables:                     3 (         3)
    bounded below:                       3 (         3)
    bounded above:                       0 (         0)
    bounded below and above:             0 (         0)
    fixed:                               0 (         0)
    free:                                0 (         0)
Number of constraints:                   2 (         2)
    linear equalities:                   1 (         1)
    nonlinear equalities:                0 (         0)
    linear inequalities:                 0 (         0)
    nonlinear inequalities:              1 (         1)
    range:                               0 (         0)
Number of nonzeros in Jacobian:          6 (         6)
Number of nonzeros in Hessian:           6 (         6)

EXIT: Locally optimal solution found.

Final Statistics
----------------
Final objective value               =   9.36000000000340e+02
Final feasibility error (abs / rel) =   0.00e+00 / 0.00e+00
Final optimality error  (abs / rel) =   3.59e-09 / 2.24e-10
# of iterations                     =          9
# of CG iterations                  =          0
# of function evaluations           =         11
# of gradient evaluations           =         11
Total program time (secs)           =       0.01467 (     0.025 CPU time)
Time spent in evaluations (secs)    =       0.01161

===============================================================================

The number of function evaluation was reduced to 11, simply by providing exact first derivatives. This small example shows the practical importance of being able to provide exact derivatives; since (unlike modeling environments like AMPL) MATLAB does not provide automatic differentiation, the user must compute these derivatives analytically and then code them manually as in the above example.

C/C++ example

Let us now modify our C example from Getting started with the callable library similarly, so as to provide first derivatives.

#include <stdio.h>
#include <stdlib.h>
#include "knitro.h"


/* callback function that evaluates the objective
   and constraints */
int  callback (const int evalRequestCode,
        const int             n,
        const int             m,
        const int             nnzJ,
        const int             nnzH,
        const double * const  x,
        const double * const  lambda,
              double * const  obj,
              double * const  c,
              double * const  objGrad,
              double * const  jac,
              double * const  hessian,
              double * const  hessVector,
              void   *        userParams)
{
        if (evalRequestCode == KTR_RC_EVALFC) {
                /* objective function */
                *obj    = 1000 - x[0]*x[0] - 2*x[1]*x[1] - x[2]*x[2] - x[0]*x[1] - x[0]*x[2];

                /* constraints */
                c[0]    = 8*x[0] + 14*x[1] + 7*x[2] - 56;
                c[1]    = x[0]*x[0] + x[1]*x[1] + x[2]*x[2] -25;

                return(0);
        }
        else if (evalRequestCode == KTR_RC_EVALGA) {

                /* gradient of objective */
                objGrad[0] = -2*x[0] - x[1] - x[2];
                objGrad[1] = -4*x[1] - x[0];
                objGrad[2] = -2*x[2] - x[0];

                /* Jacobian matrix of constraints */
                jac[0] = 8;
                jac[1] = 2*x[0];
                jac[2] = 14;
                jac[3] = 2*x[1];
                jac[4] = 7;
                jac[5] = 2*x[2];

                return( 0 );
        } else {
                printf ("Wrong evalRequestCode in callback function.\n");
                return(-1);
        }
}

/* main */
int  main (int  argc, char  *argv[]) {
        int  nStatus;

        /* variables that are passed to Knitro */
        KTR_context     *kc;
        int n, m, nnzJ, nnzH, objGoal, objType;
        int *cType;
        int *jacIndexVars, *jacIndexCons;
        double obj, *x, *lambda;
        double *xLoBnds, *xUpBnds, *xInitial, *cLoBnds, *cUpBnds;
        int i, j, k; // convenience variables

        /*problem size and mem allocation */
        n = 3;
        m = 2;
        nnzJ = n*m;
        nnzH = 0;
        x      = (double *) malloc (n     * sizeof(double));
        lambda = (double *) malloc ((m+n) * sizeof(double));
        xLoBnds      = (double *) malloc (n * sizeof(double));
        xUpBnds      = (double *) malloc (n * sizeof(double));
        xInitial     = (double *) malloc (n * sizeof(double));
        cType        = (int    *) malloc (m * sizeof(int));
        cLoBnds      = (double *) malloc (m * sizeof(double));
        cUpBnds      = (double *) malloc (m * sizeof(double));
        jacIndexVars = (int    *) malloc (nnzJ * sizeof(int));
        jacIndexCons = (int    *) malloc (nnzJ * sizeof(int));

        /* objective type */
        objType = KTR_OBJTYPE_GENERAL;
        objGoal = KTR_OBJGOAL_MINIMIZE;

        /* bounds and constraints type */
        for (i = 0; i < n; i++) {
                xLoBnds[i] = 0.0;
                xUpBnds[i] = KTR_INFBOUND;
        }
        for (j = 0; j < m; j++) {
                cType[j] = KTR_CONTYPE_GENERAL;
                cLoBnds[j] = 0.0;
                cUpBnds[j] = (j == 0 ? 0.0 : KTR_INFBOUND);
        }

        /* initial point */
        for (i = 0; i < n; i++)
                xInitial[i] = 2.0;

        /* sparsity pattern (here, of a full matrix) */
        k = 0;
        for (i = 0; i < n; i++)
                for (j = 0; j < m; j++) {
                        jacIndexCons[k] = j;
                        jacIndexVars[k] = i;
                        k++;
                }

        /* create a Knitro instance */
        kc = KTR_new();
        if (kc == NULL)
                exit( -1 ); // probably a license issue

        /* set options: exact/user-supplied gradient option */
        //if (KTR_set_int_param_by_name (kc, "gradopt", KTR_GRADOPT_FORWARD) != 0)
        if (KTR_set_int_param_by_name (kc, "gradopt", KTR_GRADOPT_EXACT) != 0)
                exit( -1 );
        if (KTR_set_int_param_by_name (kc, "hessopt", KTR_HESSOPT_BFGS) != 0)
                exit( -1 );
        if (KTR_set_int_param_by_name (kc, "outlev", 1) != 0)
                exit( -1 );

        /* register the callback function */
        if (KTR_set_func_callback (kc, &callback) != 0)
                exit( -1 );
        if (KTR_set_grad_callback (kc, &callback) != 0)
                exit( -1 );

        /* pass the problem definition to Knitro */
        nStatus = KTR_init_problem (kc, n, objGoal, objType,
                xLoBnds, xUpBnds,
                m, cType, cLoBnds, cUpBnds,
                nnzJ, jacIndexVars, jacIndexCons,
                nnzH, NULL, NULL, xInitial, NULL);

        /* free memory (Knitro maintains its own copy) */
        free (xLoBnds);
        free (xUpBnds);
        free (xInitial);
        free (cType);
        free (cLoBnds);
        free (cUpBnds);
        free (jacIndexVars);
        free (jacIndexCons);

        /* solver call */
        nStatus = KTR_solve (kc, x, lambda, 0, &obj,
                NULL, NULL, NULL, NULL, NULL, NULL);
        if (nStatus != 0)
                printf ("\nKnitro failed to solve the problem, final status = %d\n",
                                nStatus);
        else
                printf ("\nKnitro successful, objective is = %e\n", obj);

        /* delete the Knitro instance and primal/dual solution */
        KTR_free (&kc);
        free (x);
        free (lambda);

        getchar();
        return( 0 );
}

The callback function was simply updated to provide the derivatives, and then registered with:

KTR_set_grad_callback (kc, &callback)

Last, the gradopt option was set to exact/user-supplied) instead of forward finite-differences using:

KTR_set_int_param_by_name (kc, "gradopt", KTR_GRADOPT_EXACT)

Running this code produces the following output.

=======================================
          Commercial License
         Artelys Knitro 10.0.0
=======================================

Knitro presolve eliminated 0 variables and 0 constraints.

hessopt:              2
outlev:               1
Knitro changing algorithm from AUTO to 1.
Knitro changing bar_initpt from AUTO to 3.
Knitro changing bar_murule from AUTO to 4.
Knitro changing bar_penaltycons from AUTO to 1.
Knitro changing bar_penaltyrule from AUTO to 2.
Knitro changing bar_switchrule from AUTO to 2.
Knitro changing linsolver from AUTO to 2.

Problem Characteristics                    ( Presolved)
-----------------------
Objective goal:  Minimize
Number of variables:                     3 (         3)
    bounded below:                       3 (         3)
    bounded above:                       0 (         0)
    bounded below and above:             0 (         0)
    fixed:                               0 (         0)
    free:                                0 (         0)
Number of constraints:                   2 (         2)
    linear equalities:                   0 (         0)
    nonlinear equalities:                1 (         1)
    linear inequalities:                 0 (         0)
    nonlinear inequalities:              1 (         1)
    range:                               0 (         0)
Number of nonzeros in Jacobian:          6 (         6)
Number of nonzeros in Hessian:           6 (         6)

EXIT: Locally optimal solution found.

Final Statistics
----------------
Final objective value               =   9.36000000000340e+02
Final feasibility error (abs / rel) =   0.00e+00 / 0.00e+00
Final optimality error  (abs / rel) =   3.59e-09 / 2.24e-10
# of iterations                     =          9
# of CG iterations                  =          0
# of function evaluations           =         11
# of gradient evaluations           =         11
Total program time (secs)           =       0.00134 (     0.001 CPU time)
Time spent in evaluations (secs)    =       0.00000

===============================================================================


Knitro successful, objective is = 9.360000e+02

Again, the number of function calls is reduced with respect to the derivative-free case.

Note

Automatic differentiation packages like ADOL-C and ADIFOR can help in generating code with derivatives. These codes are an alternative to differentiating the functions manually. Another option is to use symbolic differentiation software to compute an analytical formula for the derivatives.

Object-oriented C++ example

Let us now modify our C++ example from Getting started with the object-oriented interface, so as to provide first derivatives.

#include "KTRSolver.h"
#include "KTRProblem.h"
#include <iostream>

class ProblemExample : public KNITRO::KTRProblem {
    // objective properties
    void setObjectiveProperties() {
        setObjType(KTR_OBJTYPE_GENERAL);
        setObjGoal(KTR_OBJGOAL_MINIMIZE);
    }

    // constraint properties
    void setConstraintProperties()
    {
        // set constraint types
        setConTypes(0, KNITRO::KTREnums::ConstraintType::ConLinear);
        setConTypes(1, KNITRO::KTREnums::ConstraintType::ConQuadratic);

        // set constraint lower bounds
        setConLoBnds(0.0);

        // set constraint upper bounds
        setConUpBnds(0, 0.0);
        setConUpBnds(1, KTR_INFBOUND);
    }

    // Variable bounds. All variables 0 <= x.
    void setVariableProperties() {
        setVarLoBnds(0.0);
    }

public:
    // constructor: pass number of variables and constraints to base class
    ProblemQCQP() : KTRProblem(3, 2) {
        // set problem properties
        setObjectiveProperties();
        setVariableProperties();
        setConstraintProperties();
    }

    // Objective and constraint evaluation function
    // overrides KTRProblem class
    double evaluateFC(const std::vector<double>& x,
        std::vector<double>& c,
        std::vector<double>& objGrad,
        std::vector<double>& jac) {

        // constraints
        c[0] = 8.0e0*x[0] + 14.0e0*x[1] + 7.0e0*x[2] - 56.0e0;
        c[1] = x[0] * x[0] + x[1] * x[1] + x[2] * x[2] - 25.0e0;

        // return objective function value
        return 1000 - x[0] * x[0] - 2.0e0*x[1] * x[1] - x[2] * x[2]
            - x[0] * x[1] - x[0] * x[2];
    }

    // Gradient and Jacobian evaluation function
    // overrides KTRProblem class
    int evaluateGA(const std::vector<double>& x,
        std::vector<double>& objGrad,
        std::vector<double>& jac) override {

        objGrad[0] = -2.0e0*x[0] - x[1] - x[2];
        objGrad[1] = -4.0e0*x[1] - x[0];
        objGrad[2] = -2.0e0*x[2] - x[0];

        // gradient of the first constraint, c[0].
        jac[0] = 8.0e0;
        jac[1] = 14.0e0;
        jac[2] = 7.0e0;

        // gradient of the second constraint, c[1]. */
        jac[3] = 2.0e0*x[0];
        jac[4] = 2.0e0*x[1];
        jac[5] = 2.0e0*x[2];
        return 0;
    }
  };

  int main(int argc, char *argv[]) {
      // Create a problem instance.
      ProblemExample problem = ProblemExample();

      // Create a solver - optional arguments:
      // exact first derivatives
      // BFGS approximate second derivatives
      KNITRO::KTRSolver solver(&instance, KTR_GRADOPT_EXACT, KTR_HESSOPT_BFGS);

      int solveStatus = solver.solve();

      if (solveStatus != 0) {
          std::cout << std::endl;
          std::cout << "KNITRO failed to solve the problem, final status = ";
          std::cout << solveStatus << std::endl;
      }
      else {
          std::cout << std::endl << "KNITRO successful, objective is = ";
          std::cout << solver.getObj() << std::endl;
      }

      return 0;
  }

Two changes were made to the previous example. This adds evaluateGA() function to the problem class, defining the derivatives, and the KTRSolver constructor is passed KTR_GRADOPT_EXACT instead of KTR_GRADOPT_FORWARD, since the exact gradient function is now defined. Running this example produces the same output as the callable library example.

Checking derivatives

One drawback of user-supplied derivatives is the risk of error in computing or implementing the derivatives, which would result in providing Knitro with (wrong and) incoherent information: the computed function values would not match the computed derivatives. Approximate derivatives computed by finite differences are useful to check whether user-supplied derivatives match user-supplied function evaluations.

Users of modeling languages such as AMPL need not be worried about this, since derivatives are computed automatically by the modeling software. However, for users of MATLAB and the callable library it is a good practice to check one’s exact derivatives against finite differences approximations. Note that small differences between exact and finite-difference approximations are to be expected.

Knitro offers the following user options to check for errors in the user-supplied first derivatives (i.e., the objective gradient and the Jacobian matrix) and second derivatives (i.e. the Hessian matrix).

Derivative Check Options

Option Meaning
derivcheck Specifies whether or not to enable the derivative checker, and which derivatives to check (first, second or both)
derivcheck_terminate Whether to terminate after the derivative check or continue to the optimization if successful
derivcheck_tol Specifies the relative tolerance used for identifying potential errors in the derivatives
derivcheck_type Specifies whether to use forward or central finite differences to compute the derivative check

Note that to use the derivative checker, you must set gradopt = 1 (to check the first derivatives) and/or hessopt=1 (to check the second derivatives/Hessian). You must also supply callback routines that compute the objective and constraint functions and analytic first derivatives (to check the first derivatives), and/or analytic second derivatives/Hessian (to check the second derivatives). By default, the derivative checker is turned off. To check first derivatives only, simply set derivcheck = 1; to check second derivatives/Hessian only set derivcheck = 2; and to check both first and second derivatives set derivcheck = 3. Additionally you can set derivcheck_type to specify what type of finite differencing to use for the derivative check, and derivcheck_tol to change the default relative tolerance used to detect derivative errors. Setting derivcheck_terminate will determine whether Knitro always stops after the derivative check is completed, or continues with the optimization (when the derivative check is successful).

It is best to check the derivatives at different points, and to avoid points where partial derivatives happen to equal zero. If an initial point was provided by the user (e.g., via KTR_init_problem()), then Knitro will perform the derivative check at this point. Otherwise, if no initial point is provided, Knitro will perform the derivative check at a randomly generated point that satisfies the variable bounds. To perform a derivative check at different points, simply feed different initial points to Knitro.

Using the example problem above, if the Knitro derivative checker runs, with value derivcheck = 1, and the relative differences between all the user-supplied first derivatives and finite-difference first derivatives satisfy the tolerance defined by derivcheck_tol, then you will see the following output:

-------------------------------------------------------------------------
Knitro Derivative Check Information

Checking 1st derivatives with forward finite differences.
Derivative check performed at user-supplied initial 'x' point.
Printing relative differences >   1.0000e-06.

Maximum relative difference in the objective gradient =   0.0000e+00.
Maximum relative difference in the Jacobian           =   0.0000e+00.
Derivative check passed.
-------------------------------------------------------------------------

before the optimization begins. Since the derivative check passed, Knitro will automatically proceed with the optimization using the user-supplied derivatives.

Now let us modify the objective gradient computation in the example problem above as follows:

/* gradient of objective */
/* objGrad[0] = -2*x[0] - x[1] - x[2]; */
objGrad[0] = -2*x[0] - x[1]; /* BUG HERE !!! */

Running the code again, we obtain:

-------------------------------------------------------------------------
Knitro Derivative Check Information

Checking 1st derivatives with forward finite differences.
Derivative check performed at user-supplied initial 'x' point.
Printing relative differences >   1.0000e-06.

WARNING: The discrepancy for objective gradient element objGrad[0]
     exceeds the derivative check relative tolerance of 1.000000e-06.
     analytic (user-supplied) value =  -6.000000000000e+00,
     finite-difference value        =  -8.000000000000e+00,
     |rel diff| =  3.3333e-01, |abs diff| =  2.0000e+00

Maximum relative difference in the objective gradient =   3.3333e-01.
Maximum relative difference in the Jacobian           =   0.0000e+00.
Derivative check failed.
-------------------------------------------------------------------------

EXIT: Derivative check failed.

Knitro is warning us that the finite difference approximation of the first coordinate of the gradient at the initial point is about -8, whereas its (supposedly) exact user-supplied value is about -6: there is a bug in our implementation of the gradient of the objective. Knitro prints a message indicating the derivative discrepancy it found and terminates immediately with a failure message.

Knitro also provides a separate API function KTR_check_first_ders() that can also be used for checking first derivatives. See the Knitro API section in the Reference Manual and the Knitro header file for more information about this function. This function will be deprecated in the near future, however, so you should use the derivcheck option described aboveto perform derivative checks.