Parallelism

Knitro offers several features to exploit parallel computations on shared memory multi-processor machines. These features are implemented using OpenMP.

Note

The parallel features offered through Knitro are not available through all interfaces. Check with your modeling language vendor to see if these features are included. The parallel features are included in the AMPL interface, the object-oriented interfaces, and through the callable library. Parallel features are also available through the MATLAB interface, but some may be less efficient in this environment.

Knitro offers the following parallel features:

Parallel Finite-Difference Gradients

As described in Evaluation callbacks, if you are unable to provide the exact first derivatives, Knitro offers the option to approximate first derivatives using either a forward or central finite-difference approach, by setting the option gradopt. Knitro will compute these finite difference gradient values in parallel if the user specifies that Knitro should use multiple threads through the option findiff_numthreads. This parallel feature only applies to first derivative finite-difference evaluations. Using more than one thread for the finite-difference gradient evaluation is only beneficial when concurrent_evals=1).

Note

In the Knitro-MATLAB interface, the parallel finite-difference feature is controlled by the UseParallel MATLAB option, rather than the Knitro findiff_numthreads option. See Knitro / MATLAB reference for more information.

Parallel Multi-Start

The multi-start procedure described in Multi-Start can run in parallel by setting either numthreads or ms_numthreads to use multiple threads.

When the multi-start procedure is run in parallel, Knitro will produce the same sequence of initial points and solves that you see when running multi-start sequentially (though, perhaps, not in the same order).

Therefore, as long as you run multi-start to completion (ms_terminate =0) and use the deterministic option (deterministic =1), you should visit the same initial points encountered when running multi-start sequentially, and get the same final solution. By default ms_terminate =0 and deterministic =1 so that the parallel multi-start produces the same solution as the sequential multi-start.

However, if deterministic =0, or ms_terminate >0, there is no guarantee that the final solution reported by multi-start will be the same when run in parallel compared to the solution when run sequentially, and even the parallel solution may change when run at different times.

Parallel Algorithms

If the user option nlp_algorithm is set to multi, then Knitro will run all four algorithms (see Algorithms). When numthreads is set to use multiple threads, the four Knitro algorithms will run in parallel. The termination of the parallel algorithms procedure is controlled by the user option opt-ma_terminate. See Algorithms for more details on the multi algorithm procedure.

Parallel Mixed-Integer Branch-and-Bound

The branch-and-bound method for mixed-integer programming is designed to execute in parallel by default. This can result in significant speedups by solving multiple node subproblems or heuristics in parallel. The option mip_numthreads can be used to customize the number of threads used for the branch-and-bound algorithm (mip_method =1).

Parallel Tuning

The Knitro-Tuner can help you identify some non-default options settings that may improve performance on a particular model or set of models. When numthreads is set to use multiple threads, Knitro will test the Tuner options in parallel.

Parallel Basic Linear Algebra Subroutine (BLAS)

The Knitro algorithms - in particular the interior-point/barrier algorithms - rely heavily on BLAS operations (e.g. dot products of vectors, dense matrix-matrix and matrix-vector products, etc.). For large-scale problems, these operations may often take 35%-50% of the overall solution time, and sometimes more.

These operations can be computed in parallel using multiple threads by setting the user option blas_numthreads >1. This option is currently only active when using the default Intel BLAS (blasoption =1) provided with Knitro.

Parallel Sparse Linear System Solves

The primary computational cost each iteration in the Knitro interior-point algorithms is the solution of a linear system of equations. The linsolver user option specifies the linear system, solver to use. You can use the multi-threaded Intel MKL PARDISO solver in Knitro by choosing linsolver =6 on x86-64 machines. By default the Intel MKL PARDISO solver will use one thread, however, it can solve linear systems in parallel by choosing linsolver_numthreads >1 (in combination with linsolver =6).

It is also possible to use multi-threaded linear solves with the linear solvers MA97 (linsolver =7), MA86 (linsolver =8), and the Apple Accelerate sparse linear solver on macOS machines (linsolver =9). In this case, you should set the more general option numthreads >1, to do this (linsolver_numthreads is currently specific to the MKL PARDISO linear solver).

Note

Generally you should not use BOTH parallel BLAS and a parallel linear solver as they may conflict with each other. If blas_numthreads >1 one should set linsolver_numthreads =1 and vice versa.

Parallel Options

Option	Meaning
numthreads	Specifies the max number of threads to use for all parallel features. You can just set this and let Knitro decide how to distribute the threads.
concurrent_evals	Specifies whether or not to allow concurrent evaluations.
blas_numthreads	Specifies the number of threads to use for parallel BLAS (when blasoption =1).
findiff_numthreads	Specifies the number of threads to use for finite-difference gradients.
linsolver_numthreads	Specifies the number of threads to use for parallel linear system solves (when linsolver =6)
mip_numthreads	Specifies the number of threads to use for the mixed- integer branch-and-bound method (mip_method =1).
ms_numthreads	Specifies the number of threads to use for the parallel multi-start procedure.

The user option numthreads is used to determine the number of threads Knitro can use for all parallel computations. Knitro will decide how to apply the threads. If numthreads > 0, then the number of threads is determined by the value of numthreads. If numthreads = 0, then the number of threads is determined by the value of the environment variables OMP_NUM_THREADS. If numthreads = 0 and OMP_NUM_THREADS is not set, then the number of threads to use will be automatically determined by OpenMP. If numthreads < 0, Knitro will automatically determine the number of threads to use.

Generally, if you are unsure of how best to apply parallel threads in Knitro you should just set the general option numthreads to the maximum number of threads you want Knitro to use, and leave blas_numthreads, linsolver_numthreads, and ms_numthreads at their default values. Then Knitro will try to allocate work to these different threads in the most sensible way. Typically, if you are performing a single solve, the threads will get applied to the BLAS operations. If, for example, you are using multi-start then the multi-start solves are run in parallel but BLAS is sequential (typically applying 2 layers of parallelism is not good). You should then compare this with using one thread to see if you achieve any speedup from parallelism. Using parallelism will not always result in a speedup since sometimes the overhead work associated with using parallelism will outweigh any improvements achieved through parallel computations inside Knitro. This may be especially likely on small problems that are solved quickly.

The options blas_numthreads, findiff_numthreads, linsolver_numthreads, mip_numthreads and ms_numthreads allow the expert user more fine-grained control over parallelism of these specific features.

The user option blas_numthreads is used to determine the number of threads Knitro can use for parallel BLAS computations. This option is only active when using the default Intel BLAS (blasoption =1). The domain specific blas_numthreads, will override the general thread setting specified by numthreads for BLAS operations.

The user option findiff_numthreads is used to determine the number of threads to use for finite-difference gradient evaluations when gradopt =2 or gradopt =3.

The user option linsolver_numthreads is used to determine the number of threads Knitro can use for parallel linear system solves when using the Intel MKL PARDISO linear solver (linsolver =6). The domain specific linsolver_numthreads, will override the general thread setting specified by numthreads for linear system solve operations in this case.

The user option mip_numthreads is used to determine the number of threads to use for the branch-and-bound algorithm for solving mixed-integer models when mip_method =1.

The user option ms_numthreads is used to determine the number of threads to use for the multi-start procedure. See Multi-Start for more details.

The user option concurrent_evals determines whether or not the user provided callback functions used for function and derivative evaluations can take place concurrently in parallel (for possibly different values of “x”). If it is not safe to have concurrent evaluations, then setting concurrent_evals =0, will put these evaluations in a critical region so that only one evaluation can take place at a time. If concurrent_evals =1 then concurrent evaluations are allowed when Knitro is run in parallel, and it is the responsibility of the user to ensure that these evaluations are stable.

Preventing concurrent evaluations will decrease the efficiency of the parallel features, particularly when the evaluations are expensive or there are many threads and these evaluations create a bottleneck.

AMPL example

Let us consider again our AMPL example from Section Getting started with AMPL and run it with the parallel multi algorithm procedure. We specify that Knitro should run in parallel with four threads (one for each algorithm):

ampl: reset;
ampl: option solver knitroampl;
ampl: option knitro_options "algorithm=5 ma_terminate=0 numthreads=4";
ampl: model testproblem.mod;
ampl: solve;

The Knitro log printed to the screen shows the results of each algorithm (one per line):

=======================================
          Commercial License
         Artelys Knitro 16.0.0
=======================================

Knitro presolve eliminated 0 variables and 0 constraints.

concurrent_evals:        0
datacheck                0
feastol                  1e-06
feastol_abs              0.001
hessian_no_f             1
hessopt                  1
ma_terminate             0
nlp_algorithm            5
numthreads               4
opttol                   1e-06
opttol_abs               0.001

Problem Characteristics                     |           Presolved
-----------------------
Problem type: QCQP
Objective: minimize / quadratic
Number of variables:                      3 |                             3
  bounds:         lower     upper     range |     lower     upper     range
                      3         0         0 |         3         0         0
                             free     fixed |                free     fixed
                                0         0 |                   0         0
Number of constraints:                    2 |                             2
                    eq.     ineq.     range |       eq.     ineq.     range
  linear:             1         0         0 |         1         0         0
  quadratic:          0         1         0 |         0         1         0
Number of nonzeros:
              objective  Jacobian   Hessian | objective  Jacobian   Hessian
  linear:             0         3           |         0         3
  quadratic:          3         3         5 |         3         3         5
  total:              3         6         5 |         3         6         5

Knitro running multiple algorithms in parallel with 4 threads.

   Alg      Status    Objective      FeasError   OptError   Real Time
--------  --------- --------------  ----------  ---------- ----------
       2         0    9.360000e+02   0.000e+00   1.945e-07      0.001
       3         0    9.360000e+02   0.000e+00   0.000e+00      0.002
       1         0    9.360000e+02   0.000e+00   1.250e-09      0.002
       4         0    9.360000e+02   0.000e+00   0.000e+00      0.006
Multiple algorithms stopping, all solves have completed.

EXIT: Locally optimal solution found.

Final Statistics
----------------
Final objective value               =   9.36000000000000e+02
Final feasibility error (abs / rel) =   0.00e+00 / 0.00e+00
Final optimality error  (abs / rel) =   0.00e+00 / 0.00e+00
# of iterations                     =         21
# of CG iterations                  =         12
# of function evaluations           =          0
# of gradient evaluations           =          0
# of Hessian evaluations            =          0
Total program time (secs)           =       0.00798 (     0.028 CPU time)

===============================================================================

Knitro 16.0.0: Locally optimal or satisfactory solution.
objective 936; feasibility error 0
21 iterations; 0 function evaluations

As can be seen, all four Knitro algorithms solve the problem and find the same local solution. However, the interior-point conjugate gradient algorithm (nlp_algorithm=2) is the fastest.

C example

As an example, the C example can also be easily modified to enable parallel multi-algorithms by adding the following lines before the call to KN_solve():

// parallelism
if (KN_set_int_param_by_name (kc, "nlp_algorithm", KN_NLP_ALG_MULTI) != 0)
exit( -1 );
if (KN_set_int_param_by_name (kc, "ma_terminate", 0) != 0)
exit( -1 );
if (KN_set_int_param_by_name (kc, "numthreads", 4) != 0)
exit( -1 );

Again, running this example we get a Knitro log that looks simlar to what we observed with AMPL.