Parallelism

Knitro offers several features to exploit parallel computations on shared memory multi-processor machines. These features are implemented using OpenMP.

Note

The parallel features offered through Knitro are not available through all interfaces. Check with your modeling language vendor to see if these features are included. The parallel features are included in the AMPL interface, the object-oriented interfaces, and through the callable library. Parallel features are also available through the MATLAB interface, but some may be less efficient in this environment.

Knitro offers the following parallel features:

Parallel Finite-Difference Gradients

As described in Derivatives, if you are unable to provide the exact first derivatives, Knitro offers the option to approximate first derivatives using either a forward or central finite-difference approach, by setting the option gradopt. Knitro will compute these finite difference gradient values in parallel if the user specifies that Knitro should use multiple threads through the option par_numthreads (see below). This parallel feature only applies to first derivative finite-difference evaluations.

Note

In the Knitro-MATLAB interface, the parallel finite-difference feature is controlled by the UseParallel MATLAB option, rather than the Knitro par_numthreads option. See Knitro / MATLAB reference for more information.

Parallel Multistart

The multistart procedure described in Multistart can run in parallel by setting par_numthreads to use multiple threads.

When the multistart procedure is run in parallel, Knitro will produce the same sequence of initial points and solves that you see when running multistart sequentially (though, perhaps, not in the same order).

Therefore, as long as you run multistart to completion (ms_terminate =0) and use the deterministic option (ms_deterministic =1), you should visit the same initial points encountered when running multistart sequentially, and get the same final solution. By default ms_terminate =0 and ms_deterministic =1 so that the parallel multistart produces the same solution as the sequential multistart.

However, if ms_deterministic =0, or ms_terminate >0, there is no guarantee that the final solution reported by multistart will be the same when run in parallel compared to the solution when run sequentially, and even the parallel solution may change when run at different times.

The option par_msnumthreads can be used to set the number of threads used by the multistart procedure. For instance, if par_numthreads =16 and par_msnumthreads =8, Knitro will run 8 solves in parallel and each solve will be allocated 2 threads.

Parallel Algorithms

If the user option alg is set to multi, then Knitro will run all four algorithms (see Algorithms). When par_numthreads is set to use multiple threads, the four Knitro algorithms will run in parallel. The termination of the parallel algorithms procedure is controlled by the user option ma_terminate. See Algorithms for more details on the multi algorithm procedure.

Parallel Tuning

The Knitro-Tuner can help you identify some non-default options settings that may improve performance on a particular model or set of models. When par_numthreads is set to use multiple threads, Knitro will test the Tuner options in parallel.

Parallel Basic Linear Algebra Subroutine (BLAS)

The Knitro algorithms - in particular the interior-point/barrier algorithms - rely heavily on BLAS operations (e.g. dot products of vectors, dense matrix-matrix and matrix-vector products, etc.). For large-scale problems, these operations may often take 35%-50% of the overall solution time, and sometimes more.

These operations can be computed in parallel using multiple threads by setting the user option par_blasnumthreads >1 (by default par_blasnumthreads =1). This option is currently only active when using the default Intel BLAS (blasoption =1) provided with Knitro.

Parallel Sparse Linear System Solves

The primary computational cost each iteration in the Knitro interior-point algorithms is the solution of a linear system of equations. The linsolver user option specifies the linear system, solver to use. You can use the multi-threaded Intel MKL PARDISO solver in Knitro by choosing linsolver =6. By default the Intel MKL PARDISO solver will use one thread, however, it can solve linear systems in parallel by choosing par_lsnumthreads >1 (in combination with linsolver =6). It is also possible to use par_lsnumthreads >1 with the linear solvers MA86 and MA97.

Note

Generally you should not use BOTH parallel BLAS and a parallel linear solver as they may conflict with each other. If par_blasnumthreads >1 one should set par_lsnumthreads =1 and vise versa.

Parallel Options

Option Meaning
par_numthreads Specifies the max number of threads to use for all parallel features. You can just set this and let Knitro decide how to distribute the threads.
par_concurrent_evals Whether or not to allow concurrent evaluations
par_blasnumthreads Specifies the number of threads to use for parallel BLAS (when blasoption =1)
par_lsnumthreads Specifies the number of threads to use for parallel linear system solves (when linsolver =6)
par_msnumthreads Specifies the number of threads to use for the parallel multi-start procedure.

The user option par_numthreads is used to determine the number of threads Knitro can use for all parallel computations. Knitro will decide how to apply the threads. If par_numthreads > 0, then the number of threads is determined by the value of par_numthreads. If par_numthreads = 0, then the number of threads is determined by the value of the environment variables OMP_NUM_THREADS. If par_numthreads = 0 and OMP_NUM_THREADS is not set, then the number of threads to use will be automatically deteremined by OpenMP. If par_numthreads < 0, Knitro will run in sequential mode.

Generally, if you are unsure of how best to apply parallel threads in Knitro you should just set the general option par_numthreads to the maximum number of threads you want Knitro to use, and leave par_blasnumthreads and par_lsnumthreads at their default values. Then Knitro will try to allocate work to these different threads in the most sensible way. Typically, if you are performing a single solve, the threads will get applied to the BLAS operations. If, for example, you are using multi-start then the multi-start solves are run in parallel but BLAS is sequential (typically applying 2 layers of parallelism is not good).

The options par_blasnumthreads and par_lsnumthreads allow the expert user more fine-grained control over parallelism of these specific features.

The user option par_blasnumthreads is used to determine the number of threads Knitro can use for parallel BLAS computations. This option is only active when using the default Intel BLAS (blasoption =1). The domain specific par_blasnumthreads, will override the general thread setting specified by par_numthreads for BLAS operations.

The user option par_lsnumthreads is used to determine the number of threads Knitro can use for parallel linear system solves. This option is only active when using the Intel MKL PARDISO linear solver (linsolver =6), the HSL MA97 linear solver (linsolver =7) and the HSL MA86 linear solver (linsolver =8). The domain specific par_lsnumthreads, will override the general thread setting specified by par_numthreads for linear system solve operations.

The user option par_msnumthreads is used to determine the number of threads to use for the multi-start procedure. See Multistart for more details.

The user option par_concurrent_evals determines whether or not the user provided callback functions used for function and derivative evaluations can take place concurrently in parallel (for possibly different values of “x”). If it is not safe to have concurrent evaluations, then setting par_concurrent_evals =0, will put these evaluations in a critical region so that only one evaluation can take place at a time. If par_concurrent_evals =1 then concurrent evaluations are allowed when Knitro is run in parallel, and it is the responsibility of the user to ensure that these evaluations are stable.

Preventing concurrent evaluations will decrease the efficiency of the parallel features, particularly when the evaluations are expensive or there are many threads and these evaluations create a bottleneck.

AMPL example

Let us consider again our AMPL example from Section Getting started with AMPL and run it with the parallel multi algorithm procedure. We specify that Knitro should run in parallel with four threads (one for each algorithm):

1
2
3
4
5
ampl: reset;
ampl: option solver knitroampl;
ampl: option knitro_options "alg=5 ma_terminate=0 par_numthreads=4";
ampl: model testproblem.mod;
ampl: solve;

The Knitro log printed to the screen shows the results of each algorithm (one per line):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
=======================================
          Commercial License
         Artelys Knitro 11.0.0
=======================================

Knitro presolve eliminated 0 variables and 0 constraints.

algorithm:            5
datacheck:            0
hessian_no_f:         1
ma_terminate:         0
par_concurrent_evals: 0
par_numthreads:       4

Problem Characteristics                                 (   Presolved)
-----------------------
Objective goal:  Minimize
Objective type:  quadratic
Number of variables:                                  3 (           3)
    bounded below only:                               3 (           3)
    bounded above only:                               0 (           0)
    bounded below and above:                          0 (           0)
    fixed:                                            0 (           0)
    free:                                             0 (           0)
Number of constraints:                                2 (           2)
    linear equalities:                                1 (           1)
    quadratic equalities:                             0 (           0)
    gen. nonlinear equalities:                        0 (           0)
    linear one-sided inequalities:                    0 (           0)
    quadratic one-sided inequalities:                 1 (           1)
    gen. nonlinear one-sided inequalities:            0 (           0)
    linear two-sided inequalities:                    0 (           0)
    quadratic two-sided inequalities:                 0 (           0)
    gen. nonlinear two-sided inequalities:            0 (           0)
Number of nonzeros in Jacobian:                       6 (           6)
Number of nonzeros in Hessian:                        5 (           5)

Knitro running multiple algorithms in parallel with 4 threads.

   Alg      Status    Objective      FeasError   OptError   Real Time
--------  --------- --------------  ----------  ---------- ----------
       2         0    9.360000e+02   0.000e+00   1.945e-07      0.002
       1         0    9.360000e+02   6.738e-08   6.614e-08      0.002
       4         0    9.360000e+02   0.000e+00   2.387e-12      0.005
       3         0    9.360000e+02   0.000e+00   0.000e+00      0.010
Multiple algorithms stopping, all solves have completed.

EXIT: Locally optimal solution found.

Final Statistics
----------------
Final objective value               =   9.35999997829394e+02
Final feasibility error (abs / rel) =   6.74e-08 / 5.18e-09
Final optimality error  (abs / rel) =   6.61e-08 / 4.13e-09
# of iterations                     =         16
# of CG iterations                  =         12
# of function evaluations           =         28
# of gradient evaluations           =         24
# of Hessian evaluations            =         16
Total program time (secs)           =       0.01169 (     0.023 CPU time)

===============================================================================

Knitro 11.0.0: Locally optimal or satisfactory solution.
objective 935.9999978293937; feasibility error 6.74e-08
16 iterations; 28 function evaluations

As can be seen, all four Knitro algorithms solve the problem and find the same local solution. However, the two interior-point algorithms (alg=1 and 2) are the fastest.

C example

As an example, the C example can also be easily modified to enable parallel multi-algorithms by adding the following lines before the call to KN_solve():

// parallelism
if (KN_set_int_param_by_name (kc, "algorithm", KN_ALG_MULTI) != 0)
exit( -1 );
if (KN_set_int_param_by_name (kc, "ma_terminate", 0) != 0)
exit( -1 );
if (KN_set_int_param_by_name (kc, "par_numthreads", 4) != 0)
exit( -1 );

Again, running this example we get a Knitro log that looks simlar to what we observed with AMPL.