# Parallelism

Knitro offers several features to exploit parallel computations on shared memory multi-processor machines. These features are implemented using OpenMP.

Note

The parallel features offered through Knitro are not available through all interfaces. Check with your modeling language vendor to see if these features are included. The parallel features are included in the AMPL interface, the object-oriented interfaces, and through the callable library. Parallel features are also available through the MATLAB interface, but some may be less efficient in this environment.

Knitro offers the following parallel features:

## Parallel Finite-Difference Gradients

As described in Derivatives, if you are unable to provide the exact
first derivatives, Knitro offers the option to approximate
first derivatives using either a forward or central finite-difference approach,
by setting the option `gradopt`

. Knitro will compute these finite
difference gradient values in parallel if the user specifies that Knitro should
use multiple threads through the option `par_numthreads`

(see below).
This parallel feature only applies to first derivative finite-difference
evaluations.

Note

In the Knitro-MATLAB interface, the parallel finite-difference feature is
controlled by the *UseParallel* MATLAB option, rather than the Knitro
`par_numthreads`

option. See Knitro / MATLAB reference for
more information.

## Parallel Multistart

The multistart procedure described in Multistart
can run in parallel by setting `par_numthreads`

to use
multiple threads.

When the multistart procedure is run in parallel, Knitro will produce the same sequence of initial points and solves that you see when running multistart sequentially (though, perhaps, not in the same order).

Therefore, as long as you run multistart to completion (`ms_terminate`

=0)
and use the deterministic option (`ms_deterministic`

=1),
you should visit
the same initial points encountered when running multistart sequentially,
and get the same final solution. By default `ms_terminate`

=0
and `ms_deterministic`

=1 so that the parallel multistart produces
the same solution as the sequential multistart.

However, if `ms_deterministic`

=0, or `ms_terminate`

>0,
there is no guarantee that the final solution reported by multistart will
be the same when run in parallel compared to the solution when run sequentially,
and even the parallel solution may change when run at different times.

The option `par_msnumthreads`

can be used to set the number
of threads used by the multistart procedure. For instance, if
`par_numthreads`

=16 and `par_msnumthreads`

=8,
Knitro will run 8 solves in parallel and each solve will be
allocated 2 threads.

## Parallel Algorithms

If the user option `alg`

is set to `multi`

, then Knitro
will run all four algorithms (see Algorithms). When
`par_numthreads`

is set to use multiple threads, the four Knitro
algorithms will run in parallel. The termination of the parallel algorithms
procedure is controlled by the user option `ma_terminate`

. See
Algorithms for more details on the multi algorithm procedure.

## Parallel Tuning

The Knitro-Tuner can help you identify some non-default options settings that may improve performance on a particular model or set of models.
When `par_numthreads`

is set to use multiple threads, Knitro will test the Tuner options in parallel.

## Parallel Basic Linear Algebra Subroutine (BLAS)

The Knitro algorithms - in particular the interior-point/barrier algorithms - rely heavily on BLAS operations (e.g. dot products of vectors, dense matrix-matrix and matrix-vector products, etc.). For large-scale problems, these operations may often take 35%-50% of the overall solution time, and sometimes more.

These operations can be computed in parallel using multiple threads by
setting the user option `par_blasnumthreads`

>1 (by default
`par_blasnumthreads`

=1). This option is currently only active when
using the default Intel BLAS (`blasoption`

=1) provided with Knitro.

## Parallel Sparse Linear System Solves

The primary computational cost each iteration in the Knitro interior-point
algorithms is the solution of a linear system of equations. The `linsolver`

user option specifies the linear system, solver to use. You can
use the multi-threaded Intel MKL PARDISO solver in Knitro by
choosing `linsolver`

=6. By default the Intel MKL
PARDISO solver will use one thread, however, it can solve linear systems in
parallel by choosing `par_lsnumthreads`

>1 (in combination with
`linsolver`

=6).

Note

Generally you should not use BOTH parallel BLAS and a parallel linear solver
as they may conflict with each other. If `par_blasnumthreads`

>1
one should set `par_lsnumthreads`

=1 and vise versa.

## Parallel Options

Option | Meaning |
---|---|

`par_blasnumthreads` |
Specifies the number of threads to use for parallel BLAS
(when `blasoption` =1) |

`par_lsnumthreads` |
Specifies the number of threads to use for parallel linear
system solves (when `linsolver` =6) |

`par_numthreads` |
Specifies the number of threads to use for all parallel features |

`par_concurrent_evals` |
Whether or not to allow concurrent evaluations |

The user option `par_numthreads`

is used to determine the number of threads
Knitro can use for all parallel computations. Knitro will decide how to apply the threads.
If `par_numthreads`

> 0, then
the number of threads is determined by the value of `par_numthreads`

.
If `par_numthreads`

= 0, then the number of threads is determined by the
value of the environment variables `OMP_NUM_THREADS`

. If
`par_numthreads`

= 0 and `OMP_NUM_THREADS`

is not set, then the
number of threads to use will be automatically deteremined by OpenMP. If
`par_numthreads`

< 0, Knitro will run in sequential mode.

Generally, if you are unsure of how best to apply parallel threads in Knitro
you should just set the general option `par_numthreads`

to the maximum
number of threads you want Knitro to use, and leave `par_blasnumthreads`

and
`par_lsnumthreads`

at their default values.
Then Knitro will try to allocate work to these different
threads in the most sensible way. Typically, if you are performing a single solve,
the threads will get applied to the BLAS operations. If, for example,
you are using multi-start then the multi-start solves are run in
parallel but BLAS is sequential (typically applying 2 layers of parallelism is
not good).

The options `par_blasnumthreads`

and `par_lsnumthreads`

allow
the expert user more fine-grained control over parallelism of these specific features.

The user option `par_blasnumthreads`

is used to determine the number of threads
Knitro can use for parallel BLAS computations. This option is only active when using
the default Intel BLAS (`blasoption`

=1).
The domain specific `par_blasnumthreads`

,
will override the general thread setting specified by `par_numthreads`

for BLAS operations.

The user option `par_lsnumthreads`

is used to determine the number of threads
Knitro can use for parallel linear system solves. This option is only active when using
the default Intel MKL PARDISO linear solver (`linsolver`

=6).
The domain specific `par_lsnumthreads`

,
will override the general thread setting specified by `par_numthreads`

for linear system solve operations.

The user option `par_concurrent_evals`

determines whether or not the user
provided callback functions used for function and derivative evaluations can
take place concurrently in parallel (for possibly different values of “*x*”).
If it is not safe to have concurrent evaluations, then setting
`par_concurrent_evals`

=0, will put these evaluations in a critical region
so that only one evaluation can take place at a time. If `par_concurrent_evals`

=1
then concurrent evaluations are allowed when Knitro is run in parallel, and it
is the responsibility of the user to ensure that these evaluations are stable.

Preventing concurrent evaluations will decrease the efficiency of the parallel features, particularly when the evaluations are expensive or there are many threads and these evaluations create a bottleneck.

## AMPL example

Let us consider again our AMPL example from Section Getting started with AMPL and run it with the parallel multi algorithm procedure. We specify that Knitro should run in parallel with four threads (one for each algorithm):

1 2 3 4 5 | ```
ampl: reset;
ampl: option solver knitroampl;
ampl: option knitro_options "alg=5 ma_terminate=0 par_numthreads=4";
ampl: model testproblem.mod;
ampl: solve;
``` |

The Knitro log printed to the screen shows the results of each algorithm (one per line):

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | ```
=======================================
Commercial License
Artelys Knitro 10.0.0
=======================================
Knitro presolve eliminated 0 variables and 0 constraints.
algorithm: 5
datacheck: 0
hessian_no_f: 1
ma_terminate: 0
par_concurrent_evals: 0
par_numthreads: 4
Problem Characteristics ( Presolved)
-----------------------
Objective goal: Minimize
Number of variables: 3 ( 3)
bounded below: 3 ( 3)
bounded above: 0 ( 0)
bounded below and above: 0 ( 0)
fixed: 0 ( 0)
free: 0 ( 0)
Number of constraints: 2 ( 2)
linear equalities: 1 ( 1)
nonlinear equalities: 0 ( 0)
linear inequalities: 0 ( 0)
nonlinear inequalities: 1 ( 1)
range: 0 ( 0)
Number of nonzeros in Jacobian: 6 ( 6)
Number of nonzeros in Hessian: 5 ( 5)
Knitro running multiple algorithms in parallel with 4 threads.
Alg Status Objective FeasError OptError Real Time
-------- --------- -------------- ---------- ---------- ----------
2 0 9.360000e+02 0.000e+00 1.945e-07 0.002
1 0 9.360000e+02 6.738e-08 6.614e-08 0.002
4 0 9.360000e+02 0.000e+00 2.387e-12 0.005
3 0 9.360000e+02 0.000e+00 0.000e+00 0.010
Multiple algorithms stopping, all solves have completed.
EXIT: Locally optimal solution found.
Final Statistics
----------------
Final objective value = 9.35999997829394e+02
Final feasibility error (abs / rel) = 6.74e-08 / 5.18e-09
Final optimality error (abs / rel) = 6.61e-08 / 4.13e-09
# of iterations = 16
# of CG iterations = 12
# of function evaluations = 28
# of gradient evaluations = 24
# of Hessian evaluations = 16
Total program time (secs) = 0.01169 ( 0.023 CPU time)
===============================================================================
Knitro 10.0.0: Locally optimal or satisfactory solution.
objective 935.9999978293937; feasibility error 6.74e-08
16 iterations; 28 function evaluations
``` |

As can be seen, all four Knitro algorithms solve the problem and find the same
local solution. However, the two interior-point algorithms (`alg=1`

and `2`

)
are the fastest.

## C example

As an example, the C example can also be easily modified to enable
parallel multi-algorithms by adding the following lines before the
call to `KTR_init_problem()`

:

```
// parallelism
if (KTR_set_int_param_by_name (kc, "algorithm", KTR_ALG_MULTI) != 0)
exit( -1 );
if (KTR_set_int_param_by_name (kc, "ma_terminate", 0) != 0)
exit( -1 );
if (KTR_set_int_param_by_name (kc, "par_numthreads", 4) != 0)
exit( -1 );
```

Again, running this example we get a Knitro log that looks simlar to what we observed with AMPL.