Parallelism
Knitro offers several features to exploit parallel computations on shared memory multiprocessor machines. These features are implemented using OpenMP.
Note
The parallel features offered through Knitro are not available through all interfaces. Check with your modeling language vendor to see if these features are included. The parallel features are included in the AMPL interface, the objectoriented interfaces, and through the callable library. Parallel features are also available through the MATLAB interface, but some may be less efficient in this environment.
Knitro offers the following parallel features:
Parallel FiniteDifference Gradients
As described in Derivatives, if you are unable to provide the exact
first derivatives, Knitro offers the option to approximate
first derivatives using either a forward or central finitedifference approach,
by setting the option gradopt
. Knitro will compute these finite
difference gradient values in parallel if the user specifies that Knitro should
use multiple threads through the option findiff_numthreads
.
This parallel feature only applies to first derivative finitedifference
evaluations. Using more than one thread for the finitedifference gradient evaluation is only
beneficial when concurrent_evals
=1
).
Note
In the KnitroMATLAB interface, the parallel finitedifference feature is
controlled by the UseParallel MATLAB option, rather than the Knitro
findiff_numthreads
option. See Knitro / MATLAB reference for
more information.
Parallel MultiStart
The multistart procedure described in MultiStart
can run in parallel by setting numthreads
to use
multiple threads.
When the multistart procedure is run in parallel, Knitro will produce the same sequence of initial points and solves that you see when running multistart sequentially (though, perhaps, not in the same order).
Therefore, as long as you run multistart to completion (ms_terminate
=0)
and use the deterministic option (ms_deterministic
=1),
you should visit
the same initial points encountered when running multistart sequentially,
and get the same final solution. By default ms_terminate
=0
and ms_deterministic
=1 so that the parallel multistart produces
the same solution as the sequential multistart.
However, if ms_deterministic
=0, or ms_terminate
>0,
there is no guarantee that the final solution reported by multistart will
be the same when run in parallel compared to the solution when run sequentially,
and even the parallel solution may change when run at different times.
The option ms_numthreads
can be used to set the number
of threads used by the multistart procedure. For instance, if
numthreads
=16 and ms_numthreads
=8,
Knitro will run 8 solves in parallel and each solve will be
allocated 2 threads.
Parallel Algorithms
If the user option alg
is set to multi
, then Knitro
will run all four algorithms (see Algorithms). When
numthreads
is set to use multiple threads, the four Knitro
algorithms will run in parallel. The termination of the parallel algorithms
procedure is controlled by the user option ma_terminate
. See
Algorithms for more details on the multi algorithm procedure.
Parallel MixedInteger BranchandBound
The branchandbound method for mixedinteger programming
is designed to execute in parallel by default. This can result in
significant speedups by solving multiple node subproblems or heuristics in parallel.
The option mip_numthreads
can be used to customize the number
of threads used for the branchandbound algorithm (mip_method
=1).
Parallel Tuning
The KnitroTuner can help you identify some nondefault options settings that may improve performance on a particular model or set of models.
When numthreads
is set to use multiple threads, Knitro will test the Tuner options in parallel.
Parallel Basic Linear Algebra Subroutine (BLAS)
The Knitro algorithms  in particular the interiorpoint/barrier algorithms  rely heavily on BLAS operations (e.g. dot products of vectors, dense matrixmatrix and matrixvector products, etc.). For largescale problems, these operations may often take 35%50% of the overall solution time, and sometimes more.
These operations can be computed in parallel using multiple threads by
setting the user option blas_numthreads
>1. This option is currently only active when
using the default Intel BLAS (blasoption
=1) provided with Knitro.
Parallel Sparse Linear System Solves
The primary computational cost each iteration in the Knitro interiorpoint
algorithms is the solution of a linear system of equations. The linsolver
user option specifies the linear system, solver to use. You can
use the multithreaded Intel MKL PARDISO solver in Knitro by
choosing linsolver
=6. By default the Intel MKL
PARDISO solver will use one thread, however, it can solve linear systems in
parallel by choosing linsolver_numthreads
>1 (in combination with
linsolver
=6). It is also possible to use linsolver_numthreads
>1
with the linear solvers MA86 and MA97.
Note
Generally you should not use BOTH parallel BLAS and a parallel linear solver
as they may conflict with each other. If blas_numthreads
>1
one should set linsolver_numthreads
=1 and vice versa.
Parallel Options
Option 
Meaning 

Specifies the max number of threads to use for all parallel features. You can just set this and let Knitro decide how to distribute the threads. 

Specifies whether or not to allow concurrent evaluations. 

Specifies the number of threads to use for parallel BLAS
(when 

Specifies the number of threads to use for finitedifference gradients. 

Specifies the number of threads to use for parallel linear
system solves (when 

Specifies the number of threads to use for the mixed
integer branchandbound method ( 

Specifies the number of threads to use for the parallel multistart procedure. 
The user option numthreads
is used to determine the number of threads
Knitro can use for all parallel computations. Knitro will decide how to apply the threads.
If numthreads
> 0, then
the number of threads is determined by the value of numthreads
.
If numthreads
= 0, then the number of threads is determined by the
value of the environment variables OMP_NUM_THREADS
. If
numthreads
= 0 and OMP_NUM_THREADS
is not set, then the
number of threads to use will be automatically determined by OpenMP. If
numthreads
< 0, Knitro will automatically determine the number of
threads to use.
Generally, if you are unsure of how best to apply parallel threads in Knitro
you should just set the general option numthreads
to the maximum
number of threads you want Knitro to use, and leave blas_numthreads
,
linsolver_numthreads
, and ms_numthreads
at their default values.
Then Knitro will try to allocate work to these different
threads in the most sensible way. Typically, if you are performing a single solve,
the threads will get applied to the BLAS operations. If, for example,
you are using multistart then the multistart solves are run in
parallel but BLAS is sequential (typically applying 2 layers of parallelism is
not good). You should then compare this with using one thread to see if you
achieve any speedup from parallelism. Using parallelism will not always result in a
speedup since sometimes the overhead work associated with using parallelism will
outweigh any improvements achieved through parallel computations inside Knitro.
This may be especially likely on small problems that are solved quickly.
The options blas_numthreads
, findiff_numthreads
, linsolver_numthreads
,
mip_numthreads
and ms_numthreads
allow
the expert user more finegrained control over parallelism of these specific features.
The user option blas_numthreads
is used to determine the number of threads
Knitro can use for parallel BLAS computations. This option is only active when using
the default Intel BLAS (blasoption
=1).
The domain specific blas_numthreads
,
will override the general thread setting specified by numthreads
for BLAS operations.
The user option findiff_numthreads
is used to determine the number of threads
to use for finitedifference gradient evaluations when gradopt
=2 or
gradopt
=3.
The user option linsolver_numthreads
is used to determine the number of threads
Knitro can use for parallel linear system solves. This option is only active when using
the Intel MKL PARDISO linear solver (linsolver
=6), the HSL MA97 linear solver
(linsolver
=7) and the HSL MA86 linear solver (linsolver
=8).
The domain specific linsolver_numthreads
,
will override the general thread setting specified by numthreads
for linear system solve operations.
The user option mip_numthreads
is used to determine the number of threads
to use for the branchandbound algorithm for solving mixedinteger models when
mip_method
=1.
The user option ms_numthreads
is used to determine the number of threads
to use for the multistart procedure. See MultiStart for more details.
The user option concurrent_evals
determines whether or not the user
provided callback functions used for function and derivative evaluations can
take place concurrently in parallel (for possibly different values of “x”).
If it is not safe to have concurrent evaluations, then setting
concurrent_evals
=0, will put these evaluations in a critical region
so that only one evaluation can take place at a time. If concurrent_evals
=1
then concurrent evaluations are allowed when Knitro is run in parallel, and it
is the responsibility of the user to ensure that these evaluations are stable.
Preventing concurrent evaluations will decrease the efficiency of the parallel features, particularly when the evaluations are expensive or there are many threads and these evaluations create a bottleneck.
AMPL example
Let us consider again our AMPL example from Section Getting started with AMPL and run it with the parallel multi algorithm procedure. We specify that Knitro should run in parallel with four threads (one for each algorithm):
1 2 3 4 5  ampl: reset;
ampl: option solver knitroampl;
ampl: option knitro_options "alg=5 ma_terminate=0 numthreads=4";
ampl: model testproblem.mod;
ampl: solve;

The Knitro log printed to the screen shows the results of each algorithm (one per line):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66  =======================================
Commercial License
Artelys Knitro 13.1.0
=======================================
Knitro presolve eliminated 0 variables and 0 constraints.
algorithm: 5
concurrent_evals: 0
datacheck: 0
hessian_no_f: 1
ma_terminate: 0
numthreads: 4
Problem Characteristics ( Presolved)

Objective goal: Minimize
Objective type: quadratic
Number of variables: 3 ( 3)
bounded below only: 3 ( 3)
bounded above only: 0 ( 0)
bounded below and above: 0 ( 0)
fixed: 0 ( 0)
free: 0 ( 0)
Number of constraints: 2 ( 2)
linear equalities: 1 ( 1)
quadratic equalities: 0 ( 0)
gen. nonlinear equalities: 0 ( 0)
linear onesided inequalities: 0 ( 0)
quadratic onesided inequalities: 1 ( 1)
gen. nonlinear onesided inequalities: 0 ( 0)
linear twosided inequalities: 0 ( 0)
quadratic twosided inequalities: 0 ( 0)
gen. nonlinear twosided inequalities: 0 ( 0)
Number of nonzeros in Jacobian: 6 ( 6)
Number of nonzeros in Hessian: 5 ( 5)
Knitro running multiple algorithms in parallel with 4 threads.
Alg Status Objective FeasError OptError Real Time
     
2 0 9.360000e+02 0.000e+00 1.945e07 0.001
3 0 9.360000e+02 0.000e+00 0.000e+00 0.002
1 0 9.360000e+02 0.000e+00 1.250e09 0.002
4 0 9.360000e+02 0.000e+00 0.000e+00 0.006
Multiple algorithms stopping, all solves have completed.
EXIT: Locally optimal solution found.
Final Statistics

Final objective value = 9.36000000000000e+02
Final feasibility error (abs / rel) = 0.00e+00 / 0.00e+00
Final optimality error (abs / rel) = 0.00e+00 / 0.00e+00
# of iterations = 21
# of CG iterations = 12
# of function evaluations = 0
# of gradient evaluations = 0
# of Hessian evaluations = 0
Total program time (secs) = 0.00798 ( 0.028 CPU time)
===============================================================================
Knitro 13.1.0: Locally optimal or satisfactory solution.
objective 936; feasibility error 0
21 iterations; 0 function evaluations

As can be seen, all four Knitro algorithms solve the problem and find the same
local solution. However, the interiorpoint conjugate gradient algorithm (alg
=2
)
is the fastest.
C example
As an example, the C example can also be easily modified to enable
parallel multialgorithms by adding the following lines before the
call to KN_solve()
:
// parallelism
if (KN_set_int_param_by_name (kc, "algorithm", KN_ALG_MULTI) != 0)
exit( 1 );
if (KN_set_int_param_by_name (kc, "ma_terminate", 0) != 0)
exit( 1 );
if (KN_set_int_param_by_name (kc, "numthreads", 4) != 0)
exit( 1 );
Again, running this example we get a Knitro log that looks simlar to what we observed with AMPL.