Simplex Optimization

Larry examines the simplex algorithm and discovers it is an efficient technique for finding solutions to mathematical problems and improving program performance.


November 01, 2003
URL:http://drdobbs.com/simplex-optimization/184401716

Simplex Optimization

Figure 1: The initial simplex points are shown as black circles. Point B is the best point known; point W is the worst. The centroid (C) of the simplex without the worst point is shown as an unfilled circle. New points are at the ends of the arrows.

Simplex Optimization

Figure 2: The succession of simplexes from the starting point at ( 11, -5 ) to the end at ( 2, 1 ). All cases of reflection, expansion, contraction, and shrinking occur.

Simplex Optimization

Listing 1: The basic simplex template.

#if !defined ( SIMPLEX_H_INCLUDED )
#define SIMPLEX_H_INCLUDED
#include <cmath>
#include <cfloat>
#include <valarray>
#include <utility>
using namespace std;
typedef valarray<double> vd;
template < typename T > class simplex
{
public:
//---------------------------------------------------------------
   simplex ( const T t ) : m_nVar( t( ).size( ) ) // constructor
   {
      m_FunctionObject       = t;  // copy the function object in
                                   // for partial safety
      m_vInitialVector.resize( m_nVar );
      m_vInitialVector       = t( );
      m_vDeltaVector.resize  ( m_nVar );
      m_vFinalVector.resize  ( m_nVar );
      m_bDeltaVectorIsValid  = false;
      m_dFinalFOM            = DBL_MAX; // make huge starting FOM
      m_lMaximumIterations   = 5000;
      m_dSmallestAllowedShift = 1.0E-7;
       //if the change falls below this value, minimization stops
   }; 
//---------------------------------------------------------------
   std::pair<double, vd> fnMinimize( void )  
   {  //  this does all the work for simplex
      if ( !m_bDeltaVectorIsValid ) m_fnComputeDeltaVector( );
      m_fnSetupSimplex( );
      valarray<double> vShift(m_nVar);
      valarray<double> vPreviousResponseVector( m_nVar+1 );
      vPreviousResponseVector = DBL_MAX;
      m_lIterationsPerformed = 0;
      for ( size_t iterations=0; 
                 iterations<m_lMaximumIterations; iterations++ )
      {  double dResponse1, dResponse2;
         valarray<double> vCentroid( m_nVar );
         valarray<double> vTest1( m_nVar );
         vCentroid = 0.0;
         vTest1 = 0;
         fnFindMinMaxResponse( );
         vCentroid = fnCalculateSimplexCentroid( );
         vShift    = vCentroid - m_vSimplexVertex[ m_lWorstFitVertex ];
         vTest1    = vCentroid + vShift;
         dResponse1 = m_FunctionObject( vTest1 );
         //  if better than the best then try a bigger step
         if (dResponse1 < m_vVectorOfFOM[ m_lBestFitVertex ] )
         {
            valarray<double> vTest2( m_nVar );
            vTest2 = vTest1 + vShift;
            dResponse2 = m_FunctionObject ( vTest2 );
            //  if better than the best, use this one
            if ( dResponse2 < m_vVectorOfFOM[m_lBestFitVertex] )
            {  // doubled reflection shift
               m_vSimplexVertex[ m_lWorstFitVertex ] = vTest2;
               m_vVectorOfFOM[ m_lWorstFitVertex ] = dResponse2;
            }
            else
            {  // reflected the worst vertex
               m_vSimplexVertex[ m_lWorstFitVertex ] = vTest1;
               m_vVectorOfFOM[ m_lWorstFitVertex ] = dResponse1;
            }
         }
         // if better than the worst, then just replace the worst
         else if ( dResponse1 < m_vVectorOfFOM[ m_lWorstFitVertex ] )
         {  // simple replacement
            m_vSimplexVertex[ m_lWorstFitVertex ] = vTest1;
            m_vVectorOfFOM[ m_lWorstFitVertex ]     = dResponse1;
         }
         //  otherwise, it was worse than the worst we know
         else
         {
            //try the point midway between worst and the centroid
            vTest1 = vCentroid - 0.5 * vShift;
            dResponse1 = m_FunctionObject ( vTest1 );
            // if better than the worst we know, then just use it
            if (dResponse1 < m_vVectorOfFOM[m_lWorstFitVertex] )
            {  // replace with halfway between worst and centroid
               m_vSimplexVertex[ m_lWorstFitVertex ] = vTest1;
               m_vVectorOfFOM[ m_lWorstFitVertex ]  = dResponse1;
            }
            else
            //  nothing worked; contract toward the best point 
            {  //  just replace each point (except best) with the
              //  point halfway between best and itself
              size_t i;
              for ( i=0; i<m_nVar; i++ )
              {
                 if ( i != m_lBestFitVertex )
                 {
                    m_vSimplexVertex[ i ] =
                    ( m_vSimplexVertex[ i ] + 
                       m_vSimplexVertex[m_lBestFitVertex] ) /2.0;
                    m_vVectorOfFOM[ i ] = 
                       m_FunctionObject( m_vSimplexVertex[ i ] );
                 }
              }
            }
         }
         if ( iterations > 1 )
         {
            valarray<double> vTest = vPreviousResponseVector - 
              m_vVectorOfFOM;
            if ( sqrt( vTest.sum( ) ) <= m_dSmallestAllowedShift ) break;
         }
         vPreviousResponseVector = m_vVectorOfFOM;
         fnFindMinMaxResponse(  );
      } // iterations
      m_lIterationsPerformed = iterations;
      fnFindMinMaxResponse(  );
      m_vFinalVector = m_vSimplexVertex[ m_lBestFitVertex ];
      m_dFinalFOM    = m_vVectorOfFOM[ m_lBestFitVertex ];
      return ( make_pair( // return best FOM and best vertex
           m_dFinalFOM, m_vSimplexVertex[ m_lBestFitVertex ] ) );
   };
//---------------------------------------------------------------
private:
    const size_t     m_nVar;
    size_t           m_lMaximumIterations;
    double           m_dSmallestAllowedShift;
    valarray<double> m_vInitialVector;
    valarray<double> m_vFinalVector;
    valarray<double> m_vDeltaVector;
    valarray<double> m_vVectorOfFOM;
    valarray<vd>     m_vSimplexVertex; // vd is just a typedef
                                       // for the Microsoft compiler 
    bool             m_bDeltaVectorIsValid;
    double           m_dFinalFOM;
    size_t           m_lBestFitVertex;
    size_t           m_lWorstFitVertex;
    double           m_dBestFitValue;
    double           m_dWorstFitValue;
    size_t           m_lIterationsPerformed;
    // copy the function object in for safety    
    T                m_FunctionObject; 
//---------------------------------------------------------------
   void m_fnComputeDeltaVector ( void )
   {
      size_t i;
      for ( i=0; i<m_vDeltaVector.size( ); ++i )
      {
         if ( fabs( m_vInitialVector[i] ) > 100.0 * DBL_MIN )
         {
            m_vDeltaVector[i] = 0.01 * m_vInitialVector[i]; 
         }
         else
         {
            m_vDeltaVector = 0.1;
         }
      }
      m_bDeltaVectorIsValid = true;
   };
//---------------------------------------------------------------
   void m_fnSetupSimplex ( void )
   {
      size_t i;
      m_vVectorOfFOM.resize( m_nVar + 1 );
      m_vSimplexVertex.resize( m_nVar + 1 );
      valarray<double> tempVertex( m_nVar );
      for ( i=0; i<m_nVar+1; ++i )
      {
         tempVertex = m_vInitialVector;
         if ( i < m_nVar ) tempVertex[i] += m_vDeltaVector[i];
         m_vVectorOfFOM[i] = m_FunctionObject( tempVertex );
         m_vSimplexVertex[ i ] = tempVertex;
      }
      fnFindMinMaxResponse ( );
   };
//---------------------------------------------------------------
   void fnFindMinMaxResponse( void )
   {
      m_dBestFitValue  = m_vVectorOfFOM.min( );
      m_dWorstFitValue = m_vVectorOfFOM.max( );
      size_t i;
      m_lBestFitVertex = -1;
      for ( i=0; i<m_nVar+1; ++i )
      {
         if ( m_vVectorOfFOM[i] == m_dBestFitValue )
         {
            m_lBestFitVertex = i;
            break;
         }
      }
      for ( i=0; i<m_nVar+1; ++i )
      {
         if ( m_vVectorOfFOM[i] == m_dWorstFitValue )
         {
            m_lWorstFitVertex = i;
            break;
         }
      }
   };
//---------------------------------------------------------------
   valarray<double> fnCalculateSimplexCentroid( void ) const
   {
      return ( ( m_vSimplexVertex.sum( ) - 
          m_vSimplexVertex[ m_lWorstFitVertex ] ) / 
          double( m_nVar ) );
   }
};  //  class simplex
#endif  // if !defined ( SIMPLEX_H_INCLUDED )

Simplex Optimization

Listing 2: Access routines and example main program.


#include "FuncObject.h"
CFuncObject::CFuncObject ( ) 
{
}
CFuncObject::CFuncObject ( const double& a, const double& b )
{
   m_vdDataArray.resize ( 2 );
   m_iNVar = 2;
   m_vdDataArray[0] = a;
   m_vdDataArray[1] = b;
}
double CFuncObject::operator ( ) ( const valarray<double> v ) const
{
   return (v[1]-1.0)*(v[1]-1.0) + (v[0]-2.0)*(v[0]-2.0) ;
}
valarray< double > CFuncObject::operator ( ) ( ) const
{
   return ( m_vdDataArray );
}
#if !defined( FUNCOBJECT_H_INCLUDED )
#define FUNCOBJECT_H_INCLUDED
#include <valarray>
using namespace std;
class CFuncObject
{
public:
   CFuncObject ( );
   CFuncObject ( const double& , const double&  );
   double operator ( ) ( const valarray<double> v ) const;
   valarray< double > operator( ) ( ) const;
   //  NOTE THAT COPYING THE FUNCTION OBJECT INTO THE SIMPLEX OBJECT 
   //  IMPLIES THAT BOTH A COPY CONSTRUCTOR AND AN ASSIGNMENT 
   //  OPERATOR
   //  WILL BE REQUIRED FOR FULL SAFETY
   //  THE VALARRAY SEEMS TO BE COPIED CORRECTLY SINCE DELETING THE 
   //  ORIGINAL
   //  FUNCOBJECT DOES NOT PERTURB OPERATION, BUT POINTERS IN THE 
   //  FUNCOBJECT
   //  CLEARLY COULD GENERATE TROUBLE FROM A SHALLOW COPY
private:
   valarray<double> m_vdDataArray;
   int m_iNVar;
};
#endif // (FUNCOBJECT_H_INCLUDED)
#include <stdlib.h>
#include <cstdio>
#include <iostream>
#include <cmath>
#include <valarray>
// #include "publishedsimplexsource.h"
#include "simplex.h"
#include "FuncObject.h"

using namespace std;
typedef valarray<double> vd;
void OutputVector ( const valarray<double>& dV );
void OutputVector ( const valarray<double>& dV )
{
      int i;
      cout << "{ ";
      int nVar = dV.size( );
      for ( i=0; i<nVar; ++i )
      {
         cout << dV[i];
         if ( i < nVar - 1 ) cout  << ", ";
      }
      cout << " }" << endl;
}
int main (void)
{
   valarray<double> VOutput;
   CFuncObject f ( 11.0, -5.0 );
   simplex<CFuncObject> s( f );
   f.~CFuncObject( );
   // s.SetMaximumIterations( 5000 );
   // s.SetLimitingFractionalChange( 0.0 );
   pair<double,vd> pReturn = s.fnMinimize( );
   cout << endl << " final FOM " << pReturn.first << endl;
   // cout << endl << " final FOM " << s.GetFinalFOM( ) << endl;
   // cout << " total interations performed " <<  
   // s.GetTotalIterations( ) << endl;
   cout << "best point found = ";
   cout << pReturn.second[0] << "  " << pReturn.second[1] << endl;
   // OutputVector( s.GetFinalVector( ) );
   // cout << "best point found (again) = ";
   // OutputVector( pReturn.second );
   // s.OutputSimplex( );
    return ( EXIT_SUCCESS );
}

Function Objects

Function Objects

I started writing this template after reading Qiang Liu's C/C++ Users Journal article [2] on the use of function objects for solving a Newton-Raphson problem. Function objects are just classes that implement a "call operator" [ operator ( ) (. . .)]. For Liu's problem, simple function objects that are exactly drop-in replacements for C's function pointers elegantly solved his problem: how to have different function signatures for different problems without hardcoding them. Function objects also provide a convenient method for holding data specific to the problem being solved. So you get several benefits from using function objects: no hardcoded function signatures, a convenient place to store problem-specific data, and compatibility with pointers to functions (for a C-like solution to function evaluations).

Modeling my solution after Liu's, I wrote a template for the simplex method that used a function object for function evaluation. Unlike Liu, I needed at least three different kinds of information from the parameterized class: the function value evaluated at some point, a starting point for the simplex, and the number of variables in the simplex. I solved this by making function object's class more complex. I implemented two different call operators, using the overloading ability of C++.

double operator ( ) ( const T t );

T operator ( ) ( );

So the function evaluation at a point is returned if a point is supplied. If no parameters are supplied, then a vector of the parameters is returned. In my case, I returned the starting point so that the simplex could be told where to start. Since I used std::valarray to hold the parameter values, the same call to the call operator that returns the starting point to the simplex template can use:

m_nVar = t.size( );
to determine how many parameters there are in the problem. The simplex template needs that value to compose m_nVar+1 points that make up the simplex and to assign the size of each of those points (stored in valarrays). Any program that uses this simplex template will need to implement both of these call operators.

Why function object? Originally, the idea was as a simple replacement for pointers to functions. An important result of using function objects is that the source code (template in this case) is not dependent on a particular set of function signatures.

— L.A.

Back to Article

Simplex Optimization

Simplex Optimization

By Larry Andrews

Sidebar: Function Objects


In science, engineering, finance, and other fields, practitioners often want to find the "best" solution to some problem. The problems change, but the need for stable, robust methods for finding best solutions does not. Writing a program using least-squares, conjugate-gradients, and similar optimization methods can be complex; it is often necessary to code the derivatives of the function being optimized. These methods may require exquisite care in the program design so that unexpected results (such as divergence from any solution) do not occur. Often overlooked, simplex optimization is a reasonable and quick-to-implement solution, useful at least for the initial stages of rapid development and for testing methodologies.

I first saw the simplex method in MINUIT, a Fortran package from CERN. MINUIT includes several optimization techniques, but simplex met my needs because it was simple to integrate into a program I was writing. Later, I used the same code in other programs. When I stopped using Fortran, I still needed to do optimizations, and I've coded the simplex method in Pascal, Modula-2, Visual Basic, and now C++.

C++ offers several advantages in implementing the simplex algorithm — good memory-management methods, valarrays that implement simple array arithmetic, and templates all contribute to elegant packaging of the simplex method. Use of function objects allows the creation of a self-contained simplex template that requires no special knowledge of the problem domain or the programmatic implementation of the problem domain.

Simplex optimization is easy to implement since it does not require the evaluation of derivatives. Its functioning is simple to understand, and since its internal method is inherently iterative, it is possible to monitor the progress of the algorithm. Its stability is well known. It does not go off to infinity searching for nonexistent minima. It often works well when the function being evaluated is discontinuous (or one that does not even have derivatives). In fact, sometimes it is about the only general method that can solve a problem.

Most scientific and engineering optimization problems use experimental data. Because of measurement error, the problems often have no actual zero, just some minimum that you want to get to. Least-squares is a common method of implementing optimization, but it often suffers from several problems. First, you need to derive and code the derivatives, sometimes a substantial task. Writing the derivatives tends to be an error-prone process. Then you need to implement or find matrix and vector code, including matrix inversion (and hope that what you choose is sufficiently robust). Finally, nonlinear least-squares is notorious for poor convergence in many cases; sometimes, it just diverges from any acceptable answer for reasons that are hard to diagnose. The simplex method, on the other hand, is straightforward, and if you need to, you can get inside and see the actual steps. Simplex just walks around in a stylized way, and you are guaranteed that you will at least have the best answer you have ever stumbled upon.

How Simplex Works

A simplex is defined as the simplest kind of object that can fill space. For example, in two dimensions, the simplex is the triangle. In three dimensions, it is a tetrahedron. The simplex optimization method begins with an initial (often guessed) point (that is, a set of parameter values) and constructs a simplex near that guessed point. The objective function is evaluated at each of the simplex points, the list of values is kept, and the best/worst vertices are located in the list. In any step, the best-known vertex is never replaced, and the worst known vertex is always replaced.

The centroid (average) of all of the points in the simplex without the worst point is computed. From this point, at each step in the process one of the following methods is invoked to compute the next simplex:

  1. Reflection. The worst point is reflected through the centroid. If the new point is better than the worst and not better than the best, the worst point is replaced; Figure 1(a).

  2. Expansion. If the point tested in step 1 is better than the best point in the simplex, then the step from the worst to the new point is doubled, and the new point is tested. If it is better than the best, then the worst is replaced with the second new point; Figure 1(b).

  3. Replacement. If the expansion point in step 2 is not better than the best, then the worst is replaced with the point tested in step 1.

  4. Contraction. The more aggressive attempts to find a better point having failed, the point midway between the worst point and the centroid is tested. If that point is better than the worst, it replaces the worst; Figure 1(c).

  5. Shrinking. Okay! Nothing else worked. Replace every point except the best with the point midway between that point and the best; Figure 1(d).

The original publication of the simplex method [1] only used the Reflection method. That approach suffered often from slow convergence and, since it did not have any way to reduce the size of the simplex, there was no way to converge down to the best answer. Others have improved the algorithm by including steps 2 through 5.

Implementation

Listing 1 is the basic simplex template. Some simple access routines for setting and getting simplex parameters have been deleted for clarity, but they are included in the files on the CUJ site (http://www.cuj.com/code/). Several of those access routines are in the commented code in Listing 2.

The first, obvious thing that is needed is some way to store the simplex. I have implemented the simplex as a Standard Library valarray of the points of that define the simplex. The points are also implemented as Standard Library valarrays. Valarrays implement simple array arithmetic such as sums and differences and multiplication by scalars, considerably simplifying several functions. For instance, calculating the centroid of the points of the simplex without the worst point just becomes (in pseudocode): simplex.sum( ) - simplex[worst point]. Also, the Standard Library allows defining the sizes of containers at run time, making the implementation independent of the details of the particular problem being solved. For instance, in the 2D example case, the valarrays for the points of the simplex (triangle) are just valarrays of length 2. Since there are three points, the valarray that contains the points is of length 3.

To provide more safety (for instance, protection from the user changing the data), I copied the function object into the template instantiation. This is not really necessary, but it provides some protection against the data being changed or the external function object's being deleted.

The other important component of the interface to the simplex class is fnMinimize(), which returns an std::pair. The first element of the returned pair is the agreement factor at the end point, and the second is the std::valarray of the best set of parameters found. For this reason, the calling function (see Listing 2) needs to provide:

pair<double, valarray> p;

An Example

To demonstrate simplex templates, Listing 2 provides a simple driver program and a small function object with its objective function a parabolic bowl with its minimum at (x=2, y=1). The objective function is (x-2)^2 + (y-1)^2, so there is an actual point where the objective function is zero (at x=2, y=1). The demonstration function object is quite simple. It provides only a constructor that stores the initial values of the two parameters (a and b in the example), and two call operators:

double operator () ( const std::valarray<double> );

std::valarray<double> operator () ();

The first call operator returns ((x-2)^2+(y-1)^2). The second returns the starting set of values of the parameters as an std::valarray. In the example main program and Figure 2, the starting point is (11, -5). The progress of the simplex is in Figure 2. The simplex starts off rapidly moving along the x-direction until it passes the x-coordinate where the minimum lies (that is, x=2). At that point, it turns and starts walking toward the global minimum. The figure shows the sequence of simplexes as the algorithm moves along.

Limitations

Sometimes the simplex method is the best method you can use to get something working quickly. If the computational burden of function evaluation is not too large, then any method may work well, and this one is simple to get up and running, but there are times when simplex doesn't work well, and there can (occasionally) be problems with getting it to run right. It also does not return an estimate of the standard deviations of the parameters.

Some of the problems that can be encountered using simplex were mentioned above. Because it does not converge as quickly as some other methods, it isn't effective if the function evaluations are very time consuming. The slow convergence can sometimes be a problem, but that may be offset by simplex being the only simple way to find a minimum at all. When there are a lot of parameters (that is, a high-dimension minimization problem), typically the simplex may get quite slow, just because there are a lot of directions to go.

A deeper problem in the use of the simplex method is getting it to stop. Basically, you want it to stop when it's not finding solutions that are any better than the best one known. But it's not always easy to see that it has happened. For one thing, you may not be interested in the absolute best solution. Any answer that is within 1.0E-7 or so of the best solution is good enough for many purposes, especially when using real data with measurement errors in it. In this implementation, I have included two termination methods. First (and most powerful), is a limit on the number of iterations (5000 iterations in the template); it stops when you tell it to stop. Second, the iterations stop when the magnitude of the vector of function evaluations changes by less than some preset amount (in the template, the default value is set to 1.0E-7). Sometimes, the simplex will cycle back and forth between two positions, and then it will simple go on forever (or at least until the iteration limit).

Another peculiarity of the simplex method is the need to figure out a starting simplex. The default method in the template is simple to increase each of the input parameters' values by 1%, one at a time. For instance, if the starting point were (1,1,1 ), then the simplex would be (1,1,1 ), (1.01,1,1), (1,1.01,1), and (1,1,1.01). (For a zero input parameter, 0.1 is used.) Lots of other schemes could be used. The signs could be changed, or all the values up to the current one could be changed. Different starting simplexes can lead to different rates of convergence. If there are multiple minima, then different starting simplexes can even lead to different final results.

In the interest of simplicity, certain checks have been left out of the template. For instance, there are no checks ensuring that the size of the valarrays is not changed by the function object; in fact, there is no sure way to do this from within the template. Likewise, there are no checks ensuring that the data is not being changed as the simplex moves along. If the copy constructor for the function object uses deep copies, then the data will simply reside in the simplex instantiation, and there will be no problem with changing data. I have chosen not to add simplifying niceties such as a typedef for valarray<double> so that the structure of the data flow remains clear.

References

[1] J.A. Nelder and R. Mead, Computer Journal, 1965, Vol. 7, p. 308.

[2] Qiang Liu, "Implementing Reusable Mathematical Procedures Using C++," C/C++ Users Journal, June 2001.


Larry Andrews has a Ph.D. in chemistry and is a software developer and software QA manager recently unemployed in the current downturn. He can be reached at [email protected].


Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.