Author Topic: [C++] Recursive auto differentiation library  (Read 2763 times)

0 Members and 1 Guest are viewing this topic.

Offline nslay

  • Hero Member
  • *****
  • Posts: 786
  • Giraffe meat, mmm
    • View Profile
[C++] Recursive auto differentiation library
« on: March 21, 2007, 02:04:50 am »
I wrote a library to perform recursive auto differentiation - In other words, it computes arbitrary order derivatives and partial derivatives exactly.

How it works:
For simple auto differentiation (suppose first order), you define a set on RxR with elements <u,du>
The operators +,-,/,*,sin,cos, etc... act on the pair in accordance to differentiation rules
e.g. <u,du>*<v,dv> = <u*v,u*dv+v*du>
For second order, you define a threee tuple vector and the associated operators, for third, etc...

Some examples:
Two dependent variables
<5,0>*<4,0> = <20,0>
One independent variable
<5,1>*<4,0> = <20,4>
This example can be thought of as x*4...x's realization happens to be 5.
Two independent variables - partial derivatives
<5,<1,0>>*<4,<0,1>> = <20,<4,5>>
This example can be thought of as x*y...x's realization is 5 while y's is 4.  x and y are independent.
d/dx[x*y] = y=4
d/dy[x*y] = x=5
in the derivative vector, x is index 1 while y is index 2
Statically defining derivative rules can be tedious and most libraries I've seen only do up to the hessian (2nd order).

Two extend this idea, I define a set U on RxU ... the set is recursively defined.
u = <x,du> in U, with x in R, and du in U

Now operations are recursive and perform all derivatives.
e.g.
Let u = <x,du>, v = <y,dv>
Define <x,du>*<y,dv> = <x*y,u*dv+v*du>

Notice u*dv and v*du additionally use our multiplication operator...This process repeats forever.

The Implimentation
This is essentially a tree data structure.  Each node holds a double and a pointer of a vector of leaves.  The order of the auto derivative is defined by the depth of the tree ... the top of the tree is reached when our pointer is NULL.  All the operators and most of the math.h functions are overloaded very carefully to ensure no deadlocks or infinite recursions occurr.

Here's the header:
Code: [Select]
/*-
 * Copyright (c) 2007
 *      Nathan Lay <nslay@hotmail.com>. All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice unmodified, this list of conditions, and the following
 *    disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 *
 * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 */

#ifndef ADVAR_H
#define ADVAR_H

class ADvar {
// Friends
// NOTE: the optional parameters are for indicating exit conditions in recursion and should be ignored.  Specifying values for the optional parameters can cause undefined behavior!
// NOTE: Some of these functions can exhibit oddities at critical numbers.  For example erf(x)'s derivative is 2/sqrt(pi)*exp(-u**2)*du/dx - If the auto derivative's order is 4, pow(u,2) runs 4 times computing u**-1 at some point.  If u had the value 0, then your 4th derivative wouldn't exist, even though in practice it should be 0
friend ADvar operator+( double nx, const ADvar &ad );
friend ADvar operator-( double nx, const ADvar &ad );
friend ADvar operator*( double nx, const ADvar &ad );
friend ADvar operator/( double nx, const ADvar &ad );
friend ADvar pow( const ADvar &ad, double nx, const ADvar *dx = NULL );
friend ADvar pow( double nx, const ADvar &ad, const ADvar *dx = NULL );
friend ADvar pow( const ADvar &ad1, const ADvar &ad2, const ADvar *dx = NULL );
friend ADvar exp( const ADvar &ad, const ADvar *dx = NULL );
friend ADvar log( const ADvar &ad );
friend ADvar log10( const ADvar &ad );
friend ADvar sqrt( const ADvar &ad, const ADvar *dx = NULL );
friend ADvar cbrt( const ADvar &ad, const ADvar *dx = NULL );
friend ADvar sin( const ADvar &ad, const ADvar *dx = NULL );
friend ADvar cos( const ADvar &ad, const ADvar *dx = NULL );
friend ADvar tan( const ADvar &ad );
friend ADvar sinh( const ADvar &ad, const ADvar *dx = NULL );
friend ADvar cosh( const ADvar &ad, const ADvar *dx = NULL );
friend ADvar tanh( const ADvar &ad );
friend ADvar asin( const ADvar &ad );
friend ADvar acos( const ADvar &ad );
friend ADvar atan( const ADvar &ad );
friend ADvar asinh( const ADvar &ad );
friend ADvar acosh( const ADvar &ad );
friend ADvar atanh( const ADvar &ad );
friend ADvar erf( const ADvar &ad );
friend ADvar fabs( const ADvar &ad, const ADvar *dx = NULL );

private:
unsigned long dim;
double x;
ADvar *dx;

public:
// Constructors
explicit ADvar( double x = 0 );
ADvar( const ADvar &ad );
~ADvar();

// Misc
// This functions sets the order and dimension (i.e. independent variables) of the auto derivative
int SetMode( unsigned long order, unsigned long dim );
// This function sets the value and dependency of the auto derivative
// an index of 0 will set the auto derivative to dependent
// NOTE: index should not be larger than dim! Should index > dim, index will be assumed 0
void SetDep( double x, unsigned long index );

// Accessors
// This function returns the derivative vector
const ADvar *Getdx() const;
// This function returns the dimension of the derivative vector
unsigned long GetDim() const;
// This function returns the order of the auto derivative
unsigned long GetOrder() const;

// Operators
// This operator is a convenience, allowing one to reference derivatives by index, e.g. ad[1][1] would give me the 2nd derivative in respect to index 2 and ad[0][1] would give me the derivative in respect to index 1 and 2.
const ADvar& operator[]( unsigned long n ) const;
ADvar operator-() const;
ADvar operator+( const ADvar &ad ) const;
ADvar operator+( double nx ) const;
ADvar operator-( const ADvar &ad ) const;
ADvar operator-( double nx ) const;
ADvar operator*( const ADvar &ad ) const;
ADvar operator*( double nx ) const;
ADvar operator/( const ADvar &ad ) const;
ADvar operator/( double nx ) const;
ADvar& operator=( const ADvar &ad );
ADvar& operator=( double nx );
ADvar& operator++();
ADvar operator++(int);
ADvar& operator--();
ADvar operator--(int);
ADvar& operator+=( const ADvar &ad );
ADvar& operator+=( double nx );
ADvar& operator-=( const ADvar &ad );
ADvar& operator-=( double nx );
ADvar& operator*=( const ADvar &ad );
ADvar& operator*=( double nx );
ADvar& operator/=( const ADvar &ad );
ADvar& operator/=( double nx );

// Typecasts
operator double() const;
};

#endif

Using the library:
Declare ADvar variables with an optional double
ADvar y; // Creates an order 0 auto derivative with value 0
ADvar x(5.2); // Creates an order 0 auto derivative with value 5.2

Because of limitations in C++ (which are very few!), it is not possible to set the order or dimension of the auto derivatives at constructor...the constructor must be able to recurse itself to initialize an auto derivative of order N...however, since it must new a dx array, it is not possible to call a constructor other than the default when allocating an array (that I know of).
e.g.
ADvar x(5.2,2,3); // Hypothetically create auto derivative of order 2 and dimension 3
The constructor allocates an array of 3 ADvars for dx, however it is not able to explicitly call itself.
e.g.
You cannot do: dx = new ADvar[3](0,1,3); // Create 3 order 1 auto derivatives with dimension 3
or: dx = new ADvar(0,1,3)[3];

We could keep a static counter in ADvar, but then this would not be thread safe.

Therefore, you must manually initialize your auto derivatives using SetMode( unsigned long order, unsigned long dim )

ADvar x, y, z;
x.SetMode( 3, 2 ); // Set x to compute order 3 derivatives with at most 2 independent variables
y.SetMode( 3, 2 );

Now that you have set desired order and dimension, you can now specify dependence.
x.SetDep( 5.2, 1 ); // Set x to be index 1 in derivative vector
y.SetDep( 3.1, 2 ); // Set y to be index 2 in derivative vector
NOTE: A zero index will set the auto derivative to be dependent

z = x*y+pow(cos(x),2.0)+2.0/x;

NOTE: It is important that you always use floats and doubles in your calculations, otherwise the C++ compiler will complain about ambiguity.  It is not practical to define operators for every single type!

You can extract derivatives in two ways ... the easiest is with the [] operator.
cout << "d/dx = " << z[0] << endl;
cout << "d^2/dxdy = " << z[0][1] << endl;
cout << "d^3/dx^3 = " << z[0][0][0] << endl;

Another way is defined since it is not always known in advance how many []s you're allowed to use on an ADvar.
const ADvar *dx = z.Getdx(), *ddx = dx[0].Getdx();
dx[0] describes z[0]
ddx[0] describes z[0][0], while ddx[1] describes z[0][1]

Two accessors are available to get the dimension and order of an ADvar.
unsigned long GetDim() const;
unsigned long GetOrder() const;

Examples
One of the first neat examples that came to mind was to compute arbitrary order taylor series:
Code: [Select]
#include <math.h>
#include <iostream>
#include <time.h>
#include <stdio.h>
#include <string.h>
#include "advar.h"

#define N 10
#define DX 0.1
#define X0 0.0
#define BEGIN -4.0
#define END 4.0

#define PREFIX "plot"

using namespace std;

double taylor( double ax, double x0, unsigned long n );

int main( int argc, char **argv ) {
unsigned long i;
double x;
FILE *file, *pfile;
char fn[256];

pfile = fopen( "graph", "w" );

fprintf( pfile, "set grid\n" );
fprintf( pfile, "set xlabel 'x'\n" );
fprintf( pfile, "set ylabel 'y'\n" );
fprintf( pfile, "plot cos(x), " );

for ( i = 0; i <= N; ++i ) {
snprintf( fn, sizeof(fn), "%s%d", PREFIX, i );
file = fopen( fn, "w" );
for ( x = BEGIN; x <= END; x+=DX ) {
fprintf( file, "%lf %lf\n", x, taylor(x,X0,i) );
}
fprintf( pfile, "'%s' w linespoints title 'Taylor n=%d'", fn, i );
if ( i < N ) fprintf( pfile, ", " );
fclose( file );
}
fprintf( pfile, "\n" );
fclose( pfile );

return 0;
}

double taylor( double nx, double x0, unsigned long n ) {
unsigned long i, prod;
ADvar x, z;
const ADvar *dx;
double sum;
x.SetMode( n, 1 );
x.SetDep( x0, 1 );

z = cos(x);
for ( i = 0, dx = &z, sum = 0, prod = 1; dx != NULL; ++i, dx = dx->Getdx() ) {
sum += ((double)dx[0])*pow(nx-x0,i)/prod;
prod *= (i+1);
}
return sum;
}

This example computes the taylor expansion for orders 0-10 of cos(x) and outputs datafiles and a GNUplot script that can be executed as follows:
'gnuplot -persist < graph'

Here's the plot


Conclusions
Recursive auto differentiation is extremely inefficient and the derivative operations grow exponentially as order and dimension increase.  I think I did a test with u*v, order 7 and 2 dimensions would take 5 seconds to compute.  Statically defining each order is more efficient but becomes increasingly difficult (e.g. imagining defining order 7 product rule).  Additionally, this library computes all derivatives regardless of symmetry...it is not possible to keep track of symmetry in my implementation (e.g. I cannot compute only the lower triangle of the hessian).  So, with recursion, you gain general usage but lose efficiency...you give and take something.

You can get ADvar library here
Enjoy!
An adorable giant isopod!