Genetic programming and the evolution of math functions

#PCI group

Figure cited from here

What is genertic programming ?

As introduced by Wikipedia, genetic programming (GP) is an evolutionary algorithm-based methodology inspired by biological evolution to find computer programs that perform a user-defined task. Essentially GP is a set of instructions and a fitness function to measure how well a computer has performed a task. It is a specialization of genetic algorithms (GA) where each individual is a computer program.

For example, given the dataset

[[27, 23, 861], 
[14, 36, 315],
... 
[2, 38, 91]]

The frist two columns are x and y and the third is z, how can we find a function that predicts z from a new pair of x and y? Regression, Artificial Neural Network, …, these are all options. But the genetic programming provides a new possibility.

Programs as parse trees

A program is an operation that turns input into output. So basically functions defined by codes and math symbols are the same. A function can be understood as a parse tree, which organize a serious of operations together. The following Python code defines a class that can be used to express parse trees.

from random import random,randint,choice
from copy import deepcopy
from math import log

#classes for parse tree
class fwrapper:
    def __init__(self,function,childcount,name):
        self.function=function
        self.childcount=childcount
        self.name=name
class node:
    def __init__(self,fw,children):
        self.function=fw.function
        self.name=fw.name
        self.children=children
    def evaluate(self,inp):
        results=[n.evaluate(inp) for n in self.children]
        return self.function(results)
    def display(self,indent=0):
        print (' '*indent)+self.name
        for c in self.children:
            c.display(indent+1)
class paramnode:
    def __init__(self,idx):
        self.idx=idx
    def evaluate(self,inp):
        return inp[self.idx]
    def display(self,indent=0):
        print '%sp%d' % (' '*indent,self.idx)
class constnode:
    def __init__(self,v):
        self.v=v
    def evaluate(self,inp):
        return self.v
    def display(self,indent=0):
        print '%s%d' % (' '*indent,self.v)

# construct operations by using fwrapper on functions
addw=fwrapper(lambda l:l[0]+l[1],2,'add')
subw=fwrapper(lambda l:l[0]-l[1],2,'subtract')
mulw=fwrapper(lambda l:l[0]*l[1],2,'multiply')
def iffunc(l):
    if l[0]>0: return l[1]
    else: return l[2]
ifw=fwrapper(iffunc,3,'if')
def isgreater(l):
    if l[0]>l[1]: return 1
    else: return 0
gtw=fwrapper(isgreater,2,'isgreater')
flist=[addw,mulw,ifw,gtw,subw]

Using the above code we can define the tree shown in the figue by

# an exmaple tree
def exampletree(): 
    return node(ifw,[
                     node(gtw,[paramnode(0),constnode(3)]),
                     node(addw,[paramnode(1),constnode(5)]),
                     node(subw,[paramnode(1),constnode(2)]),
                     ]
                )

and evaluate the tree by

exampletree().evaluate([5,3])

or print the tree by

def display(self,indent=0):
    print (' '*indent)+self.name
    for c in self.children:
        c.display(indent+1)
exampletree().display()

Setting up an example datset

We set up an exmple data set using the following function:

def hiddenfunction(x,y):
    return x**2+2*y+3*x+5
def buildhiddenset(): 
    rows=[]
    for i in range(100):
        x=randint(0,40)
        y=randint(0,40)
        rows.append([x,y,hiddenfunction(x,y)])
    return rows
hiddenset=buildhiddenset()

Genertic evolutions

We now define the following functions to simulate the mutation and crossover behavior in nature.

def scorefunction(tree,s):
    dif=0
    for data in s:
        v=tree.evaluate([data[0],data[1]])
        dif+=abs(v-data[2])
    return dif
#scorefunction(makerandomtree(2),hiddenset)

def mutate(t,pc,probchange=0.1): 
    if random()<probchange:
        return makerandomtree(pc)
    else:
        result=deepcopy(t)
        if isinstance(t,node):
            result.children=[mutate(c,pc,probchange) for c in t.children]
    return result
#mutate(makerandomtree(2),2).display()

def crossover(t1,t2,probswap=0.7,top=1): 
    if random()<probswap and not top:
        return deepcopy(t2)
    else:
        result=deepcopy(t1)
        if isinstance(t1,node) and isinstance(t2,node):
            result.children=[crossover(c,choice(t2.children),probswap,0) for c in t1.children]
        return result
#crossover(makerandomtree(2),makerandomtree(2)).display()

Setting up the evolution environment

After we preparing the data set and setting up the mutation and crossover functions we are ready to move on to set up the evironment for evolution, i.e., fitting the data.

def evolve(pc,popsize,rankfunction,maxgen=500,mutationrate=0.1,breedingrate=0.4,pexp=0.7,pnew=0.05):
    # Returns a random number, tending towards lower numbers. The lower pexp # is, more lower numbers you will get
    def selectindex():
        return int(log(random())/log(pexp))
    # Create a random initial population
    population=[makerandomtree(pc) for i in range(popsize)]
    for i in range(maxgen):
        scores=rankfunction(population)
        print scores[0][0]
        if scores[0][0]==0: break
        # The two best always make it
        newpop=[scores[0][1],scores[1][1]]
        # Build the next generation
        while len(newpop)<popsize:
            if random()>pnew: 
                newpop.append(mutate(crossover(scores[selectindex( )][1], \
                                               scores[selectindex( )][1], 
                                               probswap=breedingrate), 
                                     pc,probchange=mutationrate))
            else:
            # Add a random node to mix things up
                newpop.append(makerandomtree(pc))
        population=newpop 
    scores[0][1].display( ) 
    return scores[0][1]

def getrankfunction(dataset):
    def rankfunction(population):
        scores=[(scorefunction(t,dataset),t) for t in population] 
        scores.sort( )
        return scores
    return rankfunction

The reuslt can be obtain by

rf=getrankfunction(buildhiddenset())
evolve(2,500,rf,mutationrate=0.2,breedingrate=0.1,pexp=0.7,pnew=0.1)

The function survived in the data selection

After an evolution that lasted for more than ten minutes (a disadvantage of GP is that it is very time conusming in some cases), we got our result as shown in the following figure. If we do some simple math we will find that it equals to the function we defined above.