Evolutionary Computation is under-appreciated. Objections include “Evolution takes millennia” and “I don’t get the point”.
They are (or should be) very important to people in the AI community because they are a primitive precursor to intelligence. Understanding Evolutionary Computation (EC) will make understanding LLMs easier.
In order to not have to define “AI” again, I will now define a new term SHAG which stands for “SuperHuman Answer Generator”.
A SHAG is a computer based system which can provide answers to questions that the client or programmer that is using the system cannot (or cannot be bothered to) compute an answer to themselves. Or shorter: A SHAG generates answers that humans cannot generate.
Unsurprisingly, all LLMs are SHAGs. They can generate answers that their programmers could not. Such as understand and reply in Finnish.
To many, it is more surprising that all SHAGs (and hence all LLMs) are Holistic. Since SHAGs are Holistic we can identify several problems shared by all Holistic systems (discussed at length in my Red Pill):
– The answers provided may not be correct
– The answers provided are not known to be optimal, complete, repeatable, parsimonious, explainable, or transparent.
As evidence, we recognize these problems as endemic to current LLMs.
But are there SHAGs that are not LLMs? Besides current LLMs (including my own Deep-Discrete-Neuron-Network LLMs), Genetic Algorithms (GA), Genetic Programming (GP), and simulated annealing (SA) are SHAGs. I will mainly discuss GAs in what follows.
My favorite way of using GAs:
– Define an individual that defines a solution but has to compete for survival, initialized randomly for diversity.
– Create a population of those and store all of them in an Array. Say, 1000 individuals in the population.
– Define a goal function that returns a number indicating how good (fit) the individual is.
– Define an crossover function that breeds two successful individuals together, hoping to create an even better offspring
– Define a mutation function that reintroduces more diversity into some individuals.
Loop until system stops improving, which is detectable by noticing that the elite is not changing. This might take as little as 10 cycles for well behaved problems:
– Compute fit for each individual using the goal function
– Sort the array by fitness of the individuals
– Start with the worst individual and move towards better ones:
– Replace the individual with crossover of two superior individuals
– Stop replacements a bit below the top (to preserve the “elite”)
GAs all use individuals containing a genome of some sort. Part of the design challenge for the crossover and the goal function is that the practitioner needs to understand the problem well enough to determine not only which individual (viewed as a solution) is better, but also to be able to determine the parameters that comprise all solutions.
There really is not a Phenotype in simple GAs. We evaluate the genotype directly using the goal function. This is a pretty radical shortcut but one that is actually starting to get used in wetlab genomics: Labs make grains with better yields without the bother to grow the seeds for a year, because they know what a higher yield DNA genome looks like.
Suppose you wanted to use GA to optimize shipping cost to design a square cornered box big enough for 200lbs of grain (with a known average density) as cheaply as possible. The genome would contain the X, Y, and Z sizes of the box and those would initially be totally randomized in each individual. The goal function returns zero fitness for every box with insufficient volume for the required amount of grain and otherwise returns the length and circumference so that we can compute the shipping cost the traditional way.
The crossover function might take two parents and use X from one and Y and Z from the other, and sometimes X and Y from one and Z from the other. All parents have better fit than the replaced individual had; we are hoping recombination produces an even better offspring by b enefiting from partial solutions in the parents.
Very rapidly we will note that the best boxes in each generation become smaller and smaller and cheaper to ship. When the Elite is stable, they all have the same X, Y, and Z which is the optimal solution.
Now consider a larger problem with 500 numerical parameters and a goal function that uses every single one of them. This may be expensive to evolve, but if it is the only way forward, we’ll take it. Well-behaved problems will converge rapidly.
A typical individual would keep those 500 values in an array (much like genes in a chromosome), and crossover would brutally create the new individual using “DNA” segments from one of the parent alternating with segments from the other parent at one or more randomly chosen cut points.
-
Mutation is MUCH LESS important than crossover to the point of being optional. Beginners get this backwards and even textbooks fail to emphasize this enough. If you are not using crossover, then you are just using random search and are discarding the entire point and power of GA.
In a population of 1000 I might use an elite of 10 and will therefore replace 990 worst individuals with potentially superior offspring each turn through loop. I generally apply mutation to at most a couple percent of all individuals. -
There has to be some chance for the offspring to inherit some feature(s) of what made the parent(s) successful. I’ve seen beginners make crossover functions that mistakenly discard all history from the parents and hence the system degrades to random search. This matters for things like deciding on cut points (when using array representations), where some properties will tend to travel together for synergy reasons.
-
We can turn this mistake into a measurement. In order to test that your GA works, compare convergence of your full GA with convergence of a version where you replace the crossover function with just creation of a new random (starting point) individual without history from the parents. Now you have a random search system. If it isn’t significantly slower than your fully functioning GA, then you need to go back to the drawing board. It will also show you how much an EV can speed things up over linear or random search (those two are equivalent).
We can now deal with the objection that “Evolution takes millennia”. It does, in nature, especially if you only get one chance to create offspring per year. Computers do it faster.
A modern CPU runs at 3GHz — 3E9 cps, which is 3,000,000,000 clock cycles per second.
Suppose we have a problem where computing the goal function value takes 1000 clock cycles per individual. This is often generous. A GA with a population of 1000 can therefore run 3,000 generations per second… per thread. If we are in a hurry, we can use multithreading and there are special versions of GA frameworks that can run on a cloud.
So speed is not the issue.
As to the other objection: The point of SHAGs is that they can provide answers problems humans can’t solve, including problems without reliable and complete input data, and NP-Hard and NP-Complete problems such as knapsack problems. These are discussed in The Red Pill.
GAs shine in situations where many parameters influence the outcome in complicated (even complex) ways, and where nobody knows how to find an optimal answer, but where we can rather cheaply determine how good the answer represented by any individual’s genome is.
If you are in this situation, you might do well to explore Holistic Methods, and you need to know that a GA may sometimes be an option. A GA may be a million times cheaper than an LLM for many matched tasks, which becomes economically important for frequently occurring problems.
So SHAGs in computers are LLMs and other (future?) AIs, GA, GP, and SA. But the biggest SHAG of all is Darwinian Evolution of Species in Nature. Humans certainly could not make a platypus from scratch, but Evolution did. More about this in the next article.
In order to understand how LLMs are solving problems we humans cannot solve ourselves (e.g. Protein Folding) we should first study the simpler case of Genetic Algorithms.