| <p>Attached is a plot containing bar graphs with error bars representing the average number of generations it took for the GA to achieve a chi-squared value of 0.25 (roughly equated to a 0.8 out of a max 1.0 fitness score). Unlike the fitness scores used by the GA, these values do not have simulated error attached to them and are therefore a better measure of how well the GA is optimizing. These results were obtained by running 10 tests in the test loop for each design, population, and error combination and solving for the average number of generations to meet our threshold and the standard deviation of those scores. From this, a few things immediately pop out, I will address the more obvious one later. But essential for this test, we can see that increasing our population size seems to have a more direct impact on the number of generations needed to reach our threshold than decreased error does in both designs. My best guess regarding this is that the GA depends on diversity in its population in order to produce efficient growth, and an increase in the number of individuals contributes to this, allowing the GA to explore more options.</p>
<p> </p>
<p>This leads me to the more easily spotted trend; PUEO is much slower than ARA. This presents an anomaly as this is the opposite of what we would expect from this test loop as PUEO has fewer genes (7) to optimize than ARA (8). It is also important to note that no genes are being held constant in this test for either design, so both designs have the full range of designs provided they are within the constraints. With that in mind, my guess as to what causes this phenomenon is that PUEO's constraints are much stronger than ARA's. How this may affect the growth is that it more heavily bounds the possible solutions, which makes it harder for the GA to iterate on designs. It is possible that during a function like a crossover, the only possible combinations for a pair of children are identical to their parents, effectively performing reproduction. This could limit the genetic diversity in a population and therefore cause an increase of generations needed to reach an answer. We could in theory test this by relaxing the constraints done by PUEO and then running the test again to see how it compares.</p>
<p> </p>
<p>Finally, You will notice that no AREA designs are shown. This is because, under current conditions, they never reached the threshold within 50 generations. However, Bryan and I think we know what is happening there. AREA has 28 genes, about four times the amount of genes that our other designs use. Given that our current test loop fitness measure is dependent on a chi-squared value given by: SUM | ((observed(gene) - expected(gene))^2) / expected(gene) |, we can see that given more genes, the harder it gets to approach zero. For example, we can imagine in an ARA design if each index of the sum equals 0.1, you would get a total value of 0.8, while AREA would get a value of 2.8, which seems considerably worse, despite each index being the same. Upon further thinking about it, Bryan and I do not think that a chi-squared is the best measure of fit we could be using in this context. Another thing we thought about is that we have negative expected values in some cases. We have skirted around this by using absolute values, but upon reflection believe this to be an indicator of a poor choice in metric. Chi-Squared calculations seem to be a better fit for positive, independent, and normally distributed values, rather than our discrete values provided by our GA. With this in mind, we propose changing to a Normalized Euclidean Distance metric to calculate our fitness scores moving forward. This is given by the calculation: d = sqrt((1/max_genes) * SUM (observed(gene) - expected(gene))^2). This accomplishes a few things. First, it keeps the same 0 -> infinity bounds that our current measure has, allowing our 1/(1+d) fitness score to be bounded between 0 and 1. Second, it forces all indices to be positive so we don't need to worry about negative values in the calculation. Third, this function is weighted by the number of genes present for any given design, making them easier to compare than our current measure. Finally, as our GA is technically performing a search in an N-dimensional space for a location that provides a maximized fitness score, it makes sense to provide it the distance a solution is from that location as a measure of fit in our test loop. We created a branch on the test loop repository to test this and the results are promising as results from the three designs are much more comparable for the most part (though we still see some slowdown we think is contributed to constraints as mentioned above). Though some further input would be appreciated before we begin doing tests like the one we have done in the plot below.</p> |