Files/Limited Evolutionary …  · Web viewWith a generation time of 20 minutes, that is one...

33
Limited Evolutionary Potential Sean Pitman, M.D. February 2010 Abstract: Perhaps the greatest contribution of Charles Darwin to the thinking of mainstream science was his proposal of a reasonable mechanism to explain the diversity and complexity of living things. His proposed mechanism was, of course, random mutations combined with the non-random bias of natural selection (RM/NS). This paper takes a detailed look at the creative potential and limitations of Darwin's mechanism and proposes a dramatic exponential decline of evolutionary potential over a given span of time as the minimum structural threshold requirements for systems under consideration increase in a linear manner. The final outcome of this analysis is that the unique qualitative functionality of beneficial biosystems that require a minimum of more than 1000 amino acid residues, with an average degree of specificity of arrangement, are well beyond the powers of RM/NS to discover in the vastness of sequence space even given trillions of years of time. Acronyms used: FSAARS: Fairly Specified Amino Acid Residues FC: Functional Complexity FSC: Functional Sequence Complexity RM/NS: Random Mutations/Natural Selection SC: Specified Complexity IC: Irreducible Complexity Introduction: Charles Darwin was not the first to consider the idea that perhaps all living things evolved over long periods of time from a common ancestor. Many, before Darwin, had evolutionary ideas. Darwin's own Grandfather, Erasmus Darwin (1731-1802), and Jean-Baptiste Lamarck (1744-1829) had considered and published evolutionary views. (Southgate, 1999) Even very early philosophers, as early as 2000 years before Darwin, were

Transcript of Files/Limited Evolutionary …  · Web viewWith a generation time of 20 minutes, that is one...

Page 1: Files/Limited Evolutionary …  · Web viewWith a generation time of 20 minutes, that is one mutational step every 2,000,000 minutes; which equals ~ 4 years. So, with one random

Limited Evolutionary Potential

Sean Pitman, M.D.

February 2010

Abstract:

Perhaps the greatest contribution of Charles Darwin to the thinking of mainstream science was his proposal of a reasonable mechanism to explain the diversity and complexity of living things. His proposed mechanism was, of course, random mutations combined with the non-random bias of natural selection (RM/NS). This paper takes a detailed look at the creative potential and limitations of Darwin's mechanism and proposes a dramatic exponential decline of evolutionary potential over a given span of time as the minimum structural threshold requirements for systems under consideration increase in a linear manner. The final outcome of this analysis is that the unique qualitative functionality of beneficial biosystems that require a minimum of more than 1000 amino acid residues, with an average degree of specificity of arrangement, are well beyond the powers of RM/NS to discover in the vastness of sequence space even given trillions of years of time.

Acronyms used:FSAARS: Fairly Specified Amino Acid ResiduesFC: Functional ComplexityFSC: Functional Sequence ComplexityRM/NS: Random Mutations/Natural SelectionSC: Specified ComplexityIC: Irreducible Complexity

Introduction:

Charles Darwin was not the first to consider the idea that perhaps all living things evolved over long periods of time from a common ancestor. Many, before Darwin, had evolutionary ideas. Darwin's own Grandfather, Erasmus Darwin (1731-1802), and Jean-Baptiste Lamarck (1744-1829) had considered and published evolutionary views. (Southgate, 1999) Even very early philosophers, as early as 2000 years before Darwin, were think and writing about evolutionary ideas to include the writings of Anaximander, Epicurus, Lucretius, and the Atomists. Evolutionary ideas were therefore not at all new to Darwin or the scientific minds of his day. (Ashcraft, 2009) Yet, none of Darwin's predecessors had presented a convincing mechanism to explain how such an evolutionary process might be tenable. This is perhaps why evolutionary ideas were more popular with the lay public than with the scientific establishment. "Those who were best informed about biology, and especially about classification and morphology, upheld most strongly...the constancy of species." (Mayr, 1964)

Therefore, Darwin's unique contribution, and what propelled the theory of evolution into mainstream acceptance within the scientific community, was his proposal of the underlying mechanism of natural selection (Darwin, 1859) or, as Herbert Spencer famously described it, "survival of the fittest". (Spencer, 1864)

Page 2: Files/Limited Evolutionary …  · Web viewWith a generation time of 20 minutes, that is one mutational step every 2,000,000 minutes; which equals ~ 4 years. So, with one random

What made the concept of natural selection so popular is that it clearly worked. Living things were not static or rigidly fixed as had been believed by many, but were in fact changeable over time. Beyond this, certain desired traits were actually selectable - as had been well known and demonstrated by breeders of domesticated animals over centuries. Darwin simply noted that nature herself could play the role of breeder and preferentially select certain traits among the many randomly generated small variations in each generation. It is just that those creatures with traits that happened to be preferentially selected by nature had improved survivability and reproductive potential given to them by nature. In this way, nature selected the best features for survival and reproductive success. So, over a course of many generations, each having slight beneficial variations preferentially selected by nature for various environments, Darwin concluded that very dramatic changes could be realized over long periods of time - to include the production of all the diversity of life starting with a relatively simplistic common ancestor life form. (Darwin, 1859)

Beyond this, Darwin's theory was testable and potentially falsifiable. In other words, it had all of the elements of a very good scientific theory. Darwin himself noted in Origins that,

If it could be demonstrated that any complex organ existed, which could not possibly have been formed by numerous, successive, slight modifications, my theory would absolutely break down. But I can find out no such case. (Darwin, 1859)

From Darwin's perspective, it all fit. All of life seemed to be related within a nested hierarchical pattern. The ability of natural selection to modify a species through a sequence of "insensible gradations" seemed to easily explain the observed pattern of differences as well as the real time changes that Darwin saw taking place with his own eyes.

Sometimes though, what seems obvious at first approximation is not so clear after closer examination. The purpose of this paper is to do just that; to consider Darwin's proposed mechanism in closer detail in order to better appreciate its creative potential as well as any limitations that might become apparent. The rest of this paper will investigate what is known about evolutionary potential and limitations from experimental observations. This empirical data will be compared to concepts of sequence space and the density and distribution of potentially beneficial sequences within that space at various levels of functional complexity. The correlation of changes in the density of sequence space at various levels of functional complexity with observed evolutionary potential can then be extrapolated to predict evolutionary potential at various levels of functional complexity.

Relevant Conceptual Definitions and Formulas:

Hamming Distance and Sequence Space:In information theory, the Hamming distance between two character sequences of equal length is the number of positions that differ (usually assuming a binary alphabet). Put another way, it measures the minimum number of substitutions required to change one string of characters into the other.

For example, the Hamming distance between:

Page 3: Files/Limited Evolutionary …  · Web viewWith a generation time of 20 minutes, that is one mutational step every 2,000,000 minutes; which equals ~ 4 years. So, with one random

"toned" and "roses" is 3. 1011101 and 1001001 is 2. 2173896 and 2233797 is 4.

(National Communications System Technology & Standards Division, 1996)

The Hamming distance can therefore be used to measure the "distance" of different sequences of characters of the same length in a given "sequence space" which contains all characters of that particular size (Levenshtein distance is used to measure the distance between strings of different length). (Black, 2008)

It is difficult to graphically demonstrate Hamming distances in sequence space since sequence space is "hyperdimensional" - with two times as many dimensions as there are characters in the sequence given a binary alphabet (which makes things rather hard to visualize since most people can only visualize three or maybe four dimensions). However, as illustrated above, the Hamming distance can be measured by direct sequence comparisons.

A distance measure is a function that associates a numeric value with a pair of sequences, but the sequences are not numeric values themselves. Hamming (1950) introduces the concept of sequence space as a geometric model for detection of errors in transmitting information and also introduces a metric, D(x,y), into this space of 2n dimensions. Some time later Maynard Smith (1970) introduces the concept of "protein sequence" as a space that all possible amino acid sequences can be arranged in. In the eighties Eigen (1986) and Eigen et al. (1988) introduced a new concept of sequence space and the concept of statistical geometry. This space is a high-dimensional space. In such a space, a sequence of length v (where v is the number of nucleotides) has 4v = 22v degrees of freedom and it requires a space of dimension 2v. (Lareo, 1999)

There are ways to "project" hyperdimensional sequence space onto two or three dimensions, like projecting the shadows of three-dimensional objects onto two dimensions. Of course, the apparent distances between objects projected onto lower dimensions will appear to be smaller than they really are. For example, imagine a room that has several floating helium-filled balloons in a scattered distribution throughout the room. A light shining from one side of the room will cast a two-dimensional shadow of the balloons on the opposing wall. The shadows cast will make the shadow balloons on the wall appear closer together than the actual distance that exists between the real balloons in three-dimensional space. The same thing is true of projections of real sequences that exist within much greater dimensions. A projection from a sequence space of 2000 dimensions to two or three dimensions will make the shadows of the objects that within 2000 dimensions appear to be much closer together than the real objects really are.

Regardless of difficulties with visualization of hyperdimensions, the concept of "distances" between different sequences in sequence space can be used to evaluate the question as to the odds or likely success of various methods when it comes to searching out sequence space to find

Page 4: Files/Limited Evolutionary …  · Web viewWith a generation time of 20 minutes, that is one mutational step every 2,000,000 minutes; which equals ~ 4 years. So, with one random

novel functionally beneficial sequences within that space. As it turns out, the likelihood of success of a search algorithm, regardless of the algorithm chosen, is related to the relative density and distribution of potentially beneficial sequences in sequence space at various levels of functional complexity. If the density or distribution of potentially beneficial sequences changes at different levels of functional complexity, the odds of finding them will also change.

Levels of functional complexity:How is a level of functional complexity defined? There are many definitions of "complexity" in literature related to "information" or "information theory". Often these definitions invoke the ideas of Kolmogorov/Chaitin complexity, or Shannon information. It is important to note, in the context of this paper, that neither Shannon information nor Kolmogorov/Chaitin complexity are concerned with the actual "meaning" of the "information" being evaluated. Rather, their use of the term "complexity" has more to do with a description of chaos or randomness than with the functional integration of many parts in a "system" of function. (Pitman, 2007)

Another relevant and well-known effort to define informational complexity is that of William Dembski and his description of "specified complexity" (SC). (Dembski, 2002) Dembski's defines SC as anything that requires a long description that is also a specific description. Dembski explains:

A single letter of the alphabet is specified without being complex (i.e., it conforms to an independently given pattern but is simple). A long sequence of random letters is complex without being specified (i.e., it requires a complicated instruction-set to characterize but conforms to no independently given pattern). A Shakespearean sonnet is both complex and specified. (Dembski, 2002)

Note Dembski's requirement for conformity with an "independently given pattern" as part of his definition of SC. In other words, the pattern in question has to be previously known before it will qualify as having SC. Dembski goes on to argue that SC cannot be generated without the backing of intelligent design. In other words, no mindless mechanism can reproduce a previously known complex pattern without the backing of intelligent design. "The Darwinian mechanism does not generate actual specified complexity..." (Dembski, 2002)

The problem here is that there are numerous undeniable examples of evolution in action producing novel useful systems of function, requiring both complexity and specificity to match minimum structural requirements, for system function in populations of living things that were not there before (see further discussion of Barry Hall's demonstration of lactase evolution below).

This highlights a basic problem with Dembski's argument in that he is too absolute. He does not seem to allow for the production of SC at any level via the Darwinian mechanism of RM/NS. Or, at least he is not clear one what levels SC can and cannot be produced without the backing of intelligent design.

So, to remedy this problem, one of the goals of this paper is to define functional complexity (FC) as the minimum size and structural threshold requirement needed to achieve a beneficial level of a particular qualitative type of function. In other words, all systems of function have a minimum size and specificity of arrangement of subparts necessary before the particular type of function in

Page 5: Files/Limited Evolutionary …  · Web viewWith a generation time of 20 minutes, that is one mutational step every 2,000,000 minutes; which equals ~ 4 years. So, with one random

question can be realized to a useful degree. Until this minimum is realized, the particular function in question will not be realized at all - not to any selectable level of usefulness.

This is similar to Behe's definition of "irreducible complexity" (IC), (Behe, 1996) but somewhat unique in that it emphasizes the idea that there are different levels of irreducibility and the importance of these levels when estimating the ease or difficulty of evolvability. Behe does not seem to be as clear on the idea that all systems of function do in fact have minimum structural threshold requirements, even when it comes to very simple systems that use only a handful of amino acids. It is just that these requirements are much lower for such simple systems compared to systems that may require, at minimum, multiple proteins in specific arrangement totaling several thousand specifically arranged amino acid positions interacting at the same time to achieve a particular type of function. It is like comparing a three-letter word to a Shakespearean sonnet. Just because a three-letter word only requires three letters does not mean it is not therefore IC. A three-letter is IC since it requires all three letters to be in their proper place at the same time.

In this sense, all qualitatively unique functional systems have a degree of FC, but not all are on the same level in that not all have the same minimum structural threshold requirements before they can be achieved to a useful level of activity. Some systems require more parts, or a greater specificity of part arrangement, at minimum. Such systems would therefore be at a higher level of "functional complexity".

This concept is similar to the ideas of Hazen et al. published in their 2007 paper entitled, "Functional Information and the Emergence of Biocomplexity". In this paper they directly associate system complexity with "functional information".

Complex emergent systems of many interacting components, including complex biological systems, have the potential to perform quantifiable functions. Accordingly, we define 'functional information,' I(Ex), as a measure of system complexity. For a given system and function, x (e.g., a folded RNA sequence that binds to GTP), and degree of function, Ex (e.g., the RNA-GTP binding energy), I(Ex)= -log2 [F(Ex)], where F(Ex) is the fraction of all possible configurations of the system that possess a degree of function > Ex. Functional information, which we illustrate with letter sequences, artificial life, and biopolymers, thus represents the probability that an arbitrary configuration of a system will achieve a specific function to a specified degree. In each case we observe evidence for several distinct solutions with different maximum degrees of function, features that lead to steps in plots of information versus degree of functions. (Hazen, et al., 2007)  

Specificity is defined as the degree of tolerance for substitution changes that can be realized before all usefulness of a particular type of system is completely lost - even if the minimum absolute number of parts is present. For example, many if not all of the letters in this paragraph could be changed, individually, without a complete loss of its original intended meaning or function. However, there is a certain limitation to the number of changes that can be realized at the same time before all of the original meaning/functionality would be lost. The very same thing is true of protein-based systems.

 Experiments have demonstrated that proteins can be extremely tolerant to single substitutions; for example, 84% of single-residue mutants of T4 lysozyme and 65% of

Page 6: Files/Limited Evolutionary …  · Web viewWith a generation time of 20 minutes, that is one mutational step every 2,000,000 minutes; which equals ~ 4 years. So, with one random

single-residue mutants of lac repressor were scored as functional. For multiple substitutions, the fraction of functional proteins decreases roughly exponentially with the number of substitutions, although the severity of this decline varies among proteins. (Bloom, et al., 2005)

Of course, it is also true that multiple functionally neutral substitutions can be achieved at the same time in a given system without any substantive loss of function - to the point of complete or near complete non-identity with the original system. It is because of this feature that functional islands in sequence space have the appearance of branching trees with numerous dendritic thin-armed branches that can spread over long distances within sequence space. However, the distances between these very thin branching arms and the branches that belong to other dendritic islands are still enormous at higher levels of functional complexity. So, despite the rather extensive branching and wide distribution of functional islands within higher levels of sequence space, their remote isolation within that space is none-the-less real and significant. This isolation, from any other potentially beneficial island on all sides, becomes exponentially more and more remote (due to a linear increase in the minimum Hamming distance) with each step up the ladder of functional complexity.

Experimental Example of a Successful Sequence Space Search:Kenneth Miller, a well-known biologist from Brown University, described the very interesting evolution experiments of Barry Hall in his popular book, "Finding Darwin's God". (Miller, 1999) What Hall did was to delete genes in a clonal bacterial colony (E. coli) which produced a lactase enzyme (lacZ genes). This lactase enzyme enabled the bacteria to utilized lactose sugar for energy. Of course, deleting this enzyme prevented the mutant bacteria from utilizing lactose. Hall wondered if these mutant bacteria might not be able to evolve the lactase enzyme back again if they were put into a lactose-rich environment. And, quite surprisingly (even to Hall), within one or two generations these mutant bacteria were indeed able to evolve a novel lactase enzyme from a pre-existing genetic sequence which produced a protein that previously had no detectable lactase activity - and with just a single point mutation. (Hall, 1982) Even Hall was surprised to learn that only a single point mutation was needed to turn a pre-existing sequence into an effective lactase-producing gene.

“The realization that a single mutation in ebgA [ebg: evolved b-galactosidase gene] was sufficient to convert ebg0 enzyme into an efficient lactase was therefore disappointing.” (Hall, 1982)

Even so, this experiment did demonstrate that random mutations searching the surrounding sequence space can successfully discover new island clusters of sequences that produce qualitatively novel beneficial functions that were not present in the parent population. Such a demonstration is a clear example of evolution in action taking place within a very short period of time. Also, this experiment was unique from the usual examples of antibiotic resistance evolution (Pitman, 2004) in that the novel function demonstrated by Hall was not based on a loss of a pre-existing system or interaction, but was an entirely new independently functional system that was not present before in the population.

In Kenneth Miller's book, this is where the description of Hall's experiment ended. However, Hall's experiment did not actually end here. Pleased with his initial success, Hall wondered what would

Page 7: Files/Limited Evolutionary …  · Web viewWith a generation time of 20 minutes, that is one mutational step every 2,000,000 minutes; which equals ~ 4 years. So, with one random

happen if he deleted the newly evolved lactase gene (ebg) along with the original one? Would the double mutant bacterial colony evolve yet another lactase enzyme from some other genetic sequence?

Despite observations over the course of several years and over 40,000 generations (as many generations as it is thought to have taken to turn apes into men), the double mutant bacteria never did evolve the lactase enzyme back again. Frustrated, Hall described these double mutant bacteria as having, "Limited evolutionary potential." (Hall, 1982)

It is practically unique in mainstream science literature for anyone to describe a limit to evolutionary potential of any kind. So, what is it that limited the success of Hall's second experiment? Can the nature of this limitation be determined? And, if Hall's experiment was so successful the first time, why not the second time? What are the odds of something being close enough to any usable lactase sequence in sequence space for it to be just one character change or "point mutation" away (a Hamming distance of one)?

Formulas:

The Likely Minimum Gap Distance (Experimental Basis and Calculation):What is the likely gap distance from a given starting sequence to the closest target sequence that can produce a qualitatively unique beneficial function at various levels of functional complexity? In order to address this question, one has to have at least a rough idea as to the number of likely targets (i.e., potentially beneficial sequences) in sequence space at different levels of functional complexity.

In a paper published in 2000, Thirumalai and Klimov argue that:

Even though the sequence space grows exponentially with N (the number of amino acid residues [by 20N]) we conjecture that the number of low energy compact structures only scales as lnN [The natural logarithm or the power to which e (2.718 . . . ) would have to be raised to reach N] . . . The number of sequences for which a given fold emerges as a native structure is further reduced by the dual requirements of stability and kinetic accessibility. . .  We also suggest that the functional requirement may further reduce the number of sequences that are biologically competent. (Thirumalai, et al., 2000)

If, as sequence space size grows by 20N the number of even theoretically useful protein structures or systems scales by no more than the natural log of N, then this differential would rapidly produce an unimaginably huge discrepancy between potential target and non-target systems.  For example, the sequence space size necessary to contain all sequences of 1000aa is 201000 = ~101301 sequences.  According to these authors, what is the number of potentially stable protein structures contained within this space of ~101301 sequences?  It is 20ln1000 = ~109.   

This calculated number is backed up by numerous other published estimates of the extreme rarity of viable (or stable) amino acid residue sequences in protein sequence space.  According to literature, there seems to be general agreement that the likely number of stable protein "folds" is less than 10,000.  

 

Page 8: Files/Limited Evolutionary …  · Web viewWith a generation time of 20 minutes, that is one mutational step every 2,000,000 minutes; which equals ~ 4 years. So, with one random

"The estimated number of folds ranges widely from 700 to ~8000 based on differences in assumptions and data sets (Orengo et al. 1994; Zhang and DeLisi 1998; Govindarajan et al. 1999; Wolf et al. 2000; Coulson and Moult 2002; X. Liu et al. 2004). (Oberai, et al., 2006)  

 And, according to Govindarajan S. et al.:

"Our results suggest that there are approximately 4,000 possible folds, some so unlikely that only approximately 2,000 folds exist among naturally-occurring proteins." (Govindarajan, et al., 1999)

So, how large are viable protein folds?  The vast majority of them are less than 300aa in size.   

"Folding domains are typically 50-200 residues in length and utilize a specific sequence of side chains to encode tertiary structures enriched in secondary structure with hydrophobic cores that are shielded from solvent by a predominately hydrophilic surface." (Richardson, 1981) 

 

With less than 10,000 viable protein folds 300aa in size or less, how many viable sequences are there per fold?  Given a sequence specificity (Durston, et al., 2007) of a viable fold of just 2.0, the estimate goes as follows: 

2.0 x 300aa = fit value of 600.  1 in 2600 , or 10-180 x 20300 = ~10210 sequences per stable/viable 300aa fold.  Multiplying 10210 by 10,000 folds = 10214 total stable/viable sequences in all of 300aa sequence space. The ratio of viable vs. non-viable is 10214 / 20300 = ~10-86. 

 So, it seems reasonable that a useful rough estimate for the maximum likely number of beneficial sequences (B) at the 1000 fsaar level of functional complexity would be given by:

2.0 x 1000aa = fit value of 2000. 1 in 22000 = 10-602 x 201000 = ~10699 sequences per stable/viable 1000aa fold. The total estimated number of stable unique structures in 1000aa sequence space is 20ln1000 = ~109. (Thirumalai, et al., 2000) Multiplying 10699 by 109 = 10707 total potentially beneficial sequences at the 1000 fsaar level of functional complexity.

So, the formula for the total number of beneficial sequences (B) in sequence space is:

B = Formula 1

Page 9: Files/Limited Evolutionary …  · Web viewWith a generation time of 20 minutes, that is one mutational step every 2,000,000 minutes; which equals ~ 4 years. So, with one random

The ratio of beneficial vs. non-beneficial (BR) at a given level of functional complexity is:

BR = Formula 2BR = ratio of beneficial vs. non-beneficial sequences in sequence spaceB = beneficial sequences in sequence space (Formula 1)AC = number of character options in the alphabet for the systemN = minimum required size of the systemlnN = natural log of N (Thirumalai, et al., 2000)Sa = minimum required average specificity for the system (Durston, et al., 2007)

Notice that this formula results in an exponentially decreasing ratio of beneficial vs. non-beneficial sequences in sequence space for a given level of functional complexity (i.e., N and SA values).

The mean average Hamming gap distance (HAX) between beneficial sequences, given an essentially uniform distribution of sequences, is:

HAX =

Given: f = HAX = Formula 3

Or more loosely approximated by:

HAX = Formula 3'

Page 10: Files/Limited Evolutionary …  · Web viewWith a generation time of 20 minutes, that is one mutational step every 2,000,000 minutes; which equals ~ 4 years. So, with one random

HAX = average Hamming gap distanceAC = number of character options in the alphabet for the systemN = minimum required size of the systemB = total number of beneficial sequences in sequence space (Formula 1)

Notice that this formula results in a linearly expanding average Hamming gap distance with linearly increasing levels of functional complexity. This indicates a corresponding linear increase in the likely minimum Hamming distance between a given sequence and the next closest beneficial sequence in sequence space. As the minimum likely distance increases, the time required for a random search algorithm to cross the distance (i.e., multiple or single character mutational changes) increases exponentially according to the following formula:

MA = ACHML Formula 4MA = average number of mutations to cross likely Hamming gap distanceAC = number of character options in the alphabet for the systemHML = minimum likely Hamming gap distance

The minimum likely* Hamming gap distance (HML) for a particular level of functional complexity, given a uniformly random distribution of target sequences can be calculated by assigning our statistical level of significance* to a p-value (Pv) and then calculating the z-value (Zv) to find the HML according to the following formulas:

P(Zv) = 1 - Pv Formula 5X = Formula 6

Zv = z-valuePv = p-valueµ = Mean of the population (HAX; Formula 3)x = Minimum likely Hamming Distance (HML) σ = Standard deviation

* What is the definition of "likely"? In considering this question it is relevant to note that the upper bound on the computational resources of the universe over its entire history has been estimated by Seth Lloyd as 10120 elementary logic operations on a register of 1090 bits. (Lloyd, 2002) Therefore, anything beyond this upper bound limitation of the universe will be considered so "unlikely" to be repeatable in a reasonable amount of time as to be essentially impossible.

Page 11: Files/Limited Evolutionary …  · Web viewWith a generation time of 20 minutes, that is one mutational step every 2,000,000 minutes; which equals ~ 4 years. So, with one random

Discussion:

To continue the illustration above dealing with a level of functional complexity that requires a minimum of 1000aa with an average degree of sequence specificity, what is the likely minimum gap distance at this level of functional complexity?

The odds that the starting point will be maximally distant from all 10707 possible targets are quite unlikely. The odds that a single 1000aa sequence would be maximally distant are 19/201000 = ~1e-23.  So, the odds that 10707 such sequences would be maximally distant are 10-23 itself raised to the power of 10707  - - or an extremely unlikely number.   So, the odds that a starting sequence will be closer than the maximum possible distance of 1000 differences to at least one target in sequence space are essentially 100% certain.  This is looking promising for the evolutionary mechanism of RM/NS.    However, consider the odds that a starting point sequence will be within 50 residue differences from a particular target sequence with an unknown location (i.e., having at least 950 of 1000 sites the same as the target). The odds of having at least 950 residue positions the same can be calculated using the formula for binomial probabilities. (VasserStats, 2007)

P(k out of N) = . N = the number of opportunities for event x to occur;k = the number of times that event x occurs or is stipulated to occur;p = the probability that event x will occur on any particular occasion

 

Plugging in the numbers:

P(950 out of 1000) = . P(950 out of 1000) = ~1e-1153

These are the odds of producing exactly 950 matches out of a 1000aa sequence. Adding together the results for matches for 950 up to 1000 positions results in a slight improvement of the odds of there being at least one match within 50aa differences to about 10-1152.   Another way to approach this problem would be to use the cumulative distribution function (CDF) of the standard normal distribution curve. The CDF of a random variable X evaluated at a number x, is the probability of the event that X is less than or equal to x. The CDF of the standard normal

Page 12: Files/Limited Evolutionary …  · Web viewWith a generation time of 20 minutes, that is one mutational step every 2,000,000 minutes; which equals ~ 4 years. So, with one random

distribution is denoted with the capital Greek letter Φ (phi), and can be computed as an integral of the probability density function as follows:

=

Obviously then, 1 - Φ equals the odds of there being at least one match at or beyond the x value. However, this formula requires the use of tables to compare the calculated Φ value to the percent value for that value of Φ (difficult to find for very low values of Φ).

Either way it would take about 101152 sequences of 1000aa before one of them would likely match the needed 950 residue positions to make a gap of 50aa or fewer differences between any one starting point and a particular target in sequence space - on average. 

However, there are about 10707 target sequences at the 1000 fsaar level of functional complexity (Formula 1).  Surely, given so many potential target sequences, it would seem that at least two of them would likely be within a Hamming distance of 50. The problem is that this assumption is mistaken. The odds are indeed improved, but not significantly. Take the individual odds of success for finding a single target sequence, 10-1152, and multiply those odds by the number of potential target sequences, 10707, and the resulting odds are 1 chance in 10445 cycles of 10707 tries - - essentially impossible (Formulas 5 & 6). It illustrate the problem, the odds of success at this level are about as likely as a blind man finding a specific grain of sand in the Sahara Desert (~1025 of them) over 17 times - - in a row.  Most would consider such odds truly "impossible" - i.e., making a gap distance as small as 50aa differences at the 1000 fsaar level of functional complexity essentially impossible as well.  (Lloyd, 2002)   However, given a very large population of 1030 bacteria with 107 codons each there would be a total of 1037 codons.  This is equivalent to 1034 sequences with a size of 1000aa each (or 1034 starting points). Of course in real life most of these starting points would be from the same or a very similar location in sequence space. But, for the purposes of this paper it is assumed that each of these starting points is unique.  Given these assumptions, the odds are improved, but not significantly.  The calculation of 10445 / 1034 = one chance in 10411 tries.  This is an improvement in the odds of success since fewer sequences would be needed, on average, to achieve a gap distance of 50aa differences or less.  However, this improvement hardly solves the problem.  Obviously then, the odds of successfully crossing such an enormous gap with random search algorithms are still essentially impossible.   So what are the implications of these numbers? What does it mean if the likely minimum gap distance is essentially certain to be at least 50 residue differences wide beyond the 1000 fsaar level of functional complexity? A minimum Hamming gap distance of just 50 specifically required residue differences from each one of a large population of 1034 starting point sequences of 1000aa (or 1000-codons of DNA) means that each one of these 1034 sequences is surrounded by at least 1065 non-beneficial options (i.e., > 2050).    Given these very generous assumptions in favor of the success of RM/NS (the actually minimum gap distance is very likely to be far greater than 50 at the 1000 fsaar level of functional

Page 13: Files/Limited Evolutionary …  · Web viewWith a generation time of 20 minutes, that is one mutational step every 2,000,000 minutes; which equals ~ 4 years. So, with one random

complexity), how long will it take to get these 50 needed character changes in at least one sequence in one bacterium in our huge 1030 population of bacteria? - a population as large as all the bacteria on Earth?    Each one of our starting point sequences would have to search through a sequence space of at least 1065/1034 = ~1031 sequences before success will likely be realized.  With a mutation rate of 10-8 per codon per generation each 1,000-codon sequence will get mutated once every 1e5 generations.  With a generation time of 20 minutes, that is one mutational step every 2,000,000 minutes; which equals ~ 4 years. So, with one random walk/mutation step every 4 years, it would take 1031 * 4 = 4 x 1031 years for at least one individual in the entire population to achieve success - on average (i.e., trillions upon trillions of years).

Potential Counter Arguments:

Large leaps within sequence space:But what about the fact that the evolutionary mechanism is not limited to single point mutations, but can in fact co-opt larger pre-existing subsystems, linking them together to form larger and more complex systems in one fell swoop? - without having to traverse large distances in sequence space via tiny little steps?

When the gap distances are larger, the odds that a large mutation that connects pre-existing sequences together will end up being functionally beneficial is essentially the same as the odds that a point mutation will be successful - - not very good.

"When the landscape is rugged, the number of sequence explored by point mutations alone is comparable to that explored by point mutations plus [non-homologous] crossovers. This is because point mutations are more effective in finding a low-mortality area from an already well populated spot nearby, whereas when the landscape is rugged many crossover offspring are likely to end up at high-mortality spots." (Cui, et al., 2002)

It seems then that rugged (functionally "quantized") landscapes are crippling for both point mutations and non-homologous crossover or "multicharacter" mutations - to essentially the same degree beyond a certain level of landscape "ruggedness" (and "ruggedness", or the minimum likely Hamming gap distance, is increased at higher and higher levels of functional complexity). The reason for this effect is that a large step within sequence space, like a non-homologous crossover, is far more likely to produce a non-folding, nonfunctional sequence than a stably folding, functional one.

In support of this argument, consider a paper published by Blanco et. al. (Blanco, et al., 1999) This study involved two proteins, alpha-spectrin SH3 domain (SH3) and the B1 domain of streptococcal protein G (PG), two small dissimilar proteins. The researchers tried creating hybrid proteins to gradually move through sequence space from the SH3 protein to PG via non-homologous crossover mutations. They found that the intermediate sequences did not fold into stable structures. Even when they added most of the residues from PG to the SH3 sequence, while maintaining the SH3 core residues, the protein did not fold into a stable shape. Only when the core SH3 residues were removed was the sequence able to fold into the PG shape. Furthermore, the SH3 structure was found be non-folding even when only two non-core residues were mutated. This implies that folding is often a holistic feature, requiring the cooperation of the

Page 14: Files/Limited Evolutionary …  · Web viewWith a generation time of 20 minutes, that is one mutational step every 2,000,000 minutes; which equals ~ 4 years. So, with one random

hydrophobic core resides and the hydrophilic non-core residues to specify the unique "low-frustration" structure of a given protein family. In addition, it appears that the core residues from different protein families can actually counteract each other, producing an unstable, non-folded protein when combined.  

"The set of sequences analyzed here are hybrids of the sequences of SH3 and PG and represent a more or less uniform sampling of sequence identities between 100 and ~10% with each protein, but only those sequences very similar to the wild-type proteins have unique folds." (Blanco, et al., 1999)

The implication is that the new daughter protein, in order to fold stably, must have precisely coordinated hydrophobic and hydrophilic residues that work together to produce a stable structure. This, in turn, means that not just any parents will do - only certain ones which contain just the right complementary sequences which, when combined (at the correct position in the sequence), give a uniquely stable daughter fold. 

The likely explanation for this effect is that the requirements for stable folding are becoming so stringent that the probabilities are just too low for recombination to do any good. In other words, perhaps one or more of the needed parental sequences are not present, or the probability is just too small that the correct two will recombine in precisely the right way to produce a functional daughter sequence. The fact that the folding requirements for many types of proteins are indeed very stringent is well supported in the literature - as is the concept that with each increase in the minimum size and/or specificity of the systems in question the rarity of viable sequences increases exponentially. 

This problem only increases for those systems which require, at minimum, multiple separate proteins working together in a specific arrangement at the same time - such as a rotary flagellar motility system. The prokaryotic flagellar system, in particular, seems to require the services of dozens of different proteins with qualitatively unique functionality all oriented in a very specific three dimensional arrangement at the same time - for a total of around 10,000 fsaars. (Pitman, 2009)

While there are many stories about the "likely" evolutionary pathways which could have produced such higher level systems, none of these stories are back up in literature by actual demonstration. Not even one of the proposed steppingstones within the evolutionary pathways of such higher level systems has been demonstrated in nature or under laboratory conditions. Beyond this, none of these stories have the support of statistical analysis which demonstrates a reasonable likelihood that the gap distance between any likely precursor system and the next step along the proposed evolutionary pathway is remotely small enough to cross in a reasonable amount of time. It is simply assumed by all authors in literature that given a few billion years anything is possible. However, this assumption is just that - an assumption. This assumption is not backed up by what anyone would call real scientific support. It remains, at best, just-so story telling, and at worst, an appeal to a statistically untenable miracle since not even trillions upon trillions of years would be remotely enough time to cross the likely non-beneficial gap distances in these proposed stories. (Pitman, 2009)

Crowded sequence space?

Page 15: Files/Limited Evolutionary …  · Web viewWith a generation time of 20 minutes, that is one mutational step every 2,000,000 minutes; which equals ~ 4 years. So, with one random

There are many who argue that sequence space is far more crowded with potentially beneficial sequences than this paper suggests.

Functional sequences are not so rare and isolated. Experiments show that roughly 1 in 1011 of all random-sequence proteins have ATP-binding activity (Keefe and Szostak 2001), and theoretical work by H. P. Yockey (1992, 326-330) shows that at this density all functional sequences are connected by single amino acid changes. (Isaak, 2006)

While this statement is true, it fails to explain that this data is only relevant to systems of functional complexity that have minimum size and specificity requirements of no more than 100 fsaars. This level of sequence space is not remotely close to the 1000 fsaar level. This is why the evolutionary mechanism works so well below the 100 fsaar level in observable time, but not at all beyond the 1000 fsaar level.

After all, if all functional sequences were connected by single amino acid changes regardless of the level of functional complexity of the system in question, it would be very difficult to explain the complete lack of observed evolution in action beyond the 1000 fsaar threshold of functional complexity. It would also be hard to explain why Hall's large population of double mutant E. coli bacteria failed to evolve a novel lactase enzyme despite observation for tens of thousands of generations in a highly selective environment. Given the assumption of a likely Hamming distance of one between all existing sequences and at least one potential lactase island within sequence space, Hall should have realized success in very short order regardless of the type of bacterial colony he was using at the time.

The reason for Hall's observed limit to evolutionary potential is that the minimum threshold requirement for a useful lactase enzyme appears to be around 300 fsaars. At this level, the likely Hamming gap distance is a bit more than one - more like 5 or 6 mutational changes from what exists to the edge of the closest lactase island within sequence space (Formulas 5 & 6). The fact that Hall's experiment was at first successful suggests that occasionally the Hamming gap distance can be as small as one at this level, but not very commonly. In other words, at the level of 300 fsaars the likely minimum Hamming gap distance starts to increase beyond one - even for very large populations of pre-existing genetic options or starting point islands.

Natural selection to the rescue?Another potential counter argument is that natural selection is able to guide evolution across the non-beneficial gaps. This argument is mistaken due to the fact that natural selection does not select, in a positive manner, for any newly evolved sequence that is not already functionally beneficial. If the newly discovered sequence does not provide an immediate survival/reproductive advantage, natural selection will not preferentially select to keep it in the gene pool. In other words, natural selection only works once at least the edge of a novel beneficial island within sequence space is discovered by purely random chance. Until this point is realized, natural selection is completely blind as a helpful biasing agent when it comes to crossing non-beneficial gap distances in sequence space in shorter periods of time than would be allowed random chance alone.

Cascading systems:

Page 16: Files/Limited Evolutionary …  · Web viewWith a generation time of 20 minutes, that is one mutational step every 2,000,000 minutes; which equals ~ 4 years. So, with one random

Another interesting counter argument forwarded by Kenneth Miller has to do with a very interesting paper by Johnson et al. reporting on the very real evolution of a novel enzymatic biosystem known as 2,4-dinitrotoluene or 2,4-DNT. (Johnson, et al., 2002)  What is interesting here is that 2,4-DNT is a synthetic compound that was first synthesized in the 1930s. It comprises one of the components of the famous explosive TNT as well as expanded polyurethane foam.  Johnson et. al. somehow noticed that certain types of bacteria in the surface water and soil of Radford Army Ammunition plant in West Virginia were actually eating or metabolizing 2,4-DNT.  The bacteria identified were Burkholderia cepacia R34, a strain that grew using 2,4-DNT as a sole carbon, energy, and nitrogen source. The genes in the evolved degradative pathway were identified within a 27 kb region of DNA.        Now, what is most interesting is the way in which these bacteria achieved this feat.  They co-opted enzymes that were already present and working as parts of other enzymatic pathways to perform an entirely new type of function - the digestion or metabolism of 2,4-DNT.  As it turns out the 2,4-DNT pathway that evolved ultimately involved the use of seven different enzymes totaling well over the suggested limit for evolutionary progress in this paper of just 1000 fsaars. 

"Inferences from the comparison of the structural genes of the 2,4-DNT pathway suggest that the pathway came together from three sources." (Johnson, et al., 2002) 

        Of the seven enzymes in the 2,4-DNT metabolism pathway four of the key enzymes include dntAaAbAcAd (745aa), dntB (548aa), dntD (314aa) and dntG (281aa).  The first two steps produce the byproduct NO2- (Nitrite).  As it turns out, nitrite can be used for energy by bacteria known as nitrifying bacteria.  And, Burkholderia cepacia are nitrifying bacteria.  Why is this important?  Because, it means that each one of the first two steps in the pathway are functionally beneficial since they both provide a source of additional energy to the bacteria that gain such enzymatic activities. (Matsuzaka, et al., 2003)  In addition, each of these steppingstones has independent function in that no specific arrangement or orientation is needed, relative to the other elements in the enzymatic cascade, before its own function can be realized. Statistically this is very important because far less structural specificity is required before the next functional step can be realized - especially if a functionally-equivalent enzyme already exists as part of any other system of function.  And, all of the parts in the 2,4-DNT cascade already existed, preformed, as parts of other systems of function within the bacterium.           If all the needed enzymes are already being made, as parts of other systems, then obviously not much change or evolution is required to be able to use the 2,4-DNT molecule for energy. Unlike bacterial motility systems, enzymatic cascades need not self-assemble themselves in any particular way before the function in question can be realized.  All that needs to happen is for all the required enzymes to be present somewhere in the intracellular environment in any order/arrangement.  This is not the case for non-cascading functions (like bacterial motility systems) where all the protein parts are required to be in a particular order (i.e., a particular three dimensional arrangement) all working together at the same time before the function in question will be realized at all.        This is not to say that cascading systems have no significant functional complexity. Many of them are quite complex, but none are significantly more complex than their most complex single specifically arranged component part.  The most complex single part in the 2,4-DNT cascade

Page 17: Files/Limited Evolutionary …  · Web viewWith a generation time of 20 minutes, that is one mutational step every 2,000,000 minutes; which equals ~ 4 years. So, with one random

seems to be the dntAaAbAcAd enzyme, which requires around 745 fsaars.  Given just this degree of specificity alone, without the original genes and enzymes in place to begin with, even this relatively simple enzymatic function would most likely not have evolved.  The authors themselves state as much when they note that the

"De novo evolution of genes for nitrotoluene degradation during the short period of time seems unlikely." (Johnson, et al., 2002)  

        Compare such cascading functional systems to a functional system like flagellar motility where all the parts are required to be in a very specific arrangement relative to all the other parts in the system to achieve the next beneficial steppingstone function. What this means is that the odds needed to get all the needed parts in the right order for a cascade are exponentially better than the odds needed to get all the right parts for a fully specified system of equal overall size.        For example, say there is a need for 5 specific 3aa residues to form a certain cascading-type function. What are the odds that all 5 will exist within a pool of 1 billion different 3aa sequences? Since there are only 8,000 possible 3aa sequences (203), the odds that all 5 will exist preformed somewhere in the gene pool are very good - much better than 99% chance. (Pitman, 2007)

Now, compare this with the odds of achieving a system that requires all five specific 3aa sequences to be specifically arranged relative to each other.  The number of different specific sequences possible is at least 2015 = 32,768,000,000,000,000,000 (~3.3 x 1019).  So, the odds that one particular 15aa arrangement will appear within a pool of just 1 billion different options is around 1 in 1010 pools. (Pitman, 2007) This produces a huge difference between systems that require subpart specificity and those that do not when it comes to the odds of evolvability within a given span of time.

Conclusions:

So, does the density of potentially beneficial sequences in sequence space change as the level of functional complexity changes? Yes, to an exponential degree (Formula 2). As the level of functional complexity under consideration increases, the density of potentially beneficial sequences in sequence space does in fact decrease exponentially. And, the distribution of these target sequences, although somewhat clumpy in island clusters, is essentially uniform and becomes more and more uniform as the levels of functional complexity under consideration increase.

Although the evidence presented in support of this argument is somewhat technical and admittedly tedious, the basic concept is fairly simple. This exponential relationship between potentially beneficial sequences or systems is clearly evident in all types of language or functional/meaningful informational systems - not just biosystems.

As an example, consider a three-letter word like "cat". What are the odds that a random mutation to this sequence will happen to hit upon another three-letter sequence that is defined as meaningful in the English language system? In order to answer this question it is helpful to know the ratio of defined vs. non-defined three-letter words in the English language system. The total number of words is the total size of three-letter sequence space or 263 = 17,576 total possible three-letter sequences given the 26 characters of the English alphabet. Of these, according to the

Page 18: Files/Limited Evolutionary …  · Web viewWith a generation time of 20 minutes, that is one mutational step every 2,000,000 minutes; which equals ~ 4 years. So, with one random

Scrabble® dictionary, the total number of defined three-letter words is 972. (Scr09) This produces a ratio of about 1 in 18 "words" vs. "non-words" in the English language system. Therefore, on average, a given "word" will be surrounded in sequence space by 18 non-words for every one word that is defined. Therefore, the average number of random mutations necessary to find a new three-letter "word", from one particular starting point, is about 18. Of course, odds of success of 1 in 18 do not seem so bad. Because of such favorable odds, would be very easy to "evolve" from a sequence like "cat" to a sequence like "dog" along a pathway of shortly-spaced steppingstones… as in the following sequence:

CAT-HAT-BAT-BAD-BID-DID-DIG-DOG

However, these odds decrease dramatically as the minimum size requirement increases linearly. The ratio of defined vs. non-defined 7-letter sequences in sequence space (to include multiple word phrases adding up to 7-characters) is around 1 in 250,000… and so on along an exponential curve very similar to biosystem informational complexity described above.

As with functional biosystems in sequence space, all meaningful informational systems show some clustering effect in sequence space where defined or meaningful sequences tend to be grouped together into islands. As already noted, these islands are not round, but are branching with long thin arms that produce a dendritic appearance in sequence space. At low levels of functional complexity, these arms are interconnected with each other like a fishing net so that no one island is separated in sequence space from its closest neighbor by more than a single point mutation (a Hamming distance of one), as in the above illustration of evolution from "cat" to "dog" along a pathway where each step has a Hamming distance of no more than one.

However, as one moves up the ladder of functional complexity, the exponential increase in non-beneficial sequences relative to beneficial sequences starts to break up the dendritic network of close connections. Very quickly the likely minimum Hamming distance between a given island cluster and its next closest neighbor in sequence space is no longer 1, but is 2, then 3, then 4… etc (Formulas 5 & 6). With each linear increase in the level of functional complexity under consideration, the likely minimum Hamming distance between what is and what might be beneficial increases linearly as well. Also, the islands themselves are not as clustered as they were at lower levels of functional complexity, but form a more and more uniform distribution within sequence space (i.e., they are not clustered together in one tiny corner of sequence space but are distributed throughout all of sequence space in a relatively uniform manner). This exponentially changing ratio and a more uniform distribution does have a dramatic effect on the odds that a given type of functional system will be within striking distance by the mindless mechanism of RM/NS (starting with any pool of pre-established options in nature).

Page 19: Files/Limited Evolutionary …  · Web viewWith a generation time of 20 minutes, that is one mutational step every 2,000,000 minutes; which equals ~ 4 years. So, with one random

Even the very well known and outspoken apologist for mainstream evolutionary thinking, Richard Dawkins, seems to subconsciously support with this rather obvious concept in his book, The Blind Watchmaker:

"However many ways there are of being alive, it is certain that there are vastly more ways of being dead, or rather not alive." (Dawkins, 1996)

This seems to also hold true for functional or meaningful language systems. However many ways there are of being functional or meaningful, there are vastly more ways of being non-functional or non-meaningful (and even more ways of being non-beneficial). What is most interesting about this truism, however, is that its truth increases, exponentially (as demonstrated in this paper), as one considers systems at higher and higher levels of functional complexity. This is true regardless of the type of informational or language system under consideration, be it English, Morse code, computer codes, genetic information, or protein-based systems of function. All of these meaningful informational systems show essentially the same exponential decline in the ratio potentially beneficial vs. non-beneficial in sequence space with increasing functional complexity of the systems under consideration.

This is the reason why "simulations (Taverna and Goldstein 2002a) and experiments (Davidson et al. 1995; Keefe and Szostak 2001) clearly show that the vast majority of protein sequences do not stably fold into any structure (meaning the least stable folded protein is still far more stable than the typical random sequence). (Bloom, et al., 2007)

In this line it is interesting to note that that Hazen et. al. published a similar conclusion:

In every system, the fraction of configurations, F(Ex), capable of achieving a specified degree of function will generally decrease with increasing Ex.  [note that this decrease is an exponential decrease with each linear increase in Ex.] (Hazen, et al., 2007)  

This exponentially declining ratio is literally lethal to mainstream evolutionary views of origins when it comes to any sort of useful scientific understanding of the mechanism involved with

Page 20: Files/Limited Evolutionary …  · Web viewWith a generation time of 20 minutes, that is one mutational step every 2,000,000 minutes; which equals ~ 4 years. So, with one random

supposed evolutionary progress over vast periods of time. The mechanism of RM/NS is simply assumed to be valid in literature, not because of scientific demonstration or relevant statistical analysis beyond very low levels of functional complexity, but because there simply are no other options outside of an appeal to pre-existing higher level informational systems, to include arguments of intelligent manipulation (to explain high levels of functional biosystem complexity and diversity).

As already noted, the biasing powers of natural selection do not solve this problem. The use of pre-existing lower level systems does not help to cross the non-beneficial gaps created at higher levels of functional complexity in a reasonable amount of time. What then is left as a viable mechanism for evolutionary progress beyond the 1000 fsaar level of functional complexity? - besides an ultimate appeal to deliberate manipulation by a very intelligent designer? After all, the only directly known and observable force in nature that is clearly able to cross the 1000 fsaar threshold of creativity is human-level intelligence and creativity. What then is the most likely scientific conclusion given the potential and limitations of what is currently known regarding potential mechanisms or sources for high levels of meaningful or functional information systems?

Some might argue that intricate spider webs and honeycombs require a fair amount of meaningful informational complexity, yet are produced without the input of human-level intelligence - just mindless spiders and bees. This argument begs the question as to the ultimate origin of this information. It is like giving credit to the robots that produced the car instead of to the human engineer who produced the robots that produced the car.

Ultimately, the source of higher levels of functional informational complexity must have access to higher yet informational complexity - to the point of what most would recognize as truly mindful, thoughtful, deliberate intelligence and creativity.

Page 21: Files/Limited Evolutionary …  · Web viewWith a generation time of 20 minutes, that is one mutational step every 2,000,000 minutes; which equals ~ 4 years. So, with one random

BibliographyAshcraft Evolutionism [Online] // Creation Wiki. - June 19, 2009. - February 18, 2010. - http://creationwiki.org/Evolutionism.

Behe Michael Darwin's Black Box: The Biochemical Challenge to Evolution. [Book]. - [s.l.] : Free Press, 1996.

Black Paul E. Levenshtein Distance [Online] // National Institute of Standards and Technology. - August 14, 2008. - February 19, 2010. - http://www.itl.nist.gov/div897/sqg/dads/HTML/Levenshtein.html. - http://en.wikipedia.org/wiki/Levenshtein_distance.

Blanco Francisco J., Angrand Isabelle and and Serrano Luis Exploring the conformational properties of the sequence space between two proteins with different folds: an experimental study, [Journal]. - [s.l.] : Journal of Molecular Biology,, January 1999. - 2 : Vol. 285. - pp. 741-753.

Bloom Jesse D. [et al.] Thermodynamic prediction of protein neutrality [Journal]. - [s.l.] : Proc Natl Acad Sci, January 2005. - 3 : Vol. 102. - pp. 606-611.

Bloom Jesse D., Raval Alpan and and Wilke Claus O. Thermodynamics of Neutral Protein Evolution, [Journal]. - [s.l.] : Genetics., January 2007. - 1 : Vol. 175. - pp. 255-266.

Cui Yan [et al.] Recombinatoric exploration of novel folded structures: A heteropolymer-based model of protein evolutionary landscapes., [Journal]. - [s.l.] : PNAS, January 2002. - 2 : Vol. 99. - pp. 809-814.

Darwin Charles On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life [Book]. - London : John Murray; modern reprint Charles Darwin, Julian Huxley (2003). , 1859.

Darwin Charles On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life [Online] // Literature.org. - 1859. - November 29, 2009. - http://www.literature.org/authors/darwin-charles/the-origin-of-species/chapter-06.html.

Dawkins Richard The Blind Watchmaker [Book]. - 1996. - p. 9.

Dembski William Explaining Specified Complexity [Online] // The Virtual Office of William A. Dembski. - July 13, 2002. - February 18, 2010. - http://www.leaderu.com/offices/dembski/docs/bd-specified.html.

Durston Kirk K [et al.] Measuring the functional sequence complexity of proteins [Journal]. - [s.l.] : Theoretical Biology and Medical Modelling, 2007. - 47 : Vol. 4.

Govindarajan S, Recabarren R and Goldstein RA Estimating the total number of protein folds, [Journal]. - [s.l.] : Proteins, 1999. - 4 : Vol. 35. - pp. 408-414.

Page 22: Files/Limited Evolutionary …  · Web viewWith a generation time of 20 minutes, that is one mutational step every 2,000,000 minutes; which equals ~ 4 years. So, with one random

Hall Barry Evolution on a Petri Dish: The Evolved B-Galactosidase System as a Model for Studying Acquisitive Evolution in the Laboratory [Journal]. - [s.l.] : Evolutionary Biology, 1982. - 85 : Vol. 15. - p. 150.

Hazen Robert M. [et al.] Functional information and the emergence of biocomplexity [Journal]. - [s.l.] : PNAS, May 15, 2007. - 1 : Vol. 104. - pp. 8574-8581.

Isaak Mark Index [Online] // Talk.Origins. - 2006. - November 29, 2009. - http://www.talkorigins.org/indexcc/CB/CB150.html.

Johnson GR, Jain RK and Spain JC. Origins of the 2,4-dinitrotoluene pathway. [Journal]. - [s.l.] : J Bacteriol, 2002. - 15 : Vol. 184. - pp. 4219-4232.

Lareo Leonard R. and Acevedo, Orlando E. Sequence mapping in three dimensional space by a numeric method and some of its applications [Journal]. - [s.l.] : Acta Biotheoretica, June 1999. - 2 : Vol. 47. - pp. 123-128. - http://www.springerlink.com/content/uq14741411704025/.

Lloyd Seth Computational Capacity of the Universe [Journal]. - [s.l.] : Phys.Rev.Lett., 2002. - arXiv:quant-ph/0110141v1.

Matsuzaka Emiko [et al.] Participation of Nitrite Reductase in Conversion of NO2- to NO3 - in a Heterotrophic Nitrifier, Burkholderia cepacia NH-17, [Journal]. - [s.l.] : Microbes and Environments, 2003. - 4 : Vol. 18. - pp. 203-209.

Mayr E. Introduction to Charles Darwin's On the Origin of Species: A facsimile of the First Edition [Journal]. - Cambridge : Harvard University Press, 1964. - Vol. IX.

Miller Kenneth Finding Darwin's God [Book]. - [s.l.] : Cliff Street Books, 1999. - 0-06-017593-1.

National Communications System Technology & Standards Division Telecommunications: Glossary of Telecommunication Terms [Book]. - Federal Publication : General Services Administration Information Technology Service, 1996. - http://en.wikipedia.org/wiki/Hamming_distance.

Oberai Amit [et al.] A limited universe of membrane protein families and folds, [Journal]. - [s.l.] : Protein Sci., 2006. - 7 : Vol. 15. - pp. 1723-1734.

Pitman Sean Antibiotic Resistance [Online] // DetectingDesign.com. - 2004. - February 18, 2010. - http://www.detectingdesign.com/antibioticresistance.html.

Pitman Sean Kenneth Miller's Best Arguments Against Intelligent Design [Online] // DetectingDesign.com. - 2007. - December 1, 2009. - http://www.detectingdesign.com/kennethmiller.html#Examples.

Pitman Sean Meaningful Information and Artifact [Online] // DetectingDesign.com. - March 2007. - February 18, 2010. - http://www.detectingdesign.com/meaningfulinformation.html.

Page 23: Files/Limited Evolutionary …  · Web viewWith a generation time of 20 minutes, that is one mutational step every 2,000,000 minutes; which equals ~ 4 years. So, with one random

Pitman Sean The Evolution of the Flagellum [Online] // DetectingDesign.com. - 2009. - December 1, 2009. - http://www.detectingdesign.com/flagellum.html.

Richardson J.S. The anatomy and taxonomy of protein structure. [Journal]. - [s.l.] : Adv. Protein Chem., 1981. - Vol. 34. - pp. 167-339.

Scrabble [Online] // Yak.net. - November 29, 2009. - http://www.yak.net/kablooey/scrabble.html.

Southgate Dr. Christopher CounterBallance - New Views on Complex Issues [Book]. - [s.l.] : T&T Clark, 1999. - http://www.counterbalance.org/ghc-evo/impor-frame.html.

Spencer Herbert Principles of Biology [Article]. - 1864. - Vol. vol. 1 . - pp. p. 444,.

Thirumalai D. and Klimov D. K., Emergence of stable and fast folding protein structures, STOCHASTIC DYNAMICS AND PATTERN FORMATION IN BIOLOGICAL AND COMPLEX SYSTEMS: [Journal]. - [s.l.] : The APCTP Conference. AIP Conference Proceedings, 2000. - Vol. 501. - pp. 95-111.

VasserStats [Online] // Vasser.Edu. - 2007. - December 1, 2009. - http://faculty.vassar.edu/lowry/VassarStats.html.