The Mathematics of DNA

Imagine that someone gives you a mystery novel with an entire page ripped out.page_ripped_out2

And let’s suppose someone else comes up with a computer program that reconstructs the missing page, by assembling sentences and paragraphs lifted from other places in the book.

Imagine that this computer program does such a beautiful job that most people can’t tell the page was ever missing.

DNA does that.

In the 1940′s, the eminent scientist Barbara McClintock damaged parts of the DNA in corn maize. To her amazement, the plants could reconstruct the damaged section. They did so by copying other parts of the DNA strand, then pasting them into the damaged area.

This discovery was so radical at the time, hardly anyone believed her reports. (40 years later she won the Nobel Prize for this work.)

And we still wonder: How does a tiny cell possibly know how to do…. that???

A French HIV researcher and computer scientist has now found part of the answer. Hint: The instructions in DNA are not only linguistic, they’re beautifully mathematical. There is an Evolutionary Matrix that governs the structure of DNA.

Computers use something called a “checksum” to detect data errors. It turns out DNA uses checksums too. But DNA’s checksum is not only able to detect missing data; sometimes it can even calculate what’s missing. Here’s how it works.

In English, the letter E appears 12.7% of the time. The letter Z appears 0.7% of the time. The other letters fall somewhere in between. So it’s possible to detect data errors in English just by counting letters.

In DNA, some letters also appear a lot more often (like E in English) and some much less often. But… unlike English, how often each letters appears in DNA is controlled by an exact mathematical formula that is hidden within the genetic code table.

When cells replicate, they count the total number of letters in the DNA strand of the daughter cell. If the letter counts don’t match certain exact ratios, the cell knows that an error has been made. So it abandons the operation and kills the new cell.

Failure of this checksum mechanism causes birth defects and cancer.

Dr. Jean-Claude Perez started counting letters in DNA. He discovered that these ratios are highly mathematical and based on “Phi”, the Golden Ratio 1.618. This is a very special number, sort of like Pi. Perez’ discovery was published in the scientific journal Interdisciplinary Sciences / Computational Life Sciences in September 2010.

Jean-Claude Perez discovered an evolutionary mathematical matrix in DNA, based on the Golden Ratio 1.618

Jean-Claude Perez discovered an evolutionary mathematical matrix in DNA, based on the Golden Ratio 1.618

Before I tell you about it, allow me to explain just a little bit about the genetic code.

DNA has four symbols, T, C, A and G. These symbols are grouped into letters made from combinations of 3 symbols, called triplets.  There are 4x4x4=64 possible combinations.

So the genetic alphabet has 64 letters. The 64 letters are used to write the instructions that make amino acids and proteins.

Perez somehow figured out that if he arranged the letters in DNA according to a T-C-A-G table, an interesting pattern appeared when he counted the letters.

He divided the table in half as you see below. He took single stranded DNA of the human genome, which has 1 billion triplets. He counted the population of each triplet in the DNA and put the total in each slot:

tcag_symmetry1
When he added up the letters, the ratio of total white letters to black letters was 1:1. And this turned out to not just be roughly true. It was exactly true, to better than one part in one thousand, i.e. 1.000:1.000.

Then Perez divided the table this way:

tcag_symmetry2
Perez discovered that the ratio of white letters to black letters is exactly 0.690983, which is (3-Phi)/2. Phi is the number 1.618, the “Golden Ratio.”

He also discovered the exact same ratio, 0.690983, when he divided the table the following two alternative ways:

tcag_symmetry3
tcag_symmetry4
Again, the total number of white letters divided by the total number of black letters is 0.6909, to a precision of better than one part in 1,000.

Perez discovered two more symmetries:

tcag_symmetry5Above: Total ratio of white:black letters = 1:1
tcag_symmetry6Again, total ratio of white:black letters = 1:1

So for three ways of dividing the table, the ratio of white to black is 1.000:1.000.

And for the other three ways of dividing it, the ratio is 0.690983 or (3-Phi)/2.

When you overlay these 6 symmetries on top of each other, you get a set of mathematical stairs with 32 golden steps. Then an absolutely fascinating geometrical pattern emerges: The “Dragon Curve” which is well known in fractal geometry. Here it is, labeled with DNA letters in descending frequency:

dragoncurve

300px-dragon_curve_animation

Animated Dragon Curve

You can see other non-DNA, computer generated versions of this same curve here.

Other interesting facts:

  • Similar patterns with variations on these same rules are seen across a range of 20 different species. From the AIDS virus to bacteria, primates and humans
  • Each character in DNA occurs a precise number of times, and each has a twin. TTT and AAA are twins and appear the most often; they’re the DNA equivalent of the letter E.
  • This pattern creates a stair step of 32 frequencies, a specific frequency for each pair.
  • The number of triplets that begin with a T is precisely the same as the number of triplets that begin with A (to within 0.1%).
  • The number of triplets that begin with a C is precisely the same as the number of triplets that begin with G.
  • The genetic code table is fractal – the same pattern repeats itself at every level. The micro scale controls conversion of triplets to amino acids, and it’s in every biology book. The macro scale, newly discovered by Dr. Perez, checks the integrity of the entire organism.
  • Perez is also discovering additional patterns within the pattern.

I am only giving you the tip of the iceberg. There are other rules and layers of detail that I’m omitting for simplicity. Perez presses forward with his research; more papers are in the works, and if you’re able to read French, I recommend his book “Codex Biogenesis” and his French website. Here is an English translation.

(By the way, he found some of his most interesting data in what used to be called “Junk DNA.” It turns out to not be junk at all.)

OK, so what does all this mean?

  • Copying errors cannot be the source of evolutionary progress, because if that were true, eventually all the letters would be equally probable.
  • This proves that useful evolutionary mutations are not random. Instead, they are controlled by a precise Evolutionary Matrix to within 0.1%
  • When organisms exchange DNA with each other through Horizontal Gene Transfer, the end result still obeys specific mathematical patterns
  • DNA is able to re-create destroyed data by computing checksums in reverse – like calculating the missing contents of a page ripped out of a novel.

No man-made language has this kind of precise mathematical structure. DNA is a tightly woven, highly efficient language that follows extremely specific rules. Its alphabet, grammar and overall structure are ordered by a beautiful set of mathematical functions.

More interesting factoids:

The most common pair of letters (TTT and AAA) appears exactly 1/13X as often as all the letters combined – consistently, the genomes of humans and chimpanzees.

If you put the 32 most common triplets in Group 1 and the 32 least common triplets in Group 2, the ratio of letters in Group1:Group2 is exactly 2:1. And since triplet counts occur in symmetrical pairs (TTT-AAA, TAT-ATA, etc), you can group them into four groups of 16.

When you put those four triplet populations on a graph, you get the peace symbol:

dna_peace_symbol

Does this precise set of rules and symmetries appear random or accidental to you?

My friend, this is how it is possible for DNA to be a code that is self-repairing, self-correcting, self-re-writing and self-evolving. It reveals a level of engineering and sophistication that human engineers could only dream of. Most of all, it’s elegant.

Cancer has sometimes been described as “evolution run amok.” Dr. Perez has noted interesting distortions of this matrix in cancer cells. I strongly suspect that new breakthroughs in cancer research are hidden in this matrix.

I submit to you that the most productive research that can possibly be conducted in medicine and computer science is intensive study of the DNA Evolution Matrix. Like I said, this is just the tip of the iceberg.

There is so much more here to discover!

When we develop computer languages based on DNA language, they will be capable of extreme data compression, error correction, and yes, self-evolution. Imagine: Computer programs that add features and improve with time. All by themselves.

What would that be like?

Perry Marshall

P.S.: Dr. Perez and I are friends. Perez worked on HIV research with the man who originally discovered HIV, Luc Montagnier. Perez also worked in biomathematics and Artificial Intelligence at IBM. I’m familiar with this work because last spring I had the privilege of helping him translate his groundbreaking research paper about this into English.

You can read it here: “Codon Populations in Single-stranded Whole Human Genome DNA Are Fractal and Fine-tuned by the Golden Ratio 1.618″

Share and Enjoy:
  • email
  • PDF
  • Facebook
  • Twitter
  • StumbleUpon
  • del.icio.us
  • Technorati
  • Google Bookmarks
  • Yahoo! Bookmarks
 
 

32 Comments

Old Git Tom says:

Mr Marshall,
many thanks for that – it’s absolutely astonishing. OGT

DDD says:

Perry,
I enjoy your site. I do however have a problem with some of the math–
“Perez discovered that the ratio of white letters to black letters is exactly 0.690983, which is (3-Phi)/2. Phi is the number 1.618, the “Golden Ratio.”
He also discovered the exact same ratio, 0.690983, when he divided the table the following two alternative ways:”
Right below this is a table 1st letter T –White, next first letter C is black, next first letter A is white, next first G is Black
White / Black = Sum T+ Sum A / Sum C + Sum G =0.690983
Later in another matrix it is stated “There are two more symmetries that Perez discovered: and that matrix has the White/Black reversed
Sum C + Sum G /Sum T+ Sum A = 1 to 1
Simplifying T+A / C +G = 0.690983 and C+G / T+A =1 I don’t believe math works that way unless you are changing the inputs and not calling them out.
or am I missing something here?

The 2nd matrix does not reverse the black/white, there are a total of 6 distinct matrices, 3 are 1:1 and 3 are 0.69. Perez’s paper makes this clear. If I’ve made an error in representing Perez’s paper I’m open to having that pointed out.

NoMoreGames says:

Very interesting article, but being biologically educated, I have a few comments.

Junk DNA has long been believed to have some sort of function, we are just unsure of what that function is (most likely regulatory). The term “junk” has just stuck around from an older time.

“Copying errors cannot be the source of evolutionary progress, because if that were true, eventually all the letters would be equally probable. This proves that useful evolutionary mutations are not random. Instead, they are controlled by a precise Evolutionary Matrix to within 0.1%”

Considering there are about 3 billion base pairs in the human genome, that would still allow for some 3 million bases to not be controlled by this matrix, essentially offering counter evidence for your first example. That’s a lost of potential random mutations. Please correct me if I interpreted this data incorrectly.

I haven’t had a chance to read it yet, but I’m wondering if his paper mentioned if he factored the highly variable and repeating telomeres into his matrices?

Thank you for your input!

If the Evolution Matrix controls codon populations to 0.1% then that means that copying errors cannot account for more than 0.1% of the difference between, say, bacteria and humans. It means that 99.9% comes from processes that obey the rules of the matrix.

I’m sure there is some teeny tiny percentage of random mutations that have turned out to be beneficial. But then saying that random mutations are therefore the source of evolutionary progress is a complete non-sequitur. It’s sort of like a story I remember hearing somewhere, where a guy had some kind of physical problem and he was struck by lightning and it went away. It could be true, and freak accidents do happen, but nobody I know is volunteering to get struck by lightning. Science is not about freak accidents, it’s about systematic explanations.

Evolution is driven by transposition, horizontal gene transfer, epigenetics, symbiogenesis and genome doubling. All of those things are very well documented, all are systematic processes, and they obey the rules of the matrix. Random Mutation doesn’t obey the matrix and is dead last in the lineup of beneficial evolutionary mechanisms.

I’m not sure about your last question. I’ll forward it to Dr. Perez.

BTW The term “junk DNA” needs to be discarded. As does other derisive terms like “degenerate code” which is a misnomer for a brilliant error minimization scheme.

(Or maybe we need to keep the term ‘junk DNA’ around as a reminder of how much damage atheism has done to the study of biology and the practice of science.)

tetrahedral says:

Languages may be as precisely tuned re the Golden Ratio as DNA. There is a doctoral student at UArizona doing his thesis on Phi in phrase and clause structure. Others have shown strong typological relations between prosody type and syllable type, between numbers of features in phonemes and total average sentence length, etc. And language types tend to be highly coherent, re word order and other factors between heads and dependents. It just hasn’t been looked at through the window of fractals, the Golden Ratio, etc. Since languages change these things cyclically, and pack/unpack the individual feature bundles into new configurations, linguists specializing in one aspect or another can’t help but fail to see the larger picture.

There are also many strong parallel analogical relations between linguistic and genomic structure. Languages can separate basic meaning bearing units, adjoin them, or overlap them. Same thing happens with genomes, in terms of protein coding sequences. Meaning bearing units can bootstrap their forms from their underlying sequence (sound symbolism) or get them imposed from above. Generally the relative importance of these two extrema depends on how elaborated morphosyntax is, or how often used. There is evidence of the same thing in the genome.

Finally, re ‘junk’, it has been shown that in eukaryotic organisms the more junk, generally, the more environmental context sensitivity is present for determining the right time to become sexually mature- they wait for optimal conditions. They also tend to have larger, less complex cells, less internally ramified or externally connected. or overlapping organs. Those with reduced junk tend to have smaller more specialized cells, often mixed in organs (sometimes from separate origins). They ignore the resource environment and are instead ‘on the clock’ for maturation, so that the process is more or less in synch with the seasons, the day/night cycle, etc.- automated. Also earlier life stages tend to be de-emphasized or reduced. Thus metamorphoses- in plants and animals.

Junk cumulates either by wholesale duplications or from inputs from outside the organism (viruses). And it can be lost as well. Constant updating. Old viruses get defanged by breaking up genes into fragments, etc. And there is exchange between the junk and the split gene system.

Same thing happens in languages. But I’ll leave off here unless there is more interest.

As for Phi in nature, in the past year I discovered a link between the Periodic Table and Pascal’s Triangle, and Fibonacci and related sequences. Just for one example if one divides Fib numbers into triplets (two odds, one even- that is, an even number of odds vs. an odd number of evens), and maps them AS atomic numbers in the periodic system, then within known elements ALL the odd Fib numbers map to leftmost positions in the table where there is one electron in one new orbital lobe: s1,p1,d1,f1. But orbitals are split in two, the left/first half with singlet electrons in lobes, and the second/right half with two per lobe. Within the known elements ALL the EVEN Fib numbers map to the leftmost positions in the second half of the orbital, where the first doublet electron is in one lobe.

The related Lucas number map instead to RIGHTMOST positions within the orbitals, with half or completely filled status- where there are exceptions, the electron configurations or the behaviors of the elements themselves are altered to better fit the Lucas trend (ex. 29Cu and 47Ag steal an electron from a filled s to donate it internally to make a full d. Half s is just as ‘Lucas’ as full).

This thing goes on and on with other related sequences. And the exceptions themselves seem to be patterned between the sequences. One finds such things in languages as well.

3rdMLNM says:

Good-News to All,

Here is the wonderful “Symmetry & Mathematics”
–perhaps the equivalent of DNA codes in some respects–
within the WORD of one and only true GOD,
that Jesus unmistakably promised for this Third and Last Day (=Millennium),
thus also as an “Eternal Food” for all truth-seeking brains, souls and minds (John 6/27-40)
herein now:

http://www.holy-19-harvest.com

Recently it seems the gospel of St. John wasn’t written by John at all but Mary Magdalene. God isn’t in the material word but Spirit world. These thoughts go back to Aristotle 329BCE and the Gospel according to Thomas. They weren’t included in the cannon of scripture never the less were written.

helixbender says:

Perry do you even have an idea of what Barbara McClintock discovered? Look up class II Transposons read about them what the do and how they work. See Genes IX page 538.

Please be extremely specific about what you’re trying to say. I have a half dozen books by or about Barbara McClintock. I’m not going to go find some new book and read it just because you vaguely suggest that I don’t know what I’m talking about.

helixbender says:

Well did you look up class II transposons and compared to what you wrote in your introduction to this article?

No I did not. Make a statement of what you agree or disagree with.

helixbender says:

I not sure I get what you mean when you try to explain McClintock’s experiments. It doesn’t match what I’ve seen in the literature about class II transposons. I’ve never seen a paper talking about fixing the ‘damaged’ gene using ‘other parts of the DNA strand’. Could you give me the reference for this?

From http://shapiro.bsd.uchicago.edu/21st_Cent_View_Evol.html

In addition to proofreading systems, cells have a wide variety of repair systems to prevent or correct DNA damage from agents that include superoxides, alkylating chemicals and irradiation (33). Some of these repair systems encode mutator DNA polymerases which are clearly the source of DNA damage-induced mutations and also appear to be the source of so-called “spontaneous” mutations that appear in the absence of an obvious source of DNA damage (34). Results illustrating the effectiveness of cellular systems for genome repair and the essential role of enzymes in mutagenesis emphasize the importance of McClintock’s revolutionary discovery of internal systems generating genome, particularly when an organism has been challenged by a stress affecting genome function (Fig. 4; 5).

In repair responses, we know that DNA damage triggers the activation of mutator polymerases and non-homologous end joining activities

Sometimes, much larger multiprotein assemblages are involved, like the apparatus for carrying out homologous genetic recombination or for repairing severed DNA molecules by non-homologous joining of broken ends (36). Among the most important systems are those called “mobile genetic elements” (MGEs; 7, 8), which make up about 43% of the human genome (21). These MGEs include the transposable “controlling elements” discovered by McClintock, and they comprise integrated systems of proteins and nucleic acids that interact to mobilize DNA to new locations in the genome.

See the illustration next to this text of segments of DNA being re-arranged via transposition.

helixbender says:

Thanks I see where the misunderstanding comes from. Again it is thought provoking. I’ll find some papers that will help you understand my understanding of how this kind of thing works.

livemike says:

Ok, so basically the author claims that there is a pattern in something that is supposed to evolved, therefore it cannot have evolved. The problem is that evolution produces patterns, indeed if it didn’t it wouldn’t be evolution. The fact that certain mathematical ratios are produced in nature, particularly the “Golden Ratio” is not news. Nor is the presence of fractal patterns (indeed if you’re describing fractals it sometimes helps to say “they’re like tree branches” everybody gets that). There have been many examples of scientists finding such ratios and then explaining them in terms of reproductive advantage. This is almost certainly another of those. Biologists have long know (as this article points out) that DNA has error correction, indeed this is easily predicted by evolutionary theory because a creature that didn’t have error-correcting DNA would be more likely to produce mutated nonviable children. It doesn’t prove that the mutations aren’t random, it just proves that the sum of the mutations over billions of years has a strong tendency to produce patterns. Which is pretty much exactly what Darwin said.

Of course it could be argued that if DNA acts as described above that heavily limits the way in which random mutations could occur. But that’s exactly what an error-correction capacity is SUPPOSED to do. Nobody with any scientific credibility claims that an error correction mechanism in DNA prevents evolution.

Nobody here is denying that evolution happened. What I am saying is that none of these checksum rules can be derived by the laws of physics; and most importantly they have to be in place before evolution itself can be possible. Perez is saying that these patterns could not have emerged from randomness.

What this does mean is that the Neo-Darwinian theory that random mutation and natural selection explains everything is wrong. The mutations are not random. They’re non-random, modular and systematic. (Transposition, Horizontal Gene Transfer, Epigenetics, Genome Doubling – ALL governed by this matrix.)

james says:

has anyone heard of a e wildersmith

MikeFromOhio says:

I believe Perry is correct when he says that mutations are not strictly random and can involve Transposition, Epigenetics etc.

However evolution happens at many levels. It’s possible that the genome matrix has evolved in a coupled manner to the cells that it represents. This is typically referred to as “the evolution of evolvability”.

As I said elsewhere, we need to work out a prototype with Genetic Programming and *try* to evolve a simple design/code and see what happens. Doing such an experiment might add weight to Perry’s argument, or it might prove otherwise. Either way we would learn more.

In other words, saying that evolution is mutation/random based and that its not very powerful, is like saying that a car with 1.5 wheels won’t get you very far so why believe in cars.

Evolution = Variation In Population + Phenotype Selection + Time
Evolution Random Mutation

Perry, your TextMutator is very much the car with 1.5 wheels. It does not use a population and only looks at the genotype (there is no phenotype and no environment for selection).

With that said, I still think you make brilliant points about Design and Language. The question of where did the DNA mechanism come from is spot on. I want to help get deeper into that question. Why not open that box and see what is inside?

Old Git Tom says:

http://www.rexresearch.com/gajarev/gajarev.htm

This site may be of interest. It mainly concerns the radical findings of Russian scientist Garjajev, or Garaiaiev (?). Ie., DNA is stored information, a language, & a communications medium. DNA (allegedly) also formed the template of the original grammar of all languages – much sought after by linguists of the Chomsky school.

I’m scientifically ignorant, so can’t comment, beyond saying that it supports Perry Marshall (et al) in rejecting the idea of ‘junk’ DNA, & the materialist dogma that DNA is ‘dumb’ chemical matter. OGT

[...] the process – that all this, could just happen, from the beauty of the oceans and mountains, to the perfection of dna. To decide this happened by accident and not plan, is in itself pretty unrealistic to me. To [...]

Local fractional functional analysis, gradually conquering one stronghold after another, may become a nearly new universal mathematical doctrine, not merely a new area of mathematics, but a new mathematical world view. Its appearance was the inevitable consequence of the evolution of all of twenty-once-century mathematics, in particular analysis and mathematical physics in fractional-dimension spaces. Its original basis is formed by theory of sets from Cantor sets to fractional sets. Its existence will answer the question of how to state general principles of a broadly interpreting fractal mathematics and fractal engineering.

http://www.nonlinearscience.com/downloads/toc-yang.pdf

Local fractional Fourier analysis, Advances in Mechanical Engineering and its Applications, 1(1) (2012)12-16

Local fractional calculus (LFC) deals with everywhere continuous but nowhere differentiable functions in fractal space. In this letter we point out local fractional Fourier analysis in generalized Hilbert space. We first investigate the local fractional calculus and complex number of fractional-order based on the complex Mittag-Leffler function in fractal space. Then we study the local fractional Fourier analysis from the theory of local fractional functional analysis point of view. We finally propose the fractional-order trigonometric and complex Mittag-Leffler functions expressions of local fractional Fourier series.

http://www.worldsciencepublisher.org/journals/index.php/AMEA/article/view/264

Old Git Tom says:

Xiao-Jun Yang,
mathematics; yes, super stuff, but what exactly are you stating? In ordinary language, if possible, please? Thanks, OGT

In this paper we point out the interpretations of local fractional derivative and local fractional integration from the fractal geometry point of view. From Cantor set to fractional set, local fractional derivative and local fractional integration are investigated in detail, and some applications are given to elaborate the local fractional Fourier series, the Yang-Fourier transform, the Yang-Laplace transform, the local fractional short time transform, the local fractional wavelet transform in fractal space.
Cited from: Local fractional calculus and its applications, FDA 2012, http://em.hhu.edu.cn/fda12/index.html

You need to obtain the orginal paper, and find these results. I look forword to hearing from you, and thank you very much.

Old Git Tom says:

Xiao-Jun Yang,

your post writes about some kinds of higher mathematics – a closed book to me, so I fear the original paper would be even less comprehensible! But thanks anyhow. OGT

[...] write up on the golden ratio. I understand that he helped translated the work into English. Same? The Mathematics of DNA. [...]

God Chaser says:

“Premise #2 – All codes we know the origin of, that are capable of storing and retrieving pictures of aunt Harriet, a couple of your favorite novels, and back up your computer data, come from a mind!”

That brings the concept home!

Hey Perry, are you going to do a write-up on being able to code and store a biology book in DNA?

Old Git Tom says:

God Chaser,
no problem: there are many forms of ‘code’, not just mathematical. Broadly, which one is used depends on the kind of info to be best transmitted. Eg., a movie reel contains graphics as code. In principle, the moving images might be translated into scrolling formulae, but it would not be a rewarding viewing experience! And not all codes are interchangeable. Musical notation cannot become language text, & only skilled musicians can ‘decode’ a composition to unlock the audio ‘message’. AFAIK, we are not sure how many code-modes there are, so we cannot talk about transformations ‘in principle’. Some codes might not translate into others at all. But any biology book might be encoded as DNA. I’ve read that computer scientists are researching this very area, since DNA encodes far more densely than any known alternative – so, libraries on a pinhead, etc. OGT

Comment Page 1 of 212»

Leave a Reply

You must be logged in to post a comment.