The Origin of Information: How to Solve It

 

Cosmic Fingerprints has issued a challenge to the scientific community:

“Show an example of Information that doesn’t come from a mind. All you need is one.”

“Information” is defined as digital communication between an encoder and a decoder, using agreed upon symbols. To date, no one has shown an example of a naturally occurring encoding / decoding system, i.e. one that has demonstrably come into existence without a designer.

Cosmic Fingerprints will freely acknowledge and publicize the first person who can solve this. To solve this problem is far more than an  object of abstract religious or philosophical discussion. It would demonstrate a mechanism for producing coding systems, thus opening up new channels of scientific discovery. Such a find would have major implications for Artificial Intelligence research.

This would provide a solution to the most perplexing problem currently faced by the Origin Of Life field, namely the origin of coded information. How could the genetic code (or any coding system) come into being? This would represent a landmark discovery in the history of science and alter our fundamental understanding of the universe.

The following specification defines the criteria for identifying a naturally occurring code:

1.     Humans can design the experiment, with all manner of state-of-the-art laboratory equipment, ideal conditions etc. They just can’t cheat: the submitted system cannot be pre-programmed with any form of code whatsoever.

2.     Since the origin of DNA is unknown, the submitted system cannot be a direct derivative of DNA or produced by a living organism. Bee waggles, dogs barking, RNA strands and mating calls of birds don’t count. Such codes are products of animal intelligence, genetically hard-coded and/or instinctual.

3.     The origin of the submitted system must be documented such that its process of origin can be observed in nature and/or duplicated in a real-world laboratory according to the scientific method.

4.     The submitted system must be digital, not analog.

5.     The submitted system must have the three integral components of communication functioning together: encoder, code, decoder.

6.     The message passed between encoder and decoder must be a sequence of symbols from a finite alphabet.

7.     A symbol is a group of k bits considered as a unit. We refer to this unit as a message symbol mi (i=1, 2, …. M) from a finite symbol set or alphabet. The size of the alphabet M is M = 2^k where k is the number of bits in the symbol. For a binary symbol, k = 1, M = 2. For a quaternary symbol in DNA, k = 2, M = 4.

8.    A character is a group of n symbols considered as a unit. We refer to this unit as a message character ci (i=1, 2, …. C) from a finite word set or vocabulary. The maximum size of the character set C is C = M^n. For a standard computer byte, M = 2, n = 8, C=256. For a triplet group of quaternary symbols in DNA, M = 4, n = 3, C=64.

9.     The submitted system must be labeled with values of both encoding table and decoding table filled out.

10.    For the submitted system, it must be possible to objectively determine whether encoding and decoding have been carried out correctly. For example when you press the “A” key on the keyboard, a letter “A” is supposed to appear on the screen and there is an observable correspondence between the two. In defining biological gender, a combination of X and Y chromosomes should correspond to male, while XX should correspond to female. For any given system, a procedure should exist for determining whether input correctly corresponds to output.

(Above definitions adapted from Digital Communications: Fundamentals and Applications by Bernard Sklar, page 13, Prentice Hall, 2nd edition, 2001)

Isomorphism between Shannon’s Communication System and DNA:

Claude Shannon's communication model (From The Mathematical Theory of Communication, University of Illinois Press, 1998).

Above: Hubert Yockeys DNA communication channel model.  Notice that it contains the exact same components as Shannons – the two systems are isomorphic.  My thesis is that communication systems of this type are always, without exception, products of design.  (From Hubert Yockey, Information Theory, Evolution, and the Origin of Life,  Cambridge University Press, 2005.)

Hubert Yockey's DNA communication channel model. Notice that it contains the exact same components as Shannon's – the two systems are isomorphic. (From Hubert Yockey, Information Theory, Evolution, and the Origin of Life, Cambridge University Press, 2005.)

Essential Components of a Communication System (after Shannon, 1948):

comm_system

Example Communication Systems:

Example #1: The ASCII Code

Keyboard > ASCII > Computer Screen: When you press the letter “A” on the keyboard, the letter is encoded into ASCII and decoded by the computer and a letter “A” appears on the screen.

ASCII characters contain 7 symbols, so n = 7. The ASCII character set C is 2^7 or 128 characters.

Encoding tables for ASCII (letter on keyboard > binary code):

Input (letter on keyboard) Encoded Message
A 1000001
B 1000010
a 1100001
b 1100010

The complete ASCII table is available at http://en.wikipedia.org/wiki/Ascii#ASCII_printable_characters

Decoding tables for ASCII (binary code > letter on screen or printer):

Encoded Message Output (displayed as an arrangement of pixels on screen or printer)
1000001 A
1000010 B
1100001 a
1100010 b

~

Example #2: The Genetic Code

Nucleotides > mRNA > Proteins: Base pairs are grouped into codons and encoded (transcribed) into messenger RNA,  then decoded (translated) by the ribosomes into proteins.

The DNA symbol unit is a nucleotide, forming a 4 letter alphabet of Adenine, Cytosine, Guanine, or Thymine. Each base pair contains k = 2 bits of information. A character consists of n = 3 symbol units. Character set C is 4^3 which is 64 characters. DNA’s redundancy scheme maps these 64 characters to 20 amino acids.

Encoding tables for DNA (base pairs > mRNA):

Nucelotides (Input) Amino Acid (Encoded Message)
CCC Proline
ACC Threonine
GGG Glycine
AAA Lysine

The complete genetic code chart is available at http://en.wikipedia.org/wiki/Genetic_code#RNA_codon_table

Decoding tables for DNA (amino acids > proteins):

Amino Acid Sequence (encoded message) 

Legend of Amino Acid Abbreviations

output 

Peptide/Protein (organism name)

#**
YGGFM Met-enkephalin (HS) 5
MRTGNAN Microcin C7 (EC) 7
DRVYIHPF Angiotensin 2 (HS) 8
CYIQNCPLG Oxytocin (HS) 9
CYFQNCPRG Vasopressin (HS) 9
QHWSYGLRPG Gonadoliberin-1 (HS) 10
RPKPQQFFGLM Substance P (HS) 11
DVPKSDQFVGLM Kassinin (KS) 12
GGAGHVPEYFVGIGTPISFYG Microcin J25 (EC) 21
RSCCPCYWGGCPWGQNCYPEGCSGPKV Neurotoxin 3 (AS) 27
HSQGTFTSDYSKYLDSRRAQDFVQWLMNT Glucagon (HS) 29
APLEPVYPGDNATPEQMAQYAADLRRYINMLTRPRY Pancreatic Hormone  (HS) 36
KCNTATCATQRLANFLVHSSNNFGAILSSTNVGSNTY Islet amyloid polypeptide (HS) 37
CTPGSRKYDGCNWCTCSSGGAWICTLKYCPPSSGGGLTFA Serine protease inhibitor 3 (SG) 40
DDGLCYEGTNCGKVGKYCCSPIGKYCVCYDSKAICNKNCT Pollen allergen Amb t 5 (AT) 40
VGIGGGGGGGGGGSCGGQGGGCGGCSNGCSGGNGGSGGSGSHI Microcin B17 (EC) 43
TTCCPSIVARSNFNVCRLPGTPEALCATYTGCIIIPGATCPGDYAN Crambin (CA) 46
ATYNGKCYKKDNICKYKAQSGKTAICKCYVKKCPRDGAKCEFDSYKGKCYC Antifungal protein (AG) 51
GIVEQCCTSICSLYQLENYCN-FVNQHLCGSHLVEALYLVCGERGFFYTPKT Insulin A-B chains (HS) 51
DIPEVVVSLAWDESLAPKHPGSRKNMACYCRIPACIAGERRYGTCIYQGRLWAFCC Neutrophil defensin 1 (HS) 56
CSSNAKIDQLSSDVQTLNAKVDQLSNDVNAMRSDVQAAKDDAARANQRLDNMATKYRK Major outer membrane lipoprotein (EC) 58
RPDFCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCGGA Pancreatic trypsin inhibitor (BT) 58
EEYVGLSANQCAVPAKDRVDCGYPHVTPKECNNRGCCFDSRIPGVPWCFKPLQEAECTF Trefoil factor 3 (HS) 59
MDPNCSCAAGDSCTCAGSCKCKECKCTSCKKSCCSCCPVGCAKCAQGCICKGASDKCSCCA Metallothionein (HS) 61
IRCFITPDITSKDCPNGHVCYTKTWCDAFCSIRGKRVDLGCAATCPTVKTGVDIQCCSTDNCNPFPTRKRP Long neurotoxin 1 (NK) 71

*Mature form; **#: number of amino acids. Source: http://www.uniprot.org

This is only a partial listing of the simplest proteins. There are about a million known proteins, many of them extremely complex. More information on protein structures is available at http://www.uniprot.org and http://www.ncbi.nlm.nih.gov/.

Both ASCII and DNA are formal communication systems according to Shannon’s model because they encode and decode messages using a system of symbols. DNA is not like a communication system, or analogous to a communication system; it is formally defined as a communication system.

“Information, transcription, translation, code, redundancy, synonymous, messenger, editing, and proofreading are all appropriate terms in biology. They take their meaning from information theory (Shannon, 1948) and are not synonyms, metaphors, or analogies.” (Hubert P. Yockey,  Information Theory, Evolution, and the Origin of Life,  Cambridge University Press, 2005).

Similar tables are easily made for other codes and communication systems, like HTML, bar codes, postal codes, Morse code, computer file formats and programming languages.

Miller-Urey Experiment and the Origin of Life

The 1953 “Miller-Urey” experiment*** produced organic compounds from gases thought to be present in earth’s early atmosphere. It is widely cited in textbooks as an explanation of how early life was formed in the ocean.

This experiment only attempted to explain where a handful of the chemicals came from, and it certainly didn’t begin to explain how replication got started. Still, it provided useful insights.

If the Miller-Urey experiment had produced encoding, decoding, and information transmission as defined here, it would most certainly qualify as meeting this challenge.

Public recognition will be awarded to the first person who demonstrates a naturally occurring communication system that meets the engineering specification outlined in this document. Submissions must be identical in format to the above examples of ASCII and DNA. Submissions must include a definition of all symbols, alphabet and the associated encoding / decoding tables.

Download the application form and instructions for submission here.

All submissions, along with our evaluations of those submissions, are available in their entirety for public review at the following page:

http://www.cosmicfingerprints.com/submissions/

***Miller, Stanley L. (May 1953). “Production of Amino Acids Under Possible Primitive Earth Conditions” Science 117 (3046): 528.

Share and Enjoy:
  • email
  • PDF
  • Facebook
  • Twitter
  • StumbleUpon
  • del.icio.us
  • Technorati
  • Google Bookmarks
  • Yahoo! Bookmarks