I have a phd in a related field and I can't understand exactly what is being said here. From what I can tell, the author claims a protein was engineered, where the protein sequence maps (through a chosen translation table) to a human text. But at the same time, the protein folds into a well-defined shape (predicted, then experimentally determined), and somehow also enciphers... another poem?
You've got the right idea. The "poem" ("any style of life / is prim...") is encoded as a DNA sequence. This DNA codes for a protein, whose amino acids can be read as English text as well ("the faery is rosy / of glow..."), and which causes the bacterium to glow red. Watts mentions this work in his book Echopraxia as follows:
"The sequence spells a message and codes for a protein. The protein fluoresces and contains a response. It’s not contamination or lateral transfer. It’s a poem."
There's a more verbose explanation in this interview of Bök:
I honestly can't tell if this is truly clever, like in the way a skilled poet can combine vocabulary and meter constraints to generate wonderful phrases (Kubla Khan poem being a nice example), or just a mechanical process. I am also confused how he managed to engineer and predict the folding and functionality of a fluorescent protein (presumably by borrowing a known sequence?). Ultimately, I see this more like an incomplete quine, or something similar but not identical to a quine.
> I honestly can't tell if this is truly clever, like in the way a skilled poet can combine vocabulary and meter constraints to generate wonderful phrases (Kubla Khan poem being a nice example), or just a mechanical process.
I can't recall if Bök gave details about his methodology, but my guess is that he brute-forced ciphers until he found a suitable one. And he’s quite good at constraint-based (Oulipo) poetry.
> I am also confused how he managed to engineer and predict the folding and functionality of a fluorescent protein (presumably by borrowing a known sequence?).
The designed protein (Protein 13) is not fluorescent. He’s expressing it as a Protein 13-mCherry[1] fusion construct.
There is a bit more flexibility to this than 1:1 mappings, since there are more codons (64) than amino acids coded (20). You could have both CUU and CUC be different characters on the DNA side, that both map to same character on the protein side.
Plausible alternative would be to have the codons or amino acids still code the other half, but have pairs of nucleotides code a 1.5 times longer poem. This would restrict you to 16 different characters, vs. 64 possible codons (minus a few stop codons).
There are around 20-22 amino acids commonly used by known life, so that already restricts you to a bit smaller alphabet than 26 letters.
In this particular case you can consider codons coding for the same amino acid as synonymous, restricting, as you mentioned, the possible mappings to the ~20 proteinogenic amino acids.
Another possibility for expansion would be to take advantage of the genetic code’s degeneracy/redundancy and reprogram it to allow non-canonical amino acids in certain synonymous codons.