Dennis, a professor of computer science at New York University, is the author of four puzzle books. He can be contacted at [email protected].
Pulling a card out of the inside pocket of his well-tailored, dark suit, the professor presented it to Ecco. It read Ming Thomas, PhD, protein industrialist. "I've come with a project," Thomas began after greeting us and taking a seat. "In the early days of molecular biology, people assertedwith the authority that only uncertainty could inspirethat every gene generates one protein.
"Now it seems that there are at least a few genes that produce thousands of proteins. Let me explain how.
"A gene is a sequence of DNA, but, in higher organisms, that DNA alternates between strings that in fact produce portions of proteins (called 'exons') and strings that don't (called 'introns'). Thus, a gene sequence has the form E1 I1 E2 I2 E3 I3... where the Es represent exons and the Is represent introns.
"Genes can produce many proteins because any (not necessarily consecutive) subsequence of exons can form a protein. For example, E2 E4 E5 can form a protein as can E1 E2 E7, but E6 E4 E5 cannot because the ordering E6 E4 violates the order of the original exon sequence. E3 E3 E5 cannot form a protein either because an exon at a given position cannot be repeated.
"When manufacturing proteins at industrial scale, we can handle up to seven exons. Our expense is directly related to the total length of those exons. We hope you can minimize our expense.
"Our first client wants us to generate 15 hydrophobic proteins that are alanine heavy. They believe these will act like sticky balls floating on top of water allowing translucent water sculpture. Think Los Angeles swimming pools. We want help designing the exons in order to minimize their size. I know you like warm-ups, so here is one. Suppose we could use only three exons and we wanted to generate the following proteins (where each amino acid is represented by a single letter; for example, Alanine is A):
GA
GAGAS
GAS
RAGA
RAGAS
What would the exons have to be to generate these proteins, trying to minimize the total length of the exons?"
Solution to Warm-Up:
The following three exons could do this, having a total length of seven.
RA
GA
GAS
"Just a minute," Ecco interrupted turning to his 17-year-old niece Liane, who had been listening in. "Liane, isn't the biology here somewhat more complicated?"
"Well, yes, but probably not in an essential way," Liane responded. "DNA doesn't literally consist of amino acids, but rather, an alphabet of 'nucleotides' whose nonoverlapping consecutive triplets are translated to amino acids. So, when Dr. Thomas speaks of minimizing the length of the exons, he formally means minimizing the number of nucleotides. Provided each exon's length is a multiple of three, however, the problems are mathematically identical because minimizing the number of amino acids produced by the exons minimizes the number of nucleotides in the exons themselves."
"I couldn't have explained this better myself," said Thomas visibly impressed.
"For many reasons, we want each exon to generate full amino acids, so each exon's length is in fact a multiple of three. Therefore, we can view each exon as consisting of the amino acid string it generates. Now do you understand the warm-up?"
"Sure," said 11-year-old Tyler.
"The protein RAGAS is generated from the RA and GAS exons, for example. RAGA is generated from the first two exons and GAGAS from the last two. So give us your big challenge."
Ming Thomas chuckled. "May I hire your whole family, Dr. Ecco?"
"We're all confirmed puzzle freaks," Ecco responded with a smile. "Do tell us which proteins you want."
"Here they are," said Thomas. "Remember that you are allowed seven exons and we want to minimize the total length (in amino acids) of those exons:
AGPA
APASAG
APASARAGPA
APASARASA
APASARASAPA
CAAPASAGASAPA
CAAPASARAG
CAAPASARPA
CARAPAPAS
CARAPAPASAGASA
CARAPAPASPA
CARAPASA
RAPAPASAGPA
RAPAPASASAPA
RAPASA
- Can you find an encoding into exons whose total amino acid length is 20 or less?
- Please give it a shot.
Ecco turned to the children after Thomas left: "The longest protein in Dr. Thomas's last problem had a length of only 18. It is therefore conceivable that nine two-amino-acid exons would have been sufficient. Our solution required 11. Could we have done better?"
- What do you think?
Liane and Tyler worked this out.
"Very nice," said Thomas. "That's better than the solution we had thought of. Very nice work.
"Here is a follow-up question: One of our biochemists says he can manipulate up to 11 exons provided each produces two amino acids. In that case, what is the smallest total amino acid length of exons to create the following 15 proteins?
BAPAFADAFACA
BAPAGAPADA
RABAPAGADAFACA
RASA
RASAGAPAFAFACA
RASATABAPAGAPAFACA
RASATABAPAGAPAFAFA
RATAGAPAFADAFA
SABAPAFADACA
SAPADA
SAPAPAFADAFACA
SATABAGAPADAFA
SATABAPAGADAFACA
SATAPAGAPAFA
TABACA
Ecco helped his nephew and niece solve the problem this time. When Thomas saw the solution, he nodded and said, "Excellent. We have a long consulting arrangement ahead of us."
DDJ