Team:MoWestern Davidson/project

Understanding SAT: a Lock and Door Analogy
A member of the NP-complete family (the most challenging of the non-deterministic polynomial time problems), the satisfiability (SAT) problem can be compared to an analogy of locks and keys. Imagine a hallway of 4 doors, each with one lock.



Every lock on each door can be opened by one of two different keys, represented by the upper case and lower case letters on each door - so the first lock can be opened by either the "G" (first green key) or the "B" (first blue key). As long as you have at least one of the keys for a particular lock, you can open the door. A person with three sets of keys, blue (B, b), green (G, g), and red (R, r), wants to find the combination of one blue key, one green  key, and one red key that will open all four of the doors. The graphic below illustrates three of the person's eight possible three-key combinations, and out of this set, only the first key ring can open all four doors. The first key ring is said to "satisfy" the door problem.

The Real Math
 Descriptions in parentheses are reference to the "lock and door" analogy. 

Literal=(key hole)= half of a variable; example (a)

Variable=(color)= both literals from a single variable; example (a,a')

Clause=(door)=a group of literals joined by OR; example (a OR b)

SAT Problem=(hallway of doors)= clauses joined together by AND; example (a or b) AND (a' OR b')

SAT

A SAT problem is made by joining a certain number of clauses (doors) together with the logical operator "AND" (AND implies that you have to satisfy (open) all of the clauses (doors) to find a solution to the problem). Each clause contains literals (key holes) joined together by the logical operator "OR" (here OR implies that you need at least one literal (keyhole) for the clause (door) to be satisfied (opened)) and these literals (key holes) are selected from the number of variables (the different key colors) being used. So translating the janitor's problem from above into mathematical terms would yield:

"(a OR b) AND (a OR b') AND (a OR c') AND (a' or c)"

Where variable (a,a' ) replaces the blue keys, (b,b' ) replaces the green keys and (c,c' ) replaces the red keys.

In mathematical terms, this problem is a 3-variable (3 colors), 2-SAT (2 key holes), 4 clause (4 doors) SAT problem. Keep in mind that you can change any of these parameters by adding more variables (colors), more clauses (more doors), and you can make the problem 3-SAT, 4-SAT, etc (3, 4, etc. key holes per door).

MAX SAT

A related problem, MAX SATuses the same terminology, but instead of wanting to be able to satisfy (open) all of the clauses (doors) in an all or none fashion, MAX SAT problems ask what is the greatest number of clauses (doors) that can be satisfied (opened). We studied a modified MAX SAT problem because we were interested in looking at all patterns of satisfaction, recording when one clause (door) was satisfied, 2 clauses, 3 clauses etc. (not just the maximum number satisfied (opened).

For this project we examined methods for solving both our modified MAX SAT problem and regular SAT

The Math in E. coli
Our team translated the MAX SAT NP-complete problem into a modified biological process. In cells, the process of transcription codes DNA into mRNA, and then the mRNA is translated into a protein that expresses a function. Just as an English sentence requires a specific grouping of letters to form words, DNA must be read in 3-nucleotide groupings called codons to continue successfully the process of protein production. Transfer RNAs (tRNAs) are RNA molecules with 3 nucleotide anticodon loops that match specifically to codon sequences and supply the amino acids according to those sequences. Amino acids are required to construct all proteins. If nucleotides are inserted into or deleted from an mRNA, the reading frame is shifted, causing a frameshift mutation. Frameshift mutations cause tRNAs to supply the wrong amino acids in the protein that produces a different function than the original function.

With this biological background, the connection can be made between the “door and key” analogy and the biological interpretation. Our “doors” are engineered reporter gene DNA sequences including the 5 nucleotide frameshift mutation “keyhole”, resulting in our frameshift leaders (FSL) (see figure below). The “keys” that may or may not open the door are engineered 5 nucleotide anticodon suppressor tRNAs. Opening the door, thus solving the MAX SAT problem, would allow the cell to communicate its outcome in the expression of the reporter gene (fluorescence/antibiotic resistance). The idea of using suppressor tRNAs to bypass designed mutation to solve a SAT problem is an extension of earlier work using suppressor logic from papers by Thomas J. Magliery, J. Christopher Anderson and Peter G. Schultz (Magliery et al. & Anderson et al.). These two publications used 2,4,5, and 6 nucleotide frameshift mutations. Their research demonstrated 4 and 5 base codon suppression was biologically feasible in their search to expand the genetic code. The difference in the original amino acid and the modified 5mer amino acid is the addition of the 2 nucleotides and the insertion of the amino acid serine.

Diagram of 2-SAT problem with mRNA encoding RFP and the mRNA contains two clauses, each with two 5mer frame shift mutations. Only one of the two 5mers can be suppressed and permit translation of RFP.

History of SAT
SAT was the first problem proven to be NP-complete in 1971 by Steve Cook, and with this proof came the concept of NP-completeness. The Soviet computer scientist Leonid Levin independently arrived at a similar proof at around the same time. In 1972, Dick Karp published a list of NP-complete problems, all linked to SAT. (3, 4)

Significance of SAT

 * Computer aided design applications
 * Computer circuit design
 * Logic synthesis
 * Automatic test pattern generation
 * Internet search algorithms