Cut-offs:

Cut-offs for core  and matrix similarity : The matrix similarity is a score that describes the quality of a match between a matrix and an arbitrary part of the input sequences. Analogously, the core similarity denotes the quality of a match between the core sequence of a matrix (i.e. the five most conserved positions within a matrix) and a part of the input sequence. A match has to contain the "core sequence " of a matrix, i.e. the core sequence has to match with a score higher than or equal to the core similarity cut-off.  In addition, only those matches which score higher than or equal to the matrix similarity  threshold appear in the output.

Cut-off to minimize false positive matches (minFP) :
In order to estimate this cut-off, which will reduce the number of random sites found by MatchTM , we have applied the algorithm described above to second exon and third exon sequences, because these sequences are presumed to contain no biologically relevant TF binding sites. For every matrix the lowest cut-off for which no match is found in the set of exon sites is considered to be the minFP cut-off.

When a minFP cut-off is applied for searching a DNA sequence, the algorithm will find a relatively low number of matches per nucleotide. In the output the user will only find putative sites with a good similarity to the weight matrix; however, some known genomic binding sites could not be recognized. This kind of cut-off is useful, for example, for searching the most promising potential binding sites in the extended genomic DNA sequences.

Cut-off to minimize false negative matches (minFN):
We used sets of generated oligonucleotides for estimating the cut-offs to minimize the false negative rate, using actual weight matrices to calculate the probability of a nucleotide occurring at a certain position of a binding site.
For each matrix we applied the MatchTM  algorithm to these test sequence sets without using any cut-offs. Then we set the cut-off to a value that provides recognition of at least 90% of oligonucleotides. We decided to tolerate an error rate of ten percent. We call this set of cut-offs minFN cut-offs.

Applying the minFN cut-offs, the user will find most genomic binding sites, but in this case a high rate of false positives should be taken into account as well. The minFN cut-offs are useful for the detailed analysis of relatively short DNA fragments.

Cut-off to minimize the sum of both error rates (min SUM):
We compute a sum of both error rates to find cut-offs that give an optimal number of false positives and false negatives. To do so, we compute the number of matches found in the exon sequences for each matrix using a cut-off allowing 10% of false negative matches (minFN10). This number is defined as 100% of false positives. The sum of corresponding percentages for false positives and false negatives is then computed for every cut-off ranging from minFN10 to minFP. We refer to the cut-off that gives the minimum sum as minSum cut-off.
 

FP10:
This cut-off allows a false negative rate of 10%. Please keep in mind that minFN and FN10 are identical. The number of false positive matches for this cut-off is given in brackets. The false positive rate was estimated on exon 3 sequences, while sets of generated oligonucleotides were used to calculate the false negative rate.
 

FP30:
This cut-off allows a false negative rate of 30%. The number of false positive matches for this cut-off is given in brackets. The false positive rate was estimated on exon 3 sequences, while sets of generated oligonucleotides were used to calculate the false negative rate.
 

FP50:
This cut-off allows a false negative rate of 50%. The false positive rate was estimated on exon 3 sequences, while sets of generated oligonucleotides were used to calculate the false negative rate.
 

FP70:
This cut-off allows a false negative rate of 70%. The number of false positive matches for this cut-off is given in brackets. The false positive rate was estimated on exon 3 sequences, while sets of generated oligonucleotides were used to calculate the false negative rate.
 

FP90:
This cut-off allows a false negative rate of 90%. The number of false positive matches for this cut-off is given in brackets. The false positive rate was estimated on exon 3 sequences, while sets of generated oligonucleotides were used to calculate the false negative rate.
 

Current Cut-offs:
You can enter your own cut-offs in the field current. If you are editing an existing profile, you will find here the cut-offs which are currently stored in the profile.

Profile:
We use the term "profile" for a specific subset of weight matrices from the TRANSFAC® library with core similarity cut-off values and matrix similarity cut-off values for each matrix.

Matrix Quality: The high quality criterion denotes the following:
When using a matrix with a cut-off which allows a false negative rate of 50%, the frequency of matches found in exon3 sequences (false positive rate) must drop below a certain threshold. This threshold is defined so that the matrices which produce the highest number of false positive matches are defined as low quality matrices (about 30% of the TRANSFAC matrices).