How to use PatchTM
 
 
 
 
Introduction
 
The PatchTM tool is designed for searching potential binding sites for transcription factors (TF binding sites) in any sequence which may be of interest. The patterns, which PatchTM uses for searching, are TF binding sites of the TRANSFAC® Professional database and the consensus sequences of weight matrices of TRANSFAC® Professional.

So, these patterns consist of iupaccode characters, which are : 


code description
A Adenine
C Cytosine
G Guanine
T Thymine
U Uracil
R Purine (A or G)
Y Pyrimidine (C, T, or U)
M C or A
K T, U, or G
W T, U, or A
S C or G
B C, T, U, or G (not A)
D A, T, U, or G (not C)
H A, T, U, or C (not G)
V A, C, or G (not T, not U)
N Any base (A, C, G, T, or U)




Viewing and deleting user files ( the box on top of the page)
 
Viewing or deleting previous results
The results of each search you perform with PatchTM will be stored in your user directory. You can view or delete these results on top of the page.
 
Deleting a previously stored sequence
Each sequence you enter into the PatchTM form will be stored (see also below). You can delete the sequences you do not want to keep any longer on top of the page. 
 
Viewing or deleting a user defined set of sites
If you have searched for TF binding sites in TRANSFAC® Professional, you have the option to start a PatchTM search for exactly these binding sites. Therefore, your selected set of sites is stored. 
Such sets of sites can be found in the list user defined set of sites on top of the page. You can delete sets you do not need any longer or view them by pressing the buttons Delete or View. If you choose to view a set, a new page will be displayed giving you a list of all sites included in this set. This page also offers the opportunity to specify a new name for the set. 



Starting a new search
 
1. Enter a name for your search
You should first enter a name for your search, since PatchTM will store your search result under that name. If you do not enter a name, PatchTM uses "default" as result name.
 
2. Select a sequence
You have three options for selecting a sequence you would like to search:
  a) Select one of your stored sequences:
If you select this option, you can choose among the sequences you have entered for a previous search.

  b) Select an example:
If you choose this option, an example sequence will be used for your search. It is the 5' flank of the Rat tyrosine aminotransferase (TAT) gene (EMBL: M34257).

  c) Enter a new sequence:
To run the search with a new sequence, you should first enter a name for it. The sequence will be stored under that name so that you can use it again for a later search. Next, you can insert your sequence.
The following formats are accepted: FASTA, TRANSFAC, EMBL, GenBank, IG, and RAW. (RAW format means the pure sequence.) Examples of each format are given below. The iupaccode characters 'B', 'D', 'H', 'K', 'M', 'R', 'S', 'V', 'W', 'Y' within a sequence are changed to 'N'. Using the same format for all sequences, you can always enter one or several sequences at a time - with one exception: In RAW format it is only possible to enter one sequence at a time.

 
  RAW format:
  (all newlines and whitespaces will be ignored)
 
acacgtagctagctagctgatcgtagctagtcgatcgtagctagctagctgatcgatgctagctgatcgtagctagtcgatag
tctagctagctagtcgatcgtagctagtcgatgctagctagctgtgtgtagctagtcgatcgatgctagctgatcgatcgtaa
gtctgatctagctagctagcgatcgtagctgatcgtagctagcatgctagtcgatgca


  FASTA format:
 
>seq1
acagctagctacgatgatcgatcgatgctacgtcgtagtacgatcgtacg


  TRANSFAC format:
  (Only the fields essentially needed to recognize an entry in TRANSFAC format are
  shown. More fields may be included.)
 
AC  R00106
XX
ID  MOUSE$AAMY_02
XX
SQ  CTCCATGGGAGTTTCTGAAGAACCTTCAGCTGTGCAC.
XX
//


  EMBL format:
  (Only the fields essentially needed to recognize an entry in EMBL format are shown. More fields may be included.)
 
ID   ZMADH1P standard; DNA; PLN; 360 BP.
XX 
SQ   Sequence 360 BP; 63 A; 92 C; 97 G; 108 T; 0 other;
     ctgcagcccc ggtttcgcaa gccgcgcacg tggtttgctt gcccacaggc ggccaaaccg 60
     caccctcctt cccgtcgttt cccatctctt cctcctttag agctaccact atataaatca 120
     gggctcattt tctcgctcct cacaggctca tctcgctttg gatcgattgg tttcgtaact 180
     ggtgagggac tgagggtctc ggagtggatt gatttgggat tctgttcgaa gatttgcgga 240
     ggggggcaat ggcgaccgcg gggaaggtga tcaagtgcaa aggtccgcct tgtttctcct 300
     ctgtctcttg atctgactaa tcttggttta tgattcgttg agtaattttg gggaaagctt 360
//


  GenBank format:
  (Only the fields essentially needed to recognize GenBank format are shown. You may include more fields.)
 
LOCUS     MZEADH1P 360 bp DNA PLN 13-JUN-1996
ACCESSION K03285
            1 ctgcagcccc ggtttcgcaa gccgcgcacg tggtttgctt gcccacaggc ggccaaaccg
           61 caccctcctt cccgtcgttt cccatctctt cctcctttag agctaccact atataaatca
          121 gggctcattt tctcgctcct cacaggctca tctcgctttg gatcgattgg tttcgtaact
          181 ggtgagggac tgagggtctc ggagtggatt gatttgggat tctgttcgaa gatttgcgga
          241 ggggggcaat ggcgaccgcg gggaaggtga tcaagtgcaa aggtccgcct tgtttctcct
          301 ctgtctcttg atctgactaa tcttggttta tgattcgttg agtaattttg gggaaagctt
//


  IG format:
 
;seq_1
seq_1
acagctagtcgatcgatcgatgctagctgatcgtagctgatcgtagctaacgtgtagctagtcgacgtagctacgg1




3. Select a set of sites to search for
On top of the left column of the PatchTM  interface you can specify which sites you would like to use for your search. You can either use our predefined sets of sites or user defined sets. 
To select one of our predefined sets, mark this option and select one or several of the following sets from the list: all sites from TRANSFAC® , consensus sites, virus sites, vertebrate sites, plant sites, fungi sites, nematode sites and insect sites. All these sets of sites include both sequences from the TRANSFAC® site table and consensus sequences of the matrices from the TRANSFAC® matrix table.
If you have created a set of sites using the the TRANSFAC®  search engine, you will find this set among the user defined sets. The names of these sets are set up in the following way: "month_day_hour-min-sec.sequences"

To create a set of sites with the TRANSFAC®  search engine, please follows these steps:
  1. Please use the TRANSFAC®  query from "SITE SEARCH" to search for specific binding sites in TRANSFAC®. For example, you can enter AP-1 in the textfield "Search Term" and then select "Binding Factor" in the "Quick Search Fields" . When you then press the "Submit Query" button you will receive a list of AP-1 binding sites. Next to each site entry you will find a box.
  2. Please mark the boxes for those entries that you would like to include in a PatchTM   search.
  3. Then scroll to the bottom of the list. Here you will find a box with the text "Run PatchTM   with marked entries". Please mark this box also.
  4. Now please click on "Show marked entries/Start PATCHTM ". PatchTM   will then be started and you will find your selection of sites among the user defined set of sites.




4. Additional parameters
  Minimum length
This parameter specifies the minimum length for sites which are shown in the Patch outputs. Using the default value 4, only sites longer than or equal to 4 will appear in the output.

  Maximum number of mismatches
It would be more precise to call this parameter the maximum number of local mismatches, as it specifies how many positions may differ when comparing a binding site (search pattern) with some part of the input sequence. A match between the whole site and the input sequence has been found, if the actual number of (local) mismatches is lower than or equal to the maximum number of (local) mismatches. 
Please be careful when selecting this value. The maximum length of the sites searched for (search patterns) is restricted to 2*maximal_number_of_mismatches+1. That means, if you set the maximal number of mismatches to 5, PatchTM  searches only for sites which are longer than 11bp. All shorter ones will be ignored. The default value for this parameter is 2.

  Mismatch penalty
When comparing a binding site (search pattern) with some part of the input sequence, each mismatching position will receive a mismatch penalty. This penalty value will have a negative influence on the overall score for the match between the whole site (search pattern) and the input sequence. Each matching nucleotide receives a bonus weight of 100. So, the default value for the mismatch penalty is also 100, and the negative influence of a mismatching position corresponds to the positive influence of a matching position. If you reduce the mismatch penalty, you will receive high scoring sites containing mismatches in the Patch output. If you increase this parameter, high scoring sites are not likely to contain mismatches.

  Lower score boundary
The lower score boundary is a cut-off, which defines which matches between a site (search pattern) and the input sequence will be listed in the output. The score which is estimated for every match has to be higher than or equal to this cut-off. The default value for the lower score boundary is 87.5.



The results page

On the results page you will find a table of the matches found. The table consists of the following columns:

  identifier identifier of the site that has been found
(For consensus sites of matrices the identifier of the respective matrix is given.) Each identifier is linked to the respective TRANSFAC® entry.
  position position of the match in the input sequence
(-) ( or (+)) means the match was found in (-)-strand (or (+)-strand).
  mismatches number of mismatching positions found within a match between the whole site (search pattern) and the input sequence
  score score for the match between the whole site (search pattern) and the input sequence
  binding factor list of factors known to bind to the site that has been found.
(Each factor is linked to the respective TRANSFAC® entry.) For some sites there are no binding factors given, because it is yet unknown which particular factor is binding to them. For example, that might be the case for sites which have been experimentally identified by footprint analysis.
  sequence sequence of the site found at that position
This sequence was the actual search pattern.

You can also view a graphic of your results, in which arrows mark the position of matches. 
The last line of the results page gives the total length of all sequences which have been searched.

A flatfile version of the results can be found in the directory:
"<CGI-BIN-DIRECTORY-OF-WEB-SERVER>/biobase/transfac/<VERSION>/patch/etc/usr/<USER-LOGIN>/"
The flatfiles have the ending ".out".