|
How to use PatchTM
|
|
Introduction |
The PatchTM tool
is designed for searching potential binding sites for transcription factors
(TF binding sites) in any sequence which may be of interest. The patterns,
which PatchTM uses for searching,
are TF binding sites of the TRANSFAC® Professional database and
the consensus sequences of weight matrices of TRANSFAC® Professional.
So, these patterns consist of iupaccode characters, which are :
|
code |
description |
A |
Adenine |
C |
Cytosine |
G |
Guanine |
T |
Thymine |
U |
Uracil |
R |
Purine (A or G) |
Y |
Pyrimidine (C, T, or U) |
M |
C or A |
K |
T, U, or G |
W |
T, U, or A |
S |
C or G |
B |
C, T, U, or G (not A) |
D |
A, T, U, or G (not C) |
H |
A, T, U, or C (not G) |
V |
A, C, or G (not T, not U) |
N |
Any base (A, C, G, T, or U) |
|
Viewing
and deleting user files ( the box on top of the page) |
|
Viewing or
deleting previous results |
The results of each search you perform with
PatchTM will be stored in your user
directory. You can view or delete these results on top of the page. |
|
Deleting
a previously stored sequence |
Each sequence you enter into the PatchTM
form will be stored (see also below). You can delete the
sequences you do not want to keep any longer on top of the page. |
|
Viewing or
deleting a user defined set of sites |
If you have searched for TF binding sites in
TRANSFAC® Professional, you have the option to start a PatchTM
search for exactly these binding sites. Therefore, your selected set
of sites is stored.
Such sets of sites can be found in the list
user defined set of sites on top of the page. You can delete sets
you do not need any longer or view them by pressing the buttons Delete
or View. If you choose to view a set, a new page will be displayed
giving you a list of all sites included in this set. This page also
offers the opportunity to specify a new name for the set.
|
Starting
a new search |
|
1. Enter a name
for your search |
You should first enter a name for your search,
since PatchTM will store your search
result under that name. If you do not enter a name, PatchTM
uses "default" as result name. |
|
2. Select
a sequence |
You have three options for selecting a sequence
you would like to search:
|
|
a) Select one of your stored
sequences:
If you select this option, you can choose among
the sequences you have entered for a previous search.
|
|
b) Select an example:
If you choose this option, an example sequence
will be used for your search. It is the 5' flank of the Rat tyrosine aminotransferase (TAT) gene (EMBL: M34257).
|
|
c) Enter a new sequence:
To run the search with a new sequence, you
should first enter a name for it. The sequence will be stored under that
name so that you can use it again for a later search. Next, you can insert
your sequence.
The following formats are accepted: FASTA,
TRANSFAC, EMBL, GenBank, IG, and RAW. (RAW format means the pure sequence.)
Examples of each format are given below. The iupaccode characters 'B',
'D', 'H', 'K', 'M', 'R', 'S', 'V', 'W', 'Y' within a sequence are changed
to 'N'. Using the same format for all sequences, you can always enter one
or several sequences at a time - with one exception: In RAW format it is
only possible to enter one sequence at a time.
|
|
RAW format: (all
newlines and whitespaces will be ignored) |
|
acacgtagctagctagctgatcgtagctagtcgatcgtagctagctagctgatcgatgctagctgatcgtagctagtcgatag
tctagctagctagtcgatcgtagctagtcgatgctagctagctgtgtgtagctagtcgatcgatgctagctgatcgatcgtaa
gtctgatctagctagctagcgatcgtagctgatcgtagctagcatgctagtcgatgca
|
FASTA format: |
|
>seq1
acagctagctacgatgatcgatcgatgctacgtcgtagtacgatcgtacg
|
TRANSFAC
format: (Only the fields essentially needed to recognize an entry in TRANSFAC
format are shown. More fields may be included.) |
|
AC R00106
XX
ID MOUSE$AAMY_02
XX
SQ CTCCATGGGAGTTTCTGAAGAACCTTCAGCTGTGCAC.
XX
//
|
EMBL format:
(Only the fields essentially needed to recognize an entry in EMBL format
are shown. More fields may be included.) |
|
ID ZMADH1P standard; DNA; PLN; 360 BP.
XX
SQ Sequence 360 BP; 63 A; 92 C; 97 G; 108 T; 0 other;
ctgcagcccc ggtttcgcaa gccgcgcacg tggtttgctt gcccacaggc ggccaaaccg 60
caccctcctt cccgtcgttt cccatctctt cctcctttag agctaccact atataaatca 120
gggctcattt tctcgctcct cacaggctca tctcgctttg gatcgattgg tttcgtaact 180
ggtgagggac tgagggtctc ggagtggatt gatttgggat tctgttcgaa gatttgcgga 240
ggggggcaat ggcgaccgcg gggaaggtga tcaagtgcaa aggtccgcct tgtttctcct 300
ctgtctcttg atctgactaa tcttggttta tgattcgttg agtaattttg gggaaagctt 360
//
|
GenBank
format: (Only the fields essentially needed to recognize GenBank format
are shown. You may include more fields.) |
|
LOCUS MZEADH1P 360 bp DNA PLN 13-JUN-1996
ACCESSION K03285
1 ctgcagcccc ggtttcgcaa gccgcgcacg tggtttgctt gcccacaggc ggccaaaccg
61 caccctcctt cccgtcgttt cccatctctt cctcctttag agctaccact atataaatca
121 gggctcattt tctcgctcct cacaggctca tctcgctttg gatcgattgg tttcgtaact
181 ggtgagggac tgagggtctc ggagtggatt gatttgggat tctgttcgaa gatttgcgga
241 ggggggcaat ggcgaccgcg gggaaggtga tcaagtgcaa aggtccgcct tgtttctcct
301 ctgtctcttg atctgactaa tcttggttta tgattcgttg agtaattttg gggaaagctt
//
|
IG format: |
|
;seq_1
seq_1
acagctagtcgatcgatcgatgctagctgatcgtagctgatcgtagctaacgtgtagctagtcgacgtagctacgg1
|
|
|
3. Select a set of sites
to search for |
On top of the left column of the PatchTM
interface you can specify which sites you would like to use for your search.
You can either use our predefined sets of sites or user defined sets.
To select one of our predefined sets,
mark this option and select one or several of the following sets from the list: all
sites from TRANSFAC® , consensus sites, virus sites, vertebrate sites,
plant sites, fungi sites, nematode sites and insect sites. All these sets
of sites include both sequences from the TRANSFAC® site table and consensus
sequences of the matrices from the TRANSFAC® matrix table.
If you have created a set of
sites using the the TRANSFAC® search engine, you will find this set among the user defined sets. The names of these sets are set up in the following way: "month_day_hour-min-sec.sequences"
To create a set of sites with the TRANSFAC® search engine, please follows these steps:
- Please use the TRANSFAC® query from "SITE SEARCH" to search for specific
binding sites in TRANSFAC®. For example, you can enter AP-1 in the
textfield "Search Term" and then select "Binding Factor" in the "Quick
Search Fields" . When you then press the "Submit Query" button you will
receive a list of AP-1 binding sites. Next to each site entry you will
find a box.
- Please mark the boxes for those entries that you would like to
include in a PatchTM search.
- Then scroll to the bottom of the list.
Here you will find a box with the text "Run PatchTM with marked
entries". Please mark this box also.
- Now please click on "Show marked
entries/Start PATCHTM ". PatchTM will then be started and you will find your
selection of sites among the user defined set of sites.
|
4. Additional
parameters |
|
Minimum length
This parameter specifies the minimum length for sites which are shown in the Patch outputs.
Using the default value 4, only sites longer than or equal to 4 will appear in the output.
|
|
Maximum number of mismatches
It would be more precise to call this parameter
the maximum number of local mismatches, as it specifies how many positions
may differ when comparing a binding site (search pattern) with some part
of the input sequence. A match between the whole site and
the input sequence has been found, if the actual number of (local) mismatches is lower
than or equal to the maximum number of (local) mismatches.
Please be careful when selecting this value.
The maximum length of the sites searched for (search patterns) is restricted
to 2*maximal_number_of_mismatches+1. That means, if you set the
maximal number of mismatches to 5, PatchTM searches only for sites
which are longer than 11bp. All shorter ones will be ignored. The default
value for this parameter is 2.
|
|
Mismatch penalty
When comparing a binding site (search pattern) with some part of the input sequence, each mismatching position will receive a mismatch penalty. This penalty value will have a negative influence on the overall score for the match between the whole site (search pattern) and the input sequence. Each matching nucleotide receives a bonus weight of 100. So, the default value for the mismatch penalty is also 100, and the negative influence of a mismatching position corresponds to the positive influence of a matching position. If you reduce the mismatch penalty, you will receive high scoring sites containing mismatches in the Patch output. If you increase this parameter, high scoring sites are not likely to contain mismatches.
|
|
Lower score boundary
The lower score boundary is a cut-off, which
defines which matches between a site (search pattern) and the input sequence
will be listed in the output. The score which is estimated for every match
has to be higher than or equal to this cut-off. The default value for the
lower score boundary is 87.5.
|
|
The results
page |
On the results page you will find a table of
the matches found. The table consists of the following columns:
|
identifier |
identifier of the site that has been found
(For consensus sites of matrices the identifier
of the respective matrix is given.) Each identifier is linked to the respective
TRANSFAC® entry. |
|
position |
position of the match in the input sequence
(-) ( or (+)) means the match was found in
(-)-strand (or (+)-strand). |
|
mismatches |
number of mismatching positions found within
a match between the whole site (search pattern) and the input sequence |
|
score |
score for the match between the whole site
(search pattern) and the input sequence |
|
binding factor |
list of factors known to bind to the site that has been found.
(Each factor is linked to the respective TRANSFAC® entry.) For some sites there are no binding factors given, because it is yet unknown which particular factor is binding to them. For example, that might be the case for sites which have been experimentally identified by footprint analysis. |
|
sequence |
sequence of the site found at that position
This sequence was the actual search pattern. |
You can also view a graphic of your results, in
which arrows mark the position of matches.
The last line of the results page gives the total length of all sequences which have been searched.
A flatfile version of the results can be found in the directory:
"<CGI-BIN-DIRECTORY-OF-WEB-SERVER>/biobase/transfac/<VERSION>/patch/etc/usr/<USER-LOGIN>/"
The flatfiles have the ending ".out". |
|