How to use Patch^TM

Introduction

Viewing and deleting user files (the box on top of the page)

Viewing or deleting previous results
Deleting a previously stored sequence
Viewing or deleting a user defined set of sites

Starting a new search

1. Enter a name for your search
2. Select a sequence

a) Select one of your stored sequences
b) Select an example
c) Enter a new sequence

3. Select a set of sites to search for
4. Additional parameters

The results page

Introduction

The Patch^TM tool is designed for searching potential binding sites for transcription factors (TF binding sites) in any sequence which may be of interest. The patterns, which Patch^TMuses for searching, are TF binding sites of the TRANSFAC^® Professional database and the consensus sequences of weight matrices of TRANSFAC^® Professional.

So, these patterns consist of iupaccode characters, which are :

code	description
A	Adenine
C	Cytosine
G	Guanine
T	Thymine
U	Uracil
R	Purine (A or G)
Y	Pyrimidine (C, T, or U)
M	C or A
K	T, U, or G
W	T, U, or A
S	C or G
B	C, T, U, or G (not A)
D	A, T, U, or G (not C)
H	A, T, U, or C (not G)
V	A, C, or G (not T, not U)
N	Any base (A, C, G, T, or U)

Viewing and deleting user files ( the box on top of the page)

Viewing or deleting previous results

The results of each search you perform with Patch^TM will be stored in your user directory. You can view or delete these results on top of the page.

Deleting a previously stored sequence

Each sequence you enter into the Patch^TM form will be stored (see also below). You can delete the sequences you do not want to keep any longer on top of the page.

Viewing or deleting a user defined set of sites

If you have searched for TF binding sites in TRANSFAC^® Professional, you have the option to start a Patch^TM search for exactly these binding sites. Therefore, your selected set of sites is stored.
Such sets of sites can be found in the list user defined set of sites on top of the page. You can delete sets you do not need any longer or view them by pressing the buttons Delete or View. If you choose to view a set, a new page will be displayed giving you a list of all sites included in this set. This page also offers the opportunity to specify a new name for the set.

Starting a new search

1. Enter a name for your search

You should first enter a name for your search, since Patch^TM will store your search result under that name. If you do not enter a name, Patch^TMuses "default" as result name.

2. Select a sequence

You have three options for selecting a sequence you would like to search:

a) Select one of your stored sequences:
If you select this option, you can choose among the sequences you have entered for a previous search.

b) Select an example:
If you choose this option, an example sequence will be used for your search. It is the 5' flank of the Rat tyrosine aminotransferase (TAT) gene (EMBL: M34257).

c) Enter a new sequence:
To run the search with a new sequence, you should first enter a name for it. The sequence will be stored under that name so that you can use it again for a later search. Next, you can insert your sequence.
The following formats are accepted: FASTA, TRANSFAC, EMBL, GenBank, IG, and RAW. (RAW format means the pure sequence.) Examples of each format are given below. The iupaccode characters 'B', 'D', 'H', 'K', 'M', 'R', 'S', 'V', 'W', 'Y' within a sequence are changed to 'N'. Using the same format for all sequences, you can always enter one or several sequences at a time - with one exception: In RAW format it is only possible to enter one sequence at a time.

RAW format: (all newlines and whitespaces will be ignored)
	acacgtagctagctagctgatcgtagctagtcgatcgtagctagctagctgatcgatgctagctgatcgtagctagtcgatag tctagctagctagtcgatcgtagctagtcgatgctagctagctgtgtgtagctagtcgatcgatgctagctgatcgatcgtaa gtctgatctagctagctagcgatcgtagctgatcgtagctagcatgctagtcgatgca
FASTA format:
	>seq1 acagctagctacgatgatcgatcgatgctacgtcgtagtacgatcgtacg
TRANSFAC format: (Only the fields essentially needed to recognize an entry in TRANSFAC format are shown. More fields may be included.)
	AC R00106 XX ID MOUSE$AAMY_02 XX SQ CTCCATGGGAGTTTCTGAAGAACCTTCAGCTGTGCAC. XX //
EMBL format: (Only the fields essentially needed to recognize an entry in EMBL format are shown. More fields may be included.)
	ID ZMADH1P standard; DNA; PLN; 360 BP. XX SQ Sequence 360 BP; 63 A; 92 C; 97 G; 108 T; 0 other; ctgcagcccc ggtttcgcaa gccgcgcacg tggtttgctt gcccacaggc ggccaaaccg 60 caccctcctt cccgtcgttt cccatctctt cctcctttag agctaccact atataaatca 120 gggctcattt tctcgctcct cacaggctca tctcgctttg gatcgattgg tttcgtaact 180 ggtgagggac tgagggtctc ggagtggatt gatttgggat tctgttcgaa gatttgcgga 240 ggggggcaat ggcgaccgcg gggaaggtga tcaagtgcaa aggtccgcct tgtttctcct 300 ctgtctcttg atctgactaa tcttggttta tgattcgttg agtaattttg gggaaagctt 360 //
GenBank format: (Only the fields essentially needed to recognize GenBank format are shown. You may include more fields.)
	LOCUS MZEADH1P 360 bp DNA PLN 13-JUN-1996 ACCESSION K03285 1 ctgcagcccc ggtttcgcaa gccgcgcacg tggtttgctt gcccacaggc ggccaaaccg 61 caccctcctt cccgtcgttt cccatctctt cctcctttag agctaccact atataaatca 121 gggctcattt tctcgctcct cacaggctca tctcgctttg gatcgattgg tttcgtaact 181 ggtgagggac tgagggtctc ggagtggatt gatttgggat tctgttcgaa gatttgcgga 241 ggggggcaat ggcgaccgcg gggaaggtga tcaagtgcaa aggtccgcct tgtttctcct 301 ctgtctcttg atctgactaa tcttggttta tgattcgttg agtaattttg gggaaagctt //
IG format:
	;seq_1 seq_1 acagctagtcgatcgatcgatgctagctgatcgtagctgatcgtagctaacgtgtagctagtcgacgtagctacgg1

3. Select a set of sites to search for

On top of the left column of the Patch^TM interface you can specify which sites you would like to use for your search. You can either use our predefined sets of sites or user defined sets.
To select one of our predefined sets, mark this option and select one or several of the following sets from the list: all sites from TRANSFAC^® , consensus sites, virus sites, vertebrate sites, plant sites, fungi sites, nematode sites and insect sites. All these sets of sites include both sequences from the TRANSFAC^® site table and consensus sequences of the matrices from the TRANSFAC^® matrix table.
If you have created a set of sites using the the TRANSFAC^® search engine, you will find this set among the user defined sets. The names of these sets are set up in the following way: "month_day_hour-min-sec.sequences"

To create a set of sites with the TRANSFAC^® search engine, please follows these steps:

Please use the TRANSFAC^® query from "SITE SEARCH" to search for specific binding sites in TRANSFAC^®. For example, you can enter AP-1 in the textfield "Search Term" and then select "Binding Factor" in the "Quick Search Fields" . When you then press the "Submit Query" button you will receive a list of AP-1 binding sites. Next to each site entry you will find a box.
Please mark the boxes for those entries that you would like to include in a Patch^TM search.
Then scroll to the bottom of the list. Here you will find a box with the text "Run Patch^TM with marked entries". Please mark this box also.
Now please click on "Show marked entries/Start PATCH^TM". Patch^TM will then be started and you will find your selection of sites among the user defined set of sites.

4. Additional parameters

	Minimum length This parameter specifies the minimum length for sites which are shown in the Patch outputs. Using the default value 4, only sites longer than or equal to 4 will appear in the output.
	Maximum number of mismatches It would be more precise to call this parameter the maximum number of local mismatches, as it specifies how many positions may differ when comparing a binding site (search pattern) with some part of the input sequence. A match between the whole site and the input sequence has been found, if the actual number of (local) mismatches is lower than or equal to the maximum number of (local) mismatches. Please be careful when selecting this value. The maximum length of the sites searched for (search patterns) is restricted to *2maximal_number_of_mismatches+1**. That means, if you set the maximal number of mismatches to 5, Patch^TM searches only for sites which are longer than 11bp. All shorter ones will be ignored. The default value for this parameter is 2.
	Mismatch penalty When comparing a binding site (search pattern) with some part of the input sequence, each mismatching position will receive a mismatch penalty. This penalty value will have a negative influence on the overall score for the match between the whole site (search pattern) and the input sequence. Each matching nucleotide receives a bonus weight of 100. So, the default value for the mismatch penalty is also 100, and the negative influence of a mismatching position corresponds to the positive influence of a matching position. If you reduce the mismatch penalty, you will receive high scoring sites containing mismatches in the Patch output. If you increase this parameter, high scoring sites are not likely to contain mismatches.
	Lower score boundary The lower score boundary is a cut-off, which defines which matches between a site (search pattern) and the input sequence will be listed in the output. The score which is estimated for every match has to be higher than or equal to this cut-off. The default value for the lower score boundary is 87.5.

The results page

On the results page you will find a table of the matches found. The table consists of the following columns:

	identifier	identifier of the site that has been found (For consensus sites of matrices the identifier of the respective matrix is given.) Each identifier is linked to the respective TRANSFAC^® entry.
	position	position of the match in the input sequence (-) ( or (+)) means the match was found in (-)-strand (or (+)-strand).
	mismatches	number of mismatching positions found within a match between the whole site (search pattern) and the input sequence
	score	score for the match between the whole site (search pattern) and the input sequence
	binding factor	list of factors known to bind to the site that has been found. (Each factor is linked to the respective TRANSFAC^® entry.) For some sites there are no binding factors given, because it is yet unknown which particular factor is binding to them. For example, that might be the case for sites which have been experimentally identified by footprint analysis.
	sequence	sequence of the site found at that position This sequence was the actual search pattern.

You can also view a graphic of your results, in which arrows mark the position of matches.
The last line of the results page gives the total length of all sequences which have been searched.

A flatfile version of the results can be found in the directory:
"<CGI-BIN-DIRECTORY-OF-WEB-SERVER>/biobase/transfac/<VERSION>/patch/etc/usr/<USER-LOGIN>/"
The flatfiles have the ending ".out".