NetPro™
 

Documentation


Preface


NetPro™ is a bimolecular interaction database that is curated by the Indian bioinformatics company Molecular Connections. It covers more than 88,000 (March. 2005) expert curated and annotated molecular interactions. NetPro™ has been built using interaction data extracted with the proprietary information extraction engine, M-CAPS™. The data has been cross validated through manual curation. All the interactions are from peer reviewed published scientific literature and have gone through significant quality checks in terms of expert cross-checking by Molecular Connections' in-house scientific team.

NetPro™ data is linked to public Ids (like Entrez Gene), facilitating integration of interaction information into proprietary drug discovery databases. The protein-protein interactions in NetPro™ are complemented with annotations of the profiled proteins with scientific literature information on a variety of important subjects, including:
  • Biological Pathways
  • Diseases implicated
  • Domain information
  • Experimental Methods
  • Localization
  • Nature or property of the interacting molecules
  • Regulators
  • Species
Where applicable, NetPro™ uses controlled vocabulary for standardization and searchability.

TRANSPATH® & NetPro™


TRANSPATH® is a signal transduction database that puts emphasis on manual in-detail annotation of signaling mechanisms and pathway modeling. The data that NetPro™ adds to the integrated version is for the most part not redundant with the TRANSPATH® data, but augments it with a high number of mostly indirect relations that will give valuable support for network creation and analysis. The NetPro™ data has been enriched by BIOBASE, adding more hyperlinks to external databases and connecting the NetPro molecules to the hierarchical classification of TRANSPATH®.

Technically, the NetPro™ data is added to the TRANSPATH® flat files and matched to the existing fields. The flatfile integration allows to apply the TRANSPATH® search engine, the PathwayBuilder™, and the ArrayAnalyzer™ on the combined data. By default, both data sets are used, but for each of the three tools the usage of the NetPro™ data can be switched off. In the visualization, NetPro™ molecules and genes will appear in a brighter coloring, while reaction nodes will be smaller in size.

Statistics


Table:
TRANSPATH® 6.2 *
NetPro™ 2.1
 
  MOLECULE
28779
15766
 
  REACTION
52977
88907
 
  GENE
11157
3624
 
  REFERENCE
17618
25136
 
*Without swissprot imported entries

back to the top   next

Molecules


In the integration process, NetPro™ molecules of the class 'Complex', 'Family', 'Fusion', 'Group', 'Multi subunit', 'Protein', 'Small molecule', 'Subunit group' and 'Synergy' were added to the molecule flat file, whereas the classes 'DNA' and 'RNA' were put to the gene flat file.

Field   Content and format
AC Accession number   The accession number is the unique identifier for each entry. Its format is "NM" in capital letters followed by nine digits (e.g. NM000012345).
CO Copyright-information    
NA Molecule name   As a standard practice, symbol or preferred symbol of the molecule, which is given by Entrez Gene, is captured as Molecule name.
For visualization purposes we added the TRANSPATH® species tag to the name. This short identifier is useful in reaction names, because the experimental evidences for the reactions are frequently based on molecules from different species. The tag (v.s.) for vertebrate species is used when the exact (vertebrate) species has not been described in the reference and could also not be investigated from cited references or websites. NetPro™ uses the term 'Homo sapiens-e' (extrapolated) to indicate this fact.

species tag list   List of species tags

Small molecules: here the CAS ID specified in CHEMINDEX (http://ccinfoweb.ccohs.ca/chemindex/search.html) is used. If not present in CHEMINDEX, an Internal ID is assigned at Molecular Connections.

OS Species   The species captured in NetPro™ are restricted to those mentioned in the Entrez Gene database.
  • Brachydanio rerio
  • Bos taurus
  • Caenorhabditis elegans
  • Drosophila melanogaster
  • Gallus gallus
  • Homo sapiens
  • Human immunodeficiency virus 1
  • Mus musculus
  • Rattus norvegicus
  • Sus scrofa
  • Strongylocentrotus purpuratus
  • Xenopus laevis
Rules for species determination for interacting molecules in NetPro™:

If the molecule is endogenous to cell/tissue/body fluids, etc. species of which can be deciphered, the molecule symbol/name/Id for the specific species is assigned. Generally we consider the molecule to be endogenous:
  1. If the experiment is an in vivo study
  2. If there is no mention of the molecule in question as a recombinant type by the authors
If the Molecule is a recombinant or a transgene and if the source is specified by the authors, the molecule symbol/name /Id is captured as given. However in several instances, the origin of interacting molecules is not mentioned by the authors or cannot be deciphered due to lack of accessory information. In such cases, the molecule is assigned to be of human; however to distinguish such a molecule from molecules which are clearly of human origin, the species of these molecules are assigned as Homo sapiens-e (extrapolated to be of human).
CL Classification  
  • protein
  • small molecule
TY Type   The type of this molecule entry. Possible values are:
  • basic, for real isoforms which have a polypeptide chain
  • other, for small molecules such as lipids, second messenger (such as DAG, IP3, NO, cAMP...)
  • Family, for families in absence of evidence of the individual belonging members ( such as MAPK family, Aldehyde oxidase etc)
  • Group, for groups or classes in absence of evidence of the individual belonging members (such as Histone deacetylase class I, Alpha amylase etc)
  • Multi subunit, for multisubunit proteins which doesn’t have a single locus ID (such as Phosphoinositide-3-kinase, NF-kappa-B etc)
  • Subunit group, for groups of subunits that could form a multisubunit (such as Phosphoinositide-3-kinase regulatory subunit p85; Phosphoinositide-3-kinase, regulatory subunit etc)
  • Complex, for complexes formed b/w more than one protein (such as ITGA4:ITGB1 in complex, IL12A:IL12B in complex etc)
  • Fusion, for naturally occurring fusion proteins (such as Abl1:Bcr, ABL1:ETV6 etc)
  • Synergy, for molecules when act in synergy on the second molecule (such as IFNG:TNF synergistically acting on molecule B etc)
HP Superfamilies   Lists all groups or families this component belongs directly to (one hierarchical level above). This is a very important field, since abstracting common signaling behaviour is needed to avoid the explosion of entries. Where available, the NetPro basic entries have been linked to TRANSPATH® orthogroup and orthobasic entries.
DR External database hyperlink   Database name (e. g. Entrez Gene): database accession number; identifier.

The focus lies on linking to EMBL, Entrez Gene, UniGene, RefSeq, OMIM.
Also, corresponding Affymetrix micro-array probe set identifiers are listed. For the following chips data is available: U95A, U95B, U95C, U95D, U95E, U95Av2, U133A, U133B, HuGeneFL. The format is AFFYMETRIX:chip:probeset. Except for those from chip HuGeneFL, the Affymetrix links are based on those in Ensembl, v.14.31 for human and v.14.30 for mouse.
TP TRANSPATH entry   The corresponding TRANSPATH® entry.
ST Complex or modified form of   A list of molecules this molecule entry is a modified form of. Or, if it is a complex, synergy or fusion, then its subunits.
CX Complexs   A list of complexes this molecule is engaged in.
SN Synergy   A list of synergistically acting molecules this entry is engaged in.
FN Fusion   A list of fusion proteins this molecule is engaged in.
XB Reaction upstream   A list of reactions which produce this molecule (in the mechanistic view), or which lead to this molecule (in the semantic view). So the molecule serves either as a product or a signal acceptor.
XA Reaction downstream   A list of reactions that consume this molecule (in the mechanistic view), or which go out from this molecule (in the semantic view). Here the molecule serves either as a substrate or as a signal donor.
RN Reference number   [consecutive entry reference number].
A list of the papers from which the information in this entry was extracted.
RX PubMed database hyperlink   The PMID number in

PubMed   PubMed
RA Reference author(s)   List of authors.

PubMed   Reference
RT Reference title   Title of the paper.

PubMed   Reference
RL Reference publication   Publication details

PubMed   Reference

back to the top   next

Genes


Field   Content and format
AC Accession number   The accession number is the unique identifier for each entry. Its format is "NG" for DNA entries, respectively "NR" for RNA entries, followed by nine digits (e.g. NG000012345).
CO Copyright-information    
NA Gene name   As a standard practice, symbol or preferred symbol of the gene, which is given by Entrez Gene, is captured as Gene name.

There are tags appended to the name to differentiate the species the gene comes from. This short identifier is useful in reaction names because molecules and genes from different species often interact due to the experiments.

species tag list   List of species tags

Queries with the search field name automatically include the fields fullname and synonyms.
OS Species   The species captured in NetPro™ are restricted to those mentioned in the Entrez Gene database.
  • Brachydanio rerio
  • Bos taurus
  • Caenorhabditis elegans
  • Drosophila melanogaster
  • Gallus gallus
  • Homo sapiens
  • Human immunodeficiency virus 1
  • Mus musculus
  • Rattus norvegicus
  • Sus scrofa
  • Strongylocentrotus purpuratus
  • Xenopus laevis
Rules for species determination for interacting molecules in NetPro™:

If the molecule is endogenous to cell/tissue/body fluids, etc. species of which can be deciphered, the molecule symbol/name/Id for the specific species is assigned. Generally we consider the molecule to be endogenous:
  1. If the experiment is an in vivo study
  2. If there is no mention of the molecule in question as a recombinant type by the authors
If the Molecule is a recombinant or a transgene and if the source is specified by the authors, the molecule symbol/name /Id is captured as given. However in several instances, the origin of interacting molecules is not mentioned by the authors or cannot be deciphered due to lack of accessory information. In such cases, the molecule is assigned to be of human; however to distinguish such a molecule from molecules which are clearly of human origin, the species of these molecules are assigned as Homo sapiens-e (extrapolated to be of human).
DR External database hyperlink   Database name (e.g. EMBL/GenBank/DDBJ): database accession number; identifier.
TP TRANSPATH entry   The corresponding TRANSPATH® entry.
XB Reaction upstream   A list of reactions that lead to this gene. Here the gene serves as a signal acceptor. Up to now, these are all semantic transregulation reactions
XA Reaction downstream   A list of expression reactions that go out from this gene.
RN Reference number   [consecutive entry reference number].
A list of the papers from which the information in this entry was extracted.
RX PubMed database hyperlink   The PMID number in

PubMed   PubMed
RA Reference author(s)   List of authors.

PubMed   Reference
RT Reference title   Title of the paper.

PubMed   Reference
RL Reference publication   Publication details

PubMed   Reference

back to the top   next


Reactions


Field   Content and format
AC Accession number   The accession number is the unique identifier for each entry. Its format is "NX" in capital letters followed by nine digits (e.g. NX000012345).
CO Copyright-Information    
NA Reaction name   Contains the name of a reaction. Reaction names have different arrows indicating their type:
  • -> semantic activation
  • -/ semantic inhibition
  • <-> semantic interaction
EF Effect   The effect field contains the NetPro™ interaction verb. An interaction verb defines the type of relationship between two entities (molecules) A and B. Since version 1.3, the controlled vocabulary for verbs has been redefined and few changes have been incorporated in the pre-existing list to enable the data to be presented in a way more relevant to the author's discretion in the abstract.

The verbs are now classified into:

INTERACION VERBS (OR PRIMARY VERBS)
SUB-INTERACTION VERBS (OR SECONDARY VERBS)

The following table shows the (sub-)interaction verbs used in NetPro™:

VERB (PRIMARY)
Decreases
Increases
Inhibits
Regulates
Acetylate
Amidate
Associate
Autophosphorylate
Bind
Cleave
Deacetylate
Deamidate
Deglycosylate
Degrade
Demethylate
Deneddylate
Dephosphorylate
Desumoylate
Deubiquitinate
Farnesylate
Geranylgeranylate
Glycosylate
Homodimerize
Hydroxylate
Internalize
Methylate
Mobilize
Neddylate
Nitrate
Oxidize
Phosphorylate
Prenylate
Reduce (chemical reduction)
Sumoylate
Ubiquitinate
      
SUB-VERB (SECONDARY)
Acetylation
Amidation
Autophosphorylation
Cleavage
Deacetylation
Degradation
Dephosphorylation
Glycosylation
Hydroxylation
Internalization
Methylation
Neddylation
Nitration
Oxidation
Phosphorylation
Polymerization
Prenylation
Pro-cleavage
Reduction (chemical)
Release
Secretion
Sumoylation
Ubiquitination

Interaction verbs are used as stand-alone verbs except for the four verbs "Decreases, Increases, Regulates, Inhibits" which are supported by a sub-interaction verb.

Each interaction has either a Primary verb or a Primary (Decreases, Increases, Regulates, Inhibits) + Secondary verb.

TY Type  

The version is updated also taking into consideration the "nature of interaction" i.e Direct/Indirect: For example, though the verbs bind and associate mean the same, in the database "Bind" and "Associate" are used to bifurcate molecules that show physical interactions from those that don't show physical interaction. Bind has been used to mark a physical binding and hence nature is Direct, whereas Associate has been used to show non-physical interactions or when the interaction between two molecules is not sure, hence nature Indirect.

The reactions from NetPro™ are depicted as indirect if the interaction nature is indirect. Else, if the interaction nature is direct, then they are depicted as semantic.

Interaction nature is the nature of the link between two molecules that are related by an interaction verb. This slot defines whether the effect of one molecule on another is a direct or an indirect event.

direct (semantic)

  1. A physical association between two entities specifies a 'direct' interaction.
  2. During post transitional modification when it is known that molecule A is directly involved in the event. Example: kinases, phospatases etc.
  3. When abstract talks about binding event between the molecules and the same molecules are involved in another interaction. Example:
    1. EGF activates EGFR by direct binding.
    2. NFKB1 activates COX2 promoter by direct binding to NFKappaB binding site.
    3. P38 kinase phosphorylates ATF6 (this is a direct event).
    4. Casp3 cleaves Grap2 (this is a direct event).

indirect

  1. If cellular stimulation using one molecule affects the status of a second molecule, and if we are not sure whether there is a direct association between these two entities. Example:
    1. Stimulation of cells with IL6 increases expression of COX2. In this case, IL6 does not directly mediate increase in COX2 expression. (This is an indirect event).
    2. EGF induces phosphorylation of EGFR. Here, EGF promotes autophosphorylation of EGFR (This is an indirect event).
    3. TGF beta1 activates p38 kinase. Here, TGF beta1 doesn't physically interact with p38 kinase (This is an indirect event).
  2. Sometimes for interaction where the verbs are Interact, Complex formation, Associate or Bind, the nature of interaction is taken as 'indirect', if the abstract mentions so and if more than 2 molecules are a part of the complex/immunoprecipitate. Example:
    1. The transmembrane adapter LAT coprecipitates with SLP-76 and PLCgamma2, as well as with a number of other adapter proteins, some of which have not been previously described in platelets, including Cbl, Grb2, Gads, and SKAP-HOM. Interaction /association between LAT and SLP-76 is an indirect event.

CC Comments   This field contains different categories of annotations to the reaction:

Evidence line

It is the statement/s taken from the abstract that is used to curate information about the interaction.

Pathway

Pathway/sub-pathway ontologies in NetPro™ have been developed in-house, compliant with GO ontology. The context of the abstract is used as a guideline in determining the pathway combination to which the interaction can be mapped. Generally, for an indirect interaction the pathway through which Molecule A affects Molecule B is captured, if it can be deciphered. For an interaction of direct nature it could be the pathway/process that leads to the interaction. Sometimes the biological process to which the interacting molecules contribute to, is also captured in this field.

For example: Morphogenesis:Ossification/osteogenesis The Main pathway always precedes a sub-pathway. In case a sub-pathway is not found in the CV, main pathway is taken to make data comprehensive.

Example:Modification-dependent protein catabolism:Ubiquitin-dependent protein catabolism

Multiple Pathways could be present in an interaction separated by a semicolon. Example:Modification-dependent protein catabolism:Ubiquitin-dependent protein catabolism; Transmembrane receptor protein tyrosine kinase signaling pathway:Epidermal growth factor receptor signaling pathway; Cell differentiation:Epidermal cell differentiation

Disease associations

Disease entry shows disease information mined from the literature. A controlled vocabulary derived from MeSH or OMIM is used to capture disease records. In case a disease could not be mapped to any public database, the same description as used by the authors is captured.

Following fields are comprehended under Disease.

Disease conditions are associated for an interaction in 3 different categories:

  • disease, disease name pertaining to the interaction. Even if one of the molecules (A or B) is involved in a disease, the name of the disease is captured in the interaction field.
  • expression, A field for capturing any altered expression pattern of the molecules (A or B) in the disease condition.
  • mutation, Mutation/s in Molecules A or B in the disease condition
  • significance, Any other significance of the molecules (A or B) in the disease condition. For example if the molecule can be of therapeutic value, the information is captured in this field.

Positive regulator/ Negative regulator/ Regulator

Information regarding any protein/small molecule/condition/post-translational modification/ biological process that influences the interaction between the two molecules is recorded in these fields, depending on whether it is positive influence, negative influence or simply influence. Also, if, in any disease state, there is an enhancement or suppression in the specific interaction, the disease is recorded in these fields, which means that there is a quantitative effect of the interaction during the disease state. In addition, if there is a time or temperature dependent increase or decrease in the influence on molecule B, this is also recorded in these fields. Eg., Long time treatment, Low concentration.

Coactivator/ Corepressor

Information regarding molecules which act as coactivators/corepressors or defined as coregulators by the authors in affecting transcriptional activity of Molecule A.

Domain/motif/site/residue

In interactions with verb as ‘bind’ or ‘associate’, information in this field refer to the domains/motifs or residues involved in the physical association of the two molecules. In other interaction records which depict influence of one molecule on the other, entries in this field correspond to region/s of Molecule A that affect Molecule B or region/s of Molecule B that gets affected.
Format of entry: Domain/motif/site/residue information followed by molecule name in brackets; Domain/motif/residue (Molecule A/B).

Function

Contains information about the effect of the interaction, which is generally a change in a biological process.

Property

Contains information regarding any specific property of the molecules A or B involved in the interaction, like post-translational modification/s or isoform information etc. Eg. Thr298 phosphorylated EGFR.

Condition

Entries in this field substantiates the location entry, which gives information on where the interaction takes place. The in vivo or in vitro condition under which the experiment was carried out is captured in this field.
For Eg. Streptozocin/Alloxan induced diabetes, UV treated PC12 cells, Diabetic patient.

General

Contains information about the interaction other than that can be filled in the above fields.

CP Location positive and experiment(s)   Location comprehends the data regarding the site of occurrence of the interaction whenever given in the literature. To make this field granularized, information is filled in four fields given below.

species, Age, strain or genotype of the experimental system is also recorded in this field. Eg. Wistar rat (Young), Igf1r-/- mouse

organ, Eg. Liver, Heart

cell, Eg. Hepatocytes, PC5 cells

subcellular, Subcellular location is generally captured for translocation/secretion related interactions; prefix 'From/To’ is denoted to bring out the direction of movement. Eg. Cytoplasm to nucleus.

Experimental method is the visualization method mentioned in a study to validate an interaction. A controlled vocabulary for Experimental methods has been developed in-house which is referred to populate this field whenever data is available from literature.
MB Molecule/gene upstream   All molecules/genes which the reaction consumes, receives a signal from or is an interaction for.
MA Molecule/gene downstream   All the molecules/genes which the reaction produces or signals to.
RN Reference number   [consecutive entry reference number].
A list of the papers from which the information in this entry was extracted.
RX Medline database hyperlink The PMID number in

PubMed   PubMed
RA Reference author(s)   List of authors.

PubMed   Reference
RT Reference title   Title of the paper.

PubMed   Reference
RL Reference publication   Publication details

PubMed   Reference

back to the top