Quality Management in the Process of

 Annotation for TRANSFAC

Mathias Krull (mkl@biobase.de), Edgar Wingender (ewi@biobase.de)
Biobase Biological Databases GmbH, Mascheroder Weg 1b, D-38124 Braunschweig

Introduction

The process of data annotation and the quality of its results mainly depend on the qualification, knowledge and experience of the annotator. Data annotation as a process can hardly be described in detail, as it consists of an enormous amount of single intellectual jobs. Nevertheless it is essential to define this process and standardize it to an extent, that ensures the consistency of the extracted data and the compatibility of the database entries to the used database query language. There should be a pool for all quality-relevant information, which comprises the internal database documentation, conventions, syntax rules and format statements. This information pool is best integrated into the information system of a company‘s intranet.

High Quality Annotation

It is obvious that the quality of the data annotation process can be assessed by its products. But what are high quality data entries? Some criteria can be set up, which are closely linked to the annotation process (Fig.1). Of course main focus is on accuracy. No error should occur, whatever it is: syntax error, systematic error, textual error or typing error. Zero tolerance for errors is an important principle of modern quality management.
There should be no redundancy in the database and the complete extraction of given, relevant data is a desirable aim. Other criteria are the degree of integration and the analysis and densification of data, which is a transfer to a higher information level (e.g. the generation of TRANSFAC matrizes).

Also, as quality is the fulfillment of customer requirements (ISO 8402), the implemented data has to be fit for usage.

Fig. 1 : Criteria for the quality of database entries

Fig. 2 shows a simple model for the regulation of quality analogous to control theory. The annotator as a modulator regulates the process of data annotation (the controlled system) and can fall back on the pool of quality data. The regular annotator meeting serves as a communication tool, important for the transfer of knowledge, definition of format, syntax and other conventions. The aim of the internal documentation system is to collect all this information, being itself part of a company-wide information system.

Internal Documentation for TRANSFAC

Fig. 2: Model for quality regulation in data annotation

The internal documentation for TRANSFAC exists as a couple of browsable, HTML-formatted files, which are accessible over a portal site. It is integrated into the BIOBASE annotation homepage. As the documentation contains proprietary information of BIOBASE, it is for internal use only.

The central document of the portal site contains a short description of the aims and the scope of the documentation. It visualizes the process of data annotation using a flowchart. Each step of the process is linked to informatory documents, that set the standards for the implementation of data into TRANSFAC. The whole internal documentation is being reviewed and updated continously.
To the informatory documents belong two flowcharts (Fig. 3 and Fig. 4 ), that show the processes for the implementation of factor, gene and site data into
TRANSFAC in general. Links to database search engines like SRS (Sequence Retrieval System) and to so-called "checklists" for the different entry masks of the current TRANSFAC client (Fig. 5) are integrated into the charts.

Fig. 3: Flowchart for the process of factor data entry

Fig. 5: Example for a checklist of the internal TRANSFAC documentation, containing information about functions, conventions and syntax for a certain TRANSFAC database table

Fig. 4: Flowchart for the process of gene and site data entry

The checklists contain information about function, syntax and conventions for every single field and button of the entry masks. They also list the possible links that can be set to internal tables and external databases like EMBL or SwissProt.
The internal documentation is part of the quality improvement efforts for
TRANSFAC . Besides it, there are several consistency checks e.g. automatic SQL queries or the adjustment of site sequence data with EMBL. So, quality management is getting more and more in the focus at BIOBASE.

Literature

Pfeifer, T. (1996): "Qualitätsmanagement: Strategien, Methoden, Techniken"; 2nd edition, Hanser Verlag, Munich, Vienna