T. N. Bhat
Biochemical Science Division
National
A.Wlodawer and Vondrasek, J.
NCI/FCRDC,
v How to Use this Database
· Credits
· Annotation of inhibitors and search using data-tree
The cure for AIDS is still far from a reality. Almost all current methods of AIDS treatment fall into the category of either containment or prevention. Advancing our knowledge of the AIDS virus offers the promise for developing effective treatments of the disease. Currently, much of the research for the treatment of AIDS is directed either towards vaccine development or towards drug development. Although several promising leads on vaccine have been reported (Shiver et al. 2002) and (Barouch et al. 2002) no effective vaccine has been developed at this time (Veljkovic, et al. 2002). Virtually all vaccines work by stimulating the immune system to make antibodies that target invading microbes, coat them and tag them for destruction. In the case of AIDS, it has been very hard to identify what type of antibody, if any, actually protect a person against infection, not to mention how to get the immune system to make such a substance. For that reason, much of the AIDS research in recent years has focused on vaccines that do not prevent infection, but prime the body to fight permanent war of suppression through ‘cell-mediated’ immunity (Sasaki et al. 1999, Toda et al. 1997)
Another approach for AIDS treatment is by the use of drugs that selectively inhibit specific molecules such as the HIV protease (Gulick et al. 1997). In fact, such drugs provide the only proven method for the treatment of AIDS. Drugs specifically designed to inhibit the HIV protease, an aspartic protease that carries out the posttranslational processing of the viral gag-pol polypetide into functional viral components, have been quite successful. The processing of the gag-pol translational product by HIV protease releases the viral replication enzymes (protease, reverse transcriptase/ribonuclease H, and integrase) (Kohl et al. 1988). This activity is essential for the viral life cycle, and therefore disrupting the proteolytic activity through inhibitors results in non-infectious virions preventing infection (Lambert et al. 1992). For this reason a concentrated effort of many laboratories has gone into the elucidation of enzyme/inhibitor interaction of this and other enzymes, and many efforts have focused on developing strategies for disrupting critical macromolecular interactions required for the viral life cycle ( Turner and Summers, 1999 ). Thus, there is a critical need for structural information on these systems as long as drugs development for the treatment of AIDS is a work in progress. Support and infrastructure for such activities is all the more important at this time.
Need for a central structural information resource on AIDS related macromolecules: Enormous resources have been brought to bear by the research community on drug discovery activities that target various molecules associated with the AIDS virus. Frequently the scientists associated with these efforts publish the results and deposit structural information into the Protein Data Bank (PDB, Berman et al. 2000). This trend has been changing in the late nineties due to the changing emphases in industry and academia. The changes are particularly prominent in technological research like that associated with structure-based drug design. Completely refining the structure of a drug-protein complex, publishing the results and depositing the structural data in the PDB requires time and resources that may not advance the overall goals of the effort. Thus, these activities may be difficult to justify as structure solution becomes more and more a means to an end rather than a large-scale scientific endeavor. Needless to say, structural results from much of the research on HIV drug development are not reaching the PDB, nor are they available in any other form to related research efforts. Whether suitable for publication or not, structural results are unique and provide road maps in the march towards technological development and drug discovery. For this reason, Alex Wlodawer at NCI developed a special archival distribution facility for structural results on HIV protease Vondrasekand Wlodawer 2002). The goal of this resource was to archive and distribute structural results and associated data of HIV protease from as many sources as possible regardless whether they were published or not.
Why NIST and how does it fit with the overall mission of NIST: NIST, particularly CSTL has been focusing on data related work on biological and chemical-structure based data collections (see, for instance | http://webbook.nist.gov/chemistry/ and | http://www.nist.gov/srd/ and | for prior work PDB ). The production and dissemination of chemical information in NIST Standard Reference Data collections is part of the NIST mission. This program intends to provide advances in the annotation and dissemination of AIDS related ligand data for AIDS research with particular emphasis to drug design interests
Data Standards: Chemical Semantic Web looks like as a possible solution for the future needs of chemical databases. For this reason, HIVSDB efforts is also focused on developing Semantic Web technology ( Semantic Web ) using AIDS inhibitors.
Industrial interest: The majority of drug development activities for AIDS have been carried out by industry and this continues to be the case. The proposed HIVSDB is expected to be an archive and distribution system for structural data on both wild type and mutant enzymes complexes with AIDS drugs. Drug resistance mutations are the most troubling aspect of AIDS drug development ( John L, Marra F, Ensom MH. 2001) and structural analysis and annotation of structural data is crucial for elucidating how to circumvent this problem. The HIVSDB is expected to actively facilitate this work.
Healthcare: Since its outbreak, AIDS has caused the deaths of more than 20 million people, and the death toll is expected to triple by 2010. It has shattered millions families and orphaned more than 14 million children. AIDS is a major health concern in both the East and the West and in both the developed and developing countries. The rising infection rate of the virus and the ability of the virus to develop drug resistance have made the critical need for effective treatment and eradication an international imperative. The cure for AIDS is still far from a reality. Most proven approach for AIDS treatment of AIDS is by the use of drugs that selectively inhibit specific molecules such as the HIV protease. In fact, such drugs provide the only proven method for the treatment of AIDS. Drugs specifically designed to inhibit the HIV protease, an aspartic protease that carries out the posttranslational processing of the viral gag-pol polypetide into functional viral components, have been quite successful. The proposal is to develop a centralized archival, annotation, distribution system for structural data on AIDS.
Uniqueness of the HIVSDB: It must be mentioned that the HIVSDB is quite different from other structural biology resources such as the PDB(Berman et al. 200)). The HIVSDB holdings include structural data from the PDB, but it also contains structural data supplied directly from industrial and other laboratories. In addition, value added data for each entry is provided through the annotation process that is obtained from disparate sources. The goal is to provide those investigating AIDS drug targets a centralized structural data resource.
Unlike HIVSDB, the PDB does not hold two dimensional structural or biological, or drug resistance data. Rational drug design carried out by an industry critically depend on a comprehensive collection of all related data.
Annotation work carried out by the PDB focuses on user provided and value added data. HIVSDB has maintained a far more ambitious goal of providing both value added and evaluated data to the public. At present technology and standards for providing evaluated structural data of macromolecules are not available. The proposal on HIVSDB is to develop and deploy such technology using an important subset of the structural data available in public domain using new concepts Chem-BLAST and Semantic Web Use Case.
Annotation
of inhibitors: |(Inhibitor Searches using
data-tree) Despite the wide and expanding availability and use of chemical and
biochemical data collections, the ability to organize and retrieve
structure-based data remains primitive. While it is possible to readily find
compounds whose structures are known in advance, the ability of a user or
automated search method to find similar substances in large, complex structural
collections is generally unsatisfactory. Such searching or browsing serves at
least two purposes; 1) to find the most closely related information when data
for a specific substance is not available and 2) to enable users to discover
compounds with desired structural characteristics. The principal difficulty in such searching is that
structural features of interest to a user often cannot be defined (and indexed)
in advance due to the natural complexity of structure/property relations, which
can depend on discipline, task and user.
One of the objective of the proposal is to organize the inhibitor data in a tree-like arrangement and to develop sophisticated navigation tools to run the web interface. A sample of the inhibitor data tree is shown (fig 1) bellow and examples may be found at Chem-BLAST for AIDS inhibitors.

The data-tree described above is novel and it
provides dynamic navigation paths for a user. For instance, a user may start
with a ring, and get all ligands with ring structures. Alternatively, a user
may start with a given ligand same as that of 1HPX and traverse back in the
data tree to locate 1AAQ, 1HPX both of which have the motif of TBA and PHE.
Utility of a data-tree increases with the number, size and complexity of the
molecule of interest.
In
Impact on PDB data uniformity work: Number of entries in the PDB has more than doubled in the last five years. This growth is likely to be sustained in the future due to structural genomics efforts. In view of this growth of the already huge data in the PDB, PDB advisory committee suggested last year that efforts may be made to initiate pilot projects on data uniformity using a subset of the PDB entries. The technology and experience may be then used to annotate the complete holdings. The proposal is to use AIDS related structures in a pilot project for developing and testing methods of annotating structural data and then use this experience to annotate PDB entries.
·
Information in the database: The
following information is given for each entry in this database: the citation and abstract for the data
when available; validation data such as R-factor, resolution,.. ; unit cell
parameters; crystallization data; IUPAC name of fragments of the inhibitor. The
absence of a piece of information indicates that it was not available at this
moment. The database may be queried
in two different ways:
o Using information on inhibitors using data-tree as described in |Annotation of inhibitors Chem-BLAST
o
Text search on all data by user input values as
described in |General text searches
|(General Text Searches)
·
Using inhibitor information stored in the
database: |(Chem-BLAST)
Data are contained in the database as several distinct classes of information (i.e. columns): the pdbid, the method used in the study, citation, abstract, unit cell data, inhibitor names, quality evaluation data like R-factor, resolution and so on. Inhibitor data are critically important for successful use of the web resource. For this reason considerable effort has been invested in annotating and presenting the inhibitor data both as a single molecule and as several smaller standard fragments. A user may type in a text string and select inhibitors using any of the key words that may exist in any of the columns. A user may also choose to view certain optional information like abstract, unit cell and refinement parameters. Once a selection is made, the user will be presented with a sketch of the molecule, its fragments and certain descriptive information like citation. At this time a user may make a new query using new strings provided in the text box or may decide to perform fragment searches on the inhibitor. For searching a fragment, a user selects a fragment from the fragment displayed immediately after the sketch of the entire molecule. For instance, if a user selects a valine then in the next page the tool will display all the inhibitor molecules that have a valine (a total of ~10 pages) as a fragment. At this time a user may select another fragment, for instance benzyle formate. Then the tool will present all the molecules which have both valine and benzyle formate (~2 pages). At this stage the user may select another fragment for instance phenylethanamine resulting in one page of result. Using this tool a user may rapidly search for multiple fragments and thus perform homology searches.
·
Text search on all data by user input values| (General
Text Searches)
Values for data are contained in
the database as several distinct classes of information
(i.e. columns): the enzyme
name, the method used in the study, citation, abstract, unit cell data,
inhibitor names, quality evaluation data like R-factor, resolution and so on.
One may query the database with user entered input values for a given column by
using | user input values.
Input text strings may be chosen from one or more key-words that may be found
in abstracts, author names, journal information, crystallization data, space
group, inhibitor names, mutation information and etc. In this query option, one
may use AND or OR to specify multiple key-words. For
instance one may use Bhat AND
Erickson or phe OR
val, SAIC OR NCI, Roche OR Dupont. While several AND or several OR
may be used in a given selection box, one may not use both AND and OR simultaneously.
A user may display and or download structural data in two formats. A user may choose to download only inhibitor data or the entire data. Data download page allows download of already selected data or additional selections based on PDBID or text searches.
· Picture gallery |(Picture gallery)
A collection of molecular pictures of the HIV protease is provided.
Primary Correspondence: T. N. Bhat
Contributors: Anh Dao Nguyen, M. D. Prasanna, J. Vondrasek, A. Wlodawer
Prior Publications: Vondrasek, J. and
Wlodawer, A. (2002) “HIVdb: a database of the structure of human
immunodeficiency virus protease, Proteins, 49(4) 29 – 31.
A black in white title is provided for each web page.
All
‘gif’ representations have been provided with text explanations
using ‘ALT=explanation’ in the web.
Descriptive
structural or chemical descriptions, usually shown by lengthy texts in most
other related web resources, have been replaced or augmented in this web
resource by 2 – D drawings. Standard conventions for bonds and molecules
have been used to draw such 2-D sketches. Links are for molecular fragments are
usually provided using 2-D sketches that denote the fragment selected by the link.
Whenever a 2-D drawing is not hyper linked, an explanation is provided. All
atoms a re colored according to IUPAC conventions when colors are used. All
molecular names are provided using IUPAC conventions.
Descriptions
are provided using ‘ALT =description’ in web links that use ‘gif’
files to clarify the result of the action. All text based links are either
numbered or preceded by ‘|’ to accommodate customers with
difficulties in recognizing links that are shown in color by default.