Biological Macromolecule Crystallization Database (version 4.03)
(c) 1995, 1997, 1998, 2005, 2006, 2008, 2009 copyright by the U.S. Department of Commerce on behalf of the United States. All rights reserved.

Search Engine

The search engine for BMCD release 4.0 was built with Apache Lucene 1.4.3, which is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform, and provides the ability to create your own complex queries (see below).

This page gives the basics on how to use the BMCD search engine and some features of Lucene query language. For more information including details of query syntax, please consult the Apache Lucene Web site.

Query Syntax

A query consists of terms and operators. Search terms are case-insensitive; upper case and lower case are equivalent. There are two types of terms: single terms and phrases. A single term is a single word such as "triclinic" or "watson". A phrase is a group of words surrounded by double quotes such as "dna complex". A phrase must be surrounded by double quotes. Without double quotes the search engine would handle a phrase as a compound query with OR operator(s).

The search engine implements wildcard searches when doing text searching. Wildcards are the character symbols "?" and "*". The "?" symbol matches any single character, and the "*" symbol performs a multiple character wildcard search. A term may not begin with a wildcard.

The BMCD search engine implements the logical operators AND, OR, NOT, "+" and "-" to enable boolean combinations. These operators must be typed in UPPER CASE. The default operator is OR; thus the query "watson richardson" has the same result as "watson OR richardson", whereas the query "watson AND richardson" would find a much smaller set of entries. More information can be found at the Apache Lucene website above.

Searching by Fields

Ordinary searches cover the entire content of each entry. Data can be searched by specific fields, using field names from the table below. If no field is specified, the search covers all fields (general search). A field name is followed by a colon, and then the search item. For example, the query

title:recognition
will find any entry whose publication title includes the word "recognition". The query

title:"recognition helix"
will return the smaller set where the title includes this phrase. Multiple field searches may be combined using boolean operators. A term or phrase not preceded by a field name will be searched through the entire entry (general search). However, you may not combine field searching and non-field searching in the same query. The way to combine field and general searching is to use the "Content:" field, which is effectively a general search over all fields. Field names are case specific. They are all completely lower-case, except "Content".

Here are a few examples of correct syntax for field searches:
title:"HIV-1 protease" AND spgrp:P61
title:antibody AND common_name:mouse
title:antibody AND Content:mouse AND NOT common_name:mouse
chem_name:aden* AND (author:mckay OR author:steitz) 

Table of Searchable Field Names

Field name

What they are

Examples

mol_id

macromolecule entry ID

1GTR or M1E5 (older entries)

macromol

macromolecule description

kinase, protease, antibody

mol_name

molecular name

hemoglobin, "HIV-1 protease"

scientific_name

scientific name in taxonomy

plasmodium

common_name

common name in taxonomy

malaria

spgrp

space group

P21, P41212

crys_systm

crystal system

triclinic

crysl_methods

crystallization methods

"vapor diffusion"

chem_name

chemical additives

butanol, gluc*

jrnl

journal abbreviation for citation

acta

year

published year of citation

2003

title

article title of citation

ribosome, "cyclin-dependent"

pubmed_id

PubMed identity number

10666622

author

author name

richardson


NIST is an agency of the U.S. Commerce Department's Technology Administration.