Thursday, 19 December 2013

Understanding PDB Protein Data Bank

What is PDB?

  • A repository for 3-D biological macromolecular structure.
  • All data are available to the public.
  • It includes proteins, nucleic acids and viruses.
  • Obtained by X-Ray crystallography (80%) or NMR spectroscopy (16%).
  • Submitted by biologists and biochemists from around the world.
  • PDB is an important resource for research in the academic, pharmaceutical, and biotechnology sectors.
  • Examples :
               - Will this molecule turns into a cancer cell?
               - Can this combination of molecules cure common cold?
               -How does radiation affect the RNA and DNA?

 Looking at Structures

Looking at Structures is designed to help you get started with charting a path through this material, and help you avoid a few common pitfalls. These chapters are intertwined with one another. To begin, select a topic from the right menu, or select a topic from below:
  • PDB Data
    The primary information stored in the PDB archive consists of coordinate files for biological molecules. These files list the atoms in each protein, and their 3D location in space. These files are available in several formats (PDB, mmCIF, XML). A typical PDB formatted file includes a large "header" section of text that summarizes the protein, citation information, and the details of the structure solution, followed by the sequence and a long list of the atoms and their coordinates. The archive also contains the experimental observations that are used to determine these atomic coordinates.
  • Visualizing Structures
    While you can view PDB files directly using a text editor, it is often most useful to use a browsing or visualization program to look at them. Online tools, such as the ones on the RCSB PDB website, allow you to search and explore the information under the PDB header, including information on experimental methods and the chemistry and biology of the protein. Once you have found the PDB entries that you are interested in, you may usevisualization programs to allow you to read in the PDB file, display the protein structure on your computer, and create custom pictures of it. These programs also often include analysis tools that allow you to measure distances and bond angles, and identify interesting structural features.
  • Reading Coordinate Files
    When you start exploring the structures in the PDB archive, you will need to know a few things about the coordinate files. In a typical entry, you will find a diverse mixture of biological molecules, small molecules, ions, and water. Often, you can use the names and chain IDs to help sort these out. In structures determined from crystallography, atoms are annotated with temperature factors that describe their vibration and occupancies that show if they are seen in several conformations. NMR structures often include several different models of the molecule.
  • Potential Challenges
    You may run into several challenges as you explore the PDB archive. For example, many structures, particular those determined by crystallography, only include information about part of the functional biological assembly. Fortunately the PDB can help with this. Also, many PDB entries are missing portions of the molecule that were not observed in the experiment. These include structures that include only alpha carbon positions, structures with missing loops, structures of individual domains, or subunits from a larger molecule. In addition, most of the crystallographic structure entries do not have information on hydrogen atoms.

Classification Descriptions Image
Amylase Molecule : Alpha-1,4-glucan-4-glucanohydrolase

Polymer : 1

Type : Protein

Trypsin Molecule : Beta-trypsin

Polymer : 1

Type : Protein

Hydrolase(acid Proteinase) Molecule : Pepsin

Type : Protein

Polymer : 1

Carboxypeptidase
Molecule : Human Protective Protein

Polymer : 1

Type : Protein
 
HtrA Molecule : Probable serine protease HTRA3

Polymer : 1

Type : Protein
 


You can search many more type of proteins on this PDB website. Have a blast :)

SMILE



 
Simplified molecular input line entry specification


SMILES (Simplified Molecular Input Line Entry System) is a chemical notation that allows a user to represent a chemical structure in a way that can be used by the computer. SMILES is an easily learned and flexible notation. The SMILES notation requires that you learn a handful of rules. You do not need to worry about ambiguous representations because the software will automatically reorder your entry into a unique SMILES string when necessary.

The term SMILES refers to a line notation for encoding molecular structures and specific instances should strictly be called SMILES strings. However, the term SMILES is also commonly used to refer to both a single SMILES string and a number of SMILES strings; the exact meaning is usually apparent from the context. The terms Canonical and Isomeric can lead to some confusion when applied to SMILES. The terms describe different attributes of SMILES strings and are not mutually exclusive.

 






Its describing the structure of chemical molecules using short ASCII. Specification is to describe structure of chemical molecules using short ASCII strings. Its can be imported by most molecule editors for conversion back into two-dimensional  drawings or three-dimensional models of the molecules.

               

Single     -
Double    =
Triple    #
Aromatic   :
                                              table for smile bond.


conical smileisomeric smile
•version of SMILE specification includes rules
for ensuring that each distinct chemical
molecule has a single unique SMILE represents
•refers to the version of the SMILE
specification that includes extension
to support the specification of isotope, charity.



       Its also have a SMILE Brances. it is represented by enclosured in parenthesis. it can be catogorised as nested or stacked.


  Examples of Smile Bond
Ethene

C=C
Chloroethene
CIC=C
1,1-Dichloroethene
CIC(CI)=C
cis-1,2-Dichloroethene


CIC(CI)=CCI
Trichloroethene
CIC(CI)=CCI
PerchloroetheneCIC(CI)=C(CI)CI
   
     Smile chargesis a specify attached hydrogens and charges in squared brackets and these is the examples.
[H+]    proton
[OH-]    hydroxyl anion
[OH3+]    hydronium cation
[Fe++]    iron(ii) cation
[NH4+]    ammonium cation
   
 Acyclic compound is a compound in which a series of atoms is connected to form a loop of a ring. To get more info cyclic compound, you can click  HERE for the information.