The nucleotides and amino acid sequences are easily reduced to digital data by using single letter codes. The nomenclature system adopted in bioinformatics is based on the recommendations of International Union of Pure and Applied Chemistry (IUPAC).

a) DNA sequences: The symbols used to represent DNA sequence data A (Adenine), T (Thymine), G (Guanine), C (Cytosine), R (Purines), Y (Pyrimidines), M (Amino), K(Keto).

b) Amino Acid Sequences of Proteins: In bioinformatics, they are denoted by single letters, e.g., A for Alanine, C for Cystine, D for Aspartic acid, etc. The amino acid sequences in databases are listed from the N-terminus (at the extreme left of the sequence) to the C-terminus (at the extreme right) of the polypeptide.

c) Types of sequences in nucleotide sequence databases: The database on DNA sequences contains a variety of sequence types. A brief description of the database is given as below:

  1. cDNA Sequence: A cDNA molecule is obtained by reverse transcription of an RNA molecule. The cDNA is obtained from mRNA, it will represent only the exon sequences of the genes expressed in the concerned cell/tissue/organism.

  2. Genomic DNA Sequence: These sequences represent the complete genome of the organism, irrespective of whether it is expressed or not. When the genome sequence is complete, it will contain the sequence of genome of the organism. In case of prokaryotes, genome consists of, usually, a single chromosome, while in case of eukaryotes it is relates to the nuclear DNA.

  3. Expressed Sequence Tag (EST) Sequences: These sequences are obtained by sequencing only a part of the cDNA molecules produced using mRNA. These sequences are dubbed as ‘tags’ because they can be used as probes for the isolation of the concerned genes from the genomic DNA.

  4. Genomic Sequence Tag (GST) Sequences: GSTs were developed for identifying the genes of Plasmodium falciparum. It was observed that the enzyme mung bean nuclease (Mnase) cuts P. faciparum genomic DNA between genes. GSTs are developed by sequencing the DNA fragments on either side of the points of cuts generated by Mnase.

  5. Organellar DNA sequences: Organellar DNA is the DNA found in mitochondria (mtDNA) and chloroplasts (cpDNA). The sequence of a DNA a complicated in databases.

  6. Sequences of other Molecules: In addition to the DNA sequence databases, sequences of such molecules as tRNA, small RNAs, etc are also complied into databases.

