a) Detection of Genes: The first task in gene identification in a DNA sequences is the identification of the correct reading frame. There are three possible reading frames for each of the two strands of DNA double helix.
The gene prediction programs search for gene-specific features such as promoters, splice sites and polyadenylation sites or for pertinent gene content like ORFs. The gene prediction programs used in eukaryotes use the output of several algorithms to generate a whole gene model. The various features of eukaryotic genes recognized during gene detection.
b) Identification of function of a new gene: The gene sequence is translated into the amino acid sequence of the protein it is expected to encode. This protein sequence is then compared with a protein database. A program like tBLASTx will perform both these functions.
c) Identification of functional Domains: There are several bioinformatics tools for the identification of protein motifs and protein domains. Some of these tools are PRINTS, PROSITE, SMART, BLOCKS, etc.
d) Detection of Noncoding RNA: Several types of RNAs are noncoding, e.g., rRNA, tRNA, small RNA, etc. of these rRNAs are the easiest to find; this is done by sequence similarity search
e) Genome Annotation: Standard genome annotation languages have become widely accepted. GAME is a program for describing experimental support annotation. Similarly DAS (distributed annotation system) is particularly useful for indexing and visualization.
f) Molecular Phylogenesis: The DNA and protein sequences data can be used to investigate evolution of genes and their protein products; this is called molecular phylogeny. Bioinformatics tools are used to determine the phylogenetic relationships, which may be presented either in form of a phylogenetic tree or a dendrogram.