Tissue wise SNP Discovery from Expressed Sequence Tags (ESTs)   Help!    Procedure!

National Bureau of Animal Genetic Resources, Karnal, India

National Agricultural Bioinformatics Grid



The database consists of tissue wise SNPs discovered from Expressed Sequence Tags (ESTs) of livestock species available online. The EST data for different tissues was retrieved from the National Centre for Biotechnology Information (NCBI) . ESTs were processed for trimming of low quality DNA sequences, removal of contaminating sequences such as vector sequences and for removal of repeats.

This was done using the online tool EGassembler (Masoudi-Nejad et al, 2006) which performs all these tasks as a sequence of operations.

The RepBase repeats library was selected to mask the repeats and NCBI’s core vector library was used to mask the vector sequences. These processed sequences were free from low quality sequences, vector sequences and the repetitive sequences.The cleaned EST sequences were then assembled into contigs using the CAP3 tool (Huang et al, 1999).

The contigs assembled by the CAP3 program were analyzed for single nucleotide polymorphisms (SNPs) using the QualitySNP tool (Tang et al, 2006) available online freely.

The contigs were searched in the genome assembly of the respective species in UCSC Genome browser using BLAT tool . Perl scripts were written to find the location of the SNPs in the alignment of the contigs and the genomic sequence as provided by the BLAT tool. Further these SNPs were searched for their presence in the dbSNP database as available in the Ensembl Biomart. Additional information like the Ensembl gene id and the Ensembl gene name was also retrieved.

The database on tissue wise SNPs was developed using MySQL. The user interface was developed using PHP. This user friendly web interface of the database provides SNPs on the basis of different output parameters like Chromosome name, Contig name, Gene name, dbSNP id etc.

The database can be searched on the basis of a species and a tissue. The database contains information on the SNPs which includes Contig position, High Quality SNP, Reliable SNP, Chromosome name, Chromosome position, Blat alignment strand of the contig, Ensembl allele strand, QualitySNP alleles, Ensembl alleles, Ensembl Gene id, Ensembl Gene name.

There are links for dbSNP id and the Ensembl gene id to the dbSNP database and the Ensembl Genome browser respectively.


References:


Jifeng Tang, Ben Vosman, Roeland E Voorrips, C Gerard van der Linden and Jack AM Leunissen (2006). QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species.BioMed Central Bioinformatics 7:438 doi:10.1186/1471-2105-7-438.


Masoudi-Nejad A, Tonomura K, Kawashima S, Moriya Y, Suzuki M, Itoh M, Kanehisa M, Endo T, Goto S (2006). EGassembler: online bioinformatics service for large-scale processing, clustering and assembling ESTs and genomic DNA fragments. Nucleic Acids Research. 34:W459-462.


Xiaoqiu Huang, Anup Madan (1999). CAP3: A DNA Sequence Assembly Program, Genome Research, 9:868-877.