Last modified at: 2008-11-20 17:27
1. What can I found from the YH Database?
The YH Database is the corresponding database of the ‘YH genome’, which is the diploid genome of the first Asian. We set up this database to present the entire DNA sequence assembled based on 3.3 billion reads (117.7Gbp raw data) generated by Illumina Genome Analyzer. In total of 102.9Gbp nucleotides were mapped onto the NCBI human reference genome (Build 36) by self-developed software SOAP (Short Oligonucleotide Alignment Program), and 3.07 million SNPs were identified. The genome and polymorphism data are presented in Map View, and the relationship between YH genotypes and phenotypes are also provided. Furthermore, a BLAST web service offers online alignment of query sequences against the YH genome. Additionally, raw sequences, alignments, consensus genome, variants and relevant tools are freely available to download.2. How did you make the YH genome and organize the personal genome data?
The genome was sequenced to 36-fold average coverage, using massive parallel sequencing technology. By aligning the short reads onto the NCBI reference genome, 99.6% of the genome has been covered, and variations, detected using the assembled high-quality consensus sequence, covered 92% of the whole genome. The heterozygotes were phased, and haplotypes were predicted against HapMap CHB/JPT haplotypes. Paired-end reads were used to discover structural variations and short indels. The YH genome data was organized under the frame of GBrowse. A new module was developed to browse large-scale short reads alignment. This module enabled users to track detailed divergences between consensus and sequencing reads. For all variants, we presented their genotype in YH genome, OMIM records, and frequencies among HapMap populations if available. The potential relationship between genotypes and phenotypes was established by scanning the dataset which was collected by public database.3. How was the database constructed ?
We aligned and assembled most short reads onto the NCBI human reference 36 and YH variants were identified while we built up YH consensus. Combining with database HGMD and OMIM YH phenotypes are available for very preminilary health study. Together with HapMap variants all mentioned information can be viewed in the YH genome browser which established based on Gbrowse.
The browser provides YH genome annotation and variants, together with dnSNP SNPs, OMIM disease associations and HapMap genotypes. The consensus YH genome sequence is presented supporting by huge amount aligned short reads with disagreements highlighted. To enable the view of short reads, please make sure to check the track box “READs”. And Please note that the reads view works only when zooming into 80bp or less (100bp might also work).
Common track options and view:
Genotypes and short reads alignments view:
An example of heterozygous SNP:
We collected hundreds of genotypes and their relevant phenotypes from public database, and scanned 3.07 million polymorphisms of YH genome. The ‘Phenotype’ module we provided here is a platform describing their relevance in YH genome, as a primitive attempt to personal medicine. Variation ID, gene symbol and phenotype could be used for querying relevant results.
- Data Statistics
- What's new
- Raw Data
- Processed Data
- Contact Us
Beijing Genomics Institute(BGI) - Shenzhen
|Tel:||+86 (0) 755 25273910|
|Fax:||+86 (0) 755 25273620|
|Add:||Complex Building, Beishan Industrial Zone, Beishan Road, Yantian District, Shenzhen, China, 518083.|