Introduction
On October 11th, 2007, Beijing Genomics Institute at Shenzhen (BGI-Shenzhen) announced the completion of first diploid genome sequence of a Han Chinese, a representative of Asian population. The genome, named as YH, is a very start of YanHuang Project, which aims to sequence 100 Chinese individuals in 3 years.
We set up this ‘YH database’ to present the entire DNA sequence assembled based on 3.3 billion reads (117.7Gbp raw data) generated by Illumina Genome Analyzer. In total of 102.9Gbp nucleotides were mapped onto the NCBI human reference genome (Build 36) by self-developed software SOAP (Short Oligonucleotide Alignment Program), and 3.07 million SNPs were identified.
We illustrated the personal genome data in a MapView, which is powered by GBrowse. A new module was developed to browse large-scale short reads alignment. This module enabled users track detailed divergences between consensus and sequencing reads. In total of 53,643 HGMD recorders were used to screen YH SNPs to
| Data Statistics | ||
| Nucleotide | Total | 117.7Gbp |
| Map to genome | 102.9Gbp | |
| Coverage of genome | 99.97% | |
| Polymorphism | SNP | 3.07M |
| Indel | 135262 | |
| Structural Variation | 2682 | |
retrieve phenotype related information, to superficially explain the donor’s genome. Blast service to align query sequences against YH genome consensus was also provided.
Our efforts on designing the YH database are helpful attempts to organize and present personal genome data, which is a useful resource for genomic and medical researches. As the third published personal genome, YH diploid genome accelerates the discovery of disease gene and mutation in Asian population. Companying with other personal genome projects, this endeavor will achieve fundamental goals for establishing personal medicine.