YH genome was assembled based on 3.3 billion reads generated by Illumina
Genome Analyzer. We achieved 117.7G nucleotides data and the genome was
sequenced to 36-fold average coverage. By aligning the short reads with
SOAP, 102.9G nucleotides are mapped onto the NCBI reference genome and
99.97% of the genome has been covered. The raw sequences, aligments,
consensus genome, variants and relevant tools are released for public
use. And the documents about donor consent and sample collection are
avialable at the bottom of this page.
Note:
BGI-shenzhen FTP server has been change to ftp://public.genomics.org.cn
Please use the New site.
We suggest to use ftp tools to download the files,such as FileZilla on Windows,
wget on Linux etc.
Raw Data
Processed Data
1 YH genome sequence
a) Fasta by chromosome FTP1
These files are YH chromosome sequences in FASTA format. In order to make them easy to use, all indels are not taken into the sequences, so the sequences have the same coordinates as UCSC build hg18, which is essentially same to NCBI human v36.1.
b)Fasta.qual by chromosome FTP1
The files contain quality information of consensus bases.
c) Total genome : YH Fasta YH Fasta.qual
The files are concatenated from above FASTA sequences and the quality information.
2 YH variants and annotations
These files are variants results extracted from SOAP alignments.
- SNPs:YH-SNPs.gff Download Open
- Indels :YH-Indels.gff Download Open
- Structural Variations : YH-sv.gff Download Open
README: GFFDefinitions.doc
3 Short read alignments
- Alignment by chr FTP1 These files are alignments in format of SOAP output.
README: AlignmentDTD.doc
4 Genotyping
README:README
5 Original Illumina 1M Duo genotyping result.
- FTP1
These are the original base-calls from Illumina 1M Duo bead chip. These include sites that are inconsistent between the two replicates and those failed.
6 Phased haplotype blocks of YH based on HapMap data.
7 YH and NA18507 genome assembly
The sequences of de novo assembly of YH and NA18507
8 Novel sequences
- Novel sequences of YH genome
- Novel sequences of NA18507 genome
- MD5 checksum
- Additional comments on genetic diversity of novel sequences in Ruiqiang Li,Yingrui Li,Hancheng Zheng, Ruibang Luo, et al. 2009. Building the sequencemap of the human pan-genome. Nature Biotechnology (Published online at Dec 7)
- FTP1
HTTP
FTP
Novel sequences detect on de novo assembly of YH and NA18507
9 Predicted genes
Predicted genes on novel sequences of YH and NA18507
10 YH and NA18507 Novel sequences PCR gel picture
11 Sequenceing depth of YH genome's each site
12 YH exome raw data
13 YH SV case
YH SV cases detected by comparison between YH de novo assembly and NA18507
14 African genome SV case
15 Structural variation sets of YH and NA18507 identified by de novo assembly.
16 YanHuang fosmid sequence
17 SOAPSV pipeline
HTTP
- SOAPSV's Programs and scripts
- SOAPSV's Documents
About donor consent
These are documents about donor consent to participate.
- Donor consent of the Yanhuang Project-Phase I(in chinese): Download
- IRB Validation: Download
- The Informed Consent forms: Download
- The Protocol: Download