首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 296 毫秒
1.
Structural variation (SV) is a significant component of the genetic etiology of both neurodevelopmental and psychiatric disorders; however, routine guidelines for clinical genetic screening have been established only in the former category. Genome-wide chromosomal microarray (CMA) can detect genomic imbalances such as copy-number variants (CNVs), but balanced chromosomal abnormalities (BCAs) still require karyotyping for clinical detection. Moreover, submicroscopic BCAs and subarray threshold CNVs are intractable, or cryptic, to both CMA and karyotyping. Here, we performed whole-genome sequencing using large-insert jumping libraries to delineate both cytogenetically visible and cryptic SVs in a single test among 30 clinically referred youth representing a range of severe neuropsychiatric conditions. We detected 96 SVs per person on average that passed filtering criteria above our highest-confidence resolution (6,305 bp) and an additional 111 SVs per genome below this resolution. These SVs rearranged 3.8 Mb of genomic sequence and resulted in 42 putative loss-of-function (LoF) or gain-of-function mutations per person. We estimate that 80% of the LoF variants were cryptic to clinical CMA. We found myriad complex and cryptic rearrangements, including a “paired” duplication (360 kb, 169 kb) that flanks a 5.25 Mb inversion that appears in 7 additional cases from clinical CNV data among 47,562 individuals. Following convergent genomic profiling of these independent clinical CNV data, we interpreted three SVs to be of potential clinical significance. These data indicate that sequence-based delineation of the full SV mutational spectrum warrants exploration in youth referred for neuropsychiatric evaluation and clinical diagnostic SV screening more broadly.  相似文献   

2.

Background

Characterizing large genomic variants is essential to expanding the research and clinical applications of genome sequencing. While multiple data types and methods are available to detect these structural variants (SVs), they remain less characterized than smaller variants because of SV diversity, complexity, and size. These challenges are exacerbated by the experimental and computational demands of SV analysis. Here, we characterize the SV content of a personal genome with Parliament, a publicly available consensus SV-calling infrastructure that merges multiple data types and SV detection methods.

Results

We demonstrate Parliament’s efficacy via integrated analyses of data from whole-genome array comparative genomic hybridization, short-read next-generation sequencing, long-read (Pacific BioSciences RSII), long-insert (Illumina Nextera), and whole-genome architecture (BioNano Irys) data from the personal genome of a single subject (HS1011). From this genome, Parliament identified 31,007 genomic loci between 100 bp and 1 Mbp that are inconsistent with the hg19 reference assembly. Of these loci, 9,777 are supported as putative SVs by hybrid local assembly, long-read PacBio data, or multi-source heuristics. These SVs span 59 Mbp of the reference genome (1.8%) and include 3,801 events identified only with long-read data. The HS1011 data and complete Parliament infrastructure, including a BAM-to-SV workflow, are available on the cloud-based service DNAnexus.

Conclusions

HS1011 SV analysis reveals the limits and advantages of multiple sequencing technologies, specifically the impact of long-read SV discovery. With the full Parliament infrastructure, the HS1011 data constitute a public resource for novel SV discovery, software calibration, and personal genome structural variation analysis.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1479-3) contains supplementary material, which is available to authorized users.  相似文献   

3.
The cause of mental retardation in one-third to one-half of all affected individuals is unknown. Microscopically detectable chromosomal abnormalities are the most frequently recognized cause, but gain or loss of chromosomal segments that are too small to be seen by conventional cytogenetic analysis has been found to be another important cause. Array-based methods offer a practical means of performing a high-resolution survey of the entire genome for submicroscopic copy-number variants. We studied 100 children with idiopathic mental retardation and normal results of standard chromosomal analysis, by use of whole-genome sampling analysis with Affymetrix GeneChip Human Mapping 100K arrays. We found de novo deletions as small as 178 kb in eight cases, de novo duplications as small as 1.1 Mb in two cases, and unsuspected mosaic trisomy 9 in another case. This technology can detect at least twice as many potentially pathogenic de novo copy-number variants as conventional cytogenetic analysis can in people with mental retardation.  相似文献   

4.
The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen), with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs). SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome.) To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of human genomes at maximum accuracy and low cost.  相似文献   

5.
Diagnostic genome profiling in mental retardation   总被引:16,自引:0,他引:16       下载免费PDF全文
Mental retardation (MR) occurs in 2%-3% of the general population. Conventional karyotyping has a resolution of 5-10 million bases and detects chromosomal alterations in approximately 5% of individuals with unexplained MR. The frequency of smaller submicroscopic chromosomal alterations in these patients is unknown. Novel molecular karyotyping methods, such as array-based comparative genomic hybridization (array CGH), can detect submicroscopic chromosome alterations at a resolution of 100 kb. In this study, 100 patients with unexplained MR were analyzed using array CGH for DNA copy-number changes by use of a novel tiling-resolution genomewide microarray containing 32,447 bacterial artificial clones. Alterations were validated by fluorescence in situ hybridization and/or multiplex ligation-dependent probe amplification, and parents were tested to determine de novo occurrence. Reproducible DNA copy-number changes were present in 97% of patients. The majority of these alterations were inherited from phenotypically normal parents, which reflects normal large-scale copy-number variation. In 10% of the patients, de novo alterations considered to be clinically relevant were found: seven deletions and three duplications. These alterations varied in size from 540 kb to 12 Mb and were scattered throughout the genome. Our results indicate that the diagnostic yield of this approach in the general population of patients with MR is at least twice as high as that of standard GTG-banded karyotyping.  相似文献   

6.
Recent developments in high-throughput sequencing technology have made low-cost sequencing an attractive approach for many genome analysis tasks. Increasing read lengths, improving quality and the production of increasingly larger numbers of usable sequences per instrument-run continue to make whole-genome assembly an appealing target application. In this paper we evaluate the feasibility of de novo genome assembly from short reads (≤100 nucleotides) through a detailed study involving genomic sequences of various lengths and origin, in conjunction with several of the currently popular assembly programs. Our extensive analysis demonstrates that, in addition to sequencing coverage, attributes such as the architecture of the target genome, the identity of the used assembly program, the average read length and the observed sequencing error rates are powerful variables that affect the best achievable assembly of the target sequence in terms of size and correctness.  相似文献   

7.
8.
This work reports the completion and annotation of the genome sequence of Corynebacterium pseudotuberculosis I19, isolated from an Israeli dairy cow with severe clinical mastitis. To present the whole-genome sequence, a de novo assembly approach using 33 million short (25-bp) mate-paired SOLiD reads only was applied. Furthermore, the automatic, functional, and manual annotations were attained with the use of several algorithms in a multistep process.  相似文献   

9.
新一代测序技术(NGS)的文库制备方法在基因组的拼装中起着重要作用。但是NGS技术制备的普通DNA文库片段只有500 bp左右,难以满足复杂基因组的从头(de novo)拼装要求。三代测序技术的读长可以达到20 kb,但是其高错误率及测序成本过高使得其又不易推广。因此二代测序的Mate-paired文库制备技术一直在基因组的de novo拼装中扮演着非常重要的角色。目前主流的NGS平台Illumina制备的Mate-paired文库的片段范围只有2~5 kb,为了得到更长的可用于Illumina平台测序的Mate-paired文库,本研究首次整合并优化了Illumina和Roche/454两种测序平台的Mate-paired文库制备技术,采用诱导环化酶来提高基因组长片段DNA的环化效率,成功建立了20 kb Mate-paired文库制备技术,并已将该技术应用于人类基因组20 kb Mate-paired文库制备。该技术为Illumina平台制备长片段Mate-paired库提供了方法指导。  相似文献   

10.
A 10-fold BAC library for the giant panda was constructed and nine BACs were selected to generate finish sequences.These BACs could be used as a validation resource for the de novo assembly accuracy of the whole genome shotgun sequencing reads of the giant panda newly generated by Illumina GA sequencing technology.Complete Sanger sequencing,assembly,annotation and comparative analysis were carried out on the selected BACs of a joint length 878 kb.Homologue search and de novo prediction methods were used to ...  相似文献   

11.
Annotated genomes can provide new perspectives on the biology of species. We present the first de novo whole genome sequencing for the pink-footed goose. In order to obtain a high-quality de novo assembly the strategy used was to combine one short insert paired-end library with two mate-pair libraries. The pink-footed goose genome was assembled de novo using three different assemblers and an assembly evaluation was subsequently performed in order to choose the best assembler. For our data, ALLPATHS-LG performed the best, since the assembly produced covers most of the genome, while introducing the fewest errors. A total of 26,134 genes were annotated, with bird species accounting for virtually all BLAST hits. We also estimated the substitution rate in the pink-footed goose, which can be of use in future demographic studies, by using a comparative approach with the genome of the chicken, the mallard and the swan goose. A substitution rate of 1.38 × 10? 7 per nucleotide per generation was obtained when comparing the genomes of the two closely-related goose species (the pink-footed and the swan goose). Altogether, we provide a valuable tool for future genomic studies aiming at particular genes and regions of the pink-footed goose genome as well as other bird species.  相似文献   

12.

Background

The relatively short read lengths from next generation sequencing (NGS) technologies still pose a challenge for de novo assembly of complex mammal genomes. One important solution is to use paired-end (PE) sequence information experimentally obtained from long-range DNA fragments (>1 kb). Here, we characterize and extend a long-range PE library construction method based on direct intra-molecule ligation (or molecular linker-free circularization) for NGS.

Results

We found that the method performs stably for PE sequencing of 2- to 5- kb DNA fragments, and can be extended to 10–20 kb (and even in extremes, up to ∼35 kb). We also characterized the impact of low quality input DNA on the method, and develop a whole-genome amplification (WGA) based protocol using limited input DNA (<1 µg). Using this PE dataset, we accurately assembled the YanHuang (YH) genome, the first sequenced Asian genome, into a scaffold N50 size of >2 Mb, which is over100-times greater than the initial size produced with only small insert PE reads(17 kb). In addition, we mapped two 7- to 8- kb insertions in the YH genome using the larger insert sizes of the long-range PE data.

Conclusions

In conclusion, we demonstrate here the effectiveness of this long-range PE sequencing method and its use for the de novo assembly of a large, complex genome using NGS short reads.  相似文献   

13.
14.
MOTIVATION: Second-generation sequencing technology makes it feasible for many researches to obtain enough sequence reads to attempt the de novo assembly of higher eukaryotes (including mammals). De novo assembly not only provides a tool for understanding wide scale biological variation, but within human biomedicine, it offers a direct way of observing both large-scale structural variation and fine-scale sequence variation. Unfortunately, improvements in the computational feasibility for de novo assembly have not matched the improvements in the gathering of sequence data. This is for two reasons: the inherent computational complexity of the problem and the in-practice memory requirements of tools. RESULTS: In this article, we use entropy compressed or succinct data structures to create a practical representation of the de Bruijn assembly graph, which requires at least a factor of 10 less storage than the kinds of structures used by deployed methods. Moreover, because our representation is entropy compressed, in the presence of sequencing errors it has better scaling behaviour asymptotically than conventional approaches. We present results of a proof-of-concept assembly of a human genome performed on a modest commodity server.  相似文献   

15.
The use of high throughput genome-sequencing technologies has uncovered a large extent of structural variation in eukaryotic genomes that makes important contributions to genomic diversity and phenotypic variation. When the genomes of different strains of a given organism are compared, whole genome resequencing data are typically aligned to an established reference sequence. However, when the reference differs in significant structural ways from the individuals under study, the analysis is often incomplete or inaccurate. Here, we use rice as a model to demonstrate how improvements in sequencing and assembly technology allow rapid and inexpensive de novo assembly of next generation sequence data into high-quality assemblies that can be directly compared using whole genome alignment to provide an unbiased assessment. Using this approach, we are able to accurately assess the ‘pan-genome’ of three divergent rice varieties and document several megabases of each genome absent in the other two. Many of the genome-specific loci are annotated to contain genes, reflecting the potential for new biological properties that would be missed by standard reference-mapping approaches. We further provide a detailed analysis of several loci associated with agriculturally important traits, including the S5 hybrid sterility locus, the Sub1 submergence tolerance locus, the LRK gene cluster associated with improved yield, and the Pup1 cluster associated with phosphorus deficiency, illustrating the utility of our approach for biological discovery. All of the data and software are openly available to support further breeding and functional studies of rice and other species.  相似文献   

16.
Understanding the genetic variations of the horse (Equus caballus) genome will improve breeding conservation and welfare. However, genetic variations in long segments, such as structural variants (SVs), remain understudied. We de novo assembled 10 chromosome-level three-dimensional horse genomes, each representing a distinct breed, and analysed horse SVs using a multi-assembly approach. Our findings suggest that SVs with the accumulation of mammalian-wide interspersed repeats related to long interspersed nuclear elements might be a horse-specific mechanism to modulate genome-wide gene regulatory networks. We found that olfactory receptors were commonly loss and accumulated deleterious mutations, but no purge of deleterious mutations occurred during horse domestication. We examined the potential effects of SVs on the spatial structure of chromatin via topologically associating domains (TADs). Breed-specific TADs were significantly enriched by breed-specific SVs. We identified 4199 unique breakpoint-resolved novel insertions across all chromosomes that account for 2.84 Mb sequences missing from the reference genome. Several novel insertions might have potential functional consequences, as 519 appeared to reside within 449 gene bodies. These genes are primarily involved in pathogen recognition, innate immune responses and drug metabolism. Moreover, 37 diverse horses were resequenced. Combining this with public data, we analysed 97 horses through a comparative population genomics approach to identify the genetic basis underlying breed characteristics using Thoroughbreds as a case study. We provide new scientific evidence for horse domestication, an understanding of the genetic mechanism underlying the phenotypic evolution of horses, and a comprehensive genetic variation resource for further genetic studies of horses.  相似文献   

17.
18.
While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology.  相似文献   

19.
The introduction of next generation sequencing methods in genome studies has made it possible to shift research from a gene-centric approach to a genome wide view. Although methods and tools to detect single nucleotide polymorphisms are becoming more mature, methods to identify and visualize structural variation (SV) are still in their infancy. Most genome browsers can only compare a given sequence to a reference genome; therefore, direct comparison of multiple individuals still remains a challenge. Therefore, the implementation of efficient approaches to explore and visualize SVs and directly compare two or more individuals is desirable. In this article, we present a visualization approach that uses space-filling Hilbert curves to explore SVs based on both read-depth and pair-end information. An interactive open-source Java application, called Meander, implements the proposed methodology, and its functionality is demonstrated using two cases. With Meander, users can explore variations at different levels of resolution and simultaneously compare up to four different individuals against a common reference. The application was developed using Java version 1.6 and Processing.org and can be run on any platform. It can be found at http://homes.esat.kuleuven.be/~bioiuser/meander.  相似文献   

20.
Next-generation sequencing has transformed the fields of ecological and evolutionary genetics by allowing for cost-effective identification of genome-wide variation. Single nucleotide polymorphism (SNP) arrays, or “SNP chips”, enable very large numbers of individuals to be consistently genotyped at a selected set of these identified markers, and also offer the advantage of being able to analyse samples of variable DNA quality. We used reduced representation restriction-aided digest sequencing (RAD-seq) of 31 birds of the threatened hihi (Notiomystis cincta; stitchbird) and low-coverage whole genome sequencing (WGS) of 10 of these birds to develop an Affymetrix 50 K SNP chip. We overcame the limitations of having no hihi reference genome and a low quantity of sequence data by separate and pooled de novo assembly of each of the 10 WGS birds. Reads from all individuals were mapped back to these de novo assemblies to identify SNPs. A subset of RAD-seq and WGS SNPs were selected for inclusion on the chip, prioritising SNPs with the highest quality scores whose flanking sequence uniquely aligned to the zebra finch (Taeniopygia guttata) genome. Of the 58,466 SNPs manufactured on the chip, 72% passed filtering metrics and were polymorphic. By genotyping 1,536 hihi on the array, we found that SNPs detected in multiple assemblies were more likely to successfully genotype, representing a cost-effective approach to identify SNPs for genotyping. Here, we demonstrate the utility of the SNP chip by describing the high rates of linkage disequilibrium in the hihi genome, reflecting the history of population bottlenecks in the species.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号