首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Genomic rearrangements can result in losses, amplifications, translocations and inversions of DNA fragments thereby modifying genome architecture, and potentially having clinical consequences. Many genomic disorders caused by structural variation have initially been uncovered by early cytogenetic methods. The last decade has seen significant progression in molecular cytogenetic techniques, allowing rapid and precise detection of structural rearrangements on a whole-genome scale. The high resolution attainable with these recently developed techniques has also uncovered the role of structural variants in normal genetic variation alongside single-nucleotide polymorphisms (SNPs). We describe how array-based comparative genomic hybridisation, SNP arrays, array painting and next-generation sequencing analytical methods (read depth, read pair and split read) allow the extensive characterisation of chromosome rearrangements in human genomes.  相似文献   

2.

Background

Characterization of genomic structural variation (SV) is essential to expanding the research and clinical applications of genome sequencing. Reliance upon short DNA fragment paired end sequencing has yielded a wealth of single nucleotide variants and internal sequencing read insertions-deletions, at the cost of limited SV detection. Multi-kilobase DNA fragment mate pair sequencing has supplemented the void in SV detection, but introduced new analytic challenges requiring SV detection tools specifically designed for mate pair sequencing data. Here, we introduce SVachra – Structural Variation Assessment of CHRomosomal Aberrations, a breakpoint calling program that identifies large insertions-deletions, inversions, inter- and intra-chromosomal translocations utilizing both inward and outward facing read types generated by mate pair sequencing.

Results

We demonstrate SVachra’s utility by executing the program on large-insert (Illumina Nextera) mate pair sequencing data from the personal genome of a single subject (HS1011). An additional data set of long-read (Pacific BioSciences RSII) was also generated to validate SV calls from SVachra and other comparison SV calling programs. SVachra exhibited the highest validation rate and reported the widest distribution of SV types and size ranges when compared to other SV callers.

Conclusions

SVachra is a highly specific breakpoint calling program that exhibits a more unbiased SV detection methodology than other callers.
  相似文献   

3.
Vectors based on murine retroviruses are among the most efficient means to insert reporter constructs into the context of a vertebrate chromosome with the aim to visualize cis-regulatory information available to a basal promoter at the site of insertion. In combination with using the zebrafish embryo as a readout for the activity of regulatory elements, enhancer detection becomes a powerful technique for gene discovery and for the mapping of the extent of regulatory domains in a vertebrate genome. Our laboratory has performed the only large-scale enhancer detection screen to date in any vertebrate and we describe in this paper the methods we developed to generate viral particles, to insert reporter constructs into the zebrafish germ line, the screening of detection events in heterozygous F1 embryos, and the isolation of genomic sequence flanking the inserted vector for the purpose of genomic mapping. Given sufficient scale, the technology described here can be used to obtain cis-regulatory information across the entire zebrafish genome for any given basal promoter.  相似文献   

4.
We present SWAN, a statistical framework for robust detection of genomic structural variants in next-generation sequencing data and an analysis of mid-range size insertion and deletions (<10 Kb) for whole genome analysis and DNA mixtures. To identify these mid-range size events, SWAN collectively uses information from read-pair, read-depth and one end mapped reads through statistical likelihoods based on Poisson field models. SWAN also uses soft-clip/split read remapping to supplement the likelihood analysis and determine variant boundaries. The accuracy of SWAN is demonstrated by in silico spike-ins and by identification of known variants in the NA12878 genome. We used SWAN to identify a series of novel set of mid-range insertion/deletion detection that were confirmed by targeted deep re-sequencing. An R package implementation of SWAN is open source and freely available.  相似文献   

5.

Background

Structural variation (SV) represents a significant, yet poorly understood contribution to an individual’s genetic makeup. Advanced next-generation sequencing technologies are widely used to discover such variations, but there is no single detection tool that is considered a community standard. In an attempt to fulfil this need, we developed an algorithm, SoftSearch, for discovering structural variant breakpoints in Illumina paired-end next-generation sequencing data. SoftSearch combines multiple strategies for detecting SV including split-read, discordant read-pair, and unmated pairs. Co-localized split-reads and discordant read pairs are used to refine the breakpoints.

Results

We developed and validated SoftSearch using real and synthetic datasets. SoftSearch’s key features are 1) not requiring secondary (or exhaustive primary) alignment, 2) portability into established sequencing workflows, and 3) is applicable to any DNA-sequencing experiment (e.g. whole genome, exome, custom capture, etc.). SoftSearch identifies breakpoints from a small number of soft-clipped bases from split reads and a few discordant read-pairs which on their own would not be sufficient to make an SV call.

Conclusions

We show that SoftSearch can identify more true SVs by combining multiple sequence features. SoftSearch was able to call clinically relevant SVs in the BRCA2 gene not reported by other tools while offering significantly improved overall performance.  相似文献   

6.
Three simian virus (SV40)-phi X174 recombinant genomes were isolated from single BSC-1 monkey cells cotransfected with SV40 and phi X174 RF1 DNAs. The individual cell progenies were amplified, cloned, and mapped by a combination of restriction endonuclease and heteroduplex analyses. In each case, the 600 to 1,000 base pairs of phi X174 DNA (derived from different regions of the phi X174 genome) were present as single inserts, located in either the early or late SV40 regions; the deletion of SV40 DNA was greater than the size of the insert; and the remaining portions of the hybrid genome were indistinguishable from wild-type SV40 DNA, as judged by both mapping and biological tests. Hence, apart from the deletion which accommodates the phi X174 DNA insert, no other rearrangements of SV40 DNA were detected. The restriction map of a SV40-phi X174 recombinant DNA isolate before molecular cloning was indistinguishable from those of two separate cloned derivatives of that isolate, indicating that the species cloned was the major amplifiable recombinant structure generated by a single recombinant-producing cell. The relative simplicity of the SV40-phi X174 recombinant DNA examined is consistent with the notion that most recombinant-producing BSC-1 cells support single recombination events generating only one amplifiable recombinant structure.  相似文献   

7.
We have isolated genomic clones for human fibronectin (FN), by screening a human gene library with previously isolated FN cDNA clones. We have recently reported two different FN mRNAs, one of them containing an additional 270 nucleotide insert coding for a structural domain ED. Restriction mapping and DNA sequencing of the genomic clones show that the ED type III unit corresponds to exactly one exon in the gene, whilst the two flanking type III units are split in two exons at variable positions. When an alpha-globin/FN gene hybrid construct, containing the ED exon, flanking introns and neighbouring FN exons, is transfected into HeLa cells, two hybrid mRNAs differing by the ED exon are synthesized. These experiments confirmed that the two FN mRNAs observed in vivo arise from the same gene by alternative splicing.  相似文献   

8.
Read alignment is an ongoing challenge for the analysis of data from sequencing technologies. This article proposes an elegantly simple multi-seed strategy, called seed-and-vote, for mapping reads to a reference genome. The new strategy chooses the mapped genomic location for the read directly from the seeds. It uses a relatively large number of short seeds (called subreads) extracted from each read and allows all the seeds to vote on the optimal location. When the read length is <160 bp, overlapping subreads are used. More conventional alignment algorithms are then used to fill in detailed mismatch and indel information between the subreads that make up the winning voting block. The strategy is fast because the overall genomic location has already been chosen before the detailed alignment is done. It is sensitive because no individual subread is required to map exactly, nor are individual subreads constrained to map close by other subreads. It is accurate because the final location must be supported by several different subreads. The strategy extends easily to find exon junctions, by locating reads that contain sets of subreads mapping to different exons of the same gene. It scales up efficiently for longer reads.  相似文献   

9.
Structural genomic variations play an important role in human disease and phenotypic diversity. With the rise of high-throughput sequencing tools, mate-pair/paired-end/single-read sequencing has become an important technique for the detection and exploration of structural variation. Several analysis tools exist to handle different parts and aspects of such sequencing based structural variation analyses pipelines. A comprehensive analysis platform to handle all steps, from processing the sequencing data, to the discovery and visualization of structural variants, is missing. The ViVar platform is built to handle the discovery of structural variants, from Depth Of Coverage analysis, aberrant read pair clustering to split read analysis. ViVar provides you with powerful visualization options, enables easy reporting of results and better usability and data management. The platform facilitates the processing, analysis and visualization, of structural variation based on massive parallel sequencing data, enabling the rapid identification of disease loci or genes. ViVar allows you to scale your analysis with your work load over multiple (cloud) servers, has user access control to keep your data safe and is easy expandable as analysis techniques advance. URL: https://www.cmgg.be/vivar/  相似文献   

10.
11.

Background

Characterizing large genomic variants is essential to expanding the research and clinical applications of genome sequencing. While multiple data types and methods are available to detect these structural variants (SVs), they remain less characterized than smaller variants because of SV diversity, complexity, and size. These challenges are exacerbated by the experimental and computational demands of SV analysis. Here, we characterize the SV content of a personal genome with Parliament, a publicly available consensus SV-calling infrastructure that merges multiple data types and SV detection methods.

Results

We demonstrate Parliament’s efficacy via integrated analyses of data from whole-genome array comparative genomic hybridization, short-read next-generation sequencing, long-read (Pacific BioSciences RSII), long-insert (Illumina Nextera), and whole-genome architecture (BioNano Irys) data from the personal genome of a single subject (HS1011). From this genome, Parliament identified 31,007 genomic loci between 100 bp and 1 Mbp that are inconsistent with the hg19 reference assembly. Of these loci, 9,777 are supported as putative SVs by hybrid local assembly, long-read PacBio data, or multi-source heuristics. These SVs span 59 Mbp of the reference genome (1.8%) and include 3,801 events identified only with long-read data. The HS1011 data and complete Parliament infrastructure, including a BAM-to-SV workflow, are available on the cloud-based service DNAnexus.

Conclusions

HS1011 SV analysis reveals the limits and advantages of multiple sequencing technologies, specifically the impact of long-read SV discovery. With the full Parliament infrastructure, the HS1011 data constitute a public resource for novel SV discovery, software calibration, and personal genome structural variation analysis.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1479-3) contains supplementary material, which is available to authorized users.  相似文献   

12.

Background

Accurate catalogs of structural variants (SVs) in mammalian genomes are necessary to elucidate the potential mechanisms that drive SV formation and to assess their functional impact. Next generation sequencing methods for SV detection are an advance on array-based methods, but are almost exclusively limited to four basic types: deletions, insertions, inversions and copy number gains.

Results

By visual inspection of 100 Mbp of genome to which next generation sequence data from 17 inbred mouse strains had been aligned, we identify and interpret 21 paired-end mapping patterns, which we validate by PCR. These paired-end mapping patterns reveal a greater diversity and complexity in SVs than previously recognized. In addition, Sanger-based sequence analysis of 4,176 breakpoints at 261 SV sites reveal additional complexity at approximately a quarter of structural variants analyzed. We find micro-deletions and micro-insertions at SV breakpoints, ranging from 1 to 107 bp, and SNPs that extend breakpoint micro-homology and may catalyze SV formation.

Conclusions

An integrative approach using experimental analyses to train computational SV calling is essential for the accurate resolution of the architecture of SVs. We find considerable complexity in SV formation; about a quarter of SVs in the mouse are composed of a complex mixture of deletion, insertion, inversion and copy number gain. Computational methods can be adapted to identify most paired-end mapping patterns.  相似文献   

13.
14.
Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references.  相似文献   

15.
Bacterial artificial chromosomes (BACs) are used in genomic variation studies due to their capacity to carry a large insert, their high clonal stability, low rate of chimerism and ease of manipulation. In the present study, an attempt was made to create the first genomic BAC library of an anonymous Indian male (IMBL4) consisting of 100,224 clones covering the human genome more than three times. Restriction mapping of 255 BAC clones by pulse field gel electrophoresis confirmed an average insert size of 120 kb. The library was screened by PCR using SHANK3 (SH3 and multiple ankyrin repeat domains 3) and OLFM3 (olfactomedin 3) specific primers. A selection of clones was analyzed by fluorescent in situ hybridization (FISH) and sequencing. Fine mapping of copy number variable regions by array based comparative genomic hybridization identified 467 CNVRs in the IMBL4 genome. The IMBL4 BAC library represents the first cataloged Indian genome resource for applications in basic and clinical research.  相似文献   

16.
The three-dimensional (3D) structure of the genome is important for orchestration of gene expression and cell differentiation. While mapping genomes in 3D has for a long time been elusive, recent adaptations of high-throughput sequencing to chromosome conformation capture (3C) techniques, allows for genome-wide structural characterization for the first time. However, reconstruction of "consensus" 3D genomes from 3C-based data is a challenging problem, since the data are aggregated over millions of cells. Recent single-cell adaptations to the 3C-technique, however, allow for non-aggregated structural assessment of genome structure, but data suffer from sparse and noisy interaction sampling. We present a manifold based optimization (MBO) approach for the reconstruction of 3D genome structure from chromosomal contact data. We show that MBO is able to reconstruct 3D structures based on the chromosomal contacts, imposing fewer structural violations than comparable methods. Additionally, MBO is suitable for efficient high-throughput reconstruction of large systems, such as entire genomes, allowing for comparative studies of genomic structure across cell-lines and different species.  相似文献   

17.
Complementary packing of alpha-helices in proteins   总被引:10,自引:0,他引:10  
Efimov AV 《FEBS letters》1999,452(1-2):3-6
  相似文献   

18.
MOTIVATION: Array comparative genomic hybridization (CGH) allows detection and mapping of copy number of DNA segments. A challenge is to make inferences about the copy number structure of the genome. Several statistical methods have been proposed to determine genomic segments with different copy number levels. However, to date, no comprehensive comparison of various characteristics of these methods exists. Moreover, the segmentation results have not been utilized in downstream analyses. RESULTS: We describe a comparison of three popular and publicly available methods for the analysis of array CGH data and we demonstrate how segmentation results may be utilized in the downstream analyses such as testing and classification, yielding higher power and prediction accuracy. Since the methods operate on individual chromosomes, we also propose a novel procedure for merging segments across the genome, which results in an interpretable set of copy number levels, and thus facilitate identification of copy number alterations in each genome. AVAILABILITY: http://www.bioconductor.org  相似文献   

19.
20.
The biomedical utility of induced pluripotent stem cells (iPSCs) will be diminished if most iPSC lines harbor deleterious genetic mutations. Recent microarray studies have shown that human iPSCs carry elevated levels of DNA copy number variation compared with those in embryonic stem cells, suggesting that these and other classes of genomic structural variation (SV), including inversions, smaller duplications and deletions, complex rearrangements, and retroelement transpositions, may frequently arise as a consequence of reprogramming. Here we employ whole-genome paired-end DNA sequencing and sensitive mapping algorithms to identify all classes of SV in three fully pluripotent mouse iPSC lines. Despite the improved scope and resolution of this study, we find few spontaneous mutations per line (one or two) and no evidence for?endogenous retroelement transposition. These results show that genome stability can persist throughout reprogramming, and argue that it is possible to generate iPSCs lacking gene-disrupting mutations using current reprogramming methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号