首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Asan  Xu Y  Jiang H  Tyler-Smith C  Xue Y  Jiang T  Wang J  Wu M  Liu X  Tian G  Wang J  Wang J  Yang H  Zhang X 《Genome biology》2011,12(9):R95-12

Background

Exome sequencing, which allows the global analysis of protein coding sequences in the human genome, has become an effective and affordable approach to detecting causative genetic mutations in diseases. Currently, there are several commercial human exome capture platforms; however, the relative performances of these have not been characterized sufficiently to know which is best for a particular study.

Results

We comprehensively compared three platforms: NimbleGen's Sequence Capture Array and SeqCap EZ, and Agilent's SureSelect. We assessed their performance in a variety of ways, including number of genes covered and capture efficacy. Differences that may impact on the choice of platform were that Agilent SureSelect covered approximately 1,100 more genes, while NimbleGen provided better flanking sequence capture. Although all three platforms achieved similar capture specificity of targeted regions, the NimbleGen platforms showed better uniformity of coverage and greater genotype sensitivity at 30- to 100-fold sequencing depth. All three platforms showed similar power in exome SNP calling, including medically relevant SNPs. Compared with genotyping and whole-genome sequencing data, the three platforms achieved a similar accuracy of genotype assignment and SNP detection. Importantly, all three platforms showed similar levels of reproducibility, GC bias and reference allele bias.

Conclusions

We demonstrate key differences between the three platforms, particularly advantages of solutions over array capture and the importance of a large gene target set.  相似文献   

2.
Whole exome sequencing (WES) is increasingly used in research and diagnostics. WES users expect coverage of the entire coding region of known genes as well as sufficient read depth for the covered regions. It is, however, unknown which recent WES platform is most suitable to meet these expectations. We present insights into the performance of the most recent standard exome enrichment platforms from Agilent, NimbleGen and Illumina applied to six different DNA samples by two sequencing vendors per platform. Our results suggest that both Agilent and NimbleGen overall perform better than Illumina and that the high enrichment performance of Agilent is stable among samples and between vendors, whereas NimbleGen is only able to achieve vendor- and sample-specific best exome coverage. Moreover, the recent Agilent platform overall captures more coding exons with sufficient read depth than NimbleGen and Illumina. Due to considerable gaps in effective exome coverage, however, the three platforms cannot capture all known coding exons alone or in combination, requiring improvement. Our data emphasize the importance of evaluation of updated platform versions and suggest that enrichment-free whole genome sequencing can overcome the limitations of WES in sufficiently covering coding exons, especially GC-rich regions, and in characterizing structural variants.  相似文献   

3.

Background

Recent developments in deep (next-generation) sequencing technologies are significantly impacting medical research. The global analysis of protein coding regions in genomes of interest by whole exome sequencing is a widely used application. Many technologies for exome capture are commercially available; here we compare the performance of four of them: NimbleGen’s SeqCap EZ v3.0, Agilent’s SureSelect v4.0, Illumina’s TruSeq Exome, and Illumina’s Nextera Exome, all applied to the same human tumor DNA sample.

Results

Each capture technology was evaluated for its coverage of different exome databases, target coverage efficiency, GC bias, sensitivity in single nucleotide variant detection, sensitivity in small indel detection, and technical reproducibility. In general, all technologies performed well; however, our data demonstrated small, but consistent differences between the four capture technologies. Illumina technologies cover more bases in coding and untranslated regions. Furthermore, whereas most of the technologies provide reduced coverage in regions with low or high GC content, the Nextera technology tends to bias towards target regions with high GC content.

Conclusions

We show key differences in performance between the four technologies. Our data should help researchers who are planning exome sequencing to select appropriate exome capture technology for their particular application.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-449) contains supplementary material, which is available to authorized users.  相似文献   

4.
Isolating high-priority segments of genomes greatly enhances the efficiency of next-generation sequencing (NGS) by allowing researchers to focus on their regions of interest. For the 2010–11 DNA Sequencing Research Group (DSRG) study, we compared outcomes from two leading companies, Agilent Technologies (Santa Clara, CA, USA) and Roche NimbleGen (Madison, WI, USA), which offer custom-targeted genomic enrichment methods. Both companies were provided with the same genomic sample and challenged to capture identical genomic locations for DNA NGS. The target region totaled 3.5 Mb and included 31 individual genes and a 2-Mb contiguous interval. Each company was asked to design its best assay, perform the capture in replicates, and return the captured material to the DSRG-participating laboratories. Sequencing was performed in two different laboratories on Genome Analyzer IIx systems (Illumina, San Diego, CA, USA). Sequencing data were analyzed for sensitivity, specificity, and coverage of the desired regions. The success of the enrichment was highly dependent on the design of the capture probes. Overall, coverage variability was higher for the Agilent samples. As variant discovery is the ultimate goal for a typical targeted sequencing project, we compared samples for their ability to sequence single-nucleotide polymorphisms (SNPs) as a test of the ability to capture both chromosomes from the sample. In the targeted regions, we detected 2546 SNPs with the NimbleGen samples and 2071 with Agilent''s. When limited to the regions that both companies included as baits, the number of SNPs was ∼1000 for each, with Agilent and NimbleGen finding a small number of unique SNPs not found by the other.  相似文献   

5.

Background

To promote the clinical application of next-generation sequencing, it is important to obtain accurate and consistent variants of target genomic regions at low cost. Ion Proton, the latest updated semiconductor-based sequencing instrument from Life Technologies, is designed to provide investigators with an inexpensive platform for human whole exome sequencing that achieves a rapid turnaround time. However, few studies have comprehensively compared and evaluated the accuracy of variant calling between Ion Proton and Illumina sequencing platforms such as HiSeq 2000, which is the most popular sequencing platform for the human genome. The Ion Proton sequencer combined with the Ion TargetSeq™ Exome Enrichment Kit together make up TargetSeq-Proton, whereas SureSelect-Hiseq is based on the Agilent SureSelect Human All Exon v4 Kit and the HiSeq 2000 sequencer.

Results

Here, we sequenced exonic DNA from four human blood samples using both TargetSeq-Proton and SureSelect-HiSeq. We then called variants in the exonic regions that overlapped between the two exome capture kits (33.6 Mb). The rates of shared variant loci called by two sequencing platforms were from 68.0 to 75.3 % in four samples, whereas the concordance of co-detected variant loci reached 99 %. Sanger sequencing validation revealed that the validated rate of concordant single nucleotide polymorphisms (SNPs) (91.5 %) was higher than the SNPs specific to TargetSeq-Proton (60.0 %) or specific to SureSelect-HiSeq (88.3 %). With regard to 1-bp small insertions and deletions (InDels), the Sanger sequencing validated rates of concordant variants (100.0 %) and SureSelect-HiSeq-specific (89.6 %) were higher than those of TargetSeq-Proton-specific (15.8 %).

Conclusions

In the sequencing of exonic regions, a combination of using of two sequencing strategies (SureSelect-HiSeq and TargetSeq-Proton) increased the variant calling specificity for concordant variant loci and the sensitivity for variant loci called by any one platform. However, for the sequencing of platform-specific variants, the accuracy of variant calling by HiSeq 2000 was higher than that of Ion Proton, specifically for the InDel detection. Moreover, the variant calling software also influences the detection of SNPs and, specifically, InDels in Ion Proton exome sequencing.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1796-6) contains supplementary material, which is available to authorized users.  相似文献   

6.
Sequence capture technologies, pioneered in mammalian genomes, enable the resequencing of targeted genomic regions. Most capture protocols require blocking DNA, the production of which in large quantities can prove challenging. A blocker‐free, two‐stage capture protocol was developed using NimbleGen arrays. The first capture depletes the library of repetitive sequences, while the second enriches for target loci. This strategy was used to resequence non‐repetitive portions of an approximately 2.2 Mb chromosomal interval and a set of 43 genes dispersed in the 2.3 Gb maize genome. This approach achieved approximately 1800–3000‐fold enrichment and 80–98% coverage of targeted bases. More than 2500 SNPs were identified in target genes. Low rates of false‐positive SNP predictions were obtained, even in the presence of captured paralogous sequences. Importantly, it was possible to recover novel sequences from non‐reference alleles. The ability to design novel repeat‐subtraction and target capture arrays makes this technology accessible in any species.  相似文献   

7.

Background  

Human exome resequencing using commercial target capture kits has been and is being used for sequencing large numbers of individuals to search for variants associated with various human diseases. We rigorously evaluated the capabilities of two solution exome capture kits. These analyses help clarify the strengths and limitations of those data as well as systematically identify variables that should be considered in the use of those data.  相似文献   

8.

Background  

Whole exome capture sequencing allows researchers to cost-effectively sequence the coding regions of the genome. Although the exome capture sequencing methods have become routine and well established, there is currently a lack of tools specialized for variant calling in this type of data.  相似文献   

9.

Background

Typical human genome differs from the reference genome at 4-5 million sites. This diversity is increasingly catalogued in repositories such as ExAC/gnomAD, consisting of >15,000 whole-genomes and >126,000 exome sequences from different individuals. Despite this enormous diversity, resequencing data workflows are still based on a single human reference genome. Identification and genotyping of genetic variants is typically carried out on short-read data aligned to a single reference, disregarding the underlying variation.

Results

We propose a new unified framework for variant calling with short-read data utilizing a representation of human genetic variation – a pan-genomic reference. We provide a modular pipeline that can be seamlessly incorporated into existing sequencing data analysis workflows. Our tool is open source and available online: https://gitlab.com/dvalenzu/PanVC.

Conclusions

Our experiments show that by replacing a standard human reference with a pan-genomic one we achieve an improvement in single-nucleotide variant calling accuracy and in short indel calling accuracy over the widely adopted Genome Analysis Toolkit (GATK) in difficult genomic regions.
  相似文献   

10.
Generalized linear mixed model for segregation distortion analysis   总被引:1,自引:0,他引:1  

Background

Concerted evolution refers to the pattern in which copies of multigene families show high intraspecific sequence homogeneity but high interspecific sequence diversity. Sequence homogeneity of these copies depends on relative rates of mutation and recombination, including gene conversion and unequal crossing over, between misaligned copies. The internally repetitive intergenic spacer (IGS) is located between the genes for the 28S and 18S ribosomal RNAs. To identify patterns of recombination and/or homogenization within IGS repeat arrays, and to identify regions of the IGS that are under functional constraint, we analyzed 13 complete IGS sequences from 10 individuals representing four species in the Daphnia pulex complex.

Results

Gene conversion and unequal crossing over between misaligned IGS repeats generates variation in copy number between arrays, as has been observed in previous studies. Moreover, terminal repeats are rarely involved in these events. Despite the occurrence of recombination, orthologous repeats in different species are more similar to one another than are paralogous repeats within species that diverged less than 4 million years ago. Patterns consistent with concerted evolution of these repeats were observed between species that diverged 8-10 million years ago. Sequence homogeneity varies along the IGS; the most homogeneous regions are downstream of the 28S rRNA gene and in the region containing the core promoter. The inadvertent inclusion of interspecific hybrids in our analysis uncovered evidence of both inter- and intrachromosomal recombination in the nonrepetitive regions of the IGS.

Conclusions

Our analysis of variation in ribosomal IGS from Daphnia shows that levels of homogeneity within and between species result from the interaction between rates of recombination and selective constraint. Consequently, different regions of the IGS are on substantially different evolutionary trajectories.  相似文献   

11.

Background

Whole exome sequencing (WES) is the state-of-the-art method for identification of pathogenic mutations in patients with a Mendelian disorder. WES comprehensively covers the coding sequence of the genome and is a fast and cost-effective technique.

Purpose

As most of the technical difficulties have been overcome for WES, the major issue is data processing and analysis to find the pathogenic sequence variation among tens of thousands of sequence changes. Bioinformatic analysis pipelines for filtering sequence variants have to be adapted according to the patients and family members examined by WES and the most likely inheritance pattern underlying the disease.

Possible approaches

Based on 4 cases, different variant prioritization strategies which led to identification of the most likely causative changes in the index patients are described.  相似文献   

12.

Background

Faces, as socially relevant stimuli, readily capture human visuospatial attention. Although faces also play important roles in the social lives of chimpanzees, the closest living species to humans, the way in which faces are attentionally processed remains unclear from a comparative-cognitive perspective. In the present study, three young chimpanzees (Pan troglodytes) were tested with a simple manual response task in which various kinds of photographs, including faces as non-informative cues, were followed by a target.

Results

When the target appeared at the location that had been occupied by the face immediately before target onset, response times were significantly faster than when the target appeared at the opposite location that had been by the other object. Such an advantage was not observed when a photograph of a banana was paired with the other object. Furthermore, this attentional capture was also observed when upright human faces were presented, indicating that this effect is not limited to own-species faces. On the contrary, when the participants were tested with inverted chimpanzee faces, this effect was rather weakened, suggesting the specificity to upright faces.

Conclusion

Chimpanzee's visuospatial attention was easily captured by the face stimuli. This effect was face specific and stronger for upright faces than inverted. These results are consistent with those from typically developing humans.  相似文献   

13.

Background

The domestic pig (Sus scrofa) is both an important livestock species and a model for biomedical research. Exome sequencing has accelerated identification of protein-coding variants underlying phenotypic traits in human and mouse. We aimed to develop and validate a similar resource for the pig.

Results

We developed probe sets to capture pig exonic sequences based upon the current Ensembl pig gene annotation supplemented with mapped expressed sequence tags (ESTs) and demonstrated proof-of-principle capture and sequencing of the pig exome in 96 pigs, encompassing 24 capture experiments. For most of the samples at least 10x sequence coverage was achieved for more than 90% of the target bases. Bioinformatic analysis of the data revealed over 236,000 high confidence predicted SNPs and over 28,000 predicted indels.

Conclusions

We have achieved coverage statistics similar to those seen with commercially available human and mouse exome kits. Exome capture in pigs provides a tool to identify coding region variation associated with production traits, including loss of function mutations which may explain embryonic and neonatal losses, and to improve genomic assemblies in the vicinity of protein coding genes in the pig.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-550) contains supplementary material, which is available to authorized users.  相似文献   

14.
15.

Background

Well differentiated papillary mesothelioma of the peritoneum (WDPMP) is a rare variant of epithelial mesothelioma of low malignancy potential, usually found in women with no history of asbestos exposure. In this study, we perform the first exome sequencing of WDPMP.

Results

WDPMP exome sequencing reveals the first somatic mutation of E2F1, R166H, to be identified in human cancer. The location is in the evolutionarily conserved DNA binding domain and computationally predicted to be mutated in the critical contact point between E2F1 and its DNA target. We show that the R166H mutation abrogates E2F1's DNA binding ability and is associated with reduced activation of E2F1 downstream target genes. Mutant E2F1 proteins are also observed in higher quantities when compared with wild-type E2F1 protein levels and the mutant protein's resistance to degradation was found to be the cause of its accumulation within mutant over-expressing cells. Cells over-expressing wild-type E2F1 show decreased proliferation compared to mutant over-expressing cells, but cell proliferation rates of mutant over-expressing cells were comparable to cells over-expressing the empty vector.

Conclusions

The R166H mutation in E2F1 is shown to have a deleterious effect on its DNA binding ability as well as increasing its stability and subsequent accumulation in R166H mutant cells. Based on the results, two compatible theories can be formed: R166H mutation appears to allow for protein over-expression while minimizing the apoptotic consequence and the R166H mutation may behave similarly to SV40 large T antigen, inhibiting tumor suppressive functions of retinoblastoma protein 1.  相似文献   

16.

Background

We recently described Hi-Plex, a highly multiplexed PCR-based target-enrichment system for massively parallel sequencing (MPS), which allows the uniform definition of library size so that subsequent paired-end sequencing can achieve complete overlap of read pairs. Variant calling from Hi-Plex-derived datasets can thus rely on the identification of variants appearing in both reads of read-pairs, permitting stringent filtering of sequencing chemistry-induced errors. These principles underly ROVER software (derived from Read Overlap PCR-MPS variant caller), which we have recently used to report the screening for genetic mutations in the breast cancer predisposition gene PALB2. Here, we describe the algorithms underlying ROVER and its usage.

Results

ROVER enables users to quickly and accurately identify genetic variants from PCR-targeted, overlapping paired-end MPS datasets. The open-source availability of the software and threshold tailorability enables broad access for a range of PCR-MPS users.

Methods

ROVER is implemented in Python and runs on all popular POSIX-like operating systems (Linux, OS X). The software accepts a tab-delimited text file listing the coordinates of the target-specific primers used for targeted enrichment based on a specified genome-build. It also accepts aligned sequence files resulting from mapping to the same genome-build. ROVER identifies the amplicon a given read-pair represents and removes the primer sequences by using the mapping co-ordinates and primer co-ordinates. It considers overlapping read-pairs with respect to primer-intervening sequence. Only when a variant is observed in both reads of a read-pair does the signal contribute to a tally of read-pairs containing or not containing the variant. A user-defined threshold informs the minimum number of, and proportion of, read-pairs a variant must be observed in for a ‘call’ to be made. ROVER also reports the depth of coverage across amplicons to facilitate the identification of any regions that may require further screening.

Conclusions

ROVER can facilitate rapid and accurate genetic variant calling for a broad range of PCR-MPS users.  相似文献   

17.
Li C  Hung Wong W 《Genome biology》2001,2(8):research0032.1-research003211

Background

A model-based analysis of oligonucleotide expression arrays we developed previously uses a probe-sensitivity index to capture the response characteristic of a specific probe pair and calculates model-based expression indexes (MBEI). MBEI has standard error attached to it as a measure of accuracy. Here we investigate the stability of the probe-sensitivity index across different tissue types, the reproducibility of results in replicate experiments, and the use of MBEI in perfect match (PM)-only arrays.

Results

Probe-sensitivity indexes are stable across tissue types. The target gene's presence in many arrays of an array set allows the probe-sensitivity index to be estimated accurately. We extended the model to obtain expression values for PM-only arrays, and found that the 20-probe PM-only model is comparable to the 10-probe PM/MM difference model, in terms of the expression correlations with the original 20-probe PM/MM difference model. MBEI method is able to extend the reliable detection limit of expression to a lower mRNA concentration. The standard errors of MBEI can be used to construct confidence intervals of fold changes, and the lower confidence bound of fold change is a better ranking statistic for filtering genes. We can assign reliability indexes for genes in a specific cluster of interest in hierarchical clustering by resampling clustering trees. A software dChip implementing many of these analysis methods is made available.

Conclusions

The model-based approach reduces the variability of low expression estimates, and provides a natural method of calculating expression values for PM-only arrays. The standard errors attached to expression values can be used to assess the reliability of downstream analysis.  相似文献   

18.

Background

Halibuts are commercially important flatfish species confined to the North Pacific and North Atlantic Oceans. We have determined the complete mitochondrial genome sequences of four specimens each of Atlantic halibut (Hippoglossus hippoglossus), Pacific halibut (Hippoglossus stenolepis) and Greenland halibut (Reinhardtius hippoglossoides), and assessed the nucleotide variability within and between species.

Results

About 100 variable positions were identified within the four specimens in each halibut species, with the control regions as the most variable parts of the genomes (10 times that of the mitochondrial ribosomal DNA). Due to tandem repeat arrays, the control regions have unusually large sizes compared to most vertebrate mtDNAs. The arrays are highly heteroplasmic in size and consist mainly of different variants of a 61-bp motif. Halibut mitochondrial genomes lacking arrays were also detected.

Conclusion

The complexity, distribution, and biological role of the heteroplasmic tandem repeat arrays in halibut mitochondrial control regions are discussed. We conclude that the most plausible explanation for array maintenance includes both the slipped-strand mispairing and DNA recombination mechanisms.  相似文献   

19.

Background

Less than two percent of the human genome is protein coding, yet that small fraction harbours the majority of known disease causing mutations. Despite rapidly falling whole genome sequencing (WGS) costs, much research and increasingly the clinical use of sequence data is likely to remain focused on the protein coding exome. We set out to quantify and understand how WGS compares with the targeted capture and sequencing of the exome (exome-seq), for the specific purpose of identifying single nucleotide polymorphisms (SNPs) in exome targeted regions.

Results

We have compared polymorphism detection sensitivity and systematic biases using a set of tissue samples that have been subject to both deep exome and whole genome sequencing. The scoring of detection sensitivity was based on sequence down sampling and reference to a set of gold-standard SNP calls for each sample. Despite evidence of incremental improvements in exome capture technology over time, whole genome sequencing has greater uniformity of sequence read coverage and reduced biases in the detection of non-reference alleles than exome-seq. Exome-seq achieves 95% SNP detection sensitivity at a mean on-target depth of 40 reads, whereas WGS only requires a mean of 14 reads. Known disease causing mutations are not biased towards easy or hard to sequence areas of the genome for either exome-seq or WGS.

Conclusions

From an economic perspective, WGS is at parity with exome-seq for variant detection in the targeted coding regions. WGS offers benefits in uniformity of read coverage and more balanced allele ratio calls, both of which can in most cases be offset by deeper exome-seq, with the caveat that some exome-seq targets will never achieve sufficient mapped read depth for variant detection due to technical difficulties or probe failures. As WGS is intrinsically richer data that can provide insight into polymorphisms outside coding regions and reveal genomic rearrangements, it is likely to progressively replace exome-seq for many applications.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-247) contains supplementary material, which is available to authorized users.  相似文献   

20.

Background

Abel and Trevors have delineated three aspects of sequence complexity, Random Sequence Complexity (RSC), Ordered Sequence Complexity (OSC) and Functional Sequence Complexity (FSC) observed in biosequences such as proteins. In this paper, we provide a method to measure functional sequence complexity.

Methods and Results

We have extended Shannon uncertainty by incorporating the data variable with a functionality variable. The resulting measured unit, which we call Functional bit (Fit), is calculated from the sequence data jointly with the defined functionality variable. To demonstrate the relevance to functional bioinformatics, a method to measure functional sequence complexity was developed and applied to 35 protein families. Considerations were made in determining how the measure can be used to correlate functionality when relating to the whole molecule and sub-molecule. In the experiment, we show that when the proposed measure is applied to the aligned protein sequences of ubiquitin, 6 of the 7 highest value sites correlate with the binding domain.

Conclusion

For future extensions, measures of functional bioinformatics may provide a means to evaluate potential evolving pathways from effects such as mutations, as well as analyzing the internal structural and functional relationships within the 3-D structure of proteins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号