首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
The importance of next generation sequencing (NGS) rises in cancer research as accessing this key technology becomes easier for researchers. The sequence data created by NGS technologies must be processed by various bioinformatics algorithms within a pipeline in order to convert raw data to meaningful information. Mapping and variant calling are the two main steps of these analysis pipelines, and many algorithms are available for these steps. Therefore, detailed benchmarking of these algorithms in different scenarios is crucial for the efficient utilization of sequencing technologies. In this study, we compared the performance of twelve pipelines (three mapping and four variant discovery algorithms) with recommended settings to capture single nucleotide variants. We observed significant discrepancy in variant calls among tested pipelines for different heterogeneity levels in real and simulated samples with overall high specificity and low sensitivity. Additional to the individual evaluation of pipelines, we also constructed and tested the performance of pipeline combinations. In these analyses, we observed that certain pipelines complement each other much better than others and display superior performance than individual pipelines. This suggests that adhering to a single pipeline is not optimal for cancer sequencing analysis and sample heterogeneity should be considered in algorithm optimization.  相似文献   

3.

Background

Advances in human genomics have allowed unprecedented productivity in terms of algorithms, software, and literature available for translating raw next-generation sequence data into high-quality information. The challenges of variant identification in organisms with lower quality reference genomes are less well documented. We explored the consequences of commonly recommended preparatory steps and the effects of single and multi sample variant identification methods using four publicly available software applications (Platypus, HaplotypeCaller, Samtools and UnifiedGenotyper) on whole genome sequence data of 65 key ancestors of Swiss dairy cattle populations. Accuracy of calling next-generation sequence variants was assessed by comparison to the same loci from medium and high-density single nucleotide variant (SNV) arrays.

Results

The total number of SNVs identified varied by software and method, with single (multi) sample results ranging from 17.7 to 22.0 (16.9 to 22.0) million variants. Computing time varied considerably between software. Preparatory realignment of insertions and deletions and subsequent base quality score recalibration had only minor effects on the number and quality of SNVs identified by different software, but increased computing time considerably. Average concordance for single (multi) sample results with high-density chip data was 58.3% (87.0%) and average genotype concordance in correctly identified SNVs was 99.2% (99.2%) across software. The average quality of SNVs identified, measured as the ratio of transitions to transversions, was higher using single sample methods than multi sample methods. A consensus approach using results of different software generally provided the highest variant quality in terms of transition/transversion ratio.

Conclusions

Our findings serve as a reference for variant identification pipeline development in non-human organisms and help assess the implication of preparatory steps in next-generation sequencing pipelines for organisms with incomplete reference genomes (pipeline code is included). Benchmarking this information should prove particularly useful in processing next-generation sequencing data for use in genome-wide association studies and genomic selection.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-948) contains supplementary material, which is available to authorized users.  相似文献   

4.
This article reviews basic concepts,general applications,and the potential impact of next-generation sequencing(NGS)technologies on genomics,with particular reference to currently available and possible future platforms and bioinformatics.NGS technologies have demonstrated the capacity to sequence DNA at unprecedented speed,thereby enabling previously unimaginable scientific achievements and novel biological applications.But,the massive data produced by NGS also presents a significant challenge for data storage,analyses,and management solutions.Advanced bioinformatic tools are essential for the successful application of NGS technology.As evidenced throughout this review,NGS technologies will have a striking impact on genomic research and the entire biological field.With its ability to tackle the unsolved challenges unconquered by previous genomic technologies,NGS is likely to unravel the complexity of the human genome in terms of genetic variations,some of which may be confined to susceptible loci for some common human conditions.The impact of NGS technologies on genomics will be far reaching and likely change the field for years to come.  相似文献   

5.
BM-map: Bayesian mapping of multireads for next-generation sequencing data   总被引:1,自引:0,他引:1  
Ji Y  Xu Y  Zhang Q  Tsui KW  Yuan Y  Norris C  Liang S  Liang H 《Biometrics》2011,67(4):1215-1224
Next-generation sequencing (NGS) technology generates millions of short reads, which provide valuable information for various aspects of cellular activities and biological functions. A key step in NGS applications (e.g., RNA-Seq) is to map short reads to correct genomic locations within the source genome. While most reads are mapped to a unique location, a significant proportion of reads align to multiple genomic locations with equal or similar numbers of mismatches; these are called multireads. The ambiguity in mapping the multireads may lead to bias in downstream analyses. Currently, most practitioners discard the multireads in their analysis, resulting in a loss of valuable information, especially for the genes with similar sequences. To refine the read mapping, we develop a Bayesian model that computes the posterior probability of mapping a multiread to each competing location. The probabilities are used for downstream analyses, such as the quantification of gene expression. We show through simulation studies and RNA-Seq analysis of real life data that the Bayesian method yields better mapping than the current leading methods. We provide a C++ program for downloading that is being packaged into a user-friendly software.  相似文献   

6.
拷贝数变异是指基因组中发生大片段的DNA序列的拷贝数增加或者减少。根据现有的研究可知,拷贝数变异是多种人类疾病的成因,与其发生与发展机制密切相关。高通量测序技术的出现为拷贝数变异检测提供了技术支持,在人类疾病研究、临床诊疗等领域,高通量测序技术已经成为主流的拷贝数变异检测技术。虽然不断有新的基于高通量测序技术的算法和软件被人们开发出来,但是准确率仍然不理想。本文全面地综述基于高通量测序数据的拷贝数变异检测方法,包括基于reads深度的方法、基于双末端映射的方法、基于拆分read的方法、基于从头拼接的方法以及基于上述4种方法的组合方法,深入探讨了每类不同方法的原理,代表性的软件工具以及每类方法适用的数据以及优缺点等,并展望未来的发展方向。  相似文献   

7.
The impact of next-generation sequencing technology on genetics   总被引:26,自引:2,他引:24  
If one accepts that the fundamental pursuit of genetics is to determine the genotypes that explain phenotypes, the meteoric increase of DNA sequence information applied toward that pursuit has nowhere to go but up. The recent introduction of instruments capable of producing millions of DNA sequence reads in a single run is rapidly changing the landscape of genetics, providing the ability to answer questions with heretofore unimaginable speed. These technologies will provide an inexpensive, genome-wide sequence readout as an endpoint to applications ranging from chromatin immunoprecipitation, mutation mapping and polymorphism discovery to noncoding RNA discovery. Here I survey next-generation sequencing technologies and consider how they can provide a more complete picture of how the genome shapes the organism.  相似文献   

8.
罗佳  李薇  段云峰  王沥  金锋 《微生物学通报》2014,41(7):1368-1375
【目的】大量的证据表明机体正常的免疫活动在很大程度上依赖于免疫系统和肠道菌群的相互作用,具体表现为免疫系统对病原菌进行免疫清除而对益生菌耐受。其中,免疫系统的Toll样受体(Toll-like receptors,TLRs)和来自肠道菌群的微生物相关的分子模型(Microorganism associated molecular patterns,MAMPs)被认为在宿主免疫系统对病原菌和益生菌的区分中发挥了重要作用,因为TLRs对MAMPs的识别能够激活先天性和获得性免疫反应。在TLRs对MAMPs的识别中,只有TLR5对细菌鞭毛蛋白的识别是基于蛋白-蛋白的相互作用,比较容易对其结合方式进行研究。因此,我们研究的主要目的就是要确定TLR5与鞭毛蛋白的相互作用是如何影响宿主区分病原菌和益生菌的。【方法】构建了多种肠道细菌(包括益生菌和病原菌)鞭毛蛋白的系统发育树,并比对了鞭毛蛋白的TLR5识别序列。【结果】发现病原菌和益生菌的鞭毛蛋白序列有所不同,尤其是TLR5结合并识别的鞭毛蛋白位点。【结论】病原菌和益生菌不同的鞭毛蛋白识别区域可能是鞭毛细菌适应TLR5识别下生存的结果,据此宿主能够对病原菌和益生菌进行区分。此外,相关研究表明TLRs在肠上皮细胞的分布具有基底侧和顶端的两极性,能够分别引发对病原菌的免疫反应和对益生菌的免疫耐受,从而抵御病原菌的入侵感染、与益生菌和平共处。鞭毛蛋白和TLR5蛋白的相互作用反映了肠道菌群和免疫系统在分子层面的相互作用和共同进化,是宿主区分病原菌和益生菌的分子机制之一。  相似文献   

9.
Single nucleotide polymorphisms SNPs are rapidly replacing anonymous markers in population genomic studies, but their use in non model organisms is hampered by the scarcity of cost‐effective approaches to uncover genome‐wide variation in a comprehensive subset of individuals. The screening of one or only a few individuals induces ascertainment bias. To discover SNPs for a population genomic study of the Pyrenean rocket (Sisymbrium austriacum subsp. chrysanthum), we undertook a pooled RAD‐PE (Restriction site Associated DNA Paired‐End sequencing) approach. RAD tags were generated from the PstI‐digested pooled genomic DNA of 12 individuals sampled across the species distribution range and paired‐end sequenced using Illumina technology to produce ~24.5 Mb of sequences, covering ~7% of the specie's genome. Sequences were assembled into ~76 000 contigs with a mean length of 323 bp (N50 = 357 bp, sequencing depth = 24x). In all, >15 000 SNPs were called, of which 47% were annotated in putative genic regions based on homology with the Arabidopsis thaliana genome. Gene ontology (GO) slim categorization demonstrated that the identified SNPs covered extant genic variation well. The validation of 300 SNPs on a larger set of individuals using a KASPar assay underpinned the utility of pooled RAD‐PE as an inexpensive genome‐wide SNP discovery technique (success rate: 87%). In addition to SNPs, we discovered >600 putative SSR markers.  相似文献   

10.
  1. Download : Download high-res image (149KB)
  2. Download : Download full-size image
  相似文献   

11.
High‐throughput sequencing (HTS) technologies generate millions of sequence reads from DNA/RNA molecules rapidly and cost‐effectively, enabling single investigator laboratories to address a variety of ‘omics’ questions in nonmodel organisms, fundamentally changing the way genomic approaches are used to advance biological research. One major challenge posed by HTS is the complexity and difficulty of data quality control (QC). While QC issues associated with sample isolation, library preparation and sequencing are well known and protocols for their handling are widely available, the QC of the actual sequence reads generated by HTS is often overlooked. HTS‐generated sequence reads can contain various errors, biases and artefacts whose identification and amelioration can greatly impact subsequent data analysis. However, a systematic survey on QC procedures for HTS data is still lacking. In this review, we begin by presenting standard ‘health check‐up’ QC procedures recommended for HTS data sets and establishing what ‘healthy’ HTS data look like. We next proceed by classifying errors, biases and artefacts present in HTS data into three major types of ‘pathologies’, discussing their causes and symptoms and illustrating with examples their diagnosis and impact on downstream analyses. We conclude this review by offering examples of successful ‘treatment’ protocols and recommendations on standard practices and treatment options. Notwithstanding the speed with which HTS technologies – and consequently their pathologies – change, we argue that careful QC of HTS data is an important – yet often neglected – aspect of their application in molecular ecology, and lay the groundwork for developing a HTS data QC ‘best practices’ guide.  相似文献   

12.
张林 《生物信息学》2014,12(3):179-184
为探索准确、高效、低成本、通用性并存的生物序列局部比对方法。将点阵图算法、启发式算法等各种序列局部比对算法中准确性最高的动态规划局部比对算法在计算机中实现,并通过流式模型将其映射到图形硬件上以实现算法加速,再通过实例比对搜索数据库完成比对时间和每秒百万次格点更新(MCUPS)性能值评测。结果表明,该加速算法在保证比对准确性的同时,能显著提升比对速度。与目前最快的启发式算法相比,比对平均加速为14.5倍,最高加速可达22.9倍。  相似文献   

13.
14.
We aimed to evaluate whether the occurrence of cryptic species of Paracoccidioides brasiliensis, S1, PS2, PS3 and Paracoccidioides lutzii, has implications in the immunodiagnosis of paracoccidioidomycosis (PCM). Small quantities of the antigen gp43 were found in culture filtrates of P. lutzii strains and this molecule appeared to be more variable within P. lutzii because the synonymous-nonsynonymous mutation rate was lower, indicating an evolutionary process different from that of the remaining genotypes. The production of gp43 also varied between isolates belonging to the same species, indicating that speciation events are important, but not sufficient to fully explain the diversity in the production of this antigen. The culture filtrate antigen AgEpm83, which was obtained from a PS3 isolate, showed large quantities of gp43 and reactivity by immunodiffusion assays, similar to the standard antigen (AgB-339) from an S1 isolate. Furthermore, AgEpm83 was capable of serologically differentiating five serum samples from patients from the Botucatu and Jundiaí regions. These patients had confirmed PCM but, were non-reactive to the standard antigen, thus demonstrating an alternative for serological diagnosis in regions in which S1 and PS2 occur. We also emphasise that it is not advisable to use a single antigen preparation to diagnose PCM, a disease that is caused by highly diverse pathogens.  相似文献   

15.
Small‐scale sequencing has improved substantially in recent decades, culminating in the development of next‐generation sequencing (NGS) technologies. Modern NGS methods have helped the discovery of many new plant viruses. Nevertheless, there is still a need to establish solid assembly pipelines targeting small genomes characterised by low identities to known viral sequences. Here, we describe and discuss the fundamental steps required for discovering and sequencing new plant viral genomes by NGS. A practical pipeline and standard alternative tools used in NGS analysis are presented.  相似文献   

16.
17.
The size distributions of deletions, insertions, and indels (i.e., insertions or deletions) were studied, using 78 human processed pseudogenes and other published data sets. The following results were obtained: (1) Deletions occur more frequently than do insertions in sequence evolution; none of the pseudogenes studied shows significantly more insertions than deletions. (2) Empirically, the size distributions of deletions, insertions, and indels can be described well by a power law, i.e., f k = Ck b , where f k is the frequency of deletion, insertion, or indel with gap length k, b is the power parameter, and C is the normalization factor. (3) The estimates of b for deletions and insertions from the same data set are approximately equal to each other, indicating that the size distributions for deletions and insertions are approximately identical. (4) The variation in the estimates of b among various data sets is small, indicating that the effect of local structure exists but only plays a secondary role in the size distribution of deletions and insertions. (5) The linear gap penalty, which is most commonly used in sequence alignment, is not supported by our analysis; rather, the power law for the size distribution of indels suggests that an appropriate gap penalty is w k = a + b ln k, where a is the gap creation cost and blnk is the gap extension cost. (6) The higher frequency of deletion over insertion suggests that the gap creation cost of insertion (a i ) should be larger than that of deletion (a d ); that is, a i a d = In R, where R is the frequency ratio of deletions to insertions. Correspondence to: W.-H. Li  相似文献   

18.
G-protein coupled receptors (GPCRs) are important therapeutic targets for the treatment of human disease. Although GPCRs are highly successful drug targets, there are many challenges associated with the discovery and translation of small molecule ligands that target the endogenous ligand-binding site for GPCRs. Allosteric modulators are a class of ligands that target alternative binding sites known as allosteric sites and offer fresh opportunities for the development of new therapeutics. However, only a few allosteric modulators have been approved as drugs. Advances in GPCR structural biology enabled by the cryogenic electron microscopy (cryo-EM) revolution have provided new insights into the molecular mechanism and binding location of small molecule allosteric modulators. This review highlights the latest findings from allosteric modulator-bound structures of Class A, B, and C GPCRs with a focus on small molecule ligands. Emerging methods that will facilitate cryo-EM structures of more difficult ligand-bound GPCR complexes are also discussed. The results of these studies are anticipated to aid future structure-based drug discovery efforts across many different GPCRs.  相似文献   

19.
Thirty-two ejaculates from four Holstein bulls were utilized in the study. Neat semen was diluted (1:10) with 20% egg yolk-citrate and cooled to 5 degrees C within either 0.5 or 3 h, then glycerolated, and equilibrated for 2 h. Straws were frozen over liquid nitrogen, stored at -196 degrees C, and thawed at 46, 37, or 23 degrees C for 12, 20, or 60 sec, respectively. The average percentages of progressive motility and unstained cells post thawing increased (34.5 vs 31.4%; P<0.01 and 41.9 vs 37.1%; P<0.002, respectively) for semen collected in polyethylene rather than in rubber liner-collection cones. The post-thaw mean progressive motility for spermatozoa cooled to 5 degrees C within 3 vs 0.5 h was higher (34.5 vs 31.5%; P<0.0001, respectively). Similar differences were observed for unstained cells (40.7 vs 38.2%; P<0.0001, respectively). Mean acrosomal scores were lower when cells were collected in polyethylene (1.00 vs 1.03; P<0.02) and also when a 3-h cooling period was used (1.01 vs 1.03; P<0.0001). Spermatozoa thawed at higher temperatures exhibited increases (P<0.0001) in both progressive motility and the percentage of unstained cells. The average percentages of progressive motility and unstained cells were 37.1, 35.7 and 26.1% and 43.4, 41.5 and 33.5%, respectively, at thawing temperatures of 46, 37 and 23 degrees C. Based on the results of this study, optimal conditions for bovine semen handling were the collection of semen in polyethylene liner-collection cones, preglycerolation cooling within 3 h, and thawing at 46 degrees C for 12 sec.  相似文献   

20.
用生物信息学的方法分析不同物种的血清白蛋白的亲缘关系,分析降血糖药物米格列醇和伏格列波糖与人血清白蛋白相互作用位点在其他亲缘关系较近的物种中相应的氨基酸变化特点。结果表明米格列醇、伏格列波糖与人血清白蛋白的结合位点都位于人血清白蛋白亚区IB的疏水腔中,其间的主要作用力是氢键和疏水作用力。米格列醇和伏格列波糖与血清白蛋白结合位点处的氨基酸在其他物种中大部分都是保守的,只有少数的氨基酸不同,且极性也不相同。血清白蛋白疏水性分析发现米格列醇和伏格列波糖与血清白蛋白结合位点处的氨基酸中亲水性的较多,疏水性的少,在其他4个亲缘关系较近的物种也具有同样的现象。这些分析结果为进一步研究降血糖药物在其他物种中的表现及相互作用等提供了重要的科学依据。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号