共查询到20条相似文献,搜索用时 0 毫秒
3.
BackgroundHigh-throughput DNA sequencing technologies are generating vast amounts of data. Fast, flexible and memory efficient implementations are needed in order to facilitate analyses of thousands of samples simultaneously. ResultsWe present a multithreaded program suite called ANGSD. This program can calculate various summary statistics, and perform association mapping and population genetic analyses utilizing the full information in next generation sequencing data by working directly on the raw sequencing data or by using genotype likelihoods. ConclusionsThe open source c/c++ program ANGSD is available at http://www.popgen.dk/angsd. The program is tested and validated on GNU/Linux systems. The program facilitates multiple input formats including BAM and imputed beagle genotype probability files. The program allow the user to choose between combinations of existing methods and can perform analysis that is not implemented elsewhere. Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-014-0356-4) contains supplementary material, which is available to authorized users. 相似文献
4.
Next generation sequencing (NGS) of PCR amplicons is a standard approach to detect genetic variations in personalized medicine such as cancer diagnostics. Computer programs used in the NGS community often miss insertions and deletions (indels) that constitute a large part of known human mutations. We have developed HeurAA, an open source, heuristic amplicon aligner program. We tested the program on simulated datasets as well as experimental data from multiplex sequencing of 40 amplicons in 12 oncogenes collected on a 454 Genome Sequencer from lung cancer cell lines. We found that HeurAA can accurately detect all indels, and is more than an order of magnitude faster than previous programs. HeurAA can compare reads and reference sequences up to several thousand base pairs in length, and it can evaluate data from complex mixtures containing reads of different gene-segments from different samples. HeurAA is written in C and Perl for Linux operating systems, the code and the documentation are available for research applications at http://sourceforge.net/projects/heuraa/ 相似文献
5.
We consider the design and evaluation of short barcodes, with a length between six and eight nucleotides, used for parallel sequencing on platforms where substitution errors dominate. Such codes should have not only good error correction properties but also the code words should fulfil certain biological constraints ( experimental parameters). We compare published barcodes with codes obtained by two new constructions methods, one based on the currently best known linear codes and a simple randomized construction method. The evaluation done is with respect to the error correction capabilities, barcode size and their experimental parameters and fundamental bounds on the code size and their distance properties. We provide a list of codes for lengths between six and eight nucleotides, where for length eight, two substitution errors can be corrected. In fact, no code with larger minimum distance can exist. 相似文献
6.
ABSTRACT: BACKGROUND: With the advent of next-generation sequencing there is an increased demand for tools to pre-process and handle the vast amounts of data generated. One recurring problem is adapter contamination in the reads, i.e. the partial or complete sequencing of adapter sequences. These adapter sequences have to be removed as they can hinder correct mapping of the reads and influence SNP calling and other downstream analyses. FINDINGS: We present a tool called AdapterRemoval which is able to pre-process both single and paired-end data. The program locates and removes adapter residues from the reads, it is able to combine paired reads if they overlap, and it can optionally trim low-quality nucleotides. Furthermore, it can look for adapter sequence in both the 5' and 3' ends of the reads. This is a flexible method that can be tuned to accommodate different experimental settings and sequencing platforms producing FASTQ files. AdapterRemoval is shown to be good at trimming adapters from both single-end and paired-end data. CONCLUSIONS: AdapterRemoval is a comprehensive tool for analyzing next-generation sequencing data. It exhibits good performance both in terms of sensitivity and specificity. AdapterRemoval has already been used in various large projects and it is possible to extend it further to accommodate application-specific biases in the data. 相似文献
7.
Next generation sequencing (NGS) allows whole exome or whole genome sequencing for a given patient to be performed timely and at reasonable cost. This diagnostic quantum leap not only has various legal, ethical and economical aspects but will naturally also impact upon patient care. Currently, however, the wide-spread introduction of NGS into routine diagnostics is facing many obstacles. In particular, it is to be expected that NGS will identify a large number of rare variants in a given patient that are of (yet) unknown clinical significance. As a first step towards solving this problem, we introduce the concept of a database that will systematically integrate genotypic and phenotypic information from the German health care context. Not only will this resource be of great scientific value, but the database shall also provide human geneticists with the evidence base necessary for the reliable evaluation of their patient-related sequencing data. 相似文献
9.
BackgroundEpilepsy is genetically complex neurological disorder affecting millions of people of different age groups varying in its type and severity. Copy number variants (CNVs) are key players in the genetic etiology of numerous neurodevelopmental disorders and prior findings also revealed that chromosomal aberrations are more susceptible against the pathogenesis of epilepsy. Novel technologies, such as array comparative genomic hybridization (array-CGH), may help to uncover the pathogenic CNVs in patients with epilepsy. ResultsThis study was carried out by high density whole genome array-CGH analysis with blood DNA samples from a cohort of 22 epilepsy patients to search for CNVs associated with epilepsy. Pathogenic rearrangements which include 6p12.1 microduplications in 5 patients covering a total region of 99.9kb and 7q32.3 microdeletions in 3 patients covering a total region of 63.9kb were detected. Two genes BMP5 and PODXL were located in the predicted duplicated and deleted regions respectively. Furthermore, these CNV findings were confirmed by qPCR. ConclusionWe have described, for the first time, several novel CNVs/genes implicated in epilepsy in the Saudi population. These findings enable us to better describe the genetic variations in epilepsy, and could provide a foundation for understanding the critical regions of the genome which might be involved in the development of epilepsy. 相似文献
10.
近几年飞速发展的高通量测序技术(next generation sequencing,NGS)在生命科学研究的各个领域充分展现了其低成本、高通量和应用面广等优势。在现代农业生物技术领域,利用高通量测序技术,科学家们不仅能更经济而高效对农作物、模式植物或不同栽培品种进行深入的全基因组测序、重测序,也可以对成百上千的栽培品种进行高效而准确的遗传差异分析、分子标记分析、连锁图谱分析、表观遗传学分析、转录组分析,进而改进农作物的育种技术,加快新品种的育种研究。其中,获得农作物的全基因组序列是其他研究和分析的基础。本文通过介绍近年来发表的一些利用高通量测序技术进行的农作物全基因组测定和组装的工作,展示高通量测序技术在现代农业生物技术领域的广泛前景以及其建立起来的研究基础。 相似文献
11.
The preferred source of DNA in human genetics research is blood, or cell lines derived from blood, as these sources yield large quantities of high quality DNA. However, DNA extraction from saliva can yield high quality DNA with little to no degradation/fragmentation that is suitable for a variety of DNA assays without the expense of a phlebotomist and can even be acquired through the mail. However, at present, no saliva DNA collection/extraction protocols for next generation sequencing have been presented in the literature. This protocol optimizes parameters of saliva collection/storage and DNA extraction to be of sufficient quality and quantity for DNA assays with the highest standards, including microarray genotyping and next generation sequencing. 相似文献
12.
Understanding the Maxam-Gilbert and Sanger sequencing as the first generation, in recent years there has been an explosion of newly-developed sequencing strategies, which are usually referred to as next generation sequencing (NGS) techniques. NGS techniques have high-throughputs and produce thousands or even millions of sequences at the same time. These sequences allow for the accurate identification of microbial taxa, including uncultivable organisms and those present in small numbers. In specific applications, NGS provides a complete inventory of all microbial operons and genes present or being expressed under different study conditions. NGS techniques are revolutionizing the field of microbial ecology and have recently been used to examine several food ecosystems. After a short introduction to the most common NGS systems and platforms, this review addresses how NGS techniques have been employed in the study of food microbiota and food fermentations, and discusses their limits and perspectives. The most important findings are reviewed, including those made in the study of the microbiota of milk, fermented dairy products, and plant-, meat- and fish-derived fermented foods. The knowledge that can be gained on microbial diversity, population structure and population dynamics via the use of these technologies could be vital in improving the monitoring and manipulation of foods and fermented food products. They should also improve their safety. 相似文献
13.
Inference of population structure and individual ancestry is important both for population genetics and for association studies. With next generation sequencing technologies it is possible to obtain genetic data for all accessible genetic variations in the genome. Existing methods for admixture analysis rely on known genotypes. However, individual genotypes cannot be inferred from low-depth sequencing data without introducing errors. This article presents a new method for inferring an individual’s ancestry that takes the uncertainty introduced in next generation sequencing data into account. This is achieved by working directly with genotype likelihoods that contain all relevant information of the unobserved genotypes. Using simulations as well as publicly available sequencing data, we demonstrate that the presented method has great accuracy even for very low-depth data. At the same time, we demonstrate that applying existing methods to genotypes called from the same data can introduce severe biases. The presented method is implemented in the NGSadmix software available at http://www.popgen.dk/software. 相似文献
14.
Next generation sequencing technologies, like ultra-deep pyrosequencing (UDPS), allows detailed investigation of complex populations, like RNA viruses, but its utility is limited by errors introduced during sample preparation and sequencing. By tagging each individual cDNA molecule with barcodes, referred to as Primer IDs, before PCR and sequencing these errors could theoretically be removed. Here we evaluated the Primer ID methodology on 257,846 UDPS reads generated from a HIV-1 SG3Δenv plasmid clone and plasma samples from three HIV-infected patients. The Primer ID consisted of 11 randomized nucleotides, 4,194,304 combinations, in the primer for cDNA synthesis that introduced a unique sequence tag into each cDNA molecule. Consensus template sequences were constructed for reads with Primer IDs that were observed three or more times. Despite high numbers of input template molecules, the number of consensus template sequences was low. With 10,000 input molecules for the clone as few as 97 consensus template sequences were obtained due to highly skewed frequency of resampling. Furthermore, the number of sequenced templates was overestimated due to PCR errors in the Primer IDs. Finally, some consensus template sequences were erroneous due to hotspots for UDPS errors. The Primer ID methodology has the potential to provide highly accurate deep sequencing. However, it is important to be aware that there are remaining challenges with the methodology. In particular it is important to find ways to obtain a more even frequency of resampling of template molecules as well as to identify and remove artefactual consensus template sequences that have been generated by PCR errors in the Primer IDs. 相似文献
15.
Retinal dystrophies (RD) constitute a group of blinding diseases that are characterized by clinical variability and pronounced genetic heterogeneity. The different nonsyndromic and syndromic forms of RD can be attributed to mutations in more than 200 genes. Consequently, next generation sequencing (NGS) technologies are among the most promising approaches to identify mutations in RD. We screened a large cohort of patients comprising 89 independent cases and families with various subforms of RD applying different NGS platforms. While mutation screening in 50 cases was performed using a RD gene capture panel, 47 cases were analyzed using whole exome sequencing. One family was analyzed using whole genome sequencing. A detection rate of 61% was achieved including mutations in 34 known and two novel RD genes. A total of 69 distinct mutations were identified, including 39 novel mutations. Notably, genetic findings in several families were not consistent with the initial clinical diagnosis. Clinical reassessment resulted in refinement of the clinical diagnosis in some of these families and confirmed the broad clinical spectrum associated with mutations in RD genes. 相似文献
17.
遗传病的防治是公共卫生领域的重大课题,而明确病因是遗传病防治的重要环节。高通量测序技术(又称二代测序技术)具有高通量、低成本、高准确度的优点,为遗传诊断及咨询提供了直接证据,已成为遗传学检测不可或缺的有力工具;第三代测序也凭借其长读长的独特优势在临床应用中占据一席之地。二代及三代测序技术各有特点,互为补充,临床中针对不同的检测需求有多种类型的测序方案可供选择。基于此,对二代及三代测序技术的原理、分类及其在遗传学诊断中的应用进展做一综述,以期为临床测序方案的选择提供思路和指导。 相似文献
18.
很多的人类疾病与基因突变有关,基因突变在疾病的诊断和治疗中起到了至关重要的作用.第二代高通量测序,其特点为通量高、速度快、成本低,给检测基因突变带来了革命性的变化.该技术检测基因突变的流程简单,研究人员运用全基因组从测序,目标基因组测序以及转录组测序能够实现基因突变的全方位、高准确的检测. 相似文献
19.
Histopathological samples are a treasure-trove of DNA for clinical research. However, the quality of DNA can vary depending on the source or extraction method applied. Thus a standardized and cost-effective workflow for the qualification of DNA preparations is essential to guarantee interlaboratory reproducible results. The qualification process consists of the quantification of double strand DNA (dsDNA) and the assessment of its suitability for downstream applications, such as high-throughput next-generation sequencing. We tested the two most frequently used instrumentations to define their role in this process: NanoDrop, based on UV spectroscopy, and Qubit 2.0, which uses fluorochromes specifically binding dsDNA. Quantitative PCR (qPCR) was used as the reference technique as it simultaneously assesses DNA concentration and suitability for PCR amplification. We used 17 genomic DNAs from 6 fresh-frozen (FF) tissues, 6 formalin-fixed paraffin-embedded (FFPE) tissues, 3 cell lines, and 2 commercial preparations. Intra- and inter-operator variability was negligible, and intra-methodology variability was minimal, while consistent inter-methodology divergences were observed. In fact, NanoDrop measured DNA concentrations higher than Qubit and its consistency with dsDNA quantification by qPCR was limited to high molecular weight DNA from FF samples and cell lines, where total DNA and dsDNA quantity virtually coincide. In partially degraded DNA from FFPE samples, only Qubit proved highly reproducible and consistent with qPCR measurements. Multiplex PCR amplifying 191 regions of 46 cancer-related genes was designated the downstream application, using 40 ng dsDNA from FFPE samples calculated by Qubit. All but one sample produced amplicon libraries suitable for next-generation sequencing. NanoDrop UV-spectrum verified contamination of the unsuccessful sample. In conclusion, as qPCR has high costs and is labor intensive, an alternative effective standard workflow for qualification of DNA preparations should include the sequential combination of NanoDrop and Qubit to assess the purity and quantity of dsDNA, respectively. 相似文献
20.
转录组研究一直是生命科学研究的一个重要方向,在第二代测序技术问世以前,已经产生了一些行之有效的转录组研究方法,但这些方法存在一定的局限性。第二代测序技术的出现不仅使转录组研究很快进入了高速发展期,同时也为遗传资源的挖掘提供了一套全新的技术平台。本文简要介绍了第二代测序技术的化学原理和特性,重点阐述了利用第二代测序技术进行转录组测序,从而在此基础上挖掘遗传资源的研究。 相似文献
|