首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.

Background

High-throughput DNA sequencing technologies are generating vast amounts of data. Fast, flexible and memory efficient implementations are needed in order to facilitate analyses of thousands of samples simultaneously.

Results

We present a multithreaded program suite called ANGSD. This program can calculate various summary statistics, and perform association mapping and population genetic analyses utilizing the full information in next generation sequencing data by working directly on the raw sequencing data or by using genotype likelihoods.

Conclusions

The open source c/c++ program ANGSD is available at http://www.popgen.dk/angsd. The program is tested and validated on GNU/Linux systems. The program facilitates multiple input formats including BAM and imputed beagle genotype probability files. The program allow the user to choose between combinations of existing methods and can perform analysis that is not implemented elsewhere.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0356-4) contains supplementary material, which is available to authorized users.  相似文献   

3.
《Cell》2012,148(6):1073-1075
  相似文献   

4.
We consider the design and evaluation of short barcodes, with a length between six and eight nucleotides, used for parallel sequencing on platforms where substitution errors dominate. Such codes should have not only good error correction properties but also the code words should fulfil certain biological constraints (experimental parameters). We compare published barcodes with codes obtained by two new constructions methods, one based on the currently best known linear codes and a simple randomized construction method. The evaluation done is with respect to the error correction capabilities, barcode size and their experimental parameters and fundamental bounds on the code size and their distance properties. We provide a list of codes for lengths between six and eight nucleotides, where for length eight, two substitution errors can be corrected. In fact, no code with larger minimum distance can exist.  相似文献   

5.
Noncoding DNA sequences (NCS) have attracted much attention recently due to their functional potentials. Here we attempted to reveal the functional roles of noncoding sequences from the point of view of natural selection that typically indicates the functional potentials of certain genomic elements. We analyzed nearly 37 million single nucleotide polymorphisms (SNPs) of Phase I data of the 1000 Genomes Project. We estimated a series of key parameters of population genetics and molecular evolution to characterize sequence variations of the noncoding genome within and between populations, and identified the natural selection footprints in NCS in worldwide human populations. Our results showed that purifying selection is prevalent and there is substantial constraint of variations in NCS, while positive selectionis more likely to be specific to some particular genomic regions and regional populations. Intriguingly, we observed larger fraction of non-conserved NCS variants with lower derived allele frequency in the genome, indicating possible functional gain of non-conserved NCS. Notably, NCS elements are enriched for potentially functional markers such as eQTLs, TF motif, and DNase I footprints in the genome. More interestingly, some NCS variants associated with diseases such as Alzheimer''s disease, Type 1 diabetes, and immune-related bowel disorder (IBD) showed signatures of positive selection, although the majority of NCS variants, reported as risk alleles by genome-wide association studies, showed signatures of negative selection. Our analyses provided compelling evidence of natural selection forces on noncoding sequences in the human genome and advanced our understanding of their functional potentials that play important roles in disease etiology and human evolution.  相似文献   

6.
7.
Next generation sequencing (NGS) allows whole exome or whole genome sequencing for a given patient to be performed timely and at reasonable cost. This diagnostic quantum leap not only has various legal, ethical and economical aspects but will naturally also impact upon patient care. Currently, however, the wide-spread introduction of NGS into routine diagnostics is facing many obstacles. In particular, it is to be expected that NGS will identify a large number of rare variants in a given patient that are of (yet) unknown clinical significance. As a first step towards solving this problem, we introduce the concept of a database that will systematically integrate genotypic and phenotypic information from the German health care context. Not only will this resource be of great scientific value, but the database shall also provide human geneticists with the evidence base necessary for the reliable evaluation of their patient-related sequencing data.  相似文献   

8.
The preferred source of DNA in human genetics research is blood, or cell lines derived from blood, as these sources yield large quantities of high quality DNA. However, DNA extraction from saliva can yield high quality DNA with little to no degradation/fragmentation that is suitable for a variety of DNA assays without the expense of a phlebotomist and can even be acquired through the mail. However, at present, no saliva DNA collection/extraction protocols for next generation sequencing have been presented in the literature. This protocol optimizes parameters of saliva collection/storage and DNA extraction to be of sufficient quality and quantity for DNA assays with the highest standards, including microarray genotyping and next generation sequencing.  相似文献   

9.
ABSTRACT: BACKGROUND: With the advent of next-generation sequencing there is an increased demand for tools to pre-process and handle the vast amounts of data generated. One recurring problem is adapter contamination in the reads, i.e. the partial or complete sequencing of adapter sequences. These adapter sequences have to be removed as they can hinder correct mapping of the reads and influence SNP calling and other downstream analyses. FINDINGS: We present a tool called AdapterRemoval which is able to pre-process both single and paired-end data. The program locates and removes adapter residues from the reads, it is able to combine paired reads if they overlap, and it can optionally trim low-quality nucleotides. Furthermore, it can look for adapter sequence in both the 5' and 3' ends of the reads. This is a flexible method that can be tuned to accommodate different experimental settings and sequencing platforms producing FASTQ files. AdapterRemoval is shown to be good at trimming adapters from both single-end and paired-end data. CONCLUSIONS: AdapterRemoval is a comprehensive tool for analyzing next-generation sequencing data. It exhibits good performance both in terms of sensitivity and specificity. AdapterRemoval has already been used in various large projects and it is possible to extend it further to accommodate application-specific biases in the data.  相似文献   

10.
Understanding the Maxam-Gilbert and Sanger sequencing as the first generation, in recent years there has been an explosion of newly-developed sequencing strategies, which are usually referred to as next generation sequencing (NGS) techniques. NGS techniques have high-throughputs and produce thousands or even millions of sequences at the same time. These sequences allow for the accurate identification of microbial taxa, including uncultivable organisms and those present in small numbers. In specific applications, NGS provides a complete inventory of all microbial operons and genes present or being expressed under different study conditions. NGS techniques are revolutionizing the field of microbial ecology and have recently been used to examine several food ecosystems. After a short introduction to the most common NGS systems and platforms, this review addresses how NGS techniques have been employed in the study of food microbiota and food fermentations, and discusses their limits and perspectives. The most important findings are reviewed, including those made in the study of the microbiota of milk, fermented dairy products, and plant-, meat- and fish-derived fermented foods. The knowledge that can be gained on microbial diversity, population structure and population dynamics via the use of these technologies could be vital in improving the monitoring and manipulation of foods and fermented food products. They should also improve their safety.  相似文献   

11.
RT-PCR法扩增的人溶菌酶cDNA的克隆及其核苷酸顺序分析   总被引:1,自引:0,他引:1  
本文以人胎盘全RNA为底物进行逆转录-聚合酶链反应(RT-PCR),制备出了人溶菌酶的cDNA片段。在限制性内切酶Sma Ⅰ存在的连接体系内,将此cDNA克隆入载体pUC12的Sma Ⅰ位点。用重组质粒双链DNA的末端终止法测定了其全部的核苷酸顺序,证明其全长为444bp,编码了18个氨基酸的信号肽和130个氨基酸的成熟蛋白组成的溶菌酶的前体蛋白,并证明已成功地在此cDNA的3′末端导入了两个终止密码子及一个限制性内切酶Sal Ⅰ的识别位点。由中国人溶菌酶cDNA推导出的氨基酸顺序与有关报道不同,有5个氨基酸的改变。表达蛋白的研究工作正在进行中。  相似文献   

12.
运用高通量测序技术分析复杂样品中微生物种群的变化情况,已经成为目前微生物研究领域的热点问题之一。而微生物的样品准备,如DNA提取和16S可变区的扩增等,对于测序完成后的数据分析以及微生物原始群落组成的影响是至关重要的。采用国产试剂盒(天根土壤微生物基因组提取试剂盒)和进口试剂盒(MOBIO土壤微生物基因组提取试剂盒)分别对土壤样品和羊瘤胃食糜样品进行DNA提取。然后选取总DNA起始量为25ng,对16S V3可变区进行PCR扩增和文库构建,最后通过数据分析比较不同试剂盒提取的DNA对微生物多样性变化的影响,包括OTU数目、稀释曲线、微生物数量及物种种类等。研究发现,在相同DNA模板量和PCR条件下,进口试剂盒提取的DNA能够获得更多的微生物种类。  相似文献   

13.
Inference of population structure and individual ancestry is important both for population genetics and for association studies. With next generation sequencing technologies it is possible to obtain genetic data for all accessible genetic variations in the genome. Existing methods for admixture analysis rely on known genotypes. However, individual genotypes cannot be inferred from low-depth sequencing data without introducing errors. This article presents a new method for inferring an individual’s ancestry that takes the uncertainty introduced in next generation sequencing data into account. This is achieved by working directly with genotype likelihoods that contain all relevant information of the unobserved genotypes. Using simulations as well as publicly available sequencing data, we demonstrate that the presented method has great accuracy even for very low-depth data. At the same time, we demonstrate that applying existing methods to genotypes called from the same data can introduce severe biases. The presented method is implemented in the NGSadmix software available at http://www.popgen.dk/software.  相似文献   

14.
遗传病的防治是公共卫生领域的重大课题,而明确病因是遗传病防治的重要环节。高通量测序技术(又称二代测序技术)具有高通量、低成本、高准确度的优点,为遗传诊断及咨询提供了直接证据,已成为遗传学检测不可或缺的有力工具;第三代测序也凭借其长读长的独特优势在临床应用中占据一席之地。二代及三代测序技术各有特点,互为补充,临床中针对不同的检测需求有多种类型的测序方案可供选择。基于此,对二代及三代测序技术的原理、分类及其在遗传学诊断中的应用进展做一综述,以期为临床测序方案的选择提供思路和指导。  相似文献   

15.
Histopathological samples are a treasure-trove of DNA for clinical research. However, the quality of DNA can vary depending on the source or extraction method applied. Thus a standardized and cost-effective workflow for the qualification of DNA preparations is essential to guarantee interlaboratory reproducible results. The qualification process consists of the quantification of double strand DNA (dsDNA) and the assessment of its suitability for downstream applications, such as high-throughput next-generation sequencing. We tested the two most frequently used instrumentations to define their role in this process: NanoDrop, based on UV spectroscopy, and Qubit 2.0, which uses fluorochromes specifically binding dsDNA. Quantitative PCR (qPCR) was used as the reference technique as it simultaneously assesses DNA concentration and suitability for PCR amplification. We used 17 genomic DNAs from 6 fresh-frozen (FF) tissues, 6 formalin-fixed paraffin-embedded (FFPE) tissues, 3 cell lines, and 2 commercial preparations. Intra- and inter-operator variability was negligible, and intra-methodology variability was minimal, while consistent inter-methodology divergences were observed. In fact, NanoDrop measured DNA concentrations higher than Qubit and its consistency with dsDNA quantification by qPCR was limited to high molecular weight DNA from FF samples and cell lines, where total DNA and dsDNA quantity virtually coincide. In partially degraded DNA from FFPE samples, only Qubit proved highly reproducible and consistent with qPCR measurements. Multiplex PCR amplifying 191 regions of 46 cancer-related genes was designated the downstream application, using 40 ng dsDNA from FFPE samples calculated by Qubit. All but one sample produced amplicon libraries suitable for next-generation sequencing. NanoDrop UV-spectrum verified contamination of the unsuccessful sample. In conclusion, as qPCR has high costs and is labor intensive, an alternative effective standard workflow for qualification of DNA preparations should include the sequential combination of NanoDrop and Qubit to assess the purity and quantity of dsDNA, respectively.  相似文献   

16.

Background

Zooplankton play an important role in our oceans, in biogeochemical cycling and providing a food source for commercially important fish larvae. However, difficulties in correctly identifying zooplankton hinder our understanding of their roles in marine ecosystem functioning, and can prevent detection of long term changes in their community structure. The advent of massively parallel next generation sequencing technology allows DNA sequence data to be recovered directly from whole community samples. Here we assess the ability of such sequencing to quantify richness and diversity of a mixed zooplankton assemblage from a productive time series site in the Western English Channel.

Methodology/Principle Findings

Plankton net hauls (200 µm) were taken at the Western Channel Observatory station L4 in September 2010 and January 2011. These samples were analysed by microscopy and metagenetic analysis of the 18S nuclear small subunit ribosomal RNA gene using the 454 pyrosequencing platform. Following quality control a total of 419,041 sequences were obtained for all samples. The sequences clustered into 205 operational taxonomic units using a 97% similarity cut-off. Allocation of taxonomy by comparison with the National Centre for Biotechnology Information database identified 135 OTUs to species level, 11 to genus level and 1 to order, <2.5% of sequences were classified as unknowns. By comparison a skilled microscopic analyst was able to routinely enumerate only 58 taxonomic groups.

Conclusions

Metagenetics reveals a previously hidden taxonomic richness, especially for Copepoda and hard-to-identify meroplankton such as Bivalvia, Gastropoda and Polychaeta. It also reveals rare species and parasites. We conclude that Next Generation Sequencing of 18S amplicons is a powerful tool for elucidating the true diversity and species richness of zooplankton communities. While this approach allows for broad diversity assessments of plankton it may become increasingly attractive in future if sequence reference libraries of accurately identified individuals are better populated.  相似文献   

17.
转录组研究一直是生命科学研究的一个重要方向,在第二代测序技术问世以前,已经产生了一些行之有效的转录组研究方法,但这些方法存在一定的局限性。第二代测序技术的出现不仅使转录组研究很快进入了高速发展期,同时也为遗传资源的挖掘提供了一套全新的技术平台。本文简要介绍了第二代测序技术的化学原理和特性,重点阐述了利用第二代测序技术进行转录组测序,从而在此基础上挖掘遗传资源的研究。  相似文献   

18.
19.
Demand for the commercial use of genetically modified (GM) crops has been increasing in light of the projected growth of world population to nine billion by 2050. A prerequisite of paramount importance for regulatory submissions is the rigorous safety assessment of GM crops. One of the components of safety assessment is molecular characterization at DNA level which helps to determine the copy number, integrity and stability of a transgene; characterize the integration site within a host genome; and confirm the absence of vector DNA. Historically, molecular characterization has been carried out using Southern blot analysis coupled with Sanger sequencing. While this is a robust approach to characterize the transgenic crops, it is both time- and resource-consuming. The emergence of next-generation sequencing (NGS) technologies has provided highly sensitive and cost- and labor-effective alternative for molecular characterization compared to traditional Southern blot analysis. Herein, we have demonstrated the successful application of both whole genome sequencing and target capture sequencing approaches for the characterization of single and stacked transgenic events and compared the results and inferences with traditional method with respect to key criteria required for regulatory submissions.  相似文献   

20.
Background

Epilepsy is genetically complex neurological disorder affecting millions of people of different age groups varying in its type and severity. Copy number variants (CNVs) are key players in the genetic etiology of numerous neurodevelopmental disorders and prior findings also revealed that chromosomal aberrations are more susceptible against the pathogenesis of epilepsy. Novel technologies, such as array comparative genomic hybridization (array-CGH), may help to uncover the pathogenic CNVs in patients with epilepsy.

Results

This study was carried out by high density whole genome array-CGH analysis with blood DNA samples from a cohort of 22 epilepsy patients to search for CNVs associated with epilepsy. Pathogenic rearrangements which include 6p12.1 microduplications in 5 patients covering a total region of 99.9kb and 7q32.3 microdeletions in 3 patients covering a total region of 63.9kb were detected. Two genes BMP5 and PODXL were located in the predicted duplicated and deleted regions respectively. Furthermore, these CNV findings were confirmed by qPCR.

Conclusion

We have described, for the first time, several novel CNVs/genes implicated in epilepsy in the Saudi population. These findings enable us to better describe the genetic variations in epilepsy, and could provide a foundation for understanding the critical regions of the genome which might be involved in the development of epilepsy.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号