期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

ANGSD: Analysis of Next Generation Sequencing Data

Thorfinn Sand Korneliussen Anders Albrechtsen Rasmus Nielsen 《BMC bioinformatics》2014,15(1)

Background

High-throughput DNA sequencing technologies are generating vast amounts of data. Fast, flexible and memory efficient implementations are needed in order to facilitate analyses of thousands of samples simultaneously.

Results

We present a multithreaded program suite called ANGSD. This program can calculate various summary statistics, and perform association mapping and population genetic analyses utilizing the full information in next generation sequencing data by working directly on the raw sequencing data or by using genotype likelihoods.

Conclusions

The open source c/c++ program ANGSD is available at http://www.popgen.dk/angsd. The program is tested and validated on GNU/Linux systems. The program facilitates multiple input formats including BAM and imputed beagle genotype probability files. The program allow the user to choose between combinations of existing methods and can perform analysis that is not implemented elsewhere.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0356-4) contains supplementary material, which is available to authorized users. 相似文献

2.

Estimating Individual Admixture Proportions from Next Generation Sequencing Data

Line Skotte Thorfinn Sand Korneliussen Anders Albrechtsen 《Genetics》2013,195(3):693-702

Inference of population structure and individual ancestry is important both for population genetics and for association studies. With next generation sequencing technologies it is possible to obtain genetic data for all accessible genetic variations in the genome. Existing methods for admixture analysis rely on known genotypes. However, individual genotypes cannot be inferred from low-depth sequencing data without introducing errors. This article presents a new method for inferring an individual’s ancestry that takes the uncertainty introduced in next generation sequencing data into account. This is achieved by working directly with genotype likelihoods that contain all relevant information of the unobserved genotypes. Using simulations as well as publicly available sequencing data, we demonstrate that the presented method has great accuracy even for very low-depth data. At the same time, we demonstrate that applying existing methods to genotypes called from the same data can introduce severe biases. The presented method is implemented in the NGSadmix software available at http://www.popgen.dk/software. 相似文献

3.

Pediatrics: Sequencing the Next Generation

《Cell》2012,148(6):1073-1075

相似文献

4.

Correction: Assessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data

The PLOS ONE Staff 《PloS one》2014,9(11)

相似文献

5.

Genome-Wide SNP Calling Using Next Generation Sequencing Data in Tomato

Ji-Eun Kim Sang-Keun Oh Jeong-Hee Lee Bo-Mi Lee Sung-Hwan Jo 《Molecules and cells》2014,37(1):36-42

相似文献

6.

Quantitative and Sensitive Detection of GNAS Mutations Causing McCune-Albright Syndrome with Next Generation Sequencing

Satoshi Narumi Kumihiro Matsuo Tomohiro Ishii Yusuke Tanahashi Tomonobu Hasegawa 《PloS one》2013,8(3)

Somatic activating GNAS mutations cause McCune-Albright syndrome (MAS). Owing to low mutation abundance, mutant-specific enrichment procedures, such as the peptide nucleic acid (PNA) method, are required to detect mutations in peripheral blood. Next generation sequencing (NGS) can analyze millions of PCR amplicons independently, thus it is expected to detect low-abundance GNAS mutations quantitatively. In the present study, we aimed to develop an NGS-based method to detect low-abundance somatic GNAS mutations. PCR amplicons encompassing exons 8 and 9 of GNAS, in which most activating mutations occur, were sequenced on the MiSeq instrument. As expected, our NGS-based method could sequence the GNAS locus with very high read depth (approximately 100,000) and low error rate. A serial dilution study with use of cloned mutant and wildtype DNA samples showed a linear correlation between dilution and measured mutation abundance, indicating the reliability of quantification of the mutation. Using the serially diluted samples, the detection limits of three mutation detection methods (the PNA method, NGS, and combinatory use of PNA and NGS [PNA-NGS]) were determined. The lowest detectable mutation abundance was 1% for the PNA method, 0.03% for NGS and 0.01% for PNA-NGS. Finally, we analyzed 16 MAS patient-derived leukocytic DNA samples with the three methods, and compared the mutation detection rate of them. Mutation detection rate of the PNA method, NGS and PNA-NGS in 16 patient-derived peripheral blood samples were 56%, 63% and 75%, respectively. In conclusion, NGS can detect somatic activating GNAS mutations quantitatively and sensitively from peripheral blood samples. At present, the PNA-NGS method is likely the most sensitive method to detect low-abundance GNAS mutation. 相似文献

7.

AdapterRemoval: Easy Cleaning of Next Generation Sequencing Reads

Lindgreen S 《BMC research notes》2012,5(1):337

ABSTRACT: BACKGROUND: With the advent of next-generation sequencing there is an increased demand for tools to pre-process and handle the vast amounts of data generated. One recurring problem is adapter contamination in the reads, i.e. the partial or complete sequencing of adapter sequences. These adapter sequences have to be removed as they can hinder correct mapping of the reads and influence SNP calling and other downstream analyses. FINDINGS: We present a tool called AdapterRemoval which is able to pre-process both single and paired-end data. The program locates and removes adapter residues from the reads, it is able to combine paired reads if they overlap, and it can optionally trim low-quality nucleotides. Furthermore, it can look for adapter sequence in both the 5' and 3' ends of the reads. This is a flexible method that can be tuned to accommodate different experimental settings and sequencing platforms producing FASTQ files. AdapterRemoval is shown to be good at trimming adapters from both single-end and paired-end data. CONCLUSIONS: AdapterRemoval is a comprehensive tool for analyzing next-generation sequencing data. It exhibits good performance both in terms of sensitivity and specificity. AdapterRemoval has already been used in various large projects and it is possible to extend it further to accommodate application-specific biases in the data. 相似文献

8.

Connectivity Mapping for Candidate Therapeutics Identification Using Next Generation Sequencing RNA-Seq Data

Darragh G. McArt Philip D. Dunne Jaine K. Blayney Manuel Salto-Tellez Sandra Van Schaeybroeck Peter W. Hamilton Shu-Dong Zhang 《PloS one》2013,8(6)

The advent of next generation sequencing technologies (NGS) has expanded the area of genomic research, offering high coverage and increased sensitivity over older microarray platforms. Although the current cost of next generation sequencing is still exceeding that of microarray approaches, the rapid advances in NGS will likely make it the platform of choice for future research in differential gene expression. Connectivity mapping is a procedure for examining the connections among diseases, genes and drugs by differential gene expression initially based on microarray technology, with which a large collection of compound-induced reference gene expression profiles have been accumulated. In this work, we aim to test the feasibility of incorporating NGS RNA-Seq data into the current connectivity mapping framework by utilizing the microarray based reference profiles and the construction of a differentially expressed gene signature from a NGS dataset. This would allow for the establishment of connections between the NGS gene signature and those microarray reference profiles, alleviating the associated incurring cost of re-creating drug profiles with NGS technology. We examined the connectivity mapping approach on a publicly available NGS dataset with androgen stimulation of LNCaP cells in order to extract candidate compounds that could inhibit the proliferative phenotype of LNCaP cells and to elucidate their potential in a laboratory setting. In addition, we also analyzed an independent microarray dataset of similar experimental settings. We found a high level of concordance between the top compounds identified using the gene signatures from the two datasets. The nicotine derivative cotinine was returned as the top candidate among the overlapping compounds with potential to suppress this proliferative phenotype. Subsequent lab experiments validated this connectivity mapping hit, showing that cotinine inhibits cell proliferation in an androgen dependent manner. Thus the results in this study suggest a promising prospect of integrating NGS data with connectivity mapping. 相似文献

9.

NGS-QCbox and Raspberry for Parallel,Automated and Rapid Quality Control Analysis of Large-Scale Next Generation Sequencing (Illumina) Data

Mohan A. V. S. K. Katta Aamir W. Khan Dadakhalandar Doddamani Mahendar Thudi Rajeev K. Varshney 《PloS one》2015,10(10)

相似文献

10.

A Pipeline for the Development of Microsatellite Markers using Next Generation Sequencing Data

Adriana Maria Antunes Júlio Gabriel Nunes Stival Cíntia Pelegrineti Targueta Mariana Pires de Campos Telles Thannya Nascimentos Soares 《Current Genomics》2022,23(3):175

Background: Also known as Simple Sequence Repetitions (SSRs), microsatellites are profoundly informative molecular markers and powerful tools in genetics and ecology studies on plants.Objective: This research presents a workflow for developing microsatellite markers using genome skimming.Methods: The pipeline was proposed in several stages that must be performed sequentially: obtaining DNA sequences, identifying microsatellite regions, designing primers, and selecting candidate microsatellite regions to develop the markers. Our pipeline efficiency was analyzed using Illumina sequencing data from the non-model tree species Pterodon emarginatus Vog.Results: The pipeline revealed 4,382 microsatellite regions and drew 7,411 pairs of primers for P. emarginatus. However, a much larger number of microsatellite regions with the potential to develop markers were discovered from our pipeline. We selected 50 microsatellite regions with high potential for developing markers and organized 29 microsatellite regions in sets for multiplex PCR.Conclusion: The proposed pipeline is a powerful tool for fast and efficient development of microsatellite markers on a large scale in several species, especially nonmodel plant species. 相似文献

11.

Next Generation Sequencing of Acute Myeloid Leukemia: Influencing Prognosis

Ilyas Asad Muhammad Ahmad Sultan Faheem Muhammad Naseer Muhammad Imran Kumosani Taha A Al-Qahtani Muhammad Hussain Gari Mamdooh Ahmed Farid 《BMC genomics》2015,16(1):1-7

Background

Epilepsy is genetically complex neurological disorder affecting millions of people of different age groups varying in its type and severity. Copy number variants (CNVs) are key players in the genetic etiology of numerous neurodevelopmental disorders and prior findings also revealed that chromosomal aberrations are more susceptible against the pathogenesis of epilepsy. Novel technologies, such as array comparative genomic hybridization (array-CGH), may help to uncover the pathogenic CNVs in patients with epilepsy.

Results

This study was carried out by high density whole genome array-CGH analysis with blood DNA samples from a cohort of 22 epilepsy patients to search for CNVs associated with epilepsy. Pathogenic rearrangements which include 6p12.1 microduplications in 5 patients covering a total region of 99.9kb and 7q32.3 microdeletions in 3 patients covering a total region of 63.9kb were detected. Two genes BMP5 and PODXL were located in the predicted duplicated and deleted regions respectively. Furthermore, these CNV findings were confirmed by qPCR.

Conclusion

We have described, for the first time, several novel CNVs/genes implicated in epilepsy in the Saudi population. These findings enable us to better describe the genetic variations in epilepsy, and could provide a foundation for understanding the critical regions of the genome which might be involved in the development of epilepsy.

相似文献

12.

BG7: A New Approach for Bacterial Genome Annotation Designed for Next Generation Sequencing Data

Pablo Pareja-Tobes Marina Manrique Eduardo Pareja-Tobes Eduardo Pareja Raquel Tobes 《PloS one》2012,7(11)

BG7 is a new system for de novo bacterial, archaeal and viral genome annotation based on a new approach specifically designed for annotating genomes sequenced with next generation sequencing technologies. The system is versatile and able to annotate genes even in the step of preliminary assembly of the genome. It is especially efficient detecting unexpected genes horizontally acquired from bacterial or archaeal distant genomes, phages, plasmids, and mobile elements. From the initial phases of the gene annotation process, BG7 exploits the massive availability of annotated protein sequences in databases. BG7 predicts ORFs and infers their function based on protein similarity with a wide set of reference proteins, integrating ORF prediction and functional annotation phases in just one step. BG7 is especially tolerant to sequencing errors in start and stop codons, to frameshifts, and to assembly or scaffolding errors. The system is also tolerant to the high level of gene fragmentation which is frequently found in not fully assembled genomes. BG7 current version – which is developed in Java, takes advantage of Amazon Web Services (AWS) cloud computing features, but it can also be run locally in any operating system. BG7 is a fast, automated and scalable system that can cope with the challenge of analyzing the huge amount of genomes that are being sequenced with NGS technologies. Its capabilities and efficiency were demonstrated in the 2011 EHEC Germany outbreak in which BG7 was used to get the first annotations right the next day after the first entero-hemorrhagic E. coli genome sequences were made publicly available. The suitability of BG7 for genome annotation has been proved for Illumina, 454, Ion Torrent, and PacBio sequencing technologies. Besides, thanks to its plasticity, our system could be very easily adapted to work with new technologies in the future. 相似文献

13.

An Integrated SNP Mining and Utilization (ISMU) Pipeline for Next Generation Sequencing Data

Sarwar Azam Abhishek Rathore Trushar M. Shah Mohan Telluri BhanuPrakash Amindala Pradeep Ruperao Mohan A. V. S. K. Katta Rajeev K. Varshney 《PloS one》2014,9(7)

相似文献

14.

Identification of Optimum Sequencing Depth Especially for De Novo Genome Assembly of Small Genomes Using Next Generation Sequencing Data

Aarti Desai Veer Singh Marwah Akshay Yadav Vineet Jha Kishor Dhaygude Ujwala Bangar Vivek Kulkarni Abhay Jere 《PloS one》2013,8(4)

Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6–40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources. 相似文献

15.

Collection and Extraction of Saliva DNA for Next Generation Sequencing

Michael R. Goode Soo Yeon Cheong Ning Li William C. Ray Christopher W. Bartlett 《Journal of visualized experiments : JoVE》2014,(90)

The preferred source of DNA in human genetics research is blood, or cell lines derived from blood, as these sources yield large quantities of high quality DNA. However, DNA extraction from saliva can yield high quality DNA with little to no degradation/fragmentation that is suitable for a variety of DNA assays without the expense of a phlebotomist and can even be acquired through the mail. However, at present, no saliva DNA collection/extraction protocols for next generation sequencing have been presented in the literature. This protocol optimizes parameters of saliva collection/storage and DNA extraction to be of sufficient quality and quantity for DNA assays with the highest standards, including microarray genotyping and next generation sequencing. 相似文献

16.

Identification of Hepatotropic Viruses from Plasma Using Deep Sequencing: A Next Generation Diagnostic Tool

John Law Juan Jovel Jordan Patterson Glenn Ford Sandra O’keefe Weiwei Wang Bo Meng Deyong Song Yong Zhang Zhijian Tian Shawn T. Wasilenko Mandana Rahbari Troy Mitchell Tracy Jordan Eric Carpenter Andrew L. Mason Gane Ka-Shu Wong 《PloS one》2013,8(4)

We conducted an unbiased metagenomics survey using plasma from patients with chronic hepatitis B, chronic hepatitis C, autoimmune hepatitis (AIH), non-alcoholic steatohepatitis (NASH), and patients without liver disease (control). RNA and DNA libraries were sequenced from plasma filtrates enriched in viral particles to catalog virus populations. Hepatitis viruses were readily detected at high coverage in patients with chronic viral hepatitis B and C, but only a limited number of sequences resembling other viruses were found. The exception was a library from a patient diagnosed with hepatitis C virus (HCV) infection that contained multiple sequences matching GB virus C (GBV-C). Abundant GBV-C reads were also found in plasma from patients with AIH, whereas Torque teno virus (TTV) was found at high frequency in samples from patients with AIH and NASH. After taxonomic classification of sequences by BLASTn, a substantial fraction in each library, ranging from 35% to 76%, remained unclassified. These unknown sequences were assembled into scaffolds along with virus, phage and endogenous retrovirus sequences and then analyzed by BLASTx against the non-redundant protein database. Nearly the full genome of a heretofore-unknown circovirus was assembled and many scaffolds that encoded proteins with similarity to plant, insect and mammalian viruses. The presence of this novel circovirus was confirmed by PCR. BLASTx also identified many polypeptides resembling nucleo-cytoplasmic large DNA viruses (NCLDV) proteins. We re-evaluated these alignments with a profile hidden Markov method, HHblits, and observed inconsistencies in the target proteins reported by the different algorithms. This suggests that sequence alignments are insufficient to identify NCLDV proteins, especially when these alignments are only to small portions of the target protein. Nevertheless, we have now established a reliable protocol for the identification of viruses in plasma that can also be adapted to other patient samples such as urine, bile, saliva and other body fluids. 相似文献

17.

Impact of Next Generation Sequencing Techniques in Food Microbiology

Baltasar Mayo Caio T. C. C Rachid ángel Alegría Analy M. O Leite Raquel S Peixoto Susana Delgado 《Current Genomics》2014,15(4):293-309

Understanding the Maxam-Gilbert and Sanger sequencing as the first generation, in recent years there has been an explosion of newly-developed sequencing strategies, which are usually referred to as next generation sequencing (NGS) techniques. NGS techniques have high-throughputs and produce thousands or even millions of sequences at the same time. These sequences allow for the accurate identification of microbial taxa, including uncultivable organisms and those present in small numbers. In specific applications, NGS provides a complete inventory of all microbial operons and genes present or being expressed under different study conditions. NGS techniques are revolutionizing the field of microbial ecology and have recently been used to examine several food ecosystems. After a short introduction to the most common NGS systems and platforms, this review addresses how NGS techniques have been employed in the study of food microbiota and food fermentations, and discusses their limits and perspectives. The most important findings are reviewed, including those made in the study of the microbiota of milk, fermented dairy products, and plant-, meat- and fish-derived fermented foods. The knowledge that can be gained on microbial diversity, population structure and population dynamics via the use of these technologies could be vital in improving the monitoring and manipulation of foods and fermented food products. They should also improve their safety. 相似文献

18.

Host Subtraction,Filtering and Assembly Validations for Novel Viral Discovery Using Next Generation Sequencing Data

Gordon M. Daly Richard M. Leggett William Rowe Samuel Stubbs Maxim Wilkinson Ricardo H. Ramirez-Gonzalez Mario Caccamo William Bernal Jonathan L. Heeney 《PloS one》2015,10(6)

The use of next generation sequencing (NGS) to identify novel viral sequences from eukaryotic tissue samples is challenging. Issues can include the low proportion and copy number of viral reads and the high number of contigs (post-assembly), making subsequent viral analysis difficult. Comparison of assembly algorithms with pre-assembly host-mapping subtraction using a short-read mapping tool, a k-mer frequency based filter and a low complexity filter, has been validated for viral discovery with Illumina data derived from naturally infected liver tissue and simulated data. Assembled contig numbers were significantly reduced (up to 99.97%) by the application of these pre-assembly filtering methods. This approach provides a validated method for maximizing viral contig size as well as reducing the total number of assembled contigs that require down-stream analysis as putative viral nucleic acids. 相似文献

19.

Natural Selection and Functional Potentials of Human Noncoding Elements Revealed by Analysis of Next Generation Sequencing Data

Pankaj Jha Dongsheng Lu Shuhua Xu 《PloS one》2015,10(6)

Noncoding DNA sequences (NCS) have attracted much attention recently due to their functional potentials. Here we attempted to reveal the functional roles of noncoding sequences from the point of view of natural selection that typically indicates the functional potentials of certain genomic elements. We analyzed nearly 37 million single nucleotide polymorphisms (SNPs) of Phase I data of the 1000 Genomes Project. We estimated a series of key parameters of population genetics and molecular evolution to characterize sequence variations of the noncoding genome within and between populations, and identified the natural selection footprints in NCS in worldwide human populations. Our results showed that purifying selection is prevalent and there is substantial constraint of variations in NCS, while positive selectionis more likely to be specific to some particular genomic regions and regional populations. Intriguingly, we observed larger fraction of non-conserved NCS variants with lower derived allele frequency in the genome, indicating possible functional gain of non-conserved NCS. Notably, NCS elements are enriched for potentially functional markers such as eQTLs, TF motif, and DNase I footprints in the genome. More interestingly, some NCS variants associated with diseases such as Alzheimer''s disease, Type 1 diabetes, and immune-related bowel disorder (IBD) showed signatures of positive selection, although the majority of NCS variants, reported as risk alleles by genome-wide association studies, showed signatures of negative selection. Our analyses provided compelling evidence of natural selection forces on noncoding sequences in the human genome and advanced our understanding of their functional potentials that play important roles in disease etiology and human evolution. 相似文献

20.

Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow

Amit Kawalia Susanne Motameny Stephan Wonczak Holger Thiele Lech Nieroda Kamel Jabbari Stefan Borowski Vishal Sinha Wilfried Gunia Ulrich Lang Viktor Achter Peter Nürnberg 《PloS one》2015,10(5)

Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files. 相似文献