共查询到20条相似文献,搜索用时 0 毫秒
1.
The resampling-based test, which often relies on permutation or bootstrap procedures, has been widely used for statistical hypothesis testing when the asymptotic distribution of the test statistic is unavailable or unreliable. It requires repeated calculations of the test statistic on a large number of simulated data sets for its significance level assessment, and thus it could become very computationally intensive. Here, we propose an efficient p-value evaluation procedure by adapting the stochastic approximation Markov chain Monte Carlo algorithm. The new procedure can be used easily for estimating the p-value for any resampling-based test. We show through numeric simulations that the proposed procedure can be 100-500 000 times as efficient (in term of computing time) as the standard resampling-based procedure when evaluating a test statistic with a small p-value (e.g. less than 10( - 6)). With its computational burden reduced by this proposed procedure, the versatile resampling-based test would become computationally feasible for a much wider range of applications. We demonstrate the application of the new method by applying it to a large-scale genetic association study of prostate cancer. 相似文献
2.
Semir Zeki 《Philosophical transactions of the Royal Society of London. Series B, Biological sciences》2015,370(1668)
Whether the visual brain uses a parallel or a serial, hierarchical, strategy to process visual signals, the end result appears to be that different attributes of the visual scene are perceived asynchronously—with colour leading form (orientation) by 40 ms and direction of motion by about 80 ms. Whatever the neural root of this asynchrony, it creates a problem that has not been properly addressed, namely how visual attributes that are perceived asynchronously over brief time windows after stimulus onset are bound together in the longer term to give us a unified experience of the visual world, in which all attributes are apparently seen in perfect registration. In this review, I suggest that there is no central neural clock in the (visual) brain that synchronizes the activity of different processing systems. More likely, activity in each of the parallel processing-perceptual systems of the visual brain is reset independently, making of the brain a massively asynchronous organ, just like the new generation of more efficient computers promise to be. Given the asynchronous operations of the brain, it is likely that the results of activities in the different processing-perceptual systems are not bound by physiological interactions between cells in the specialized visual areas, but post-perceptually, outside the visual brain. 相似文献
3.
Andrey Alexeyenko Bj?rn Nystedt Francesco Vezzi Ellen Sherwood Rosa Ye Bjarne Knudsen Martin Simonsen Benjamin Turner Pieter de Jong Cheng-Cang Wu Joakim Lundeberg 《BMC genomics》2014,15(1)
Background
Sampling genomes with Fosmid vectors and sequencing of pooled Fosmid libraries on the Illumina platform for massive parallel sequencing is a novel and promising approach to optimizing the trade-off between sequencing costs and assembly quality.Results
In order to sequence the genome of Norway spruce, which is of great size and complexity, we developed and applied a new technology based on the massive production, sequencing, and assembly of Fosmid pools (FP). The spruce chromosomes were sampled with ~40,000 bp Fosmid inserts to obtain around two-fold genome coverage, in parallel with traditional whole genome shotgun sequencing (WGS) of haploid and diploid genomes. Compared to the WGS results, the contiguity and quality of the FP assemblies were high, and they allowed us to fill WGS gaps resulting from repeats, low coverage, and allelic differences. The FP contig sets were further merged with WGS data using a novel software package GAM-NGS.Conclusions
By exploiting FP technology, the first published assembly of a conifer genome was sequenced entirely with massively parallel sequencing. Here we provide a comprehensive report on the different features of the approach and the optimization of the process.We have made public the input data (FASTQ format) for the set of pools used in this study:ftp://congenie.org/congenie/Nystedt_2013/Assembly/ProcessedData/FosmidPools/.(alternatively accessible via http://congenie.org/downloads).The software used for running the assembly process is available at http://research.scilifelab.se/andrej_alexeyenko/downloads/fpools/.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-439) contains supplementary material, which is available to authorized users. 相似文献4.
Conservation and management of natural populations requires accurate and inexpensive genotyping methods. Traditional microsatellite, or simple sequence repeat (SSR), marker analysis remains a popular genotyping method because of the comparatively low cost of marker development, ease of analysis and high power of genotype discrimination. With the availability of massively parallel sequencing (MPS), it is now possible to sequence microsatellite-enriched genomic libraries in multiplex pools. To test this approach, we prepared seven microsatellite-enriched, barcoded genomic libraries from diverse taxa (two conifer trees, five birds) and sequenced these on one lane of the Illumina Genome Analyzer using paired-end 80-bp reads. In this experiment, we screened 6.1 million sequences and identified 356,958 unique microreads that contained di- or trinucleotide microsatellites. Examination of four species shows that our conversion rate from raw sequences to polymorphic markers compares favourably to Sanger- and 454-based methods. The advantage of multiplexed MPS is that the staggering capacity of modern microread sequencing is spread across many libraries; this reduces sample preparation and sequencing costs to less than $400 (USD) per species. This price is sufficiently low that microsatellite libraries could be prepared and sequenced for all 1373 organisms listed as 'threatened' and 'endangered' in the United States for under $0.5 M (USD). 相似文献
5.
Since the development of technologies that can determine the base-pair sequence of DNA, the ability to sequence genes has contributed much to science and medicine. However, it has remained a relatively costly and laborious process, hindering its use as a routine biomedical tool. Recent times are seeing rapid developments in this field, both in the availability of novel sequencing platforms, as well as supporting technologies involved in processes such as targeting and data analysis. This is leading to significant reductions in the cost of sequencing a human genome and the potential for its use as a routine biomedical tool. This review is a snapshot of this rapidly moving field examining the current state of the art, forthcoming developments and some of the issues still to be resolved prior to the use of new sequencing technologies in routine clinical diagnosis. 相似文献
6.
Accuracy and quality of massively parallel DNA pyrosequencing 总被引:13,自引:0,他引:13
Background
Massively parallel pyrosequencing systems have increased the efficiency of DNA sequencing, although the published per-base accuracy of a Roche GS20 is only 96%. In genome projects, highly redundant consensus assemblies can compensate for sequencing errors. In contrast, studies of microbial diversity that catalogue differences between PCR amplicons of ribosomal RNA genes (rDNA) or other conserved gene families cannot take advantage of consensus assemblies to detect and minimize incorrect base calls. 相似文献7.
A method is described for finding all occurrences of a sequencepattern within a database of molecular sequences. Implementationof this on a massively parallel computer allows the user toperform very fast database searches using complex patterns.In particular, the software supports approximate pattern matchingwith score thresholds for either the entire pattern or specifiedelements thereof. Matches to individual elements can be linkedby variable length gaps within userspecified limits. 相似文献
8.
Mark T. Gregory Jessica A. Bertout Nolan G. Ericson Sean D. Taylor Rithun Mukherjee Harlan S. Robins Charles W. Drescher Jason H. Bielas 《Nucleic acids research》2016,44(3):e22
Next-generation sequencing (NGS) technologies have transformed genomic research and have the potential to revolutionize clinical medicine. However, the background error rates of sequencing instruments and limitations in targeted read coverage have precluded the detection of rare DNA sequence variants by NGS. Here we describe a method, termed CypherSeq, which combines double-stranded barcoding error correction and rolling circle amplification (RCA)-based target enrichment to vastly improve NGS-based rare variant detection. The CypherSeq methodology involves the ligation of sample DNA into circular vectors, which contain double-stranded barcodes for computational error correction and adapters for library preparation and sequencing. CypherSeq is capable of detecting rare mutations genome-wide as well as those within specific target genes via RCA-based enrichment. We demonstrate that CypherSeq is capable of correcting errors incurred during library preparation and sequencing to reproducibly detect mutations down to a frequency of 2.4 × 10−7 per base pair, and report the frequency and spectra of spontaneous and ethyl methanesulfonate-induced mutations across the Saccharomycescerevisiae genome. 相似文献
9.
Julia Sandberg Patrik L. Sthl Afshin Ahmadian Magnus K. Bjursell Joakim Lundeberg 《Nucleic acids research》2009,37(8):e63
Massively parallel DNA sequencing is revolutionizing genomics research throughout the life sciences. However, the reagent costs and labor requirements in current sequencing protocols are still substantial, although improvements are continuously being made. Here, we demonstrate an effective alternative to existing sample titration protocols for the Roche/454 system using Fluorescence Activated Cell Sorting (FACS) technology to determine the optimal DNA-to-bead ratio prior to large-scale sequencing. Our method, which eliminates the need for the costly pilot sequencing of samples during titration is capable of rapidly providing accurate DNA-to-bead ratios that are not biased by the quantification and sedimentation steps included in current protocols. Moreover, we demonstrate that FACS sorting can be readily used to highly enrich fractions of beads carrying template DNA, with near total elimination of empty beads and no downstream sacrifice of DNA sequencing quality. Automated enrichment by FACS is a simple approach to obtain pure samples for bead-based sequencing systems, and offers an efficient, low-cost alternative to current enrichment protocols. 相似文献
10.
11.
12.
Allison J. DePaul Erik J. Thompson Sarav S. Patel Kristin Haldeman Eric J. Sorin 《Nucleic acids research》2010,38(14):4856-4867
Conformational equilibrium within the ubiquitous GNRA tetraloop motif was simulated at the ensemble level, including 10 000 independent all-atom molecular dynamics trajectories totaling over 110 µs of simulation time. This robust sampling reveals a highly dynamic structure comprised of 15 conformational microstates. We assemble a Markov model that includes transitions ranging from the nanosecond to microsecond timescales and is dominated by six key loop conformations that contribute to fluctuations around the native state. Mining of the Protein Data Bank provides an abundance of structures in which GNRA tetraloops participate in tertiary contact formation. Most predominantly observed in the experimental data are interactions of the native loop structure within the minor groove of adjacent helical regions. Additionally, a second trend is observed in which the tetraloop assumes non-native conformations while participating in multiple tertiary contacts, in some cases involving multiple possible loop conformations. This tetraloop flexibility can act to counterbalance the energetic penalty associated with assuming non-native loop structures in forming tertiary contacts. The GNRA motif has thus evolved not only to readily participate in simple tertiary interactions involving native loop structure, but also to easily adapt tetraloop secondary conformation in order to participate in larger, more complex tertiary interactions. 相似文献
13.
S Gombar HJ Jung F Dong B Calder G Atzmon N Barzilai XL Tian J Pothof JH Hoeijmakers J Campisi J Vijg Y Suh 《BMC genomics》2012,13(1):353
ABSTRACT: BACKGROUND: MicroRNAs (miRNAs) are small, non-coding RNAs that regulate gene expression and play a critical role in development, homeostasis, and disease. Despite their demonstrated roles in age-associated pathologies, little is known about the role of miRNAs in human aging and longevity. RESULTS: We employed massively parallel sequencing technology to identify miRNAs expressed in B-cells from Ashkenazi Jewish centenarians, i.e., those living to a hundred and a human model of exceptional longevity, and younger controls without a family history of longevity. With data from 26.7 million reads comprising 9.4x108 bp from 3 centenarian and 3 control individuals, we discovered a total of 276 known miRNAs and 8 unknown miRNAs ranging several orders of magnitude in expression levels, a typical characteristics of saturated miRNA-sequencing. A total of 22 miRNAs were found to be significantly upregulated, with only 2 miRNAs downregulated, in centenarians as compared to controls. Gene Ontology analysis of the predicted and validated targets of the 24 differentially expressed miRNAs indicated enrichment of functional pathways involved in cell metabolism, cell cycle, cell signaling, and cell differentiation. A cross sectional expression analysis of the differentially expressed miRNAs in B-cells from Ashkenazi Jewish individuals between the 50th and 100th years of age indicated that expression levels of miR-363* declined significantly with age. Centenarians, however, maintained the youthful expression level. This result suggests that miR-363* may be a candidate longevity-associated miRNA. CONCLUSION: Our comprehensive miRNA data provide a resource for further studies to identify genetic pathways associated with aging and longevity in humans. 相似文献
14.
Insights into nucleic acid conformational dynamics from massively parallel stochastic simulations
下载免费PDF全文

The helical hairpin is one of the most ubiquitous and elementary secondary structural motifs in nucleic acids, capable of serving functional roles and participating in long-range tertiary contacts. Yet the self-assembly of these structures has not been well-characterized at the atomic level. With this in mind, the dynamics of nucleic acid hairpin formation and disruption have been studied using a novel computational tool: large-scale, parallel, atomistic molecular dynamics simulation employing an inhomogeneous distributed computer consisting of more than 40,000 processors. Using multiple methodologies, over 500 micro s of atomistic simulation time has been collected for a large ensemble of hairpins (sequence 5'-GGGC[GCAA]GCCU-3'), allowing characterization of rare events not previously observable in simulation. From uncoupled ensemble dynamics simulations in unperturbed folding conditions, we report on 1), competing pathways between the folded and unfolded regions of the conformational space; 2), observed nonnative stacking and basepairing traps; and 3), a helix unwinding-rewinding mode that is differentiated from the unfolding and folding dynamics. A heterogeneous transition state ensemble is characterized structurally through calculations of conformer-specific folding probabilities and a multiplexed replica exchange stochastic dynamics algorithm is used to derive an approximate folding landscape. A comparison between the observed folding mechanism and that of a peptide beta-hairpin analog suggests that although native topology defines the character of the folding landscape, the statistical weighting of potential folding pathways is determined by the chemical nature of the polymer. 相似文献
15.
Sánchez-León N Arteaga-Vázquez M Alvarez-Mejía C Mendiola-Soto J Durán-Figueroa N Rodríguez-Leal D Rodríguez-Arévalo I García-Campayo V García-Aguilar M Olmedo-Monfil V Arteaga-Sánchez M de la Vega OM Nobuta K Vemaraju K Meyers BC Vielle-Calzada JP 《Journal of experimental botany》2012,63(10):3829-3842
16.
Dematté L 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2012,9(3):655-667
Space is a very important aspect in the simulation of biochemical systems; recently, the need for simulation algorithms able to cope with space is becoming more and more compelling. Complex and detailed models of biochemical systems need to deal with the movement of single molecules and particles, taking into consideration localized fluctuations, transportation phenomena, and diffusion. A common drawback of spatial models lies in their complexity: models can become very large, and their simulation could be time consuming, especially if we want to capture the systems behavior in a reliable way using stochastic methods in conjunction with a high spatial resolution. In order to deliver the promise done by systems biology to be able to understand a system as whole, we need to scale up the size of models we are able to simulate, moving from sequential to parallel simulation algorithms. In this paper, we analyze Smoldyn, a widely diffused algorithm for stochastic simulation of chemical reactions with spatial resolution and single molecule detail, and we propose an alternative, innovative implementation that exploits the parallelism of Graphics Processing Units (GPUs). The implementation executes the most computational demanding steps (computation of diffusion, unimolecular, and bimolecular reaction, as well as the most common cases of molecule-surface interaction) on the GPU, computing them in parallel on each molecule of the system. The implementation offers good speed-ups and real time, high quality graphics output 相似文献
17.
Massively Parallel Sequencing (MPS) allows sequencing of entire exomes and genomes to now be done at reasonable cost, and its utility for identifying genes responsible for rare Mendelian disorders has been demonstrated. However, for a complex disease, study designs need to accommodate substantial degrees of locus, allelic, and phenotypic heterogeneity, as well as complex relationships between genotype and phenotype. Such considerations include careful selection of samples for sequencing and a well-developed strategy for identifying the few "true" disease susceptibility genes from among the many irrelevant genes that will be found to harbor rare variants. To examine these issues we have performed simulation-based analyses in order to compare several strategies for MPS sequencing in complex disease. Factors examined include genetic architecture, sample size, number and relationship of individuals selected for sequencing, and a variety of filters based on variant type, multiple observations of genes and concordance of genetic variants within pedigrees. A two-stage design was assumed where genes from the MPS analysis of high-risk families are evaluated in a secondary screening phase of a larger set of probands with more modest family histories. Designs were evaluated using a cost function that assumes the cost of sequencing the whole exome is 400 times that of sequencing a single candidate gene. Results indicate that while requiring variants to be identified in multiple pedigrees and/or in multiple individuals in the same pedigree are effective strategies for reducing false positives, there is a danger of over-filtering so that most true susceptibility genes are missed. In most cases, sequencing more than two individuals per pedigree results in reduced power without any benefit in terms of reduced overall cost. Further, our results suggest that although no single strategy is optimal, simulations can provide important guidelines for study design. 相似文献
18.
YM Lo 《Open biology》2012,2(6):120086
The presence of foetal DNA in the plasma of pregnant women has opened up new possibilities for non-invasive prenatal diagnosis. The use of circulating foetal DNA for the non-invasive prenatal detection of foetal chromosomal aneuploidies is challenging as foetal DNA represents a minor fraction of maternal plasma DNA. In 2007, it was shown that single molecule counting methods would allow the detection of the presence of a trisomic foetus, as long as enough molecules were counted. With the advent of massively parallel sequencing, millions or billions of DNA molecules can be readily counted. Using massively parallel sequencing, foetal trisomies 21, 13 and 18 have been detected from maternal plasma. Recently, large-scale clinical studies have validated the robustness of this approach for the prenatal detection of foetal chromosomal aneuploidies. A proof-of-concept study has also shown that a genome-wide genetic and mutational map of a foetus can be constructed from the maternal plasma DNA sequencing data. These developments suggest that the analysis of foetal DNA in maternal plasma would play an increasingly important role in future obstetrics practice. It is thus a priority that the ethical, social and legal issues regarding this technology be systematically studied. 相似文献
19.
20.
We present an implementation of McCaskill's algorithm for computing the base pair probabilities of an RNA molecule for massively parallel message passing architectures. The program can be used to routinely fold RNA sequences of more than 10,000 nucleotides. Applications to complete viral genomes are discussed. 相似文献