首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A massively parallel Genetic Algorithm (GA) has been applied to RNA sequence folding on three different computer architectures. The GA, an evolution-like algorithm that is applied to a large population of RNA structures based on a pool of helical stems derived from an RNA sequence, evolves this population in parallel. The algorithm was originally designed and developed for a 16384 processor SIMD (Single Instruction Multiple Data) MasPar MP-2. More recently it has been adapted to a 64 processor MIMD (Multiple Instruction Multiple Data) SGI ORIGIN 2000, and a 512 processor MIMD CRAY T3E. The MIMD version of the algorithm raises issues concerning RNA structure data-layout and processor communication. In addition, the effects of population variation on the predicted results are discussed. Also presented are the scaling properties of the algorithm from the perspective of the number of physical processors utilized and the number of virtual processors (RNA structures) operated upon.  相似文献   

2.

Background

Next-generation sequencing technology provides a means to study genetic exchange at a higher resolution than was possible using earlier technologies. However, this improvement presents challenges as the alignments of next generation sequence data to a reference genome cannot be directly used as input to existing detection algorithms, which instead typically use multiple sequence alignments as input. We therefore designed a software suite called REDHORSE that uses genomic alignments, extracts genetic markers, and generates multiple sequence alignments that can be used as input to existing recombination detection algorithms. In addition, REDHORSE implements a custom recombination detection algorithm that makes use of sequence information and genomic positions to accurately detect crossovers. REDHORSE is a portable and platform independent suite that provides efficient analysis of genetic crosses based on Next-generation sequencing data.

Results

We demonstrated the utility of REDHORSE using simulated data and real Next-generation sequencing data. The simulated dataset mimicked recombination between two known haploid parental strains and allowed comparison of detected break points against known true break points to assess performance of recombination detection algorithms. A newly generated NGS dataset from a genetic cross of Toxoplasma gondii allowed us to demonstrate our pipeline. REDHORSE successfully extracted the relevant genetic markers and was able to transform the read alignments from NGS to the genome to generate multiple sequence alignments. Recombination detection algorithm in REDHORSE was able to detect conventional crossovers and double crossovers typically associated with gene conversions whilst filtering out artifacts that might have been introduced during sequencing or alignment. REDHORSE outperformed other commonly used recombination detection algorithms in finding conventional crossovers. In addition, REDHORSE was the only algorithm that was able to detect double crossovers.

Conclusion

REDHORSE is an efficient analytical pipeline that serves as a bridge between genomic alignments and existing recombination detection algorithms. Moreover, REDHORSE is equipped with a recombination detection algorithm specifically designed for Next-generation sequencing data. REDHORSE is portable, platform independent Java based utility that provides efficient analysis of genetic crosses based on Next-generation sequencing data. REDHORSE is available at http://redhorse.sourceforge.net/.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1309-7) contains supplementary material, which is available to authorized users.  相似文献   

3.
The maximum likelihood (ML) method of phylogenetic tree construction is not as widely used as other tree construction methods (e.g., parsimony, neighbor-joining) because of the prohibitive amount of time required to find the ML tree when the number of sequences under consideration is large. To overcome this difficulty, we propose a stochastic search strategy for estimation of the ML tree that is based on a simulated annealing algorithm. The algorithm works by moving through tree space by way of a "local rearrangement" strategy so that topologies that improve the likelihood are always accepted, whereas those that decrease the likelihood are accepted with a probability that is related to the proportionate decrease in likelihood. Besides greatly reducing the time required to estimate the ML tree, the stochastic search strategy is less likely to become trapped in local optima than are existing algorithms for ML tree estimation. We demonstrate the success of the modified simulated annealing algorithm by comparing it with two existing algorithms (Swofford's PAUP* and Felsenstein's DNAMLK) for several theoretical and real data examples.  相似文献   

4.
McVean G  Awadalla P  Fearnhead P 《Genetics》2002,160(3):1231-1241
Determining the amount of recombination in the genealogical history of a sample of genes is important to both evolutionary biology and medical population genetics. However, recurrent mutation can produce patterns of genetic diversity similar to those generated by recombination and can bias estimates of the population recombination rate. Hudson 2001 has suggested an approximate-likelihood method based on coalescent theory to estimate the population recombination rate, 4N(e)r, under an infinite-sites model of sequence evolution. Here we extend the method to the estimation of the recombination rate in genomes, such as those of many viruses and bacteria, where the rate of recurrent mutation is high. In addition, we develop a powerful permutation-based method for detecting recombination that is both more powerful than other permutation-based methods and robust to misspecification of the model of sequence evolution. We apply the method to sequence data from viruses, bacteria, and human mitochondrial DNA. The extremely high level of recombination detected in both HIV1 and HIV2 sequences demonstrates that recombination cannot be ignored in the analysis of viral population genetic data.  相似文献   

5.
W Banzhaf 《Bio Systems》1989,22(2):163-172
We present a model of optimization of cost functions by a population of parallel processors and argue that especially diploid recombination of gene strings is a promising recipe for optimization which nature proliferates. Based on a simulated evolutionary search strategy diploidy is introduced as a means for maintaining variability in computational problems with large numbers of local extrema. A differentiation into genotypes and phenotypes is performed. The applied strategy is compared to some traditional algorithms simulating evolution on the basis of two sample cost functions.  相似文献   

6.
Paul JS  Steinrücken M  Song YS 《Genetics》2011,187(4):1115-1128
The sequentially Markov coalescent is a simplified genealogical process that aims to capture the essential features of the full coalescent model with recombination, while being scalable in the number of loci. In this article, the sequentially Markov framework is applied to the conditional sampling distribution (CSD), which is at the core of many statistical tools for population genetic analyses. Briefly, the CSD describes the probability that an additionally sampled DNA sequence is of a certain type, given that a collection of sequences has already been observed. A hidden Markov model (HMM) formulation of the sequentially Markov CSD is developed here, yielding an algorithm with time complexity linear in both the number of loci and the number of haplotypes. This work provides a highly accurate, practical approximation to a recently introduced CSD derived from the diffusion process associated with the coalescent with recombination. It is empirically demonstrated that the improvement in accuracy of the new CSD over previously proposed HMM-based CSDs increases substantially with the number of loci. The framework presented here can be adopted in a wide range of applications in population genetics, including imputing missing sequence data, estimating recombination rates, and inferring human colonization history.  相似文献   

7.
MOTIVATION: While genetic properties such as linkage disequilibrium (LD) and population structure are closely related under a common inheritance process, the statistical methodologies developed so far mostly deal with LD analysis and structural inference separately, using specialized models that do not capture their statistical and genetic relationships. Also, most of these approaches ignore the inherent uncertainty in the genetic complexity of the data and rely on inflexible models built on a closed genetic space. These limitations may make it difficult to infer detailed and consistent structural information from rich genomic data such as populational single nucleotide polymorphisms (SNP) profiles. RESULTS: We propose a new model-based approach to address these issues through joint inference of population structure and recombination events under a non-parametric Bayesian framework; we present Spectrum, an efficient implementation based on our new model. We validated Spectrum on simulated data and applied it to two real SNP datasets, including single-population Daly data and the four-population HapMap data. Our method performs well relative to LDhat 2.0 in estimating the recombination rates and hotspots on these datasets. More interestingly, it generates an ancestral spectrum for representing population structures which not only displays sub-structure based on population founders but also reveals details of the genetic diversity of each individual. It offers an alternative view of the population structures to that offered by Structure 2.1, which ignores chromosome-level mutation and recombination with respect to founders.  相似文献   

8.
The variation of the recombination rate along chromosomal DNA is one of the important determinants of the patterns of linkage disequilibrium. A number of inferential methods have been developed which estimate the recombination rate and its variation from population genetic data. The majority of these methods are based on modelling the genealogical process underlying a sample of DNA sequences and thus explicitly include a model of the demographic process. Here we propose a different inferential procedure based on a previously introduced framework where recombination is modelled as a point process along a DNA sequence. The approach infers regions containing putative hotspots based on the inferred minimum number of recombination events; it thus depends only indirectly on the underlying population demography. A Poisson point process model with local rates is then used to infer patterns of recombination rate estimation in a fully Bayesian framework. We illustrate this new approach by applying it to several population genetic datasets, including a region with an experimentally confirmed recombination hotspot.  相似文献   

9.

Background

Coalescent simulation is pivotal for understanding population evolutionary models and demographic histories, as well as for developing novel analytical methods for genetic association studies for DNA sequence data. A plethora of coalescent simulators are developed, but selecting the most appropriate program remains challenging.

Results

We extensively compared performances of five widely used coalescent simulators – Hudson’s ms, msHOT, MaCS, Simcoal2, and fastsimcoal, to provide a practical guide considering three crucial factors, 1) speed, 2) scalability and 3) recombination hotspot position and intensity accuracy. Although ms represents a popular standard coalescent simulator, it lacks the ability to simulate sequences with recombination hotspots. An extended program msHOT has compensated for the deficiency of ms by incorporating recombination hotspots and gene conversion events at arbitrarily chosen locations and intensities, but remains limited in simulating long stretches of DNA sequences. Simcoal2, based on a discrete generation-by-generation approach, could simulate more complex demographic scenarios, but runs comparatively slow. MaCS and fastsimcoal, both built on fast, modified sequential Markov coalescent algorithms to approximate standard coalescent, are much more efficient whilst keeping salient features of msHOT and Simcoal2, respectively. Our simulations demonstrate that they are more advantageous over other programs for a spectrum of evolutionary models. To validate recombination hotspots, LDhat 2.2 rhomap package, sequenceLDhot and Haploview were compared for hotspot detection, and sequenceLDhot exhibited the best performance based on both real and simulated data.

Conclusions

While ms remains an excellent choice for general coalescent simulations of DNA sequences, MaCS and fastsimcoal are much more scalable and flexible in simulating a variety of demographic events under different recombination hotspot models. Furthermore, sequenceLDhot appears to give the most optimal performance in detecting and validating cross-over hotspots.  相似文献   

10.
The same evolutionary forces that cause diversification in sexual eukaryotes are expected to cause diversification in bacteria. However, in bacteria, the wider variety of mechanisms for gene exchange (or lack thereof) increases the range of expected diversity patterns compared to those of sexual organisms. Two parallel concepts for bacterial speciation have developed, based on ecological divergence or barriers to recombination in turn. Recent evidence from DNA sequence data shows that both processes can generate independently evolving groups that are equivalent to sexual species and that represent separate arenas within which recombination (when it occurs), selection and drift occur. It remains unclear, however, how often different processes act in concert to generate simple units of diversity, or whether a more complex model of diversity is required, specifying hierarchical levels at which different cohesive processes operate. We advocate an integrative approach that evaluates the effects of multiple evolutionary forces on diversity patterns. There is also great potential for laboratory studies of bacterial evolution that test evolutionary mechanisms inferred from population genetic analyses of multi-locus and genome sequence data.  相似文献   

11.
MOTIVATION: Analysis of large biological data sets using a variety of parallel processor computer architectures is a common task in bioinformatics. The efficiency of the analysis can be significantly improved by properly handling redundancy present in these data combined with taking advantage of the unique features of these compute architectures. RESULTS: We describe a generalized approach to this analysis, but present specific results using the program CEPAR, an efficient implementation of the Combinatorial Extension algorithm in a massively parallel (PAR) mode for finding pairwise protein structure similarities and aligning protein structures from the Protein Data Bank. CEPAR design and implementation are described and results provided for the efficiency of the algorithm when run on a large number of processors. AVAILABILITY: Source code is available by contacting one of the authors.  相似文献   

12.
We present a web engine boosted fluorescence in-situ hybridization (webFISH) algorithm using a genome-wide sequence similarity search to design target-specific single-copy and repetitive DNA FISH probes. The webFISH algorithm featuring a user-friendly interface (http://www.webfish2.org/) maximizes the coverage of the examined sequences with FISH probes by considering locally repetitive sequences absent from the remainder of the genome. The highly repetitive human immunoglobulin heavy chain sequence was analyzed using webFISH to design three sets of FISH probes. These allowed direct simultaneous detection of class switch recombination in both immunoglobulin-heavy chain alleles in single cells from a population of cultured primary B cells. It directly demonstrated asynchrony of the class switch recombination in the two alleles in structurally preserved nuclei while permitting parallel readout of protein expression by immunofluorescence staining. This novel technique offers the possibility of gaining unprecedented insight into the molecular mechanisms involved in class switch recombination.  相似文献   

13.
A Load Balancing Tool for Distributed Parallel Loops   总被引:1,自引:0,他引:1  
Large scale applications typically contain parallel loops with many iterates. The iterates of a parallel loop may have variable execution times which translate into performance degradation of an application due to load imbalance. This paper describes a tool for load balancing parallel loops on distributed-memory systems. The tool assumes that the data for a parallel loop to be executed is already partitioned among the participating processors. The tool utilizes the MPI library for interprocessor coordination, and determines processor workloads by loop scheduling techniques. The tool was designed independent of any application; hence, it must be supplied with a routine that encapsulates the computations for a chunk of loop iterates, as well as the routines to transfer data and results between processors. Performance evaluation on a Linux cluster indicates that the tool reduces the cost of executing a simulated irregular loop without load balancing by up to 81%. The tool is useful for parallelizing sequential applications with parallel loops, or as an alternate load balancing routine for existing parallel applications.  相似文献   

14.
The rapid range southward expansion of the periwinkle Littorina littorea from the Canadian maritimes has fueled a long-running debate over whether this species was introduced to North America by human activity. A reappraisal of the mitochondrial DNA sequence evidence finds considerable endemic allelic diversity in the American population. The degree of endemic genetic diversity is higher than expected from human-mediated colonization, but not so much to suggest that it survived the last glacial maximum in America. Coalescent estimates of population divergence agree that colonization of America preceded European contact. A reappraisal of the ITS nuclear sequence data finds extensive recombination. Taking this recombination into account strengthens the genetic case against human-mediated introduction. Finally, a reappraisal of conflicting allozyme studies from the 1970’s supports a claim of limited divergence between American and European populations. This is consistent with post-glacial colonization, but the allozyme data cannot distinguish between natural or human-mediated colonization. Taken as a whole, the DNA sequence data supports the many sub-fossil reports of an American L. littorea population in the Canadian maritimes that preceded even the first visits by the Vikings.  相似文献   

15.
Comparison of the performance and accuracy of different inference methods, such as maximum likelihood (ML) and Bayesian inference, is difficult because the inference methods are implemented in different programs, often written by different authors. Both methods were implemented in the program MIGRATE, that estimates population genetic parameters, such as population sizes and migration rates, using coalescence theory. Both inference methods use the same Markov chain Monte Carlo algorithm and differ from each other in only two aspects: parameter proposal distribution and maximization of the likelihood function. Using simulated datasets, the Bayesian method generally fares better than the ML approach in accuracy and coverage, although for some values the two approaches are equal in performance. MOTIVATION: The Markov chain Monte Carlo-based ML framework can fail on sparse data and can deliver non-conservative support intervals. A Bayesian framework with appropriate prior distribution is able to remedy some of these problems. RESULTS: The program MIGRATE was extended to allow not only for ML(-) maximum likelihood estimation of population genetics parameters but also for using a Bayesian framework. Comparisons between the Bayesian approach and the ML approach are facilitated because both modes estimate the same parameters under the same population model and assumptions.  相似文献   

16.
Diffusion limited aggregation (DLA) has proved very successful in modelling systems which display fractal characteristics, like viscous fingering. However, by nature, such simulations are very processor intensive, requiring large amounts of processor time even for relatively small models. We have performed simulations of viscous fingering on the NCUBE parallel computer which has hypercube architecture. We find that, as long as the number of processors used is much less than both the total number of walkers released and the overall dimensions of the model, the fractal dimensions obtained using serial and parallel algorithms give similar results whilst achieving a considerable speed-up in the parallel implementation. An average fractal dimension of 1.71 was obtained along with a speed-up of 106 (in the best case) and 83% efficiency using 128 processors.  相似文献   

17.
Linear regression analysis is considered the least computationally demanding method for mapping quantitative trait loci (QTL). However, simultaneous search for multiple QTL, the use of permutations to obtain empirical significance thresholds, and larger experimental studies significantly increase the computational demand. This report describes an easily implemented parallel algorithm, which significantly reduces the computing time in both QTL mapping and permutation testing. In the example provided, the analysis time was decreased to less than 15% of a single processor system by the use of 18 processors. We indicate how the efficiency of the analysis could be improved by distributing the computations more evenly to the processors and how other ways of distributing the data facilitate the use of more processors. The use of parallel computing in QTL mapping makes it possible to routinely use permutations to obtain empirical significance thresholds for multiple traits and multiple QTL models. It could also be of use to improve the computational efficiency of the more computationally demanding QTL analysis methods.  相似文献   

18.
While the maximum-likelihood (ML) method of tree reconstruction is statistically rigorous, it is extremely time-consuming for reconstructing large trees. We previously developed a hybrid method (NJML) that combines the neighbor-joining (NJ) and ML methods and thus is much faster than the ML method and improves the performance of NJ. However, we considered only nucleotide sequence data, so NJML is not suitable for handling amino acid sequence data, which requires even more computer time. NJML+ is an implementation of a further improved method for practical data analyses (including protein sequence data). Our extensive simulations using nucleotide and amino acid sequences showed that NJML+ gave good results in tree reconstruction. Indeed, NJML+ showed substantial improvements over existing methods in terms of both computational times and efficiencies, especially for amino acid sequence data. We also developed a "user-friendly" interface for the NJML+ program, including a simple tree viewer.  相似文献   

19.
TT virus (TTV) has a remarkable genetic heterogeneity. To study TTV evolution, phylogenetic analyses were performed on 739 DNA sequences mapping in the N22 region of ORF1. Analysis of neighbor-joining consensus trees shows significant differences between DNA and protein phylogeny. Median joining networks phylogenetic clustering indicates that DNA sequence analysis is biased by homoplasy (i.e., genetic variability not originated by descent), indicative of either hypermutation or recombination. Statistical analysis shows that the significant excess of homoplasy is due to frequent recombination among closely related strains. Recombination events imply that the transmission of TTV is not clonal and provide the necessary basis to explain (i) the high degree of genetic divergence between TTV isolates, (ii) the lack of population structure on a world scale, and (iii) the number of highly divergent strains that seems typical of this virus. We show that recombination phenomena can be detected by phylogenetic analyses in very short sequences when a sufficiently large data set is available.  相似文献   

20.
We asked if single-stranded vector DNA molecules could be used to reintroduce cloned DNA sequences into a eukaryotic cell and cause genetic transformation typical of that observed using double-stranded DNA vectors. DNA was presented to Saccharomyces cerevisiae following a standard transformation protocol, genetic transformants were isolated, and the physical state of the transforming DNA sequence was determined. We found that single-stranded DNA molecules transformed yeast cells 10- to 30-fold more efficiently than double-stranded molecules of identical sequence. More cells were competent for transformation by the single-stranded molecules. Single-stranded circular (ssc) DNA molecules carrying the yeast 2 μ plasmid-replicator sequence were converted to autonomously replicating double-stranded circular (dsc) molecules, suggesting their efficient utilization as templates for DNA synthesis in the cell. Single-stranded DNA molecules carrying 2 μ plasmid non-replicator sequences recombined with the endogenous multicopy 2 μ plasmid DNA. This recombination yielded either the simple molecular adduct expected from homologous recombination (40% of the transformants examined) or aberrant recombination products carrying incomplete transforming DNA sequences, endogenous 2 μ plasmid DNA sequences, or both (60% of the transformants examined). These aberrant recombination products suggest the frequent use of a recombination pathway that trims one or both of the substrate DNA molecules. Similar aberrant recombination products were detected in 30% of the transformants in cotransformation experiments employing single-stranded and double-stranded DNA molecules, one carrying the 2 μ plasmid replicator sequence and the other the selectable genetic marker. We conclude that single-stranded DNA molecules are useful vectors for the genetic transformation of a eukaryotic cell. They offer the advantage of high transformation efficiency, and yield the same intracellular DNA species obtained upon transformation with double-stranded DNA molecules. In addition, single-stranded DNA molecules can participate in a recombination pathway that trims one or both DNA recombination substrates, a pathway not detected, at least at the same frequency, when transforming with double-stranded DNA molecules  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号