首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Ring networks are enjoying renewed interest as Storage Area Networks (SANs), i.e., networks for interconnecting storage devices (e.g., disk, disk arrays and tape drives) and storage data clients. This paper addresses the problem of fairness in ring networks with spatial reuse operating under dynamic traffic scenarios. To this end, in the first part of the paper the Max-Min fairness definition is extended to dynamic traffic scenarios and an algorithm for computing Max-Min fair rates in a dynamic environment is introduced. In the second part of the paper the extended Max-Min fairness definition is used as a measure to compare the performance in dynamic conditions of three fairness algorithms proposed for ring-based SANs. These algorithms are characterized by different fairness cycle sizes (number of links involved in each instance of the fairness algorithm), i.e., different complexity. The results show that the performance increases as the fairness cycle size decreases. In particular, the Global-cycle algorithm (implemented in the Serial Storage Architecture - SSA), whose cycle size is equal to the number N of links in the ring, exhibits the lowest performance, while the One-cycle algorithm, so called because of its cycle size equal to 1, has the best performance. The Variable-cycle algorithm, whose cycle size changes between 1 and N links, performs in between and provides the best tradeoff between performance and complexity. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

2.
Position weight matrices are an important method for modeling signals or motifs in biological sequences, both in DNA and protein contexts. In this paper, we present fast algorithms for the problem of finding significant matches of such matrices. Our algorithms are of the online type, and they generalize classical multipattern matching, filtering, and superalphabet techniques of combinatorial string matching to the problem of weight matrix matching. Several variants of the algorithms are developed, including multiple matrix extensions that perform the search for several matrices in one scan through the sequence database. Experimental performance evaluation is provided to compare the new techniques against each other as well as against some other online and index-based algorithms proposed in the literature. Compared to the brute-force O(mn) approach, our solutions can be faster by a factor that is proportional to the matrix length m. Our multiple-matrix filtration algorithm had the best performance in the experiments. On a current PC, this algorithm finds significant matches (p = 0.0001) of the 123 JASPAR matrices in the human genome in about 18 minutes.  相似文献   

3.

Background

Next-generation sequencing technology provides a means to study genetic exchange at a higher resolution than was possible using earlier technologies. However, this improvement presents challenges as the alignments of next generation sequence data to a reference genome cannot be directly used as input to existing detection algorithms, which instead typically use multiple sequence alignments as input. We therefore designed a software suite called REDHORSE that uses genomic alignments, extracts genetic markers, and generates multiple sequence alignments that can be used as input to existing recombination detection algorithms. In addition, REDHORSE implements a custom recombination detection algorithm that makes use of sequence information and genomic positions to accurately detect crossovers. REDHORSE is a portable and platform independent suite that provides efficient analysis of genetic crosses based on Next-generation sequencing data.

Results

We demonstrated the utility of REDHORSE using simulated data and real Next-generation sequencing data. The simulated dataset mimicked recombination between two known haploid parental strains and allowed comparison of detected break points against known true break points to assess performance of recombination detection algorithms. A newly generated NGS dataset from a genetic cross of Toxoplasma gondii allowed us to demonstrate our pipeline. REDHORSE successfully extracted the relevant genetic markers and was able to transform the read alignments from NGS to the genome to generate multiple sequence alignments. Recombination detection algorithm in REDHORSE was able to detect conventional crossovers and double crossovers typically associated with gene conversions whilst filtering out artifacts that might have been introduced during sequencing or alignment. REDHORSE outperformed other commonly used recombination detection algorithms in finding conventional crossovers. In addition, REDHORSE was the only algorithm that was able to detect double crossovers.

Conclusion

REDHORSE is an efficient analytical pipeline that serves as a bridge between genomic alignments and existing recombination detection algorithms. Moreover, REDHORSE is equipped with a recombination detection algorithm specifically designed for Next-generation sequencing data. REDHORSE is portable, platform independent Java based utility that provides efficient analysis of genetic crosses based on Next-generation sequencing data. REDHORSE is available at http://redhorse.sourceforge.net/.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1309-7) contains supplementary material, which is available to authorized users.  相似文献   

4.
"Protein Side-chain Packing" has an ever-increasing application in the field of bio-informatics, dating from the early methods of homology modeling to protein design and to the protein docking. However, this problem is computationally known to be NP-hard. In this regard, we have developed a novel approach to solve this problem using the notion of a maximum edge-weight clique. Our approach is based on efficient reduction of protein side-chain packing problem to a graph and then solving the reduced graph to find the maximum clique by applying an efficient clique finding algorithm developed by our co-authors. Since our approach is based on deterministic algorithms in contrast to the various existing algorithms based on heuristic approaches, our algorithm guarantees of finding an optimal solution. We have tested this approach to predict the side-chain conformations of a set of proteins and have compared the results with other existing methods. We have found that our results are favorably comparable or better than the results produced by the existing methods. As our test set contains a protein of 494 residues, we have obtained considerable improvement in terms of size of the proteins and in terms of the efficiency and the accuracy of prediction.  相似文献   

5.
The label switching problem occurs as a result of the nonidentifiability of posterior distribution over various permutations of component labels when using Bayesian approach to estimate parameters in mixture models. In the cases where the number of components is fixed and known, we propose a relabelling algorithm, an allocation variable-based (denoted by AVP) probabilistic relabelling approach, to deal with label switching problem. We establish a model for the posterior distribution of allocation variables with label switching phenomenon. The AVP algorithm stochastically relabel the posterior samples according to the posterior probabilities of the established model. Some existing deterministic and other probabilistic algorithms are compared with AVP algorithm in simulation studies, and the success of the proposed approach is demonstrated in simulation studies and a real dataset.  相似文献   

6.
BLAST is the most popular bioinformatics tool and is used to run millions of queries each day. However, evaluating such queries is slow, taking typically minutes on modern workstations. Therefore, continuing evolution of BLAST--by improving its algorithms and optimizations--is essential to improve search times in the face of exponentially increasing collection sizes. We present an optimization to the first stage of the BLAST algorithm specifically designed for protein search. It produces the same results as NCBI-BLAST but in around 59% of the time on Intel-based platforms; we also present results for other popular architectures. Overall, this is a saving of around 15% of the total typical BLAST search time. Our approach uses a deterministic finite automaton (DFA), inspired by the original scheme used in the 1990 BLAST algorithm. The techniques are optimized for modern hardware, making careful use of cache-conscious approaches to improve speed. Our optimized DFA approach has been integrated into a new version of BLAST that is freely available for download at http://www.fsa-blast.org/.  相似文献   

7.
MOTIVATION: Deciphering the location of gene duplications and multiple gene duplication episodes on the Tree of Life is fundamental to understanding the way gene families and genomes evolve. The multiple gene duplication problem provides a framework for placing gene duplication events onto nodes of a given species tree, and detecting episodes of multiple gene duplication. One version of the multiple gene duplication problem was defined by Guigó et al. in 1996. Several heuristic solutions have since been proposed for this problem, but no exact algorithms were known. RESULTS: In this article we solve this longstanding open problem by providing the first exact and efficient solution. We also demonstrate the improvement offered by our algorithm over the best heuristic approaches, by applying it to several simulated as well as empirical datasets.  相似文献   

8.
ABSTRACT: BACKGROUND: The estimation of parameter values for mathematical models of biological systems is an optimization problem that is particularly challenging due to the nonlinearities involved. One major difficulty is the existence of multiple minima in which standard optimization methods may fall during the search. Deterministic global optimization methods overcome this limitation, ensuring convergence to the global optimum within a desired tolerance. Global optimization techniques are typically classified into stochastic and deterministic. The former typically lead to lower CPU times but offer no guarantee of convergence to the global minimum in a finite number of iterations. In contrast, deterministic methods provide solutions of a given quality (i.e., optimality gap), but tend to lead to large computational burdens. RESULTS: This work presents a deterministic outer approximation-based algorithm for the global optimization of dynamic problems arising in the parameter estimation of models of biological systems. Our approach, which offers a theoretical guarantee of convergence to the global minimum, reformulating the set of ordinary differential equations into an equivalent set of algebraic equations through the use of orthogonal collocation methods, giving rise to a nonconvex nonlinear programming (NLP) problem. This nonconvex NLP is decomposed into two hierarchical levels: a master mixed-integer linear programming problem (MILP) that provides a rigorous lower bound on the optimal solution, and a reduced-space slave NLP that yields an upper bound. The algorithm iterates between these two levels until a termination criterion is satisfied. CONCLUSION: The capabilities of our approach were tested in two benchmark problems, in which the performance of our algorithm was compared with that of the commercial global optimization package BARON. The proposed strategy produced near optimal solutions (i.e., within a desired tolerance) in a fraction of the CPU time required by BARON.  相似文献   

9.
There are typically multiple heterogeneous servers providing various services in cloud computing. High power consumption of these servers increases the cost of running a data center. Thus, there is a problem of reducing the power cost with tolerable performance degradation. In this paper, we optimize the performance and power consumption tradeoff for multiple heterogeneous servers. We consider the following problems: (1) optimal job scheduling with fixed service rates; (2) joint optimal service speed scaling and job scheduling. For problem (1), we present the Karush-Kuhn-Tucker (KKT) conditions and provide a closed-form solution. For problem (2), both continuous speed scaling and discrete speed scaling are considered. In discrete speed scaling, the feasible service rates are discrete and bounded. We formulate the problem as an MINLP problem and propose a distributed algorithm by online value iteration, which has lower complexity than a centralized algorithm. Our approach provides an analytical way to manage the tradeoff between performance and power consumption. The simulation results show the gain of using speed scaling, and also prove the effectiveness and efficiency of the proposed algorithms.  相似文献   

10.
Evaluation of a particle swarm algorithm for biomechanical optimization   总被引:1,自引:0,他引:1  
Optimization is frequently employed in biomechanics research to solve system identification problems, predict human movement, or estimate muscle or other internal forces that cannot be measured directly. Unfortunately, biomechanical optimization problems often possess multiple local minima, making it difficult to find the best solution. Furthermore, convergence in gradient-based algorithms can be affected by scaling to account for design variables with different length scales or units. In this study we evaluate a recently-developed version of the particle swarm optimization (PSO) algorithm to address these problems. The algorithm's global search capabilities were investigated using a suite of difficult analytical test problems, while its scale-independent nature was proven mathematically and verified using a biomechanical test problem. For comparison, all test problems were also solved with three off-the-shelf optimization algorithms--a global genetic algorithm (GA) and multistart gradient-based sequential quadratic programming (SQP) and quasi-Newton (BFGS) algorithms. For the analytical test problems, only the PSO algorithm was successful on the majority of the problems. When compared to previously published results for the same problems, PSO was more robust than a global simulated annealing algorithm but less robust than a different, more complex genetic algorithm. For the biomechanical test problem, only the PSO algorithm was insensitive to design variable scaling, with the GA algorithm being mildly sensitive and the SQP and BFGS algorithms being highly sensitive. The proposed PSO algorithm provides a new off-the-shelf global optimization option for difficult biomechanical problems, especially those utilizing design variables with different length scales or units.  相似文献   

11.
Data striping and reliability aspects in distributed video servers   总被引:2,自引:0,他引:2  
To provide the required amount of storage and bandwidth, a video server typically comprises a large number of disks. As the total number of disks increases, the influence of the striping algorithm that determines how video data are distributed across the disks becomes decisive in terms of overall server cost and performance. Also introducing fault-tolerance against disk failures becomes a must. In this paper, we first evaluate different striping algorithms in terms of throughput, buffer requirement, and start-up latency for a non-fault-tolerant server. We then examine the impact of data striping on a fault-tolerant server and show that the striping policy and the optimal technique to assure fault-tolerance are related: Depending on the technique used to assure fault-tolerance (mirroring or parity), different striping techniques perform best. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

12.
Rasmussen TK  Krink T 《Bio Systems》2003,72(1-2):5-17
Multiple sequence alignment (MSA) is one of the basic problems in computational biology. Realistic problem instances of MSA are computationally intractable for exact algorithms. One way to tackle MSA is to use Hidden Markov Models (HMMs), which are known to be very powerful in the related problem domain of speech recognition. However, the training of HMMs is computationally hard and there is no known exact method that can guarantee optimal training within reasonable computing time. Perhaps the most powerful training method is the Baum-Welch algorithm, which is fast, but bears the problem of stagnation at local optima. In the study reported in this paper, we used a hybrid algorithm combining particle swarm optimization with evolutionary algorithms to train HMMs for the alignment of protein sequences. Our experiments show that our approach yields better alignments for a set of benchmark protein sequences than the most commonly applied HMM training methods, such as Baum-Welch and Simulated Annealing.  相似文献   

13.
14.
MOTIVATION: A major problem for current peak detection algorithms is that noise in mass spectrometry (MS) spectra gives rise to a high rate of false positives. The false positive rate is especially problematic in detecting peaks with low amplitudes. Usually, various baseline correction algorithms and smoothing methods are applied before attempting peak detection. This approach is very sensitive to the amount of smoothing and aggressiveness of the baseline correction, which contribute to making peak detection results inconsistent between runs, instrumentation and analysis methods. RESULTS: Most peak detection algorithms simply identify peaks based on amplitude, ignoring the additional information present in the shape of the peaks in a spectrum. In our experience, 'true' peaks have characteristic shapes, and providing a shape-matching function that provides a 'goodness of fit' coefficient should provide a more robust peak identification method. Based on these observations, a continuous wavelet transform (CWT)-based peak detection algorithm has been devised that identifies peaks with different scales and amplitudes. By transforming the spectrum into wavelet space, the pattern-matching problem is simplified and in addition provides a powerful technique for identifying and separating the signal from the spike noise and colored noise. This transformation, with the additional information provided by the 2D CWT coefficients can greatly enhance the effective signal-to-noise ratio. Furthermore, with this technique no baseline removal or peak smoothing preprocessing steps are required before peak detection, and this improves the robustness of peak detection under a variety of conditions. The algorithm was evaluated with SELDI-TOF spectra with known polypeptide positions. Comparisons with two other popular algorithms were performed. The results show the CWT-based algorithm can identify both strong and weak peaks while keeping false positive rate low. AVAILABILITY: The algorithm is implemented in R and will be included as an open source module in the Bioconductor project.  相似文献   

15.
An efficient rank based approach for closest string and closest substring   总被引:1,自引:0,他引:1  
Dinu LP  Ionescu R 《PloS one》2012,7(6):e37576
This paper aims to present a new genetic approach that uses rank distance for solving two known NP-hard problems, and to compare rank distance with other distance measures for strings. The two NP-hard problems we are trying to solve are closest string and closest substring. For each problem we build a genetic algorithm and we describe the genetic operations involved. Both genetic algorithms use a fitness function based on rank distance. We compare our algorithms with other genetic algorithms that use different distance measures, such as Hamming distance or Levenshtein distance, on real DNA sequences. Our experiments show that the genetic algorithms based on rank distance have the best results.  相似文献   

16.
Protein NMR peak assignment refers to the process of assigning a group of "spin systems" obtained experimentally to a protein sequence of amino acids. The automation of this process is still an unsolved and challenging problem in NMR protein structure determination. Recently, protein NMR peak assignment has been formulated as an interval scheduling problem (ISP), where a protein sequence P of amino acids is viewed as a discrete time interval I (the amino acids on P one-to-one correspond to the time units of I), each subset S of spin systems that are known to originate from consecutive amino acids from P is viewed as a "job" j(s), the preference of assigning S to a subsequence P of consecutive amino acids on P is viewed as the profit of executing job j(s) in the subinterval of I corresponding to P, and the goal is to maximize the total profit of executing the jobs (on a single machine) during I. The interval scheduling problem is max SNP-hard in general; but in the real practice of protein NMR peak assignment, each job j(s) usually requires at most 10 consecutive time units, and typically the jobs that require one or two consecutive time units are the most difficult to assign/schedule. In order to solve these most difficult assignments, we present an efficient 13/7-approximation algorithm for the special case of the interval scheduling problem where each job takes one or two consecutive time units. Combining this algorithm with a greedy filtering strategy for handling long jobs (i.e., jobs that need more than two consecutive time units), we obtain a new efficient heuristic for protein NMR peak assignment. Our experimental study shows that the new heuristic produces the best peak assignment in most of the cases, compared with the NMR peak assignment algorithms in the recent literature. The above algorithm is also the first approximation algorithm for a nontrivial case of the well-known interval scheduling problem that breaks the ratio 2 barrier.  相似文献   

17.
18.
Networks of workstations offer large amounts of unused processing time. Resource management systems are able to exploit this computing capacity by assigning compute-intensive tasks to idle workstations. To avoid interferences between multiple, concurrently running applications, such resource management systems have to schedule application jobs carefully. Continuously arriving jobs and dynamically changing amounts of available CPU capacity make traditional scheduling algorithms difficult to apply in workstation networks. Online scheduling algorithms promise better results by adapting schedules to changing situations. This paper compares six online scheduling algorithms by simulating several workload scenarios. Based on the insights gained by simulation, the three online scheduling algorithms performing best were implemented in the Winner resource management system. Experiments conducted with Winner in a real workstation network confirm the simulation results obtained. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

19.
The PSO family: deduction, stochastic analysis and comparison   总被引:2,自引:0,他引:2  
The PSO algorithm can be physically interpreted as a stochastic damped mass-spring system: the so-called PSO continuous model. Furthermore, PSO corresponds to a particular discretization of the PSO continuous model. In this paper, we introduce a delayed version of the PSO continuous model, where the center of attraction might be delayed with respect to the particle trajectories. Based on this mechanical analogy, we derive a family of PSO versions. For each member of this family, we deduce the first and second order stability regions and the corresponding spectral radius. As expected, the PSO-family algorithms approach the PSO continuous model (damped-mass-spring system) as the size of the time step goes to zero. All the family members are linearly isomorphic when they are derived using the same delay parameter. If the delay parameter is different, the algorithms have corresponding stability regions of any order, but they differ in their respective force terms. All the PSO versions perform fairly well in a very broad area of the parameter space (inertia weight and local and global accelerations) that is close to the border of the second order stability regions and also to the median lines of the first order stability regions where no temporal correlation between trajectories exists. In addition, these regions are fairly similar for different benchmark functions. When the number of parameters of the cost function increases, these regions move towards higher inertia weight values (w=1) and lower total mean accelerations where the temporal covariance between trajectories is positive. Finally, the comparison between different PSO versions results in the conclusion that the centered version (CC-PSO) and PSO have the best convergence rates. Conversely, the centered-progressive version (CP-PSO) has the greatest exploratory capabilities. These features have to do with the way these algorithms update the velocities and positions of particles in the swarm. Knowledge of their respective dynamics can be used to propose a family of simple and stable algorithms, including hybrid versions.  相似文献   

20.
Optimizing amino acid conformation and identity is a central problem in computational protein design. Protein design algorithms must allow realistic protein flexibility to occur during this optimization, or they may fail to find the best sequence with the lowest energy. Most design algorithms implement side-chain flexibility by allowing the side chains to move between a small set of discrete, low-energy states, which we call rigid rotamers. In this work we show that allowing continuous side-chain flexibility (which we call continuous rotamers) greatly improves protein flexibility modeling. We present a large-scale study that compares the sequences and best energy conformations in 69 protein-core redesigns using a rigid-rotamer model versus a continuous-rotamer model. We show that in nearly all of our redesigns the sequence found by the continuous-rotamer model is different and has a lower energy than the one found by the rigid-rotamer model. Moreover, the sequences found by the continuous-rotamer model are more similar to the native sequences. We then show that the seemingly easy solution of sampling more rigid rotamers within the continuous region is not a practical alternative to a continuous-rotamer model: at computationally feasible resolutions, using more rigid rotamers was never better than a continuous-rotamer model and almost always resulted in higher energies. Finally, we present a new protein design algorithm based on the dead-end elimination (DEE) algorithm, which we call iMinDEE, that makes the use of continuous rotamers feasible in larger systems. iMinDEE guarantees finding the optimal answer while pruning the search space with close to the same efficiency of DEE. Availability: Software is available under the Lesser GNU Public License v3. Contact the authors for source code.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号