首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
 In recent years the genetic algorithm (GA) was used successfully to solve many optimization problems. One of the most difficult questions of applying GA to a particular problem is that of coding. In this paper a scheme is derived to optimize one aspect of the coding in an automatic fashion. This is done by using a high cardinality alphabet and optimizing the meaning of the letters. The scheme is especially well suited in cases where a number of similar problems need to be solved. The use of the scheme is demonstrated with such a group of problems: the simplified problem of navigating a ‘robot’ in a ‘room.’ It is shown that for the sample problem family the proposed algorithm is superior to the canonical GA. Received: 26 August 1994/Accepted in revised form: 13 January 1995  相似文献   

2.
MOTIVATION: STS-content data for genomic mapping contain numerous errors and anomalies resulting in cross-links among distant regions of the genome. Identification of contigs within the data is an important and difficult problem. RESULTS: This paper introduces a graph algorithm which creates a simplified view of STS-content data. The shape of the resulting structure graph provides a quality check - coherent data produce a straight line, while anomalous data produce branches and loops. In the latter case, it is sometimes possible to disentangle the various paths into subsets of the data covering contiguous regions of the genome, i.e. contigs. These straight subgraphs can then be analyzed in standard ways to construct a physical map. A theoretical basis for the method is presented along with examples of its application to current STS data from human genome centers. AVAILABILITY: Freely available on request.  相似文献   

3.
MOTIVATION: Dynamic programming is probably the most popular programming method in bioinformatics. Sequence comparison, gene recognition, RNA structure prediction and hundreds of other problems are solved by ever new variants of dynamic programming. Currently, the development of a successful dynamic programming algorithm is a matter of experience, talent and luck. The typical matrix recurrence relations that make up a dynamic programming algorithm are intricate to construct, and difficult to implement reliably. No general problem independent guidance is available. RESULTS: This article introduces a systematic method for constructing dynamic programming solutions to problems in biosequence analysis. By a conceptual splitting of the algorithm into a recognition and an evaluation phase, algorithm development is simplified considerably, and correct recurrences can be derived systematically. Without additional effort, the method produces an early, executable prototype expressed in a functional programming language. The method is quite generally applicable, and, while programming effort decreases, no overhead in terms of ultimate program efficiency is incurred.  相似文献   

4.
MOTIVATION: A major problem for current peak detection algorithms is that noise in mass spectrometry (MS) spectra gives rise to a high rate of false positives. The false positive rate is especially problematic in detecting peaks with low amplitudes. Usually, various baseline correction algorithms and smoothing methods are applied before attempting peak detection. This approach is very sensitive to the amount of smoothing and aggressiveness of the baseline correction, which contribute to making peak detection results inconsistent between runs, instrumentation and analysis methods. RESULTS: Most peak detection algorithms simply identify peaks based on amplitude, ignoring the additional information present in the shape of the peaks in a spectrum. In our experience, 'true' peaks have characteristic shapes, and providing a shape-matching function that provides a 'goodness of fit' coefficient should provide a more robust peak identification method. Based on these observations, a continuous wavelet transform (CWT)-based peak detection algorithm has been devised that identifies peaks with different scales and amplitudes. By transforming the spectrum into wavelet space, the pattern-matching problem is simplified and in addition provides a powerful technique for identifying and separating the signal from the spike noise and colored noise. This transformation, with the additional information provided by the 2D CWT coefficients can greatly enhance the effective signal-to-noise ratio. Furthermore, with this technique no baseline removal or peak smoothing preprocessing steps are required before peak detection, and this improves the robustness of peak detection under a variety of conditions. The algorithm was evaluated with SELDI-TOF spectra with known polypeptide positions. Comparisons with two other popular algorithms were performed. The results show the CWT-based algorithm can identify both strong and weak peaks while keeping false positive rate low. AVAILABILITY: The algorithm is implemented in R and will be included as an open source module in the Bioconductor project.  相似文献   

5.
MOTIVATION: Protein-protein complexes are known to play key roles in many cellular processes. However, they are often not accessible to experimental study because of their low stability and difficulty to produce the proteins and assemble them in native conformation. Thus, docking algorithms have been developed to provide an in silico approach of the problem. A protein-protein docking procedure traditionally consists of two successive tasks: a search algorithm generates a large number of candidate solutions, and then a scoring function is used to rank them. RESULTS: To address the second step, we developed a scoring function based on a Vorono? tessellation of the protein three-dimensional structure. We showed that the Vorono? representation may be used to describe in a simplified but useful manner, the geometric and physico-chemical complementarities of two molecular surfaces. We measured a set of parameters on native protein-protein complexes and on decoys, and used them as attributes in several statistical learning procedures: a logistic function, Support Vector Machines (SVM), and a genetic algorithm. For the later, we used ROGER, a genetic algorithm designed to optimize the area under the receiver operating characteristics curve. To further test the scores derived with ROGER, we ranked models generated by two different docking algorithms on targets of a blind prediction experiment, improving in almost all cases the rank of native-like solutions. AVAILABILITY: http://genomics.eu.org/spip/-Bioinformatics-tools-  相似文献   

6.
We present LaTcOm, a new web tool, which offers several alternative methods for 'rare codon cluster' (RCC) identification from a single and simple graphical user interface. In the current version, three RCC detection schemes are implemented: the recently described %MinMax algorithm and a simplified sliding window approach, along with a novel modification of a linear-time algorithm for the detection of maximally scoring subsequences tailored to the RCC detection problem. Among a number of user tunable parameters, several codon-based scales relevant for RCC detection are available, including tRNA abundance values from Escherichia coli and several codon usage tables from a selection of genomes. Furthermore, useful scale transformations may be performed upon user request (e.g. linear, sigmoid). Users may choose to visualize RCC positions within the submitted sequences either with graphical representations or in textual form for further processing. AVAILABILITY: LaTcOm is freely available online at the URL http://troodos.biol.ucy.ac.cy/latcom.html.  相似文献   

7.
Alignment of RNA base pairing probability matrices   总被引:6,自引:0,他引:6  
MOTIVATION: Many classes of functional RNA molecules are characterized by highly conserved secondary structures but little detectable sequence similarity. Reliable multiple alignments can therefore be constructed only when the shared structural features are taken into account. Since multiple alignments are used as input for many subsequent methods of data analysis, structure-based alignments are an indispensable necessity in RNA bioinformatics. RESULTS: We present here a method to compute pairwise and progressive multiple alignments from the direct comparison of base pairing probability matrices. Instead of attempting to solve the folding and the alignment problem simultaneously as in the classical Sankoff's algorithm, we use McCaskill's approach to compute base pairing probability matrices which effectively incorporate the information on the energetics of each sequences. A novel, simplified variant of Sankoff's algorithms can then be employed to extract the maximum-weight common secondary structure and an associated alignment. AVAILABILITY: The programs pmcomp and pmmulti described in this contribution are implemented in Perl and can be downloaded together with the example datasets from http://www.tbi.univie.ac.at/RNA/PMcomp/. A web server is available at http://rna.tbi.univie.ac.at/cgi-bin/pmcgi.pl  相似文献   

8.

Rationale

Identification and characterization of asthma phenotypes are challenging due to disease complexity and heterogeneity. The Severe Asthma Research Program (SARP) used unsupervised cluster analysis to define 5 phenotypically distinct asthma clusters that they replicated using 3 variables in a simplified algorithm. We evaluated whether this simplified SARP algorithm could be used in a separate and diverse urban asthma population to recreate these 5 phenotypic clusters.

Methods

The SARP simplified algorithm was applied to adults with asthma recruited to the New York University/Bellevue Asthma Registry (NYUBAR) to classify patients into five groups. The clinical phenotypes were summarized and compared.

Results

Asthma subjects in NYUBAR (n = 471) were predominantly women (70%) and Hispanic (57%), which were demographically different from the SARP population. The clinical phenotypes of the five groups generated by the simplified SARP algorithm were distinct across groups and distributed similarly to those described for the SARP population. Groups 1 and 2 (6 and 63%, respectively) had predominantly childhood onset atopic asthma. Groups 4 and 5 (20%) were older, with the longest duration of asthma, increased symptoms and exacerbations. Group 4 subjects were the most atopic and had the highest peripheral eosinophils. Group 3 (10%) had the least atopy, but included older obese women with adult-onset asthma, and increased exacerbations.

Conclusions

Application of the simplified SARP algorithm to the NYUBAR yielded groups that were phenotypically distinct and useful to characterize disease heterogeneity. Differences across NYUBAR groups support phenotypic variation and support the use of the simplified SARP algorithm for classification of asthma phenotypes in future prospective studies to investigate treatment and outcome differences between these distinct groups.

Trial Registration

Clinicaltrials.gov NCT00212537  相似文献   

9.
MOTIVATION: Protein and DNA are generally represented by sequences of letters. In a number of circumstances simplified alphabets (where one or more letters would be represented by the same symbol) have proved their potential utility in several fields of bioinformatics including searching for patterns occurring at an unexpected rate, studying protein folding and finding consensus sequences in multiple alignments. The main issue addressed in this paper is the possibility of finding a general approach that would allow an exhaustive analysis of all the possible simplified alphabets, using substitution matrices like PAM and BLOSUM as a measure for scoring. RESULTS: The computational approach presented in this paper has led to a computer program called AlphaSimp (Alphabet Simplifier) that can perform an exhaustive analysis of the possible simplified amino acid alphabets, using a branch and bound algorithm together with standard or user-defined substitution matrices. The program returns a ranked list of the highest-scoring simplified alphabets. When the extent of the simplification is limited and the simplified alphabets are maintained above ten symbols the program is able to complete the analysis in minutes or even seconds on a personal computer. However, the performance becomes worse, taking up to several hours, for highly simplified alphabets. AVAILABILITY: AlphaSimp and other accessory programs are available at http://bioinformatics.cribi.unipd.it/alphasimp  相似文献   

10.
MOTIVATION: A new heuristic algorithm for solving DNA sequencing by hybridization problem with positive and negative errors. RESULTS: A heuristic algorithm providing better solutions than algorithms known from the literature based on tabu search method.  相似文献   

11.
Translation initiation start prediction in human cDNAs with high accuracy   总被引:3,自引:0,他引:3  
MOTIVATION: Correct identification of the Translation Initiation Start (TIS) in cDNA sequences is an important issue for genome annotation. The aim of this work is to improve upon current methods and provide a performance guaranteed prediction. METHODS: This is achieved by using two modules, one sensitive to the conserved motif and the other sensitive to the coding/non-coding potential around the start codon. Both modules are based on Artificial Neural Networks (ANNs). By applying the simplified method of the ribosome scanning model, the algorithm starts a linear search at the beginning of the coding ORF and stops once the combination of the two modules predicts a positive score. RESULTS: According to the results of the test group, 94% of the TIS were correctly predicted. A confident decision is obtained through the use of the Las Vegas algorithm idea. The incorporation of this algorithm leads to a highly accurate recognition of the TIS in human cDNAs for 60% of the cases. Availability: The program is available upon request from the author.  相似文献   

12.
A key problem in molecular biology is to infer regulatory relationships between genes from expression data. This paper studies a simplified model of such inference problems in which one or more Boolean variables, modeling, for example, the expression levels of genes, each depend deterministically on a small but unknown subset of a large number of Boolean input variables. Our model assumes that the expression data comprises a time series, in which successive samples may be correlated. We provide bounds on the expected amount of data needed to infer the correct relationships between output and input variables. These bounds improve and generalize previous results for Boolean network inference and continuous-time switching network inference. Although the computational problem is intractable in general, we describe a fixed-parameter tractable algorithm that is guaranteed to provide at least a partial solution to the problem. Most interestingly, both the sample complexity and computational complexity of the problem depend on the strength of correlations between successive samples in the time series but in opposing ways. Uncorrelated samples minimize the total number of samples needed while maximizing computational complexity; a strong correlation between successive samples has the opposite effect. This observation has implications for the design of experiments for measuring gene expression.  相似文献   

13.
J M Gibson 《Bio Systems》1989,23(2-3):219-28; discussion 229
A highly simplified evolving system was investigated by computer simulation. The genetic complement of each simulated organism in the population was represented by a single chromosome that consisted of a string of symbols. Individual fitness was measured as the number of symbols that corresponded to a specified rule. Reproduction was simulated with a non-breeding algorithm and two variants of a breeding algorithm, and was subject to random point mutations. In each generation, selection was effected by replacing the less fit members of the population with offspring of the more fit. The size of the population and the fraction replaced, though under experimental control, were constant for each simulation run. It was found that even such a simplified system is able to mimic a variety of properties observed in natural systems. In addition, the effect of the simulation parameters on the course of fitness increase provides a basis for using a genetic algorithm as an optimization technique.  相似文献   

14.
Zhang  Li  You  Yisha  Fu  Yongqi  Xu  Zongwei  Fang  Fengzhou 《Plasmonics (Norwell, Mass.)》2018,13(1):265-268
Plasmonics - A simplified and cost-effective optical absorber is designed by means of optimization on the basis of finite domain and time difference (FDTD) algorithm. It is suitable for fabrication...  相似文献   

15.
Finding control strategies of cells is a challenging and important problem in the post-genomic era. This paper considers theoretical aspects of the control problem using the Boolean network (BN), which is a simplified model of genetic networks. It is shown that finding a control strategy leading to the desired global state is computationally intractable (NP-hard) in general. Furthermore, this hardness result is extended for BNs with considerably restricted network structures. These results justify existing exponential time algorithms for finding control strategies for probabilistic Boolean networks (PBNs). On the other hand, this paper shows that the control problem can be solved in polynomial time if the network has a tree structure. Then, this algorithm is extended for the case where the network has a few loops and the number of time steps is small. Though this paper focuses on theoretical aspects, biological implications of the theoretical results are also discussed.  相似文献   

16.
Sorting permutations by transpositions is an important problem in genome rearrangements. A transposition is a rearrangement operation in which a segment is cut out of the permutation and pasted in a different location. The complexity of this problem is still open and it has been a 10-year-old open problem to improve the best known 1.5-approximation algorithm. In this paper, we provide a 1.375-approximation algorithm for sorting by transpositions. The algorithm is based on a new upper bound on the diameter of 3-permutations. In addition, we present some new results regarding the transposition diameter: We improve the lower bound for the transposition diameter of the symmetric group and determine the exact transposition diameter of simple permutations  相似文献   

17.
A generalization of the Stern theory is derived to treat the simultaneous adsorption of monovalent cations and divalent cations by single-component phospholipid membranes, where the ion:phospholipid binding stoichiometries are 1:1 for the monovalent cations and 1:1 and/or 1:2 for the divalent cations. This study treats both the situation in which the monovalent and divalent cations compete for membrane binding sites and that in which they do not compete. The general formalism of the screening/binding problem is reviewed, and it is shown how the adsorption problem can be isolated from the electrostatics. The statistical mechanics of mixed 1:1- and 1:2-stoichiometric adsorption (the monomer-dimer problem) is treated, and the problem of simultaneous 1:1 and 1:2 binding is solved. A simple expression for this solution, given in the Bethe approximation, is combined with the electrostatics to yield an adsorption isotherm encompassing both 1:1 monovalent-cation, and 1:1 and 1:2 divalent-cation, binding to charged membranes. A comparison with the simplified treatment of previous authors is made and the significance of their assumptions clarified in light of the present result. The present and previous treatments are plotted for a representative case of Na+ and Ca++ binding to a phosphatidylserine membrane. Criteria are established to permit unambiguous experimental testing of the present vs. previous treatments.  相似文献   

18.
Identification of the elastic symmetry of bone and other materials   总被引:1,自引:0,他引:1  
A simplified classification scheme for the elastic symmetries of a solid is applied to the identification of the elastic symmetry of a material by three different methods--visual, stereological and numerical algorithm. Each method is illustrated with an application to bone tissues, but the methods apply to all materials.  相似文献   

19.
We study the Simplified Partial Digest Problem (SPDP), which is a mathematical model for a new simplified partial digest method of genome mapping. This method is easy for laboratory implementation and robust with respect to the experimental errors. SPDP is NP-hard in the strong sense. We present an $O(n2;n)$ time enumerative algorithm and an O(n(2q)) time dynamic programming algorithm for the error-free SPDP, where $n$ is the number of restriction sites and n is the number of distinct intersite distances. We also give examples of the problem, in which there are 2(n+2)/(3)-1 non-congruent solutions. These examples partially answer a question recently posed in the literature about the number of solutions of SPDP. We adapt our enumerative algorithm for handling SPDP with imprecise input data. Finally, we describe and discuss the results of the computer experiments with our algorithms.  相似文献   

20.
The problems related to kinematic redundancy in both task and joint space were investigated for arm prehension movements in this paper. After a detailed analysis of kinematic redundancy of the arm, it is shown that the redundancy problem is ill posed only for the control of hand orientation. An experiment was then designed to investigate the influence of hand orientation on the control of arm movements. Since movements must be made within the limits of the joints, the influence of these limits was also analyzed quantitatively. The results of the experiment confirm that the increase of movement time because of the change of object orientation is due to the lengthening of the deceleration phase disproportionately to the rest of the movement. The variation of hand path due to the change of object orientation was observed as being surprisingly small for some subjects as opposed to the large range of object orientation, implying that hand path and hand orientation could be controlled separately, thus simplifying the computational problem of inverse kinematics. Moreover, the observations from the present experiment strongly suggest that a functional segmentation of the proximal and distal joints exists and that the control of wrist motion is dissociated from the rest of joint motions. The contribution of each joint in the control of arm movements could be determined through the principle of minimum energy and minimum discomfort under the constraints of the joint limits. A simplified inverse kinematics model was tested. It shows that these hypotheses can be easily implemented in a geometric algorithm and be used to predict arm prehension postures reasonably well under the constraints of joint limits. Received: 6 August 1998 / Accepted in revised form: 16 December 1998  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号