首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In many phylogenetic problems, assuming that species have evolved from a common ancestor by a simple branching process is unrealistic. Reticulate phylogenetic models, however, have been largely neglected because the concept of reticulate evolution have not been supported by using appropriate analytical tools and software. The reticulate model can adequately describe such complicated mechanisms as hybridization between species or lateral gene transfer in bacteria. In this paper, we describe a new algorithm for inferring reticulate phylogenies from evolutionary distances among species. The algorithm is capable of detecting contradictory signals encompassed in a phylogenetic tree and identifying possible reticulate events that may have occurred during evolution. The algorithm produces a reticulate phylogeny by gradually improving upon the initial solution provided by a phylogenetic tree model. The new algorithm is compared to the popular SplitsGraph method in a reanalysis of the evolution of photosynthetic organisms. A computer program to construct and visualize reticulate phylogenies, called T-Rex (Tree and Reticulogram Reconstruction), is available to researchers at the following URL: www.fas.umontreal.ca/biol/casgrain/en/labo/t-rex.  相似文献   

2.
Motivation: Finding a good network null model for protein–proteininteraction (PPI) networks is a fundamental issue. Such a modelwould provide insights into the interplay between network structureand biological function as well as into evolution. Also, network(graph) models are used to guide biological experiments anddiscover new biological features. It has been proposed thatgeometric random graphs are a good model for PPI networks. Ina geometric random graph, nodes correspond to uniformly randomlydistributed points in a metric space and edges (links) existbetween pairs of nodes for which the corresponding points inthe metric space are close enough according to some distancenorm. Computational experiments have revealed close matchesbetween key topological properties of PPI networks and geometricrandom graph models. In this work, we push the comparison furtherby exploiting the fact that the geometric property can be testedfor directly. To this end, we develop an algorithm that takesPPI interaction data and embeds proteins into a low-dimensionalEuclidean space, under the premise that connectivity informationcorresponds to Euclidean proximity, as in geometric-random graphs.We judge the sensitivity and specificity of the fit by computingthe area under the Receiver Operator Characteristic (ROC) curve.The network embedding algorithm is based on multi-dimensionalscaling, with the square root of the path length in a networkplaying the role of the Euclidean distance in the Euclideanspace. The algorithm exploits sparsity for computational efficiency,and requires only a few sparse matrix multiplications, givinga complexity of O(N2) where N is the number of proteins. Results: The algorithm has been verified in the sense that itsuccessfully rediscovers the geometric structure in artificiallyconstructed geometric networks, even when noise is added byre-wiring some links. Applying the algorithm to 19 publiclyavailable PPI networks of various organisms indicated that:(a) geometric effects are present and (b) two-dimensional Euclideanspace is generally as effective as higher dimensional Euclideanspace for explaining the connectivity. Testing on a high-confidenceyeast data set produced a very strong indication of geometricstructure (area under the ROC curve of 0.89), with this networkbeing essentially indistinguishable from a noisy geometric network.Overall, the results add support to the hypothesis that PPInetworks have a geometric structure. Availability: MATLAB code implementing the algorithm is availableupon request. Contact: natasha{at}ics.uci.edu Associate Editor: Olga Troyanskaya  相似文献   

3.
We describe an algorithm to position a rigid surface so as to make its cross-section by a given plane match a given curve in that plane, a problem relevant to model-based medical imaging. After building an atlas of cross-sections of the surface and searching it for a best position to start from, each iteration of the algorithm (1) determines a vector field along the intersection curve that will improve its matching with the target curve, and (2) computes and applies a small displacement of the surface whose effect on the intersection will approximate best the required vector field. Computations use least-square techniques, an exponential formula for Lie groups of transformations, and generic properties of cross-sections. Experiments with an implementation are reported and theoretical tools for justifying and improving the algorithm, some of them based on Catastrophe Theory, are outlined.  相似文献   

4.
This paper proposes a new method to reverse engineer gene regulatory networks from experimental data. The modeling framework used is time-discrete deterministic dynamical systems, with a finite set of states for each of the variables. The simplest examples of such models are Boolean networks, in which variables have only two possible states. The use of a larger number of possible states allows a finer discretization of experimental data and more than one possible mode of action for the variables, depending on threshold values. Furthermore, with a suitable choice of state set, one can employ powerful tools from computational algebra, that underlie the reverse-engineering algorithm, avoiding costly enumeration strategies. To perform well, the algorithm requires wildtype together with perturbation time courses. This makes it suitable for small to meso-scale networks rather than networks on a genome-wide scale. An analysis of the complexity of the algorithm is performed. The algorithm is validated on a recently published Boolean network model of segment polarity development in Drosophila melanogaster.  相似文献   

5.
Alignment of protein sequences is a key step in most computational methods for prediction of protein function and homology-based modeling of three-dimensional (3D)-structure. We investigated correspondence between "gold standard" alignments of 3D protein structures and the sequence alignments produced by the Smith-Waterman algorithm, currently the most sensitive method for pair-wise alignment of sequences. The results of this analysis enabled development of a novel method to align a pair of protein sequences. The comparison of the Smith-Waterman and structure alignments focused on their inner structure and especially on the continuous ungapped alignment segments, "islands" between gaps. Approximately one third of the islands in the gold standard alignments have negative or low positive score, and their recognition is below the sensitivity limit of the Smith-Waterman algorithm. From the alignment accuracy perspective, the time spent by the algorithm while working in these unalignable regions is unnecessary. We considered features of the standard similarity scoring function responsible for this phenomenon and suggested an alternative hierarchical algorithm, which explicitly addresses high scoring regions. This algorithm is considerably faster than the Smith-Waterman algorithm, whereas resulting alignments are in average of the same quality with respect to the gold standard. This finding shows that the decrease of alignment accuracy is not necessarily a price for the computational efficiency.  相似文献   

6.
Important desired properties of an algorithm to construct a supertree (species tree) by reconciling input trees are its low complexity and applicability to large biological data. In its common statement the problem is proved to be NP-hard, i.e. to have an exponential complexity in practice. We propose a reformulation of the supertree building problem that allows a computationally effective solution. We introduce a biologically natural requirement that the supertree is sought for such that it does not contain clades incompatible with those existing in the input trees. The algorithm was tested with simulated and biological trees and was shown to possess an almost square complexity even if horizontal transfers are allowed. If HGTs are not assumed, the algorithm is mathematically correct and possesses the longest running time of n3 x[V0]3, where n is the number of input trees and [V0] is the total number of species. The authors are unaware of analogous solutions in published evidence. The corresponding inferring program, its usage examples and manual are freely available at http://lab6.iitp.ru/en/super3gl. The available program does not implement HGTs. The generalized case is described in the publication "A tree nearest in average to a set of trees" (Information Transmission Problems, 2011).  相似文献   

7.
In biological systems, instead of actual encoders at different joints, proprioception signals are acquired through distributed receptive fields. In robotics, a single and accurate sensor output per link (encoder) is commonly used to track the position and the velocity. Interfacing bio-inspired control systems with spiking neural networks emulating the cerebellum with conventional robots is not a straight forward task. Therefore, it is necessary to adapt this one-dimensional measure (encoder output) into a multidimensional space (inputs for a spiking neural network) to connect, for instance, the spiking cerebellar architecture; i.e. a translation from an analog space into a distributed population coding in terms of spikes. This paper analyzes how evolved receptive fields (optimized towards information transmission) can efficiently generate a sensorimotor representation that facilitates its discrimination from other "sensorimotor states". This can be seen as an abstraction of the Cuneate Nucleus (CN) functionality in a robot-arm scenario. We model the CN as a spiking neuron population coding in time according to the response of mechanoreceptors during a multi-joint movement in a robot joint space. An encoding scheme that takes into account the relative spiking time of the signals propagating from peripheral nerve fibers to second-order somatosensory neurons is proposed. Due to the enormous number of possible encodings, we have applied an evolutionary algorithm to evolve the sensory receptive field representation from random to optimized encoding. Following the nature-inspired analogy, evolved configurations have shown to outperform simple hand-tuned configurations and other homogenized configurations based on the solution provided by the optimization engine (evolutionary algorithm). We have used artificial evolutionary engines as the optimization tool to circumvent nonlinearity responses in receptive fields.  相似文献   

8.
We describe an algorithm to design the primary structures for peptides which must have the strongest binding to a given molecular surface. This problem cannot be solved by a direct combinatorial sorting, because of an enormous number of possible primary and spatial structures. The approach to solve this problem is to describe a state of each residue by two variables: (i) amino acid type and (ii) 3-D coordinate, and to minimize binding energy over all these variables simultaneously. For short chains which have no long-range interactions within themselves, this minimization can be done easily and efficiently by dynamic programming. We also discuss the problem of how to estimate specificity of binding and how to deduce a sequence with maximal specificity for a given surface. We show that this sequence can be deduced by the same algorithm after some modification of energetic parameters.  相似文献   

9.

Background  

Microarray technology is increasingly used to identify potential biomarkers for cancer prognostics and diagnostics. Previously, we have developed the iterative Bayesian Model Averaging (BMA) algorithm for use in classification. Here, we extend the iterative BMA algorithm for application to survival analysis on high-dimensional microarray data. The main goal in applying survival analysis to microarray data is to determine a highly predictive model of patients' time to event (such as death, relapse, or metastasis) using a small number of selected genes. Our multivariate procedure combines the effectiveness of multiple contending models by calculating the weighted average of their posterior probability distributions. Our results demonstrate that our iterative BMA algorithm for survival analysis achieves high prediction accuracy while consistently selecting a small and cost-effective number of predictor genes.  相似文献   

10.
As the knowledge of protein signal peptides can be used to reprogram cells in a desired way for gene therapy, signal peptides have become a crucial tool for researchers to design new drugs for targeting a particular organelle to correct a specific defect. To effectively use such a technique, however, we have to develop an automated method for fast and accurately predicting signal peptides and their cleavage sites, particularly in the post-genomic era when the number of protein sequences is being explosively increased. To realize this, the first important thing is to discriminate secretory proteins from non-secretory proteins. On the basis of the Needleman-Wunsch algorithm, we proposed a new alignment kernel function. The novel approach can be effectively used to extract the statistical properties of protein sequences for machine learning, leading to a higher prediction success rate.  相似文献   

11.
Emerging evidence places small proteins (≤50 amino acids) more centrally in physiological processes. Yet, their functional identification and the systematic genome annotation of their cognate small open-reading frames (smORFs) remains challenging both experimentally and computationally. Ribosome profiling or Ribo-Seq (that is a deep sequencing of ribosome-protected fragments) enables detecting of actively translated open-reading frames (ORFs) and empirical annotation of coding sequences (CDSs) using the in-register translation pattern that is characteristic for genuinely translating ribosomes. Multiple identifiers of ORFs that use the 3-nt periodicity in Ribo-Seq data sets have been successful in eukaryotic smORF annotation. They have difficulties evaluating prokaryotic genomes due to the unique architecture (e.g. polycistronic messages, overlapping ORFs, leaderless translation, non-canonical initiation etc.). Here, we present a new algorithm, smORFer, which performs with high accuracy in prokaryotic organisms in detecting putative smORFs. The unique feature of smORFer is that it uses an integrated approach and considers structural features of the genetic sequence along with in-frame translation and uses Fourier transform to convert these parameters into a measurable score to faithfully select smORFs. The algorithm is executed in a modular way, and dependent on the data available for a particular organism, different modules can be selected for smORF search.  相似文献   

12.
We introduce the software tool NTRFinder to search for a complex repetitive structure in DNA we call a nested tandem repeat (NTR). An NTR is a recurrence of two or more distinct tandem motifs interspersed with each other. We propose that NTRs can be used as phylogenetic and population markers. We have tested our algorithm on both real and simulated data, and present some real NTRs of interest. NTRFinder can be downloaded from http://www.maths.otago.ac.nz/~aamatroud/.  相似文献   

13.
Depth is a term frequently applied to the shape and surface of macromolecules, describing for example the grooves in DNA, the shape of an enzyme active site, or the binding site for a small molecule in a protein. Yet depth is a difficult property to define rigorously in a macromolecule, and few computational tools exist to quantify this notion, to visualize it, or analyze the results. We present our notion of travel depth, simply put the physical distance a solvent molecule would have to travel from a surface point to a suitably defined reference surface. To define the reference surface, we use the limiting form of the molecular surface with increasing probe size: the convex hull. We then present a fast, robust approximation algorithm to compute travel depth to every surface point. The travel depth is useful because it works for pockets of any size and complexity. It also works for two interesting special cases. First, it works on the grooves in DNA, which are unbounded in one direction. Second, it works on the case of tunnels, that is pockets that have no "bottom", but go through the entire macromolecule. Our algorithm makes it straightforward to quantify discussions of depth when analyzing structures. High-throughput analysis of macromolecule depth is also enabled by our algorithm. This is demonstrated by analyzing a database of protein-small molecule binding pockets, and the distribution of bound magnesium ions in RNA structures. These analyses show significant, but subtle effects of depth on ligand binding localization and strength.  相似文献   

14.
Many diseases show dichotomous phenotypic variation but do not follow a simple Mendelian pattern of inheritance. Variances of these binary diseases are presumably controlled by multiple loci and environmental variants. A least-squares method has been developed for mapping such complex disease loci by treating the binary phenotypes (0 and 1) as if they were continuous. However, the least-squares method is not recommended because of its ad hoc nature. Maximum Likelihood (ML) and Bayesian methods have also been developed for binary disease mapping by incorporating the discrete nature of the phenotypic distribution. In the ML analysis, the likelihood function is usually maximized using some complicated maximization algorithms (e.g. the Newton-Raphson or the simplex algorithm). Under the threshold model of binary disease, we develop an Expectation Maximization (EM) algorithm to solve for the maximum likelihood estimates (MLEs). The new EM algorithm is developed by treating both the unobserved genotype and the disease liability as missing values. As a result, the EM iteration equations have the same form as the normal equation system in linear regression. The EM algorithm is further modified to take into account sexual dimorphism in the linkage maps. Applying the EM-implemented ML method to a four-way-cross mouse family, we detected two regions on the fourth chromosome that have evidence of QTLs controlling the segregation of fibrosarcoma, a form of connective tissue cancer. The two QTLs explain 50-60% of the variance in the disease liability. We also applied a Bayesian method previously developed (modified to take into account sex-specific maps) to this data set and detected one additional QTL on chromosome 13 that explains another 26% of the variance of the disease liability. All the QTLs detected primarily show dominance effects.  相似文献   

15.
MOTIVATION: Inclusion body formation has been a major deterrent for overexpression studies since a large number of proteins form insoluble inclusion bodies when overexpressed in Escherichia coli. The formation of inclusion bodies is known to be an outcome of improper protein folding; thus the composition and arrangement of amino acids in the proteins would be a major influencing factor in deciding its aggregation propensity. There is a significant need for a prediction algorithm that would enable the rational identification of both mutants and also the ideal protein candidates for mutations that would confer higher solubility-on-overexpression instead of the presently used trial-and-error procedures. RESULTS: Six physicochemical properties together with residue and dipeptide-compositions have been used to develop a support vector machine-based classifier to predict the overexpression status in E.coli. The prediction accuracy is approximately 72% suggesting that it performs reasonably well in predicting the propensity of a protein to be soluble or to form inclusion bodies. The algorithm could also correctly predict the change in solubility for most of the point mutations reported in literature. This algorithm can be a useful tool in screening protein libraries to identify soluble variants of proteins.  相似文献   

16.
Finite element (FE) analysis is a cornerstone of orthopaedic biomechanics research. Three-dimensional medical imaging provides sufficient resolution for the subject-specific FE models to be generated from these data-sets. FE model development requires discretisation of a three-dimensional domain, which can be the most time-consuming component of a FE study. Hexahedral meshing tools based on the multiblock method currently rely on the manual placement of building blocks for mesh generation. We hypothesise that angular analysis of the geometric centreline for a three-dimensional surface could be used to automatically generate building block structures for the multiblock hexahedral mesh generation. Our algorithm uses a set of user-defined points and parameters to automatically generate a multiblock structure based on a surface's geometric centreline. This significantly reduces the time required for model development. We have applied this algorithm to 47 bones of varying geometries and successfully generated a FE mesh in all cases. This work represents significant advancement in automatically generating multiblock structures for a wide range of geometries.  相似文献   

17.
Finite element (FE) analysis is a cornerstone of orthopaedic biomechanics research. Three-dimensional medical imaging provides sufficient resolution for the subject-specific FE models to be generated from these data-sets. FE model development requires discretisation of a three-dimensional domain, which can be the most time-consuming component of a FE study. Hexahedral meshing tools based on the multiblock method currently rely on the manual placement of building blocks for mesh generation. We hypothesise that angular analysis of the geometric centreline for a three-dimensional surface could be used to automatically generate building block structures for the multiblock hexahedral mesh generation. Our algorithm uses a set of user-defined points and parameters to automatically generate a multiblock structure based on a surface's geometric centreline. This significantly reduces the time required for model development. We have applied this algorithm to 47 bones of varying geometries and successfully generated a FE mesh in all cases. This work represents significant advancement in automatically generating multiblock structures for a wide range of geometries.  相似文献   

18.
Many different methods exist for pattern detection in gene expression data. In contrast to classical methods, biclustering has the ability to cluster a group of genes together with a group of conditions (replicates, set of patients or drug compounds). However, since the problem is NP-complex, most algorithms use heuristic search functions and therefore might converge towards local maxima. By using the results of biclustering on discrete data as a starting point for a local search function on continuous data, our algorithm avoids the problem of heuristic initialization. Similar to OPSM, our algorithm aims to detect biclusters whose rows and columns can be ordered such that row values are growing across the bicluster's columns and vice-versa. Results have been generated on the yeast genome (Saccharomyces cerevisiae), a human cancer dataset and random data. Results on the yeast genome showed that 89% of the one hundred biggest non-overlapping biclusters were enriched with Gene Ontology annotations. A comparison with OPSM and ISA demonstrated a better efficiency when using gene and condition orders. We present results on random and real datasets that show the ability of our algorithm to capture statistically significant and biologically relevant biclusters.  相似文献   

19.
RationaleLower respiratory tract illness (LRTI) frequently causes adult hospitalization and antibiotic overuse. Procalcitonin (PCT) treatment algorithms have been used successfully in Europe to safely reduce antibiotic use for LRTI but have not been adopted in the United States. We recently performed a feasibility study for a randomized clinical trial (RCT) of PCT and viral testing to guide therapy for non-pneumonic LRTI.ObjectiveThe primary objective of the current study was to understand factors influencing PCT algorithm adherence during the RCT and evaluate factors influencing provider antibiotic prescribing practices for LRTI.ResultsDiagnosis of pneumonia on admission was the only variable significantly associated with non-adherence [7% (adherence) vs. 26% (nonadherence), p = 0.01]. Surveys confirmed possible infiltrate on chest radiograph as important for provider decisions, as were severity of illness, positive sputum culture, abnormal CBC and fever. However, age, patient expectations and medical-legal concerns were also at least somewhat important to prescribing practices. Physician agreement with the importance of viral and PCT testing increased from 42% to 64% (p = 0.007) and 49% to 74% (p = 0.001), respectively, after the study.ConclusionsOptimal algorithm adherence will be important for definitive PCT intervention trials in the US to determine if PCT guided algorithms result in better outcomes than reliance on traditional clinical variables. Factors influencing treatment decisions such as patient age, presence of fever, patient expectations and medical legal concerns may be amenable to education to improve PCT algorithm compliance for LRTI.  相似文献   

20.
DNA fingerprinting analysis such as amplified ribosomal DNA restriction analysis (ARDRA), repetitive extragenic palindromic PCR (rep-PCR), ribosomal intergenic spacer analysis (RISA), and denaturing gradient gel electrophoresis (DGGE) are frequently used in various fields of microbiology. The major difficulty in DNA fingerprinting data analysis is the alignment of multiple peak sets. We report here an R program for a clustering-based peak alignment algorithm, and its application to analyze various DNA fingerprinting data, such as ARDRA, rep-PCR, RISA, and DGGE data. The results obtained by our clustering algorithm and by BioNumerics software showed high similarity. Since several R packages have been established to statistically analyze various biological data, the distance matrix obtained by our R program can be used for subsequent statistical analyses, some of which were not previously performed but are useful in DNA fingerprinting studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号