首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
The observation that disease associated proteins often interact with each other has fueled the development of network-based approaches to elucidate the molecular mechanisms of human disease. Such approaches build on the assumption that protein interaction networks can be viewed as maps in which diseases can be identified with localized perturbation within a certain neighborhood. The identification of these neighborhoods, or disease modules, is therefore a prerequisite of a detailed investigation of a particular pathophenotype. While numerous heuristic methods exist that successfully pinpoint disease associated modules, the basic underlying connectivity patterns remain largely unexplored. In this work we aim to fill this gap by analyzing the network properties of a comprehensive corpus of 70 complex diseases. We find that disease associated proteins do not reside within locally dense communities and instead identify connectivity significance as the most predictive quantity. This quantity inspires the design of a novel Disease Module Detection (DIAMOnD) algorithm to identify the full disease module around a set of known disease proteins. We study the performance of the algorithm using well-controlled synthetic data and systematically validate the identified neighborhoods for a large corpus of diseases.  相似文献   

3.

Background  

Many of the most popular pre-processing methods for Affymetrix expression arrays, such as RMA, gcRMA, and PLIER, simultaneously analyze data across a set of predetermined arrays to improve precision of the final measures of expression. One problem associated with these algorithms is that expression measurements for a particular sample are highly dependent on the set of samples used for normalization and results obtained by normalization with a different set may not be comparable. A related problem is that an organization producing and/or storing large amounts of data in a sequential fashion will need to either re-run the pre-processing algorithm every time an array is added or store them in batches that are pre-processed together. Furthermore, pre-processing of large numbers of arrays requires loading all the feature-level data into memory which is a difficult task even with modern computers. We utilize a scheme that produces all the information necessary for pre-processing using a very large training set that can be used for summarization of samples outside of the training set. All subsequent pre-processing tasks can be done on an individual array basis. We demonstrate the utility of this approach by defining a new version of the Robust Multi-chip Averaging (RMA) algorithm which we refer to as refRMA.  相似文献   

4.
Marini F  Camilloni C  Provasi D  Broglia RA  Tiana G 《Gene》2008,422(1-2):37-40
Metadynamics is a powerful computational tool to obtain the free-energy landscape of complex systems. The Monte Carlo algorithm has proven useful to calculate thermodynamic quantities associated with simplified models of proteins, and thus to gain an ever-increasing understanding on the general principles underlying the mechanism of protein folding. We show that it is possible to couple metadynamics and Monte Carlo algorithms to obtain the free energy of model proteins in a way which is computationally very economical.  相似文献   

5.
We investigate a simple model that generates random partitions of the leaf set of a tree. Of particular interest is the reconstruction question: what number k of independent samples (partitions) are required to correctly reconstruct the underlying tree (with high probability)? We demonstrate a phase transition for k as a function of the mutation rate, from logarithmic to polynomial dependence on the size of the tree. We also describe a simple polynomial-time tree reconstruction algorithm that applies in the logarithmic region. This model and the associated reconstruction questions are motivated by a Markov model for genomic evolution in molecular biology.  相似文献   

6.
Although metastasis is the principal cause of death cause for colorectal cancer (CRC) patients, the molecular mechanisms underlying CRC metastasis are still not fully understood. In an attempt to identify metastasis-related genes in CRC, we obtained gene expression profiles of 55 early stage primary CRCs, 56 late stage primary CRCs, and 34 metastatic CRCs from the expression project in Oncology (http://www.intgen.org/expo/). We developed a novel gene selection algorithm (SVM-T-RFE), which extends support vector machine recursive feature elimination (SVM-RFE) algorithm by incorporating T-statistic. We achieved highest classification accuracy (100%) with smaller gene subsets (10 and 6, respectively), when classifying between early and late stage primary CRCs, as well as between metastatic CRCs and late stage primary CRCs. We also compared the performance of SVM-T-RFE and SVM-RFE gene selection algorithms on another large-scale CRC dataset and the five public microarray datasets. SVM-T-RFE bestowed SVM-RFE algorithm in identifying more differentially expressed genes, and achieving highest prediction accuracy using equal or smaller number of selected genes. A fraction of selected genes have been reported to be associated with CRC development or metastasis.  相似文献   

7.
Protein domain decomposition using a graph-theoretic approach   总被引:2,自引:0,他引:2  
MOTIVATION: Automatic decomposition of a multi-domain protein into individual domains represents a highly interesting and unsolved problem. As the number of protein structures in PDB is growing at an exponential rate, there is clearly a need for more reliable and efficient methods for protein domain decomposition simply to keep the domain databases up-to-date. RESULTS: We present a new algorithm for solving the domain decomposition problem, using a graph-theoretic approach. We have formulated the problem as a network flow problem, in which each residue of a protein is represented as a node of the network and each residue--residue contact is represented as an edge with a particular capacity, depending on the type of the contact. A two-domain decomposition problem is solved by finding a bottleneck (or a minimum cut) of the network, which minimizes the total cross-edge capacity, using the classical Ford--Fulkerson algorithm. A multi-domain decomposition problem is solved through repeatedly solving a series of two-domain problems. The algorithm has been implemented as a computer program, called DomainParser. We have tested the program on a commonly used test set consisting of 55 proteins. The decomposition results are 78.2% in agreement with the literature on both the number of decomposed domains and the assignments of residues to each domain, which compares favorably to existing programs. On the subset of two-domain proteins (20 in number), the program assigned 96.7% of the residues correctly when we require that the number of decomposed domains is two.  相似文献   

8.
MOTIVATION: As the number of fully sequenced prokaryotic genomes continues to grow rapidly, computational methods for reliably detecting protein-coding regions become even more important. Audic and Claverie (1998) Proc. Natl Acad. Sci. USA, 95, 10026-10031, have proposed a clustering algorithm for protein-coding regions in microbial genomes. The algorithm is based on three Markov models of order k associated with subsequences extracted from a given genome. The parameters of the three Markov models are recursively updated by the algorithm which, in simulations, always appear to converge to a unique stable partition of the genome. The partition corresponds to three kinds of regions: (1) coding on the direct strand, (2) coding on the complementary strand, (3) non-coding. RESULTS: Here we provide an explanation for the convergence of the algorithm by observing that it is essentially a form of the expectation maximization (EM) algorithm applied to the corresponding mixture model. We also provide a partial justification for the uniqueness of the partition based on identifiability. Other possible variations and improvements are briefly discussed.  相似文献   

9.
We propose a two layer neural network for computation of an approximate convex-hull of a set of points or a set of circles/ellipses of different sizes. The algorithm is based on a very elegant concept - shrinking of a rubber band surrounding the set of planar objects. Logically, a set of neurons is placed on a circle (rubber band) surrounding the objects. Each neuron has a parameter vector associated with it. This may be viewed as the current position of the neuron. The given set of points/objects exerts a force of attraction on every neuron, which determines how its current position will be updated (as if, the force determines the direction of movement of the neuron lying on the rubber band). As the network evolves, the neurons (parameter vectors) approximate the convex-hull more and more accurately. The scheme can be applied to find the convex-hull of a planar set of circles or ellipses or a mixture of the two. Some properties related to the evolution of the algorithm are also presented.  相似文献   

10.
The potential for obtaining a true mass spectrometric protein identification result depends on the choice of algorithm as well as on experimental factors that influence the information content in the mass spectrometric data. Current methods can never prove definitively that a result is true, but an appropriate choice of algorithm can provide a measure of the statistical risk that a result is false, i.e., the statistical significance. We recently demonstrated an algorithm, Probity, which assigns the statistical significance to each result. For any choice of algorithm, the difficulty of obtaining statistically significant results depends on the number of protein sequences in the sequence collection searched. By simulations of random protein identifications and using the Probity algorithm, we here demonstrate explicitly how the statistical significance depends on the number of sequences searched. We also provide an example on how the practitioner's choice of taxonomic constraints influences the statistical significance.  相似文献   

11.
Accurate prediction of pseudoknotted nucleic acid secondary structure is an important computational challenge. Prediction algorithms based on dynamic programming aim to find a structure with minimum free energy according to some thermodynamic ("sum of loop energies") model that is implicit in the recurrences of the algorithm. However, a clear definition of what exactly are the loops in pseudoknotted structures, and their associated energies, has been lacking. In this work, we present a complete classification of loops in pseudoknotted nucleic secondary structures, and describe the Rivas and Eddy and other energy models as sum-of-loops energy models. We give a linear time algorithm for parsing a pseudoknotted secondary structure into its component loops. We give two applications of our parsing algorithm. The first is a linear time algorithm to calculate the free energy of a pseudoknotted secondary structure. This is useful for heuristic prediction algorithms, which are widely used since (pseudoknotted) RNA secondary structure prediction is NP-hard. The second application is a linear time algorithm to test the generality of the dynamic programming algorithm of Akutsu for secondary structure prediction.Together with previous work, we use this algorithm to compare the generality of state-of-the-art algorithms on real biological structures.  相似文献   

12.
The problem of reconstructing a species supertree from a given set of protein, gene, and regulatorysite trees is the subject of this study. Under the traditional formulation, this problem is proven to be NP-hard. We propose a reformulation: to seek for a supertree, most of the clades of which contribute to the original protein trees. In such a variant, the problem seems to be biologically natural and a fast algorithm can be developed for its solution. The algorithm was tested on artificial and biological sets of protein trees, and it proved to be efficient even under the assumption of horizontal gene transfer. When horizontal transfer is not allowed, the algorithm correctness is proved mathematically; the time necessary for repeating the algorithm is assessed, and, in the worst case scenario, it is of the order n 3 · |V 0|3, where n is the number of gene trees and |V 0| is the number of tree species. Our software for supertree construction, examples of computations, and instructions can be freely accessed at . Events associated with horizontal gene transfer are not included either in this study or in any variant of the software. A general case is described in the authors’ report (journal Problems of Information Transmission, 2011).  相似文献   

13.
MOTIVATION: Alternative splicing allows a single gene to generate multiple mRNAs, which can be translated into functionally and structurally diverse proteins. One gene can have multiple variants coexisting at different concentrations. Estimating the relative abundance of each variant is important for the study of underlying biological function. Microarrays are standard tools that measure gene expression. But most design and analysis has not accounted for splice variants. Thus splice variant-specific chip designs and analysis algorithms are needed for accurate gene expression profiling. RESULTS: Inspired by Li and Wong (2001), we developed a gene structure-based algorithm to determine the relative abundance of known splice variants. Probe intensities are modeled across multiple experiments using gene structures as constraints. Model parameters are obtained through a maximum likelihood estimation (MLE) process/framework. The algorithm produces the relative concentration of each variant, as well as an affinity term associated with each probe. Validation of the algorithm is performed by a set of controlled spike experiments as well as endogenous tissue samples using a human splice variant array.  相似文献   

14.
Codon adaptation index as a measure of dominating codon bias   总被引:9,自引:0,他引:9  
We propose a simple algorithm to detect dominating synonymous codon usage bias in genomes. The algorithm is based on a precise mathematical formulation of the problem that lead us to use the Codon Adaptation Index (CAI) as a 'universal' measure of codon bias. This measure has been previously employed in the specific context of translational bias. With the set of coding sequences as a sole source of biological information, the algorithm provides a reference set of genes which is highly representative of the bias. This set can be used to compute the CAI of genes of prokaryotic and eukaryotic organisms, including those whose functional annotation is not yet available. An important application concerns the detection of a reference set characterizing translational bias which is known to correlate to expression levels; in this case, the algorithm becomes a key tool to predict gene expression levels, to guide regulatory circuit reconstruction, and to compare species. The algorithm detects also leading-lagging strands bias, GC-content bias, GC3 bias, and horizontal gene transfer. The approach is validated on 12 slow-growing and fast-growing bacteria, Saccharomyces cerevisiae, Caenorhabditis elegans and Drosophila melanogaster. AVAILABILITY: http://www.ihes.fr/~materials.  相似文献   

15.
Genomic copy number change is one of the important phenomenon observed in cancer and other genetic disorders. Recently oligonucleotide microarrays have been used to analyze changes in the copy number. Although high density microarrays provide genome wide useful data on copy number, they are often associated with substantial amount of experimental noise that could affect the performance of the analyses. We used the high density oligonucleotide genotyping microarrays in our experiments that uses redundant probe tiling approach for individual SNPs. We found that the noise in the genotyping microarray data is associated with several experimental steps during target preparation and devised an algorithm that takes into account those experimental parameters. Additionally, defective probes that do not hybridize well to the target and therefore could not be modified inherently were detected and omitted automatically by using the algorithm. When we applied the algorithm to actual datasets, we could reduce the noise substantially without compressing the dynamic range. Additionally, combinatorial use of our noise reduction algorithm and conventional breakpoint detection algorithm successfully detected a microamplification of c-myc which was overlooked in the raw data. The algorithm described here is freely available with the software upon request to all non-profit researchers.  相似文献   

16.
We present a new multiscale model for complex fluids based on three scales: microscopic, kinetic and continuum. We choose the microscopic level as Kramers' bead–rod model for polymers, which we describe as a system of stochastic differential equations with an implicit constraint formulation. The associated Fokker–Planck equation is then derived, and adiabatic elimination removes the fast momentum coordinates. Approached in this way, the kinetic level reduces to a dispersive drift equation. The continuum level is modelled with a finite volume Godunov-projection algorithm. We demonstrate the computation of viscoelastic stress divergence using this multiscale approach.  相似文献   

17.
Gait reaction reconstruction and a heel strike algorithm   总被引:1,自引:0,他引:1  
A mathematical model of gait ground loading is presented. The model allows the ground reactions produced by any particular single- or multiple-footfall pattern to be constructed, given a sufficient variety of other measured ground reactions. An algorithm which uses center of vertical pressure data only to determine the instants of successive heel strikes on a large force plate is then presented. Experiments show the high accuracy of the heel strike algorithm and show that reconstructions of the vertical component of ground reactions are typically within 3% of corresponding measured reactions. The techniques presented allow certain problems associated with small force plates and other problems associated with large force plates to be largely overcome.  相似文献   

18.
Copy number variation (CNV) is a form of structural alteration in the mammalian DNA sequence, which are associated with many complex neurological diseases as well as cancer. The development of next generation sequencing (NGS) technology provides us a new dimension towards detection of genomic locations with copy number variations. Here we develop an algorithm for detecting CNVs, which is based on depth of coverage data generated by NGS technology. In this work, we have used a novel way to represent the read count data as a two dimensional geometrical point. A key aspect of detecting the regions with CNVs, is to devise a proper segmentation algorithm that will distinguish the genomic locations having a significant difference in read count data. We have designed a new segmentation approach in this context, using convex hull algorithm on the geometrical representation of read count data. To our knowledge, most algorithms have used a single distribution model of read count data, but here in our approach, we have considered the read count data to follow two different distribution models independently, which adds to the robustness of detection of CNVs. In addition, our algorithm calls CNVs based on the multiple sample analysis approach resulting in a low false discovery rate with high precision.  相似文献   

19.
Braille reading is a complex process involving intricate finger-motion patterns and finger-rubbing actions across Braille letters for the stimulation of appropriate nerves. Although Braille reading is performed by smoothly moving the finger from left-to-right, research shows that even fluent reading requires right-to-left movements of the finger, known as “reversal”. Reversals are crucial as they not only enhance stimulation of nerves for correctly reading the letters, but they also show one to re-read the letters that were missed in the first pass. Moreover, it is known that reversals can be performed as often as in every sentence and can start at any location in a sentence. Here, we report experimental results on the feasibility of an algorithm that can render a machine to automatically adapt to reversal gestures of one’s finger. Through Braille-reading-analogous tasks, the algorithm is tested with thirty sighted subjects that volunteered in the study. We find that the finger motion adaptive algorithm (FMAA) is useful in achieving cooperation between human finger and the machine. In the presence of FMAA, subjects’ performance metrics associated with the tasks have significantly improved as supported by statistical analysis. In light of these encouraging results, preliminary experiments are carried out with five blind subjects with the aim to put the algorithm to test. Results obtained from carefully designed experiments showed that subjects’ Braille reading accuracy in the presence of FMAA was more favorable then when FMAA was turned off. Utilization of FMAA in future generation Braille reading devices thus holds strong promise.  相似文献   

20.
Rifamycin B is an important polyketide antibiotic used in the treatment of tuberculosis and leprosy. We present results on medium optimization for Rifamycin B production via a barbital insensitive mutant strain of Amycolatopsis mediterranei S699. Machine-learning approaches such as Genetic algorithm (GA), Neighborhood analysis (NA) and Decision Tree technique (DT) were explored for optimizing the medium composition. Genetic algorithm was applied as a global search algorithm while NA was used for a guided local search and to develop medium predictors. The fermentation medium for Rifamycin B consisted of nine components. A large number of distinct medium compositions are possible by variation of concentration of each component. This presents a large combinatorial search space. Optimization was achieved within five generations via GA as well as NA. These five generations consisted of 178 shake-flask experiments, which is a small fraction of the search space. We detected multiple optima in the form of 11 distinct medium combinations. These medium combinations provided over 600% improvement in Rifamycin B productivity. Genetic algorithm performed better in optimizing fermentation medium as compared to NA. The Decision Tree technique revealed the media-media interactions qualitatively in the form of sets of rules for medium composition that give high as well as low productivity.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号