首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Discovery of prognostic and diagnostic biomarker gene signatures for diseases, such as cancer, is seen as a major step toward a better personalized medicine. During the last decade various methods have been proposed for that purpose. However, one important obstacle for making gene signatures a standard tool in clinical diagnosis is the typical low reproducibility of these signatures combined with the difficulty to achieve a clear biological interpretation. For that purpose in the last years there has been a growing interest in approaches that try to integrate information from molecular interaction networks. Most of these methods focus on classification problems, that is learn a model from data that discriminates patients into distinct clinical groups. Far less has been published on approaches that predict a patient's event risk. In this paper, we investigate eight methods that integrate network information into multivariable Cox proportional hazard models for risk prediction in breast cancer. We compare the prediction performance of our tested algorithms via cross‐validation as well as across different datasets. In addition, we highlight the stability and interpretability of obtained gene signatures. In conclusion, we find GeneRank‐based filtering to be a simple, computationally cheap and highly predictive technique to integrate network information into event time prediction models. Signatures derived via this method are highly reproducible.  相似文献   

2.
Systems biology aims to develop mathematical models of biological systems by integrating experimental and theoretical techniques. During the last decade, many systems biological approaches that base on genome-wide data have been developed to unravel the complexity of gene regulation. This review deals with the reconstruction of gene regulatory networks (GRNs) from experimental data through computational methods. Standard GRN inference methods primarily use gene expression data derived from microarrays. However, the incorporation of additional information from heterogeneous data sources, e.g. genome sequence and protein–DNA interaction data, clearly supports the network inference process. This review focuses on promising modelling approaches that use such diverse types of molecular biological information. In particular, approaches are discussed that enable the modelling of the dynamics of gene regulatory systems. The review provides an overview of common modelling schemes and learning algorithms and outlines current challenges in GRN modelling.  相似文献   

3.
Particle Swarm Optimization (PSO) is a stochastic optimization approach that originated from simulations of bird flocking, and that has been successfully used in many applications as an optimization tool. Estimation of distribution algorithms (EDAs) are a class of evolutionary algorithms which perform a two-step process: building a probabilistic model from which good solutions may be generated and then using this model to generate new individuals. Two distinct research trends that emerged in the past few years are the hybridization of PSO and EDA algorithms and the parallelization of EDAs to exploit the idea of exchanging the probabilistic model information. In this work, we propose the use of a cooperative PSO/EDA algorithm based on the exchange of heterogeneous probabilistic models. The model is heterogeneous because the cooperating PSO/EDA algorithms use different methods to sample the search space. Three different exchange approaches are tested and compared in this work. In all these approaches, the amount of information exchanged is adapted based on the performance of the two cooperating swarms. The performance of the cooperative model is compared to the existing state-of-the-art PSO cooperative approaches using a suite of well-known benchmark optimization functions.  相似文献   

4.

Background

Position-specific priors (PSP) have been used with success to boost EM and Gibbs sampler-based motif discovery algorithms. PSP information has been computed from different sources, including orthologous conservation, DNA duplex stability, and nucleosome positioning. The use of prior information has not yet been used in the context of combinatorial algorithms. Moreover, priors have been used only independently, and the gain of combining priors from different sources has not yet been studied.

Results

We extend RISOTTO, a combinatorial algorithm for motif discovery, by post-processing its output with a greedy procedure that uses prior information. PSP's from different sources are combined into a scoring criterion that guides the greedy search procedure. The resulting method, called GRISOTTO, was evaluated over 156 yeast TF ChIP-chip sequence-sets commonly used to benchmark prior-based motif discovery algorithms. Results show that GRISOTTO is at least as accurate as other twelve state-of-the-art approaches for the same task, even without combining priors. Furthermore, by considering combined priors, GRISOTTO is considerably more accurate than the state-of-the-art approaches for the same task. We also show that PSP's improve GRISOTTO ability to retrieve motifs from mouse ChiP-seq data, indicating that the proposed algorithm can be applied to data from a different technology and for a higher eukaryote.

Conclusions

The conclusions of this work are twofold. First, post-processing the output of combinatorial algorithms by incorporating prior information leads to a very efficient and effective motif discovery method. Second, combining priors from different sources is even more beneficial than considering them separately.  相似文献   

5.
Many studies on integration of process planning and production scheduling have been carried out during the last decade. While various integration approaches and algorithms have been proposed, the implementation of these approaches is still a difficult issue. To achieve successful implementation, it is important to examine and evaluate integration approaches or algorithms beforehand. Based on an object-oriented integration testbed, a simulation study that compares different integration algorithms is presented in this paper. Separated planning method and integrated planning methods are examined. Also, situations of both fixed and variable processing times are simulated, and useful results have been observed. The successful simulation with the object-oriented integration testbed eventually will be extended to include other new planning algorithms for examining their effectiveness and implementation feasibility.  相似文献   

6.
In the last decade, directed evolution has become a routine approach for engineering proteins with novel or altered properties. Concurrently, a trend away from purely 'blind' randomization strategies and towards more 'semi-rational' approaches has also become apparent. In this review, we discuss ways in which structural information and predictive computational tools are playing an increasingly important role in guiding the design of randomized libraries: web servers such as ConSurf-HSSP and SCHEMA allow the prediction of sites to target for producing functional variants, while algorithms such as GLUE, PEDEL and DRIVeR are useful for estimating library completeness and diversity. In addition, we review recent methodological developments that facilitate the construction of unbiased libraries, which are inherently more diverse than biased libraries and therefore more likely to yield improved variants.  相似文献   

7.
Several approaches for estimation of fractional zinc absorption (FZA) by calculating the ratio of oral to intravenous stable isotopic tracer concentrations (at an appropriate time) in urine or plasma after their simultaneous administration have been proposed in the last decade. These simple-to-implement approaches, often referred to as the double isotopic tracer ratio (DITR) method, are more attractive than the classical "deconvolution" method and the more commonly used single-tracer methods based on fecal monitoring and indicator dilution, after oral or intravenous tracer administration, respectively. However, the domain of validity of DITR for measuring FZA has recently been questioned. In this paper, we provide a theoretical justification of the validity of four different "approximate" formulations of the DITR technique by demonstrating mathematically that their accuracy is a consequence of the particular properties of zinc kinetics.  相似文献   

8.

Background  

Protein subcellular localization is an important determinant of protein function and hence, reliable methods for prediction of localization are needed. A number of prediction algorithms have been developed based on amino acid compositions or on the N-terminal characteristics (signal peptides) of proteins. However, such approaches lead to a loss of contextual information. Moreover, where information about the physicochemical properties of amino acids has been used, the methods employed to exploit that information are less than optimal and could use the information more effectively.  相似文献   

9.
For almost a decade, in vitro selection experiments have been used to isolate novel nucleic acids, peptides and proteins according to their function. Selection experiments have altered our perception of molecular mimicry and catalysis, and they appear to be more facile than rational design at generating biopolymers with desired properties. New methods that have been developed improve the power of functional strategies in ways that nature has already discovered - by expanding library size and facilitating the recombination of positive mutations. Recent structural information on a number of selected and evolved molecules highlights future challenges for design via rational approaches.  相似文献   

10.
The potential effectiveness of statistical haplotype inference makes it an area of active exploration over the last decade. There are several complications of statistical inference, including: the same algorithm can produce different solutions for the same data set, which reflects the internal algorithm variability; different algorithms can give different solutions for the same data set, reflecting the discordance among algorithms; and the algorithms per se are unable to evaluate the reliability of the solutions even if they are unique, this being a general limitation of all inference methods. With the aim of increasing the confidence of statistical inference results, consensus strategy appears to be an effective means to deal with these problems. Several authors have explored this with different emphases. Here we discuss two recent studies examining the internal algorithm variability and among-algorithm discordance, respectively, and evaluate the different outcomes of these analyses, in light of Orzack (2009) comment. Until other, better methods are developed, a combination of these two approaches should provide a practical way to increase the confidence of statistical haplotyping results.  相似文献   

11.
Least-squares methods for blind source separation based on nonlinear PCA   总被引:2,自引:0,他引:2  
In standard blind source separation, one tries to extract unknown source signals from their instantaneous linear mixtures by using a minimum of a priori information. We have recently shown that certain nonlinear extensions of principal component type neural algorithms can be successfully applied to this problem. In this paper, we show that a nonlinear PCA criterion can be minimized using least-squares approaches, leading to computationally efficient and fast converging algorithms. Several versions of this approach are developed and studied, some of which can be regarded as neural learning algorithms. A connection to the nonlinear PCA subspace rule is also shown. Experimental results are given, showing that the least-squares methods usually converge clearly faster than stochastic gradient algorithms in blind separation problems.  相似文献   

12.
A new dynamical layout algorithm for complex biochemical reaction networks   总被引:1,自引:0,他引:1  

Background  

To study complex biochemical reaction networks in living cells researchers more and more rely on databases and computational methods. In order to facilitate computational approaches, visualisation techniques are highly important. Biochemical reaction networks, e.g. metabolic pathways are often depicted as graphs and these graphs should be drawn dynamically to provide flexibility in the context of different data. Conventional layout algorithms are not sufficient for every kind of pathway in biochemical research. This is mainly due to certain conventions to which biochemists/biologists are used to and which are not in accordance to conventional layout algorithms. A number of approaches has been developed to improve this situation. Some of these are used in the context of biochemical databases and make more or less use of the information in these databases to aid the layout process. However, visualisation becomes also more and more important in modelling and simulation tools which mostly do not offer additional connections to databases. Therefore, layout algorithms used in these tools have to work independently of any databases. In addition, all of the existing algorithms face some limitations with respect to the number of edge crossings when it comes to larger biochemical systems due to the interconnectivity of these. Last but not least, in some cases, biochemical conventions are not met properly.  相似文献   

13.
Each diploid organism has two alleles at every gene locus. In sexual organisms such as most plants, animals and fungi, the two alleles in an individual may be genetically very different from each other. DNA sequence data from individual alleles (called a haplotype) can provide powerful information to address a variety of biological questions and guide many practical applications. The advancement in molecular technology and computational tools in the last decade has made obtaining large-scale haplotypes feasible. This review summarizes the two basic approaches for obtaining haplotypes and discusses the associated techniques and methods. The first approach is to experimentally obtain diploid sequence information and then use computer algorithms to infer haplotypes. The second approach is to obtain haplotype sequences directly through experimentation. The advantages and disadvantages of each approach are discussed. I then discussed a specific example on how the direct approach was used to obtain haplotype information to address several fundamental biological questions of a pathogenic yeast. With increasing sophistication in both bioinformatics tools and high-throughput molecular techniques, haplotype analysis is becoming an integrated component in biomedical research.  相似文献   

14.
Background: The reconstruction of clonal haplotypes and their evolutionary history in evolving populations is a common problem in both microbial evolutionary biology and cancer biology. The clonal theory of evolution provides a theoretical framework for modeling the evolution of clones.Results: In this paper, we review the theoretical framework and assumptions over which the clonal reconstruction problem is formulated. We formally define the problem and then discuss the complexity and solution space of the problem. Various methods have been proposed to find the phylogeny that best explains the observed data. We categorize these methods based on the type of input data that they use (space-resolved or time-resolved), and also based on their computational formulation as either combinatorial or probabilistic. It is crucial to understand the different types of input data because each provides essential but distinct information for drastically reducing the solution space of the clonal reconstruction problem. Complementary information provided by single cell sequencing or from whole genome sequencing of randomly isolated clones can also improve the accuracy of clonal reconstruction. We briefly review the existing algorithms and their relationships. Finally we summarize the tools that are developed for either directly solving the clonal reconstruction problem or a related computational problem.Conclusions: In this review, we discuss the various formulations of the problem of inferring the clonal evolutionary history from allele frequeny data, review existing algorithms and catergorize them according to their problem formulation and solution approaches. We note that most of the available clonal inference algorithms were developed for elucidating tumor evolution whereas clonal reconstruction for unicellular genomes are less addressed. We conclude the review by discussing more open problems such as the lack of benchmark datasets and comparison of performance between available tools.  相似文献   

15.
Rapid identification of proteins by peptide-mass fingerprinting   总被引:33,自引:0,他引:33  
BACKGROUND: Developments in 'soft' ionisation techniques have revolutionized mass-spectro-metric approaches for the analysis of protein structure. For more than a decade, such techniques have been used, in conjuction with digestion b specific proteases, to produce accurate peptide molecular weight 'fingerprints' of proteins. These fingerprints have commonly been used to screen known proteins, in order to detect errors of translation, to characterize post-translational modifications and to assign diulphide bonds. However, the extent to which peptide-mass information can be used alone to identify unknown sample proteins, independent of other analytical methods such as protein sequence analysis, has remained largely unexplored. RESULTS: We report here on the development of the molecular weight search (MOWSE) peptide-mass database at the SERC Daresbury Laboratory. Practical experience has shown that sample proteins can be uniquely identified from a few as three or four experimentally determined peptide masses when these are screened against a fragment database that is derived from over 50 000 proteins. Experimental errors of a few Daltons are tolerated by the scoring algorithms, thus permitting the use of inexpensive time-of-flight mass spectrometers. As with other types of physical data, such as amino-acid composition or linear sequence, peptide masses provide a set of determinants that are sufficiently discriminating to identify or match unknown sample proteins. CONCLUSION: Peptide-mass fingerprints can prove as discriminating as linear peptide sequences, but can be obtained in a fraction of the time using less protein. In many cases, this allows for a rapid identification of a sample protein before committing it to protein sequence analysis. Fragment masses also provide information, at the protein level, that is complementary to the information provided by large-scale DNA sequencing or mapping projects.  相似文献   

16.

Background

Ring artifacts are the concentric rings superimposed on the tomographic images often caused by the defective and insufficient calibrated detector elements as well as by the damaged scintillator crystals of the flat panel detector. It may be also generated by objects attenuating X-rays very differently in different projection direction. Ring artifact reduction techniques so far reported in the literature can be broadly classified into two groups. One category of the approaches is based on the sinogram processing also known as the pre-processing techniques and the other category of techniques perform processing on the 2-D reconstructed images, recognized as the post-processing techniques in the literature. The strength and weakness of these categories of approaches are yet to be explored from a common platform.

Method

In this paper, a comparative study of the two categories of ring artifact reduction techniques basically designed for the multi-slice CT instruments is presented from a common platform. For comparison, two representative algorithms from each of the two categories are selected from the published literature. A very recently reported state-of-the-art sinogram domain ring artifact correction method that classifies the ring artifacts according to their strength and then corrects the artifacts using class adaptive correction schemes is also included in this comparative study. The first sinogram domain correction method uses a wavelet based technique to detect the corrupted pixels and then using a simple linear interpolation technique estimates the responses of the bad pixels. The second sinogram based correction method performs all the filtering operations in the transform domain, i.e., in the wavelet and Fourier domain. On the other hand, the two post-processing based correction techniques actually operate on the polar transform domain of the reconstructed CT images. The first method extracts the ring artifact template vector using a homogeneity test and then corrects the CT images by subtracting the artifact template vector from the uncorrected images. The second post-processing based correction technique performs median and mean filtering on the reconstructed images to produce the corrected images.

Results

The performances of the comparing algorithms have been tested by using both quantitative and perceptual measures. For quantitative analysis, two different numerical performance indices are chosen. On the other hand, different types of artifact patterns, e.g., single/band ring, artifacts from defective and mis-calibrated detector elements, rings in highly structural object and also in hard object, rings from different flat-panel detectors are analyzed to perceptually investigate the strength and weakness of the five methods. An investigation has been also carried out to compare the efficacy of these algorithms in correcting the volume images from a cone beam CT with the parameters determined from one particular slice. Finally, the capability of each correction technique in retaining the image information (e.g., small object at the iso-center) accurately in the corrected CT image has been also tested.

Conclusions

The results show that the performances of the algorithms are limited and none is fully suitable for correcting different types of ring artifacts without introducing processing distortion to the image structure. To achieve the diagnostic quality of the corrected slices a combination of the two approaches (sinogram- and post-processing) can be used. Also the comparing methods are not suitable for correcting the volume images from a cone beam flat-panel detector based CT.  相似文献   

17.
MOTIVATION: Recently, several information extraction systems have been developed to retrieve relevant information out of biomedical text. However, these methods represent individual efforts. In this paper, we show that by combining different algorithms and their outcome, the results improve significantly. For this reason, CONAN has been created, a system which combines different programs and their outcome. Its methods include tagging of gene/protein names, finding interaction and mutation data, tagging of biological concepts and linking to MeSH and Gene Ontology terms. RESULTS: In this paper, we will present data that show that combining different text-mining algorithms significantly improves the results. Not only is CONAN a full-scale approach that will ultimately cover all of PubMed/MEDLINE, we also show that this universality has no effect on quality: our system performs as well as or better than existing systems. AVAILABILITY: The LDD corpus presented is available by request to the author. The system will be available shortly. For information and updates on CONAN please visit http://www.cs.uu.nl/people/rainer/conan.html.  相似文献   

18.
Several approaches have been developed over the past decade to study the complex interactions that occur in biological system. The ability to carry out a comprehensive genetic analysis of an organism becomes more limited and difficult as the complexity of the organism increases because complex organisms are likely to have not only more genes than simple organisms but also more elaborate networks of interactions among those genes. The development of technologies to systematically disrupt protein networks at the genomic scale would greatly accelerate the comprehensive understanding of the cell as molecular machinery. Intracellular antibodies (intrabodies) can be targeted to different intracellular compartments to specifically interfere with function of selected intracellular gene products in mammalian cells. This technique should prove important for studies of mammalian cells, where genetic approaches are more difficult. In the context of large-scale protein interaction mapping projects, intracellular antibodies (ICAbs) promise to be an important tool to knocking out protein function inside the cell. In this context, however, the need for speed and high throughput requires the development of simple and robust methods to derive antibodies which function within cells, without the need for optimization of each individual ICAb. The successful inhibition of biological processes by intrabodies has been demonstrated in a number of different cells. The performance of antibodies that are intracellularly expressed is, however, somewhat unpredictable, because the reducing environment of the cell cytoplasm in which they are forced to work prevents some antibodies, but not others, to fold properly. For this reason, we have developed an in vivo selection procedure named Intracellular Antibody Capture Technology (IACT) that allows the isolation of functional intrabodies. The IAC technology has been used for the rapid identification of antigen-antibody pairs in intracellular compartments and for the in vivo identification of epitopes recognized by the selected intracellular antibodies. Several optimizations of the IAC technology for protein knock-out have been developed so far. This system offers a powerful and versatile proteomic tool to dissect diverse functional properties of cellular proteins in different cell lines.  相似文献   

19.
MOTIVATION: Reconstructing evolutionary trees is an important problem in biology. A response to the computational intractability of most of the traditional criteria for inferring evolutionary trees has been a focus on new criteria, particularly quartet-based methods that seek to merge trees derived on subsets of four species from a given species-set into a tree for that entire set. Unfortunately, most of these methods are very sensitive to errors in the reconstruction of the trees for individual quartets of species. A recently developed technique called quartet cleaning can alleviate this difficulty in certain cases by using redundant information in the complete set of quartet topologies for a given species-set to correct such errors. RESULTS: In this paper, we describe two new local vertex quartet cleaning algorithms which have optimal time complexity and error-correction bound, respectively. These are the first known local vertex quartet cleaning algorithms that are optimal with respect to either of these attributes.  相似文献   

20.
Biclustering algorithms for biological data analysis: a survey   总被引:7,自引:0,他引:7  
A large number of clustering approaches have been proposed for the analysis of gene expression data obtained from microarray experiments. However, the results from the application of standard clustering methods to genes are limited. This limitation is imposed by the existence of a number of experimental conditions where the activity of genes is uncorrelated. A similar limitation exists when clustering of conditions is performed. For this reason, a number of algorithms that perform simultaneous clustering on the row and column dimensions of the data matrix has been proposed. The goal is to find submatrices, that is, subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated activities for every condition. In this paper, we refer to this class of algorithms as biclustering. Biclustering is also referred in the literature as coclustering and direct clustering, among others names, and has also been used in fields such as information retrieval and data mining. In this comprehensive survey, we analyze a large number of existing approaches to biclustering, and classify them in accordance with the type of biclusters they can find, the patterns of biclusters that are discovered, the methods used to perform the search, the approaches used to evaluate the solution, and the target applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号