共查询到20条相似文献,搜索用时 7 毫秒
1.
Isabel A Nepomuceno-Chamorro Jesus S Aguilar-Ruiz Jose C Riquelme 《BMC bioinformatics》2010,11(1):517
Background
Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. 相似文献2.
Wiuf C 《Journal of mathematical biology》2003,46(3):241-264
Inference about population history from DNA sequence data has become increasingly popular. For human populations, questions about whether a population has been expanding and when expansion began are often the focus of attention. For viral populations, questions about the epidemiological history of a virus, e.g., HIV-1 and Hepatitis C, are often of interest. In this paper I address the following question: Can population history be accurately inferred from single locus DNA data? An idealised world is considered in which the tree relating a sample of n non-recombining and selectively neutral DNA sequences is observed, rather than just the sequences themselves. This approach provides an upper limit to the information that possibly can be extracted from a sample. It is shown, based on Kingman's (1982a) coalescent process, that consistent estimation of parameters describing population history (e.g., a growth rate) cannot be achieved for increasing sample size, n. This is worse than often found for estimators of genetic parameters, e.g., the mutation rate typically converges at rate \(\) under the assumption that all historical mutations can be observed in the sample. In addition, various results for the distribution of maximum likelihood estimators are presented. 相似文献
3.
Background
Machine-learning tools have gained considerable attention during the last few years for analyzing biological networks for protein function prediction. Kernel methods are suitable for learning from graph-based data such as biological networks, as they only require the abstraction of the similarities between objects into the kernel matrix. One key issue in kernel methods is the selection of a good kernel function. Diffusion kernels, the discretization of the familiar Gaussian kernel of Euclidean space, are commonly used for graph-based data. 相似文献4.
Leventhal GE Kouyos R Stadler T Wyl Vv Yerly S Böni J Cellerai C Klimkait T Günthard HF Bonhoeffer S 《PLoS computational biology》2012,8(3):e1002413
Contact structure is believed to have a large impact on epidemic spreading and consequently using networks to model such contact structure continues to gain interest in epidemiology. However, detailed knowledge of the exact contact structure underlying real epidemics is limited. Here we address the question whether the structure of the contact network leaves a detectable genetic fingerprint in the pathogen population. To this end we compare phylogenies generated by disease outbreaks in simulated populations with different types of contact networks. We find that the shape of these phylogenies strongly depends on contact structure. In particular, measures of tree imbalance allow us to quantify to what extent the contact structure underlying an epidemic deviates from a null model contact network and illustrate this in the case of random mixing. Using a phylogeny from the Swiss HIV epidemic, we show that this epidemic has a significantly more unbalanced tree than would be expected from random mixing. 相似文献
5.
Martin Lott Andreas Spillner Katharina T Huber Anna Petri Bengt Oxelman Vincent Moulton 《BMC evolutionary biology》2009,9(1):216
Background
Gene trees that arise in the context of reconstructing the evolutionary history of polyploid species are often multiply-labeled, that is, the same leaf label can occur several times in a single tree. This property considerably complicates the task of forming a consensus of a collection of such trees compared to usual phylogenetic trees. 相似文献6.
Marcussen T Jakobsen KS Danihelka J Ballard HE Blaxland K Brysting AK Oxelman B 《Systematic biology》2012,61(1):107-126
The phylogenies of allopolyploids take the shape of networks and cannot be adequately represented as bifurcating trees. Especially for high polyploids (i.e., organisms with more than six sets of nuclear chromosomes), the signatures of gene homoeolog loss, deep coalescence, and polyploidy may become confounded, with the result that gene trees may be congruent with more than one species network. Herein, we obtained the most parsimonious species network by objective comparison of competing scenarios involving polyploidization and homoeolog loss in a high-polyploid lineage of violets (Viola, Violaceae) mostly or entirely restricted to North America, Central America, or Hawaii. We amplified homoeologs of the low-copy nuclear gene, glucose-6-phosphate isomerase (GPI), by single-molecule polymerase chain reaction (PCR) and the chloroplast trnL-F region by conventional PCR for 51 species and subspecies. Topological incongruence among GPI homoeolog subclades, owing to deep coalescence and two instances of putative loss (or lack of detection) of homoeologs, were reconciled by applying the maximum tree topology for each subclade. The most parsimonious species network and the fossil-based calibration of the homoeolog tree favored monophyly of the high polyploids, which has resulted from allodecaploidization 9-14 Ma, involving sympatric ancestors from the extant Viola sections Chamaemelanium (diploid), Plagiostigma (paleotetraploid), and Viola (paleotetraploid). Although two of the high-polyploid lineages (Boreali-Americanae, Pedatae) remained decaploid, recurrent polyploidization with tetraploids of section Plagiostigma within the last 5 Ma has resulted in two 14-ploid lineages (Mexicanae, Nosphinium) and one 18-ploid lineage (Langsdorffianae). This implies a more complex phylogenetic and biogeographic origin of the Hawaiian violets (Nosphinium) than that previously inferred from rDNA data and illustrates the necessity of considering polyploidy in phylogenetic and biogeographic reconstruction. 相似文献
7.
Inferring species membership using DNA sequences with back-propagation neural networks 总被引:4,自引:0,他引:4
DNA barcoding as a method for species identification is rapidly increasing in popularity. However, there are still relatively few rigorous methodological tests of DNA barcoding. Current distance-based methods are frequently criticized for treating the nearest neighbor as the closest relative via a raw similarity score, lacking an objective set of criteria to delineate taxa, or for being incongruent with classical character-based taxonomy. Here, we propose an artificial intelligence-based approach - inferring species membership via DNA barcoding with back-propagation neural networks (named BP-based species identification) - as a new advance to the spectrum of available methods. We demonstrate the value of this approach with simulated data sets representing different levels of sequence variation under coalescent simulations with various evolutionary models, as well as with two empirical data sets of COI sequences from East Asian ground beetles (Carabidae) and Costa Rican skipper butterflies. With a 630-to 690-bp fragment of the COI gene, we identified 97.50% of 80 unknown sequences of ground beetles, 95.63%, 96.10%, and 100% of 275, 205, and 9 unknown sequences of the neotropical skipper butterfly to their correct species, respectively. Our simulation studies indicate that the success rates of species identification depend on the divergence of sequences, the length of sequences, and the number of reference sequences. Particularly in cases involving incomplete lineage sorting, this new BP-based method appears to be superior to commonly used methods for DNA-based species identification. 相似文献
8.
Background
Modern approaches to treating genetic disorders, cancers and even epidemics rely on a detailed understanding of the underlying gene signaling network. Previous work has used time series microarray data to infer gene signaling networks given a large number of accurate time series samples. Microarray data available for many biological experiments is limited to a small number of arrays with little or no time series guarantees. When several samples are averaged to examine differences in mean value between a diseased and normal state, information from individual samples that could indicate a gene relationship can be lost.Results
Asynchronous Inference of Regulatory Networks (AIRnet) provides gene signaling network inference using more practical assumptions about the microarray data. By learning correlation patterns for the changes in microarray values from all pairs of samples, accurate network reconstructions can be performed with data that is normally available in microarray experiments.Conclusions
By focussing on the changes between microarray samples, instead of absolute values, increased information can be gleaned from expression data.9.
Korbinian Strimmer Andrew Rambaut 《Proceedings. Biological sciences / The Royal Society》2002,269(1487):137-142
The problem of inferring confidence sets of gene trees is discussed without assuming that the substitution model or the branching pattern of any of the investigated trees is correct. In this case, widely used methods to compare genealogies can give highly contradicting results. Here, three methods to infer confidence sets that are robust against model misspecification are compared, including a new approach based on estimating the confidence in a specific tree using expected-likelihood weights. The power of the investigated methods is studied by analysing HIV-1 and mtDNA sequence data as well as simulated sequences. Finally, guidelines for choosing an appropriate method to compare multiple gene trees are provided. 相似文献
10.
A major goal in the study of complex traits is to decipher the causal interrelationships among correlated phenotypes. Current methods mostly yield undirected networks that connect phenotypes without causal orientation. Some of these connections may be spurious due to partial correlation that is not causal. We show how to build causal direction into an undirected network of phenotypes by including causal QTL for each phenotype. We evaluate causal direction for each edge connecting two phenotypes, using a LOD score. This new approach can be applied to many different population structures, including inbred and outbred crosses as well as natural populations, and can accommodate feedback loops. We assess its performance in simulation studies and show that our method recovers network edges and infers causal direction correctly at a high rate. Finally, we illustrate our method with an example involving gene expression and metabolite traits from experimental crosses. 相似文献
11.
12.
13.
An approach is presented for computing meaningful pathways in the network of small molecule metabolism comprising the chemical reactions characterized in all organisms. The metabolic network is described as a weighted graph in which all the compounds are included, but each compound is assigned a weight equal to the number of reactions in which it participates. Path finding is performed in this graph by searching for one or more paths with lowest weight. Performance is evaluated systematically by computing paths between the first and last reactions in annotated metabolic pathways, and comparing the intermediate reactions in the computed pathways to those in the annotated ones. For the sake of comparison, paths are computed also in the un-weighted raw (all compounds and reactions) and filtered (highly connected pool metabolites removed) metabolic graphs, respectively. The correspondence between the computed and annotated pathways is very poor (<30%) in the raw graph; increasing to approximately 65% in the filtered graph; reaching approximately 85% in the weighted graph. Considering the best-matching path among the five lightest paths increases the correspondence to 92%, on average. We then show that the average distance between pairs of metabolites is significantly larger in the weighted graph than in the raw unfiltered graph, suggesting that the small-world properties previously reported for metabolic networks probably result from irrelevant shortcuts through pool metabolites. In addition, we provide evidence that the length of the shortest path in the weighted graph represents a valid measure of the "metabolic distance" between enzymes. We suggest that the success of our simplistic approach is rooted in the high degree of specificity of the reactions in metabolic pathways, presumably reflecting thermodynamic constraints operating in these pathways. We expect our approach to find useful applications in inferring metabolic pathways in newly sequenced genomes. 相似文献
15.
Noman N Iba H 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2007,4(4):634-647
We present a memetic algorithm for evolving the structure of biomolecular interactions and inferring the effective kinetic parameters from the time series data of gene expression using the decoupled Ssystem formalism. We propose an Information Criteria based fitness evaluation for gene network model selection instead of the conventional Mean Squared Error (MSE) based fitness evaluation. A hill-climbing local-search method has been incorporated in our evolutionary algorithm for efficiently attaining the skeletal architecture which is most frequently observed in biological networks. The suitability of the method is tested in gene circuit reconstruction experiments, varying the network dimension and/or characteristics, the amount of gene expression data used for inference and the noise level present in expression profiles. The reconstruction method inferred the network topology and the regulatory parameters with high accuracy. Nevertheless, the performance is limited to the amount of expression data used and the noise level present in the data. The proposed fitness function has been found more suitable for identifying correct network topology and for estimating the accurate parameter values compared to the existing ones. Finally, we applied the methodology for analyzing the cell-cycle gene expression data of budding yeast and reconstructed the network of some key regulators. 相似文献
16.
Recently a state-space model with time delays for inferring gene regulatory networks was proposed. It was assumed that each regulation between two internal state variables had multiple time delays. This assumption caused underestimation of the model with many current gene expression datasets. In biological reality, one regulatory relationship may have just a single time delay, and not multiple time delays. This study employs Boolean variables to capture the existence of the time-delayed regulatory relationships in gene regulatory networks in terms of the state-space model. As the solution space of time delayed relationships is too large for an exhaustive search, a genetic algorithm (GA) is proposed to determine the optimal Boolean variables (the optimal time-delayed regulatory relationships). Coupled with the proposed GA, Bayesian information criterion (BIC) and probabilistic principle component analysis (PPCA) are employed to infer gene regulatory networks with time delays. Computational experiments are performed on two real gene expression datasets. The results show that the GA is effective at finding time-delayed regulatory relationships. Moreover, the inferred gene regulatory networks with time delays from the datasets improve the prediction accuracy and possess more of the expected properties of a real network, compared to a gene regulatory network without time delays. 相似文献
17.
18.
MOTIVATION: Microarray gene expression data has increasingly become the common data source that can provide insights into biological processes at a system-wide level. One of the major problems with microarrays is that a dataset consists of relatively few time points with respect to a large number of genes, which makes the problem of inferring gene regulatory network an ill-posed one. On the other hand, gene expression data generated by different groups worldwide are increasingly accumulated on many species and can be accessed from public databases or individual websites, although each experiment has only a limited number of time-points. RESULTS: This paper proposes a novel method to combine multiple time-course microarray datasets from different conditions for inferring gene regulatory networks. The proposed method is called GNR (Gene Network Reconstruction tool) which is based on linear programming and a decomposition procedure. The method theoretically ensures the derivation of the most consistent network structure with respect to all of the datasets, thereby not only significantly alleviating the problem of data scarcity but also remarkably improving the prediction reliability. We tested GNR using both simulated data and experimental data in yeast and Arabidopsis. The result demonstrates the effectiveness of GNR in terms of predicting new gene regulatory relationship in yeast and Arabidopsis. AVAILABILITY: The software is available from http://zhangorup.aporc.org/bioinfo/grninfer/, http://digbio.missouri.edu/grninfer/ and http://intelligent.eic.osaka-sandai.ac.jp or upon request from the authors. 相似文献
19.
Robert G. Beiko 《Biology & philosophy》2010,25(4):659-673
Frequent lateral genetic transfer undermines the existence of a unique “tree of life” that relates all organisms. Vertical
inheritance is nonetheless of vital interest in the study of microbial evolution, and knowing the “tree of cells” can yield
insights into ecological continuity, the rates of change of different cellular characters, and the evolutionary plasticity
of genomes. Notwithstanding within-species recombination, the relationships most frequently recovered from genomic data at
shallow to moderate taxonomic depths are likely to reflect cellular inheritance. At the same time, it is clear that several
types of ‘average signals’ from whole genomes can be highly misleading, and the existence of a central tendency must not be
taken as prima facie evidence of vertical descent. Phylogenetic networks offer an attractive solution, since they can be formulated in ways that
mitigate the misleading aspects of hybrid evolutionary signals in genomes. But the connections in a network typically show
genetic relatedness without distinguishing between vertical and lateral inheritance of genetic material. The solution may
lie in a compromise between strict tree-thinking and network paradigms: build a phylogenetic network, but identify the set
of connections in the network that are potentially due to vertical descent. Even if a single tree cannot be unambiguously
identified, choosing a subnetwork of putative vertical connections can still lead to drastic reductions in the set of candidate
vertical hypotheses. 相似文献