共查询到20条相似文献,搜索用时 15 毫秒
1.
Anup Parikh Eryong Huang Christopher Dinh Blaz Zupan Adam Kuspa Devika Subramanian Gad Shaulsky 《BMC bioinformatics》2010,11(1):163
Background
Identifying candidate genes in genetic networks is important for understanding regulation and biological function. Large gene expression datasets contain relevant information about genetic networks, but mining the data is not a trivial task. Algorithms that infer Bayesian networks from expression data are powerful tools for learning complex genetic networks, since they can incorporate prior knowledge and uncover higher-order dependencies among genes. However, these algorithms are computationally demanding, so novel techniques that allow targeted exploration for discovering new members of known pathways are essential. 相似文献2.
3.
Background
Accurate inference of genetic discontinuities between populations is an essential component of intraspecific biodiversity and evolution studies, as well as associative genetics. The most widely-used methods to infer population structure are model-based, Bayesian MCMC procedures that minimize Hardy-Weinberg and linkage disequilibrium within subpopulations. These methods are useful, but suffer from large computational requirements and a dependence on modeling assumptions that may not be met in real data sets. Here we describe the development of a new approach, PCO-MC, which couples principal coordinate analysis to a clustering procedure for the inference of population structure from multilocus genotype data.Methodology/Principal Findings
PCO-MC uses data from all principal coordinate axes simultaneously to calculate a multidimensional “density landscape”, from which the number of subpopulations, and the membership within subpopulations, is determined using a valley-seeking algorithm. Using extensive simulations, we show that this approach outperforms a Bayesian MCMC procedure when many loci (e.g. 100) are sampled, but that the Bayesian procedure is marginally superior with few loci (e.g. 10). When presented with sufficient data, PCO-MC accurately delineated subpopulations with population Fst values as low as 0.03 (G''st>0.2), whereas the limit of resolution of the Bayesian approach was Fst = 0.05 (G''st>0.35).Conclusions/Significance
We draw a distinction between population structure inference for describing biodiversity as opposed to Type I error control in associative genetics. We suggest that discrete assignments, like those produced by PCO-MC, are appropriate for circumscribing units of biodiversity whereas expression of population structure as a continuous variable is more useful for case-control correction in structured association studies. 相似文献4.
Inference of gene pathways using mixture Bayesian networks 总被引:1,自引:0,他引:1
Background
Inference of gene networks typically relies on measurements across a wide range of conditions or treatments. Although one network structure is predicted, the relationship between genes could vary across conditions. A comprehensive approach to infer general and condition-dependent gene networks was evaluated. This approach integrated Bayesian network and Gaussian mixture models to describe continuous microarray gene expression measurements, and three gene networks were predicted. 相似文献5.
Background
The inference of the hidden structure of a population is an essential issue in population genetics. Recently, several methods have been proposed to infer population structure in population genetics.Methods
In this study, a new method to infer the number of clusters and to assign individuals to the inferred populations is proposed. This approach does not make any assumption on Hardy-Weinberg and linkage equilibrium. The implemented criterion is the maximisation (via a simulated annealing algorithm) of the averaged genetic distance between a predefined number of clusters. The performance of this method is compared with two Bayesian approaches: STRUCTURE and BAPS, using simulated data and also a real human data set.Results
The simulations show that with a reduced number of markers, BAPS overestimates the number of clusters and presents a reduced proportion of correct groupings. The accuracy of the new method is approximately the same as for STRUCTURE. Also, in Hardy-Weinberg and linkage disequilibrium cases, BAPS performs incorrectly. In these situations, STRUCTURE and the new method show an equivalent behaviour with respect to the number of inferred clusters, although the proportion of correct groupings is slightly better with the new method. Re-establishing equilibrium with the randomisation procedures improves the precision of the Bayesian approaches. All methods have a good precision for FST ≥ 0.03, but only STRUCTURE estimates the correct number of clusters for FST as low as 0.01. In situations with a high number of clusters or a more complex population structure, MGD performs better than STRUCTURE and BAPS. The results for a human data set analysed with the new method are congruent with the geographical regions previously found.Conclusion
This new method used to infer the hidden structure in a population, based on the maximisation of the genetic distance and not taking into consideration any assumption about Hardy-Weinberg and linkage equilibrium, performs well under different simulated scenarios and with real data. Therefore, it could be a useful tool to determine genetically homogeneous groups, especially in those situations where the number of clusters is high, with complex population structure and where Hardy-Weinberg and/or linkage equilibrium are present. 相似文献6.
Background
One of main aims of Molecular Biology is the gain of knowledge about how molecular components interact each other and to understand gene function regulations. Using microarray technology, it is possible to extract measurements of thousands of genes into a single analysis step having a picture of the cell gene expression. Several methods have been developed to infer gene networks from steady-state data, much less literature is produced about time-course data, so the development of algorithms to infer gene networks from time-series measurements is a current challenge into bioinformatics research area. In order to detect dependencies between genes at different time delays, we propose an approach to infer gene regulatory networks from time-series measurements starting from a well known algorithm based on information theory. 相似文献7.
Background
The statistical modeling of biomedical corpora could yield integrated, coarse-to-fine views of biological phenomena that complement discoveries made from analysis of molecular sequence and profiling data. Here, the potential of such modeling is demonstrated by examining the 5,225 free-text items in the Caenorhabditis Genetic Center (CGC) Bibliography using techniques from statistical information retrieval. Items in the CGC biomedical text corpus were modeled using the Latent Dirichlet Allocation (LDA) model. LDA is a hierarchical Bayesian model which represents a document as a random mixture over latent topics; each topic is characterized by a distribution over words. 相似文献8.
Fernando Alda Tania Gaitero Mónica Suárez Tomás Merchán Gregorio Rocha Ignacio Doadrio 《BMC evolutionary biology》2010,10(1):347
Background
Rabbit haemorrhagic disease virus (RHDV) is a highly virulent calicivirus, first described in domestic rabbits in China in 1984. RHDV appears to be a mutant form of a benign virus that existed in Europe long before the first outbreak. In the Iberian Peninsula, the first epidemic in 1988 severely reduced the populations of autochthonous European wild rabbit. To examine the evolutionary history of RHDV in the Iberian Peninsula, we collected virus samples from wild rabbits and sequenced a fragment of the capsid protein gene VP60. These data together with available sequences from other Western European countries, were analyzed following Bayesian Markov chain Monte Carlo methods to infer their phylogenetic relationships, evolutionary rates and demographic history. 相似文献9.
Background
Evolutionary biologists have so far largely treated the testis as a black box with a certain size, a matching resource demand and a resulting sperm output. A better understanding of the way that the testis responds to selection may come from recent developments in theoretical biology aimed at understanding the factors that influence the evolution of tissue architecture (i.e. the logical organisation of a tissue). Here we perform a comparative analysis of aspects of testicular architecture of the fruit fly family Drosophilidae. Specifically, we collect published information on the number of first (or primary) spermatocytes in spermatogenesis, which allows to infer an important aspect of testicular architecture. 相似文献10.
11.
Background
MTML-msBayes uses hierarchical approximate Bayesian computation (HABC) under a coalescent model to infer temporal patterns of divergence and gene flow across codistributed taxon-pairs. Under a model of multiple codistributed taxa that diverge into taxon-pairs with subsequent gene flow or isolation, one can estimate hyper-parameters that quantify the mean and variability in divergence times or test models of migration and isolation. The software uses multi-locus DNA sequence data collected from multiple taxon-pairs and allows variation across taxa in demographic parameters as well as heterogeneity in DNA mutation rates across loci. The method also allows a flexible sampling scheme: different numbers of loci of varying length can be sampled from different taxon-pairs. 相似文献12.
Background
Genetic disease studies investigate relationships between changes in chromosomes and genetic diseases. Single haplotypes provide useful information for these studies but extracting single haplotypes directly by biochemical methods is expensive. A computational method to infer haplotypes from genotype data is therefore important. We investigate the problem of computing the minimum number of recombination events for general pedigrees with a small number of sites for all members. 相似文献13.
Background
Microarray technology is increasingly used to identify potential biomarkers for cancer prognostics and diagnostics. Previously, we have developed the iterative Bayesian Model Averaging (BMA) algorithm for use in classification. Here, we extend the iterative BMA algorithm for application to survival analysis on high-dimensional microarray data. The main goal in applying survival analysis to microarray data is to determine a highly predictive model of patients' time to event (such as death, relapse, or metastasis) using a small number of selected genes. Our multivariate procedure combines the effectiveness of multiple contending models by calculating the weighted average of their posterior probability distributions. Our results demonstrate that our iterative BMA algorithm for survival analysis achieves high prediction accuracy while consistently selecting a small and cost-effective number of predictor genes. 相似文献14.
Molecular evolutionary rates predict both extinction and speciation in temperate angiosperm lineages
Lesley T Lancaster 《BMC evolutionary biology》2010,10(1):162
Background
A positive relationship between diversification (i.e., speciation) and nucleotide substitution rates is commonly reported for angiosperm clades. However, the underlying cause of this relationship is often unknown because multiple intrinsic and extrinsic factors can affect the relationship, and these have confounded previous attempts infer causation. Determining which factor drives this oft-reported correlation can lend insight into the macroevolutionary process. 相似文献15.
Background
Protein domains can be viewed as portable units of biological function that defines the functional properties of proteins. Therefore, if a protein is associated with a disease, protein domains might also be associated and define disease endophenotypes. However, knowledge about such domain-disease relationships is rarely available. Thus, identification of domains associated with human diseases would greatly improve our understandingof the mechanism of human complex diseases and further improve the prevention, diagnosis and treatment of these diseases.Methods
Based on phenotypic similarities among diseases, we first group diseases into overlapping modules. We then develop a framework to infer associations between domains and diseases through known relationships between diseases and modules, domains and proteins, as well as proteins and disease modules. Different methods including Association, Maximum likelihood estimation (MLE), Domain-disease pair exclusion analysis (DPEA), Bayesian, and Parsimonious explanation (PE) approaches are developed to predict domain-disease associations.Results
We demonstrate the effectiveness of all the five approaches via a series of validation experiments, and show the robustness of the MLE, Bayesian and PE approaches to the involved parameters. We also study the effects of disease modularization in inferring novel domain-disease associations. Through validation, the AUC (Area Under the operating characteristic Curve) scores for Bayesian, MLE, DPEA, PE, and Association approaches are 0.86, 0.84, 0.83, 0.83 and 0.79, respectively, indicating the usefulness of these approaches for predicting domain-disease relationships. Finally, we choose the Bayesian approach to infer domains associated with two common diseases, Crohn’s disease and type 2 diabetes.Conclusions
The Bayesian approach has the best performance for the inference of domain-disease relationships. The predicted landscape between domains and diseases provides a more detailed view about the disease mechanisms.16.
Background
The primary objective of this study is to reconstruct the phylogeny of the hentzi species group and sister species in the North American tarantula genus, Aphonopelma, using a set of mitochondrial DNA markers that include the animal “barcoding gene”. An mtDNA genealogy is used to consider questions regarding species boundary delimitation and to evaluate timing of divergence to infer historical biogeographic events that played a role in shaping the present-day diversity and distribution. We aimed to identify potential refugial locations, directionality of range expansion, and test whether A. hentzi post-glacial expansion fit a predicted time frame.Methods and Findings
A Bayesian phylogenetic approach was used to analyze a 2051 base pair (bp) mtDNA data matrix comprising aligned fragments of the gene regions CO1 (1165 bp) and ND1-16S (886 bp). Multiple species delimitation techniques (DNA tree-based methods, a “barcode gap” using percent of pairwise sequence divergence (uncorrected p-distances), and the GMYC method) consistently recognized a number of divergent and genealogically exclusive groups.Conclusions
The use of numerous species delimitation methods, in concert, provide an effective approach to dissecting species boundaries in this spider group; as well they seem to provide strong evidence for a number of nominal, previously undiscovered, and cryptic species. Our data also indicate that Pleistocene habitat fragmentation and subsequent range expansion events may have shaped contemporary phylogeographic patterns of Aphonopelma diversity in the southwestern United States, particularly for the A. hentzi species group. These findings indicate that future species delimitation approaches need to be analyzed in context of a number of factors, such as the sampling distribution, loci used, biogeographic history, breadth of morphological variation, ecological factors, and behavioral data, to make truly integrative decisions about what constitutes an evolutionary lineage recognized as a “species”. 相似文献17.
18.
19.