首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Landscape genetics, an emerging field integrating landscape ecology and population genetics, has great potential to influence our understanding of habitat connectivity and distribution of organisms. Whereas typical population genetics studies summarize gene flow as pairwise measures between sampling localities, landscape characteristics that influence population genetic connectivity are often continuously distributed in space. Thus, there are currently gaps in both the ability to analyze genotypic data in a continuous spatial context and our knowledge of expected of landscape genetic structure under varying conditions. We present a framework for generating continuous “genetic surfaces”, evaluate their statistical properties, and quantify statistical behavior of landscape genetic structure in a simple landscape. We simulated microsatellite genotypes under varying parameters (time since vicariance, migration, effective population size) and used ancestry (q) values from STRUCTURE to interpolate a genetic surface. Using a spatially adjusted Pearson's correlation coefficient to test the significance of landscape variable(s) on genetic structure we were able to detect landscape genetic structure on a contemporary time scale (≥5 generations post vicariance, migration probability ≤0.10) even when population differentiation was minimal (FST≥0.00015). We show that genetic variation can be significantly correlated with geographic distance even when genetic structure is due to landscape variable(s), demonstrating the importance of testing landscape influence on genetic structure. Finally, we apply genetic surfacing to analyze an empirical dataset of black bears from northern Idaho USA. We find black bear genetic variation is a function of distance (autocorrelation) and habitat patch (spatial dependency), consistent with previous results indicating genetic variation was influenced by landscape by resistance. These results suggest genetic surfaces can be used to test competing hypotheses of the influence of landscape characteristics on genetic structure without delineation of categorical groups.  相似文献   



Since the publication of the first draft of the human genome in 2000, bioinformatic data have been accumulating at an overwhelming pace. Currently, more than 3 million sequences and 35 thousand structures of proteins and nucleic acids are available in public databases. Finding correlations in and between these data to answer critical research questions is extremely challenging. This problem needs to be approached from several directions: information science to organize and search the data; information visualization to assist in recognizing correlations; mathematics to formulate statistical inferences; and biology to analyze chemical and physical properties in terms of sequence and structure changes.  相似文献   

Phylogeny reconstruction is a difficult computational problem, because the number of possible solutions increases with the number of included taxa. For example, for only 14 taxa, there are more than seven trillion possible unrooted phylogenetic trees. For this reason, phylogenetic inference methods commonly use clustering algorithms (e.g., the neighbor-joining method) or heuristic search strategies to minimize the amount of time spent evaluating nonoptimal trees. Even heuristic searches can be painfully slow, especially when computationally intensive optimality criteria such as maximum likelihood are used. I describe here a different approach to heuristic searching (using a genetic algorithm) that can tremendously reduce the time required for maximum-likelihood phylogenetic inference, especially for data sets involving large numbers of taxa. Genetic algorithms are simulations of natural selection in which individuals are encoded solutions to the problem of interest. Here, labeled phylogenetic trees are the individuals, and differential reproduction is effected by allowing the number of offspring produced by each individual to be proportional to that individual's rank likelihood score. Natural selection increases the average likelihood in the evolving population of phylogenetic trees, and the genetic algorithm is allowed to proceed until the likelihood of the best individual ceases to improve over time. An example is presented involving rbcL sequence data for 55 taxa of green plants. The genetic algorithm described here required only 6% of the computational effort required by a conventional heuristic search using tree bisection/reconnection (TBR) branch swapping to obtain the same maximum-likelihood topology.   相似文献   

Mester D  Ronin Y  Minkov D  Nevo E  Korol A 《Genetics》2003,165(4):2269-2282
This article is devoted to the problem of ordering in linkage groups with many dozens or even hundreds of markers. The ordering problem belongs to the field of discrete optimization on a set of all possible orders, amounting to n!/2 for n loci; hence it is considered an NP-hard problem. Several authors attempted to employ the methods developed in the well-known traveling salesman problem (TSP) for multilocus ordering, using the assumption that for a set of linked loci the true order will be the one that minimizes the total length of the linkage group. A novel, fast, and reliable algorithm developed for the TSP and based on evolution-strategy discrete optimization was applied in this study for multilocus ordering on the basis of pairwise recombination frequencies. The quality of derived maps under various complications (dominant vs. codominant markers, marker misclassification, negative and positive interference, and missing data) was analyzed using simulated data with approximately 50-400 markers. High performance of the employed algorithm allows systematic treatment of the problem of verification of the obtained multilocus orders on the basis of computing-intensive bootstrap and/or jackknife approaches for detecting and removing questionable marker scores, thereby stabilizing the resulting maps. Parallel calculation technology can easily be adopted for further acceleration of the proposed algorithm. Real data analysis (on maize chromosome 1 with 230 markers) is provided to illustrate the proposed methodology.  相似文献   


Quantitative dynamical models facilitate the understanding of biological processes and the prediction of their dynamics. These models usually comprise unknown parameters, which have to be inferred from experimental data. For quantitative experimental data, there are several methods and software tools available. However, for qualitative data the available approaches are limited and computationally demanding. Here, we consider the optimal scaling method which has been developed in statistics for categorical data and has been applied to dynamical systems. This approach turns qualitative variables into quantitative ones, accounting for constraints on their relation. We derive a reduced formulation for the optimization problem defining the optimal scaling. The reduced formulation possesses the same optimal points as the established formulation but requires less degrees of freedom. Parameter estimation for dynamical models of cellular pathways revealed that the reduced formulation improves the robustness and convergence of optimizers. This resulted in substantially reduced computation times. We implemented the proposed approach in the open-source Python Parameter EStimation TOolbox (pyPESTO) to facilitate reuse and extension. The proposed approach enables efficient parameterization of quantitative dynamical models using qualitative data.


In a previous paper (Klotz et a1., 1979) we described a method for determining evolutionary trees from sequence data when rates of evolution of the sequences might differ greatly. It was shown theoretically that the method always gave the correct topology and root when the exact number of mutation differences between sequences and from their common ancestor was known. However, the method is impractical to use in most situations because it requires some knowledge of the ancestor. In this present paper we describe another method, related to the previous one, in which a present-day sequence can serve temporarily as an ancestor for purposes of determining the evolutionary tree regardless of the rates of evolution of the sequences involved. This new method can be carried out with high precision without the aid of a computer, and it does not increase in difficulty rapidly as the number of sequences involved in the study increases, unlike other methods.  相似文献   

proseq is an integrated user‐friendly windows based program for convenient sequence editing and evolutionary analysis. It is designed to simplify preparation and analysis of DNA sequence data sets in population genetic, phylogenetic and molecular ecology studies. Sequence editor features include editing of chromatogram files, contig assembly, sequence alignment, translation and other utilities. Analysis features include calculation of genetic diversity, divergence, population subdivision and gene flow with permutation‐based significance testing and various tests of neutrality. A tool for coalescent simulations implements models with intragenic recombination, population subdivision and population growth.  相似文献   

Johnson DS  Hoeting JA 《Biometrics》2003,59(2):341-350
In this article, we incorporate an autoregressive time-series framework into models for animal survival using capture-recapture data. Researchers modeling animal survival probabilities as the realization of a random process have typically considered survival to be independent from one time period to the next. This may not be realistic for some populations. Using a Gibbs sampling approach, we can estimate covariate coefficients and autoregressive parameters for survival models. The procedure is illustrated with a waterfowl band recovery dataset for northern pintails (Anas acuta). The analysis shows that the second lag autoregressive coefficient is significantly less than 0, suggesting that there is a triennial relationship between survival probabilities and emphasizing that modeling survival rates as independent random variables may be unrealistic in some cases. Software to implement the methodology is available at no charge on the Internet.  相似文献   

SUMMARY: Inferring genetic network architecture from time series data generated from high-throughput experimental technologies, such as cDNA microarray, can help us to understand the system behavior of living organisms. We have developed an interactive tool, GeneNetwork, which provides four reverse engineering models and three data interpolation approaches to infer relationships between genes. GeneNetwork enables a user to readily reconstruct genetic networks based on microarray data without having intimate knowledge of the mathematical models. A simple graphical user interface enables rapid, intuitive mapping and analysis of the reconstructed network allowing biologists to explore gene relationships at the system level. AVAILABILITY: Download from http://genenetwork.sbl.bc.sinica.edu.tw/. SUPPLEMENTARY INFORMATION: Supplement documentation of algorithms for the four approaches is downloadable at the above location.  相似文献   

Cytokinins are ubiquitous plant hormones; their signal is perceived by sensor histidine kinases—cytokinin receptors. This review focuses on recent advances on cytokinin receptor structure, in particular sensing module and adjacent domains which play an important role in hormone recognition, signal transduction and receptor subcellular localization. Principles of cytokinin binding site organization and point mutations affecting signaling are discussed. To date, more than 100 putative cytokinin receptor genes from different plant species were revealed due to the total genome sequencing. This allowed us to employ an evolutionary and bioinformatics approaches to clarify some new aspects of receptor structure and function. Non-transmembrane areas adjacent to the ligand-binding CHASE domain were characterized in detail and new conserved protein motifs were recovered. Putative mechanisms for cytokinin-triggered receptor activation were suggested.  相似文献   

Debate exists over how to incorporate information from multipartite sequence data in phylogenetic analyses. Strict combined-data approaches argue for concatenation of all partitions and estimation of one evolutionary history, maximizing the explanatory power of the data. Consensus/independence approaches endorse a two-step procedure where partitions are analyzed independently and then a consensus is determined from the multiple results. Mixtures across the model space of a strict combined-data approach and a priori independent parameters are popular methods to integrate these methods. We propose an alternative middle ground by constructing a Bayesian hierarchical phylogenetic model. Our hierarchical framework enables researchers to pool information across data partitions to improve estimate precision in individual partitions while permitting estimation and testing of tendencies in across-partition quantities. Such across-partition quantities include the distribution from which individual topologies relating the sequences within a partition are drawn. We propose standard hierarchical priors on continuous evolutionary parameters across partitions, while the structure on topologies varies depending on the research problem. We illustrate our model with three examples. We first explore the evolutionary history of the guinea pig (Cavia porcellus) using alignments of 13 mitochondrial genes. The hierarchical model returns substantially more precise continuous parameter estimates than an independent parameter approach without losing the salient features of the data. Second, we analyze the frequency of horizontal gene transfer using 50 prokaryotic genes. We assume an unknown species-level topology and allow individual gene topologies to differ from this with a small estimable probability. Simultaneously inferring the species and individual gene topologies returns a transfer frequency of 17%. We also examine HIV sequences longitudinally sampled from HIV+ patients. We ask whether posttreatment development of CCR5 coreceptor virus represents concerted evolution from middisease CXCR4 virus or reemergence of initial infecting CCR5 virus. The hierarchical model pools partitions from multiple unrelated patients by assuming that the topology for each patient is drawn from a multinomial distribution with unknown probabilities. Preliminary results suggest evolution and not reemergence.  相似文献   

One important aim within systems biology is to integrate disparate pieces of information, leading to discovery of higher-level knowledge about important functionality within living organisms. This makes standards for representation of data and technology for exchange and integration of data important key points for development within the area. In this article, we focus on the recent developments within the field. We compare the recent updates to the three standard representations for exchange of data SBML, PSI MI and BioPAX. In addition, we give an overview of available tools for these three standards and a discussion on how these developments support possibilities for data exchange and integration.  相似文献   

As the field of genomics matures, more complex genotypes and phenotypes are being studied. Fanconi anemia (FA), for example, is an inherited chromosome instability syndrome with a complex array of variable disease phenotypes including congenital malformations, hematological manifestations, and cancer. To better understand specific aspects of the genetic etiology of FA and other rare diseases with complex phenotypes, it is often necessary to reduce the dimensions of the disease phenotype information. Towards this end, we extend a novel non-parametric approach to include information about a hierarchical structure among disease phenotypes. The proposed extension increases information content of the phenotype scores obtained and, thereby, the power of genotype-phenotype relationships studies.  相似文献   

MOTIVATION: Analysis of oligonucleotide array data, especially to select genes of interest, is a highly challenging task because of the large volume of information and various experimental factors. Moreover, interaction effect (i.e. expression changes depend on probe effects) complicates the analysis because current methods often use an additive model to analyze data. We propose an approach to address these issues with the aim of producing a more reliable selection of differentially expressed genes. The approach uses the rank for normalization, employs the percentile-range to measure expression variation, and applies various filters to monitor expression changes. RESULTS: We compare our approach with MAS and Dchip models. A data set from an angiogenesis study is used for illustration. Results show that our approach performs better than other methods either in identification of the positive control gene or in PCR confirmatory tests. In addition, the invariant set of genes in our approach provides an efficient way for normalization.  相似文献   

Evolutionary game dynamics have been proposed as a mathematical framework for the cultural evolution of language and more specifically the evolution of vocabulary. This article discusses a model that is mutually exclusive in its underlying principals with some previously suggested models. The model describes how individuals in a population culturally acquire a vocabulary by actively participating in the acquisition process instead of passively observing and communicate through peer-to-peer interactions instead of vertical parent-offspring relations. Concretely, a notion of social/cultural learning called the naming game is first abstracted using learning theory. This abstraction defines the required cultural transmission mechanism for an evolutionary process. Second, the derived transmission system is expressed in terms of the well-known selection-mutation model defined in the context of evolutionary dynamics. In this way, the analogy between social learning and evolution at the level of meaning-word associations is made explicit. Although only horizontal and oblique transmission structures will be considered, extensions to vertical structures over different genetic generations can easily be incorporated. We provide a number of simplified experiments to clarify our reasoning.  相似文献   

In genetics, many evolutionary pathways can be modeled by the ordered accumulation of permanent changes. Mixture models of mutagenetic trees have been used to describe disease progression in cancer and in HIV. In cancer, progression is modeled by the accumulation of chromosomal gains and losses in tumor cells; in HIV, the accumulation of drug resistance-associated mutations in the viral genome is known to be associated with disease progression. From such evolutionary models, genetic progression scores can be derived that assign measures for the disease state to single patients. Rtreemix is an R package for estimating mixture models of evolutionary pathways from observed cross-sectional data and for estimating associated genetic progression scores. The package also provides extended functionality for estimating confidence intervals for estimated model parameters and for evaluating the stability of the estimated evolutionary mixture models.  相似文献   



Clustering techniques are routinely used in gene expression data analysis to organize the massive data. Clustering techniques arrange a large number of genes or assays into a few clusters while maximizing the intra-cluster similarity and inter-cluster separation. While clustering of genes facilitates learning the functions of un-characterized genes using their association with known genes, clustering of assays reveals the disease stages and subtypes. Many clustering algorithms require the user to specify the number of clusters a priori. A wrong specification of number of clusters generally leads to either failure to detect novel clusters (disease subtypes) or unnecessary splitting of natural clusters.  相似文献   

Models of nucleotide substitution were constructed for combined analyses of heterogeneous sequence data (such as those of multiple genes) from the same set of species. The models account for different aspects of the heterogeneity in the evolutionary process of different genes, such as differences in nucleotide frequencies, in substitution rate bias (for example, the transition/transversion rate bias), and in the extent of rate variation across sites. Model parameters were estimated by maximum likelihood and the likelihood ratio test was used to test hypotheses concerning sequence evolution, such as rate constancy among lineages (the assumption of a molecular clock) and proportionality of branch lengths for different genes. The example data from a segment of the mitochondrial genome of six hominoid species (human, common and pygmy chimpanzees, gorilla, orangutan, and siamang) were analyzed. Nucleotides at the three codon positions in the protein-coding regions and from the tRNA-coding regions were considered heterogeneous data sets. Statistical tests showed that the amount of evolution in the sequence data reflected in the estimated branch lengths can be explained by the codon-position effect and lineage effect of substitution rates. The assumption of a molecular clock could not be rejected when the data were analyzed separately or when the rate variation among sites was ignored. However, significant differences in substitution rate among lineages were found when the data sets were combined and when the rate variation among sites was accounted for in the models. Under the assumption that the orangutan and African apes diverged 13 million years ago, the combined analysis of the sequence data estimated the times for the human-chimpanzee separation and for the separation of the gorilla as 4.3 and 6.8 million years ago, respectively.  相似文献   

In conservation and management of species it is important to make inferences about gene flow, dispersal and population structure. In this study, we used 613 georeferenced tissue samples from hazel grouse (Bonasa bonasia) where each individual was genotyped at 12 microsatellite loci to make inference on population genetic structure, gene flow and dispersal in northern Sweden. Observed levels of genetic diversity suggest that Swedish hazel grouse do not suffer loss of genetic diversity compared with other grouse species. We found significant F(IS) (deviation from Hardy-Weinberg expectations) over the entire sample using jack-knifed estimators over loci, which is most likely explained by a Wahlund effect. With the use of spatial autocorrelation methods, we detected significant isolation by distance among individuals. Neighbourhood size was estimated in the order of 62-158 individuals corresponding to a dispersal distance of 950-1500 m. Using a spatial statistical model for landscape genetics to infer the number of populations and the spatial location of genetic discontinuities between these populations we found indications that Swedish hazel grouse are divided into a northern and a southern population. We could not find a sharp border between these two populations and none of the observed borders appeared to coincide with any potential geographical barriers.These results imply that gene flow appears somewhat unrestricted in the boreal taiga forests of northern Sweden and that the two populations of hazel grouse in Sweden may be explained by the post-glacial reinvasion history of the Scandinavian Peninsula.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号