首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Disease incidence or mortality data are typically available as rates or counts for specified regions, collected over time. We propose Bayesian nonparametric spatial modeling approaches to analyze such data. We develop a hierarchical specification using spatial random effects modeled with a Dirichlet process prior. The Dirichlet process is centered around a multivariate normal distribution. This latter distribution arises from a log-Gaussian process model that provides a latent incidence rate surface, followed by block averaging to the areal units determined by the regions in the study. With regard to the resulting posterior predictive inference, the modeling approach is shown to be equivalent to an approach based on block averaging of a spatial Dirichlet process to obtain a prior probability model for the finite dimensional distribution of the spatial random effects. We introduce a dynamic formulation for the spatial random effects to extend the model to spatio-temporal settings. Posterior inference is implemented through Gibbs sampling. We illustrate the methodology with simulated data as well as with a data set on lung cancer incidences for all 88 counties in the state of Ohio over an observation period of 21 years.  相似文献   

2.
Extinction and quasi-stationarity in the Verhulst logistic model.   总被引:7,自引:0,他引:7  
We formulate and analyse a stochastic version of the Verhulst deterministic model for density-dependent growth of a single population. Three parameter regions with qualitatively different behaviours are identified. Explicit approximations of the quasi-stationary distribution and of the expected time to extinction are presented in each of these regions. The quasi-stationary distribution is approximately normal, and the time to extinction is long, in one of these regions. Another region has a short time to extinction and a quasi-stationary distribution that is approximately truncated geometric. A third region is a transition region between these two. Here the time to extinction is moderately long and the quasi-stationary distribution has a more complicated behaviour. Numerical illustrations are given.  相似文献   

3.
We developed a time-integrated thermogeographic model to demonstrate conditions under which benthic marine algal assemblages evolve biogeographic patterns in their distribution and abundance. The graphical model applies to rocky marine sublittoral zones in which seasonal temperatures, coastline area, isolation, and evolutionary time are primary factors. Time is treated by using the temperature/area/distributions for the present (interglacial period) integrated with that of 18,000 years before present (glacial period). These two alternate states characterize the global marine realm since the late Pliocene to Pleistocene time during which many extant species have evolved. The resulting abiotic "thermogeographic" model defines 20 regions that correspond with the cores of 24 recognized biogeographic regions and/or provinces determined by published distributions of organisms. Modern biogeographic regions conform closely with thermogeographic regions where temperature, area, and time are integrated. We also propose that biogeographic patterns should be determined by the abundance of species assemblages rather than presence and absence or percent endemism as is commonly done. We test the efficacy of thermogeographic regions with abundance-weighted patterns in the biogeography of crustose coralline red algae (Rhodophyta/Corallinales) in the colder part of the northern hemisphere. Based on abundance, rather than presence/absence, coralline red algal biogeographic regions correspond closely with the model's thermogeographic regions.  相似文献   

4.
Bacterial biodiversity at the species level, in terms of gene acquisition or loss, is so immense that it raises the question of how essential chromosomal regions are spared from uncontrolled rearrangements. Protection of the genome likely depends on specific DNA motifs that impose limits on the regions that undergo recombination. Although most such motifs remain unidentified, they are theoretically predictable based on their genomic distribution properties. We examined the distribution of the “crossover hotspot instigator,” or Chi, in Escherichia coli, and found that its exceptional distribution is restricted to the core genome common to three strains. We then formulated a set of criteria that were incorporated in a statistical model to search core genomes for motifs potentially involved in genome stability in other species. Our strategy led us to identify and biologically validate two distinct heptamers that possess Chi properties, one in Staphylococcus aureus, and the other in several streptococci. This strategy paves the way for wide-scale discovery of other important functional noncoding motifs that distinguish core genomes from the strain-variable regions.  相似文献   

5.
The inference of antigen selection on Ig genes   总被引:18,自引:0,他引:18  
Analysis of somatic mutations in V regions of Ig genes is important for understanding various biological processes. It is customary to estimate Ag selection on Ig genes by assessment of replacement (R) as opposed to silent (S) mutations in the complementary-determining regions and S as opposed to R mutations in the framework regions. In the past such an evaluation was performed using a binomial distribution model equation, which is inappropriate for Ig genes in which mutations have four different distribution possibilities (R and S mutations in the complementary-determining region and/or framework regions of the gene). In the present work, we propose a multinomial distribution model for assessment of Ag selection. Side-by-side application of multinomial and binomial models on 86 previously established Ig sequences disclosed 8 discrepancies, leading to opposite statistical conclusions about Ag selection. We suggest the use of the multinomial model for all future analysis of Ag selection.  相似文献   

6.
Overlapping genes are a common phenomenon. Among sequenced prokaryotes, more than 29% of all annotated genes overlap at least 1 of their 2 flanking genes. We present a unified model for the creation and repair of overlaps among adjacent genes where the 3' ends either overlap or nearly overlap. Our model, derived from a comprehensive analysis of complete prokaryotic genomes in GenBank, explains the nonuniform distribution of the lengths of such overlap regions far more simply than previously proposed models. Specifically, we explain the distribution of overlap lengths based on random extensions of genes to the next occurring downstream stop codon. Our model also provides an explanation for a newly observed (here) pattern in the distribution of the separation distances of closely spaced nonoverlapping genes. We provide evidence that the newly described biased distribution of separation distances is driven by the same phenomenon that creates the uneven distribution of overlap lengths. This suggests a dynamic picture of continual overlap creation and elimination.  相似文献   

7.
Predictive habitat distribution models are normally assumed to sacrifice generality for precision and reality. Nevertheless, such models are often applied to predict the distribution of a species outside the area for which the model has been calibrated.
We investigated how the geographic extent of the data used for calibration influenced the performance of habitat distribution models applied on independent data. We took a multi-scale logistic regression approach by varying the grain size to develop six habitat models for capercaillie Tetrao urogallus in Switzerland: three regional models, for the northern Pre-Alps, eastern Central Alps and Jura mountains, respectively, and three pooled models, each using data from two of the three regions. The six models were validated with data from the region(s) not used for model building. We used Cohen's Kappa and the area under the receiver operating characteristics curve as accuracy measures. The regional models performed well in the region where they had been calibrated, but poorly to moderately well in the other regions. The pooled models classified almost as well in their calibration regions as the corresponding regional models, but generally better when validated on data from the independent region. Hence, models built with data from single regions provide less certain predictions of species' distributions in other regions. We recommend building more general models using data pooled from several regions, when the aim is to predict species' distributions in independent regions.  相似文献   

8.
The post‐glacial migration of European beech Fagus sylvatica has been addressed by many studies using either genetic or fossil data or a combination of both. In contrast to this, only little is known about the migration history of beech forest understorey species. In a review of phytosociological literature, we identified 110 plant species which are closely associated with beech forest. We divided the distribution range of European beech forests into 40 geographical regions, and the presence or absence of each species was recorded for each region. We compared overall species numbers per region and numbers of narrow‐range species (species present in <10 regions). A multiple regression model was used to test for the explanatory value of three potential diversity controls: range in elevation, soil type diversity, and distance to the nearest potential refuge area. A hierarchical cluster analysis of the narrow‐range species was performed. The frequency of range sizes shows a U‐shaped distribution, with 42 species occurring in <10 regions. The highest number of beech forest species is found in the southern Alps and adjacent regions, and species numbers decrease with increasing distance from these regions. With only narrow‐range species taken into consideration, secondary maxima are found in Spain, the southern Apennines, the Carpathians, and Greece. Distance to the nearest potential refuge area is the strongest predictor of beech forest species richness, while altitudinal range and soil type diversity had little or no predictive value. The clusters of narrow‐range species are in good concordance with the glacial refuge areas of beech and other temperate tree species as estimated in recent studies. These findings support the hypothesis that the distribution of many beech forest species is limited by post‐glacial dispersal rather than by their environmental requirements.  相似文献   

9.
Duret L  Marais G  Biémont C 《Genetics》2000,156(4):1661-1669
We analyzed the distribution of transposable elements (TEs: transposons, LTR retrotransposons, and non-LTR retrotransposons) in the chromosomes of the nematode Caenorhabditis elegans. The density of transposons (DNA-based elements) along the chromosomes was found to be positively correlated with recombination rate, but this relationship was not observed for LTR or non-LTR retrotransposons (RNA-based elements). Gene (coding region) density is higher in regions of low recombination rate. However, the lower TE density in these regions is not due to the counterselection of TE insertions within exons since the same positive correlation between TE density and recombination rate was found in noncoding regions (both in introns and intergenic DNA). These data are not compatible with a global model of selection acting against TE insertions, for which an accumulation of elements in regions of reduced recombination is expected. We also found no evidence for a stronger selection against TE insertions on the X chromosome compared to the autosomes. The difference in distribution of the DNA and RNA-based elements along the chromosomes in relation to recombination rate can be explained by differences in the transposition processes.  相似文献   

10.
The evolution of isochores: evidence from SNP frequency distributions   总被引:4,自引:0,他引:4  
Lercher MJ  Smith NG  Eyre-Walker A  Hurst LD 《Genetics》2002,162(4):1805-1810
The large-scale systematic variation in nucleotide composition along mammalian and avian genomes has been a focus of the debate between neutralist and selectionist views of molecular evolution. Here we test whether the compositional variation is due to mutation bias using two new tests, which do not assume compositional equilibrium. In the first test we assume a standard population genetics model, but in the second we make no assumptions about the underlying population genetics. We apply the tests to single-nucleotide polymorphism data from noncoding regions of the human genome. Both models of neutral mutation bias fit the frequency distributions of SNPs segregating in low- and medium-GC-content regions of the genome adequately, although both suggest compositional nonequilibrium. However, neither model fits the frequency distribution of SNPs from the high-GC-content regions. In contrast, a simple population genetics model that incorporates selection or biased gene conversion cannot be rejected. The results suggest that mutation biases are not solely responsible for the compositional biases found in noncoding regions.  相似文献   

11.
Keleş S 《Biometrics》2007,63(1):10-21
Chromatin immunoprecipitation followed by DNA microarray analysis (ChIP-chip methodology) is an efficient way of mapping genome-wide protein-DNA interactions. Data from tiling arrays encompass DNA-protein interaction measurements on thousands or millions of short oligonucleotides (probes) tiling a whole chromosome or genome. We propose a new model-based method for analyzing ChIP-chip data. The proposed model is motivated by the widely used two-component multinomial mixture model of de novo motif finding. It utilizes a hierarchical gamma mixture model of binding intensities while incorporating inherent spatial structure of the data. In this model, genomic regions belong to either one of the following two general groups: regions with a local protein-DNA interaction (peak) and regions lacking this interaction. Individual probes within a genomic region are allowed to have different localization rates accommodating different binding affinities. A novel feature of this model is the incorporation of a distribution for the peak size derived from the experimental design and parameters. This leads to the relaxation of the fixed peak size assumption that is commonly employed when computing a test statistic for these types of spatial data. Simulation studies and a real data application demonstrate good operating characteristics of the method including high sensitivity with small sample sizes when compared to available alternative methods.  相似文献   

12.
Mitochondrial mismatch analysis is insensitive to the mutational process   总被引:13,自引:4,他引:9  
Mismatch distributions are histograms showing the pattern of nucleotide (or restriction) site differences between pairs of individuals in a sample. They can be used to test hypotheses about the history of population size and subdivision (if selective neutrality is assumed) or about selection (if a constant population size is assumed). Previous work has assumed that mutations never strike the same site twice, an assumption that is called the model of infinite sites. Fortunately, the results are surprisingly robust even when this assumption is violated. We show here that (1) confidence regions inferred using the infinite- sites model differ little from those inferred using a model of finite sites with uniform site-specific mutation rates, and (2) even when site- specific mutation rates follow a gamma distribution, confidence regions are little changed until the gamma shape parameter falls well below its plausible range, to roughly 0.01. In addition, we evaluate and reject the proposition that mismatch waves are produced by pooling data from several subdivisions of a structured population.   相似文献   

13.
Risk mapping in epidemiology enables areas with a low or high risk of disease contamination to be localized and provides a measure of risk differences between these regions. Risk mapping models for pooled data currently used by epidemiologists focus on the estimated risk for each geographical unit. They are based on a Poisson log-linear mixed model with a latent intrinsic continuous hidden Markov random field (HMRF) generally corresponding to a Gaussian autoregressive spatial smoothing. Risk classification, which is necessary to draw clearly delimited risk zones (in which protection measures may be applied), generally must be performed separately. We propose a method for direct classified risk mapping based on a Poisson log-linear mixed model with a latent discrete HMRF. The discrete hidden field (HF) corresponds to the assignment of each spatial unit to a risk class. The risk values attached to the classes are parameters and are estimated. When mapping risk using HMRFs, the conditional distribution of the observed field is modeled with a Poisson rather than a Gaussian distribution as in image segmentation. Moreover, abrupt changes in risk levels are rare in disease maps. The spatial hidden model should favor smoothed out risks, but conventional discrete Markov random fields (e.g. the Potts model) do not impose this. We therefore propose new potential functions for the HF that take into account class ordering. We use a Monte Carlo version of the expectation-maximization algorithm to estimate parameters and determine risk classes. We illustrate the method's behavior on simulated and real data sets. Our method appears particularly well adapted to localize high-risk regions and estimate the corresponding risk levels.  相似文献   

14.
Bacteria of the genus Rhizobium and related genera establish nitrogen-fixing symbioses with the roots of leguminous plants. The genetic elements that participate in the symbiotic process are usually compartmentalized in the genome, either as independent replicons (symbiotic plasmids) or as symbiotic regions or islands in the chromosome. The complete nucleotide sequence of the symbiotic plasmid of Rhizobium etli model strain CFN42, symbiont of the common bean plant, has been reported. To better understand the basis of DNA sequence diversification of this symbiotic compartment, we analyzed the distribution of single-nucleotide polymorphisms in homologous regions from different Rhizobium etli strains. The distribution of polymorphisms is highly asymmetric in each of the different strains, alternating regions containing very few changes with regions harboring an elevated number of substitutions. The regions showing high polymorphism do not correspond with discrete genetic elements and are not the same in the different strains, indicating that they are not hypervariable regions of functional genes. Most interesting, some highly polymorphic regions share exactly the same nucleotide substitutions in more than one strain. Furthermore, in different regions of the symbiotic compartment, different sets of strains share the same substitutions. The data indicate that the majority of nucleotide substitutions are spread in the population by recombination and that the contribution of new mutations to polymorphism is relatively low. We propose that the horizontal transfer of homologous DNA segments among closely related organisms is a major source of genomic diversification.  相似文献   

15.
Aim Predicting species distribution is of fundamental importance for ecology and conservation. However, distribution models are usually established for only one region and it is unknown whether they can be transferred to other geographical regions. We studied the distribution of six amphibian species in five regions to address the question of whether the effect of landscape variables varied among regions. We analysed the effect of 10 variables extracted in six concentric buffers (from 100 m to 3 km) describing landscape composition around breeding ponds at different spatial scales. We used data on the occurrence of amphibian species in a total of 655 breeding ponds. We accounted for proximity to neighbouring populations by including a connectivity index to our models. We used logistic regression and information‐theoretic model selection to evaluate candidate models for each species. Location Switzerland. Results The explained deviance of each species’ best models varied between 5% and 32%. Models that included interactions between a region and a landscape variable were always included in the most parsimonious models. For all species, models including region‐by‐landscape interactions had similar support (Akaike weights) as models that did not include interaction terms. The spatial scale at which landscape variables affected species distribution varied from 100 m to 1000 m, which was in agreement with several recent studies suggesting that land use far away from the ponds can affect pond occupancy. Main conclusions Different species are affected by different landscape variables at different spatial scales and these effects may vary geographically, resulting in a generally low transferability of distribution models across regions. We also found that connectivity seems generally more important than landscape variables. This suggests that metapopulation processes may play a more important role in species distribution than habitat characteristics.  相似文献   

16.
Crowley EM 《Biopolymers》2001,58(2):165-174
A goal of the human genome project is to determine the entire sequence of DNA (3 x 10(9) base pairs) found in chromosomes. The massive amounts of data produced by this project require interpretation. A Bayesian model is developed for locating regulatory regions in a DNA sequence. Regulatory regions are areas of DNA to which specific proteins bind and control whether or not a gene is transcribed to produce templates for protein synthesis. Each human cell contains the same DNA sequence. Thus the particular function of different cells is determined by the genes that are transcribed in that cell. A Hidden Markov chain is used to model whether a small interval of the DNA is in a regulatory region or not. This can be regarded as a changepoint problem where the changepoints are the start of a regulatory or nonregulatory region. The data consists of protein-binding elements, which are short subsequences, or "words," in the DNA sequence. Although these words can occur anywhere in the sequence, a larger number are expected in regulatory regions. Therefore, regulatory regions are detected by locating clusters of words. For a particular DNA sequence, the model automatically selects those words that best predict regions of interest. Markov chain Monte Carlo methods are used to explore the posterior distribution of the Hidden Markov chain. The model is tested by means of simulations, and applied to several DNA sequences.  相似文献   

17.
Recent empirical studies have suggested that the patch-size distribution of vegetation can be fitted by a power law, truncated power law, or lognormal model to provide explanatory mechanisms for vegetation pattern formation in arid and semiarid regions. However, contradictory results have been reported. Therefore, additional empirical studies are necessary to test the patch-size distribution of vegetation over several regions before it can be considered as an indicator for assessing the discontinuous transition of ecosystems and understanding the mechanisms of vegetation pattern formation. Analogous to arid and semiarid regions of the world, vegetation patterns are characterized by a two-phase mosaic composed of dense vegetation patches interspersed with areas of bare soil, referred to as quasi-circular vegetation patches (QVPs), in the Yellow River Delta (YRD), China. However, research on the patch-size distribution of the QVPs reflecting vegetation patterns and ecosystem functioning is lacking. To fill this gap, for the first time, we examined the patch-size distribution of the QVPs using the fused IKONOS high-spatial-resolution image and evaluated the statistical distributions that better fit the patch size data of the QVPs in the YRD. We found that a power law, truncated power law, or lognormal distribution was not supported in the study area, whereas gamma distribution reasonably fits the size data of QVPs, implying that micro-depressions, combined with the water-limited and salinization environments had considerable effects on vegetation pattern formation. Our results provide helpful insights and suggest that further studies are needed to classify different types of QVPs. Additionally, more efficient approaches need to be used to fit the statistical distributions for elucidating the spatial vegetation patterns in the YRD.  相似文献   

18.
Landscape change may reduce the connectivity of landscapes and impact the movement of animals. If movement processes have been influenced by landscape connectivity, we hypothesize that animals may distribute themselves in larger connected regions of the landscape in order to minimize the movement costs associated with obtaining required resources and avoiding predators. We adopt the term functional grain to describe a set of functionally connected regions. In this spatial pattern, each region describes a contiguous area of the landscape within which an animal may move freely below a threshold amount of movement cost. We used telemetry data from woodland caribou Rangifer tarandus caribou to test hypothetical functional grains where connectivity was determined by the spatial configuration of resource patches (patch only), by the resistance to movement presented by landscape features (resistance only), and by a combination of the two (patch + resistance). To identify these functional grains, we used a grains of connectivity approach, and introduced a novel lattice‐based variant of this method to build the resistance only model. We developed a measure of fit that describes caribou distribution with respect to larger functionally connected regions in the grain, and used this to ask: 1) are seasonal caribou locations consistent with a random functional grain, implying that landscape connectivity has not shaped their distribution? 2) Given a functional grain model, are seasonal caribou locations distributed in larger functionally connected regions than random points, implying a response to the shape, size, and location of the connected regions. We found support for landscape connectivity influencing animal distribution using grains based on a landscape resistance model, and that support varied between behaviourally defined seasons. We also discuss how our novel lattice approach may be valuable for highly mobile mammals and other species where the identification of resource patches is a limitation.  相似文献   

19.
We present a simple model of genetic regulatory networks in which regulatory connections among genes are mediated by a limited number of signaling molecules. Each gene in our model produces (publishes) a single gene product, which regulates the expression of other genes by binding to regulatory regions that correspond (subscribe) to that product. We explore the consequences of this publish-subscribe model of regulation for the properties of single networks and for the evolution of populations of networks. Degree distributions of randomly constructed networks, particularly multimodal in-degree distributions, which depend on the length of the regulatory sequences and the number of possible gene products, differed from simpler Boolean NK models. In simulated evolution of populations of networks, single mutations in regulatory or coding regions resulted in multiple changes in regulatory connections among genes, or alternatively in neutral change that had no effect on phenotype. This resulted in remarkable evolvability in both number and length of attractors, leading to evolved networks far beyond the expectation of these measures based on random distributions. Surprisingly, this rapid evolution was not accompanied by changes in degree distribution; degree distribution in the evolved networks was not substantially different from that of randomly generated networks. The publish-subscribe model also allows exogenous gene products to create an environment, which may be noisy or stable, in which dynamic behavior occurs. In simulations, networks were able to evolve moderate levels of both mutational and environmental robustness.  相似文献   

20.

Background

Ancestral reconstructions of mammalian genomes have revealed that evolutionary breakpoint regions are clustered in regions that are more prone to break and reorganize. What is still unclear to evolutionary biologists is whether these regions are physically unstable due solely to sequence composition and/or genome organization, or do they represent genomic areas where the selection against breakpoints is minimal.

Methodology and Principal Findings

Here we present a comprehensive study of the distribution of tandem repeats in great apes. We analyzed the distribution of tandem repeats in relation to the localization of evolutionary breakpoint regions in the human, chimpanzee, orangutan and macaque genomes. We observed an accumulation of tandem repeats in the genomic regions implicated in chromosomal reorganizations. In the case of the human genome our analyses revealed that evolutionary breakpoint regions contained more base pairs implicated in tandem repeats compared to synteny blocks, being the AAAT motif the most frequently involved in evolutionary regions. We found that those AAAT repeats located in evolutionary regions were preferentially associated with Alu elements.

Significance

Our observations provide evidence for the role of tandem repeats in shaping mammalian genome architecture. We hypothesize that an accumulation of specific tandem repeats in evolutionary regions can promote genome instability by altering the state of the chromatin conformation or by promoting the insertion of transposable elements.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号