首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.

Background  

To interpret microarray experiments, several ontological analysis tools have been developed. However, current tools are limited to specific organisms.  相似文献   

2.
Current models of codon substitution are formulated at the levels of nucleotide substitution and do not explicitly consider the separate effects of mutation and selection. They are thus incapable of inferring whether mutation or selection is responsible for evolution at silent sites. Here we implement a few population genetics models of codon substitution that explicitly consider mutation bias and natural selection at the DNA level. Selection on codon usage is modeled by introducing codon-fitness parameters, which together with mutation-bias parameters, predict optimal codon frequencies for the gene. The selective pressure may be for translational efficiency and accuracy or for fine-tuning translational kinetics to produce correct protein folding. We apply the models to compare mitochondrial and nuclear genes from several mammalian species. Model assumptions concerning codon usage are found to affect the estimation of sequence distances (such as the synonymous rate d(S), the nonsynonymous rate d(N), and the rate at the 4-fold degenerate sites d(4)), as found in previous studies, but the new models produced very similar estimates to some old ones. We also develop a likelihood ratio test to examine the null hypothesis that codon usage is due to mutation bias alone, not influenced by natural selection. Application of the test to the mammalian data led to rejection of the null hypothesis in most genes, suggesting that natural selection may be a driving force in the evolution of synonymous codon usage in mammals. Estimates of selection coefficients nevertheless suggest that selection on codon usage is weak and most mutations are nearly neutral. The sensitivity of the analysis on the assumed mutation model is discussed.  相似文献   

3.
Model-based clustering is a popular tool for summarizing high-dimensional data. With the number of high-throughput large-scale gene expression studies still on the rise, the need for effective data- summarizing tools has never been greater. By grouping genes according to a common experimental expression profile, we may gain new insight into the biological pathways that steer biological processes of interest. Clustering of gene profiles can also assist in assigning functions to genes that have not yet been functionally annotated. In this paper, we propose 2 model selection procedures for model-based clustering. Model selection in model-based clustering has to date focused on the identification of data dimensions that are relevant for clustering. However, in more complex data structures, with multiple experimental factors, such an approach does not provide easily interpreted clustering outcomes. We propose a mixture model with multiple levels, , that provides sparse representations both "within" and "between" cluster profiles. We explore various flexible "within-cluster" parameterizations and discuss how efficient parameterizations can greatly enhance the objective interpretability of the generated clusters. Moreover, we allow for a sparse "between-cluster" representation with a different number of clusters at different levels of an experimental factor of interest. This enhances interpretability of clusters generated in multiple-factor contexts. Interpretable cluster profiles can assist in detecting biologically relevant groups of genes that may be missed with less efficient parameterizations. We use our multilevel mixture model to mine a proliferating cell line expression data set for annotational context and regulatory motifs. We also investigate the performance of the multilevel clustering approach on several simulated data sets.  相似文献   

4.
Horizontal gene transfer, genome innovation and evolution   总被引:10,自引:0,他引:10  
To what extent is the tree of life the best representation of the evolutionary history of microorganisms? Recent work has shown that, among sets of prokaryotic genomes in which most homologous genes show extremely low sequence divergence, gene content can vary enormously, implying that those genes that are variably present or absent are frequently horizontally transferred. Traditionally, successful horizontal gene transfer was assumed to provide a selective advantage to either the host or the gene itself, but could horizontally transferred genes be neutral or nearly neutral? We suggest that for many prokaryotes, the boundaries between species are fuzzy, and therefore the principles of population genetics must be broadened so that they can be applied to higher taxonomic categories.  相似文献   

5.
Historical biogeography and comparative phylogeography havemuch in common. Both seek to discover common historical patternsin the elements of biotas, although typically at different tiersof evolutionary history. Comparative phylogeography is basedon phylogeographic analyses of multiple taxa, usually widespreadspecies. By comparing the phylogeographic structures of numerouswidespread sympatric species, one can infer whether the currentfauna has been historically stable, as evidenced by the relativefrequency of geographically congruent reciprocally monophyleticgroups. Alternatively, if species distributions are ephemeralover evolutionary time, a mixture of phylogeographic structuresis expected. Coalescence analyses contribute information abouthistory irrespective of whether haplotype phylogenies are structuredor not. In the aridlands of North America, several isolatingevents are evident in the phylogeographic patterns of birds,mammals and herps. A mid-peninsular seaway in Baja California,dated at ca. one million years before present, had a pervasiveeffect, with 13 of 16 assayed species showing a concordant split.Hence, this community appears to have been a stable assemblageof species over the past one million years. In contrast, theavifauna of the Sonoran-Chihuahuan deserts consists of two specieswith a concordant split and three other species that are undifferentiatedacross both deserts. Hence, the species in this area have haddifferent histories. The Baja biota appears to resemble itsancestral configuration to a greater degree than the Sonoran-Chihuahuanone. A deeper evolutionary event separated taxa in Baja Californiafrom the eastern deserts, showing that the aridlands fauna wasaffected by events at different times resulting in overlaintiers of history.  相似文献   

6.
Nonlinear multilevel models, with an application to discrete response data   总被引:11,自引:0,他引:11  
GOLDSTEIN  HARVEY 《Biometrika》1991,78(1):45-51
  相似文献   

7.
Transposable element contributions to plant gene and genome evolution   总被引:34,自引:0,他引:34  
Transposable elements were first discovered in plants because they can have tremendous effects on genome structure and gene function. Although only a few or no elements may be active within a genome at any time in any individual, the genomic alterations they cause can have major outcomes for a species. All major element types appear to be present in all plant species, but their quantitative and qualitative contributions are enormously variable even between closely related lineages. In some large-genome plants, mobile DNAs make up the majority of the nuclear genome. They can rearrange genomes and alter individual gene structure and regulation through any of the activities they promote: transposition, insertion, excision, chromosome breakage, and ectopic recombination. Many genes may have been assembled or amplified through the action of transposable elements, and it is likely that most plant genes contain legacies of multiple transposable element insertions into promoters. Because chromosomal rearrangements can lead to speciating infertility in heterozygous progeny, transposable elements may be responsible for the rate at which such incompatibility is generated in separated populations. For these reasons, understanding plant gene and genome evolution is only possible if we comprehend the contributions of transposable elements.  相似文献   

8.
Protandry models and their application to salmon   总被引:1,自引:0,他引:1  
Mating systems characterized by restricted breeding seasons,male polygamy, and female monogamy are common among animals.In such systems (e.g., butterflies), the earlier emergenceof males than females to breeding areas (protandry) is a typicalphenological pattern. Protandry likely results from a timingstrategy that maximizes mating opportunities by males. In Pacific salmon (Oncorhynchus spp.), males typically arrive at the spawning grounds in advance of females. Using arrival-timing models,I found that under the mate-opportunity hypothesis, the matingsystem of salmon favors protandry. Protandry is predicted undera range of competitive scenarios, and the degree of protandryis especially sensitive to the duration of male spawning activity.Greater protandry is expected with increasing population sexratio (i.e., more males) when there is mate guarding, but lowerprotandry is expected with increasing population sex ratiowhen interference competition among males reduces male longevity.The timing of unequal competitors is expected to be similar,but among years, protandry may be less variable in the bettercompetitor.  相似文献   

9.
Abiotic stress and plant genome evolution. Search for new models   总被引:23,自引:0,他引:23       下载免费PDF全文
  相似文献   

10.
If one has the amino acid sequences of a set of homologous proteins as well as their phylogenetic relationships, one can easily determine the minimum number of mutations (nucleotide replacements) which must have been fixed in each codon since their common ancestor. It is found that for 29 species of cytochrome c the data fit the assumption that there is a group of approximately 32 invariant codons and that the remainder compose two Poisson-distributed groups of size 65 and 16 codons, the latter smaller group fixing mutations at about 3.2 times the rate of the larger. It is further found that the size of the invariant group increases as the range of species is narrowed. Extrapolation suggests that less than 10% of the codons in a given mammalian cytochrome c gene are capable of accepting a mutation. This is consistent with the view that at any one point in time only a very restricted number of positions can fix mutations but that as mutations are fixed the positions capable of accepting mutations also change so that examination of a wide range of species reveals a wide range of altered positions. We define this restricted group as the concomitantly variable codons. Given this restriction, the fixation rates for mutations in concomitantly variable codons in cytochrome c and fibrinopeptide A are not very different, a result which should be the case if most of these mutations are in fact selectively neutral as Kimura suggests.Paper number 1382 from the Laboratory of Genetics. Work performed in part at the University of Iowa, Department of Preventive Medicine and Environmental Health and Department of Statistics, Iowa City, Iowa. Computing supported by the Graduate College, University of Iowa.  相似文献   

11.
Classical genetic studies show that gene conversion can favour some alleles over others. Molecular experiments suggest that gene conversion could favour GC over AT basepairs, leading to the concept of biased gene conversion towards GC (BGC(GC)). The expected consequence of such a process is the GC-enrichment of DNA sequences under gene conversion. Recent genomic work suggests that BGC(GC) affects the base composition of yeast, invertebrate and mammalian genomes. Hypotheses for the mechanisms and evolutionary origin of such a strange phenomenon have been proposed. Most BGC(GC) events probably occur during meiosis, which has implications for our understanding of the evolution of sex and recombination.  相似文献   

12.
Chemoperception plays a key role in adaptation and speciation in animals, and the senses of olfaction and gustation are mediated by gene families which show large variation in repertoire size among species. In Drosophila, there are around 60 loci of each type and it is thought that ecological specialization influences repertoire size, with increased pseudogenization of loci. Here, we analyse the size of the gustatory and olfactory repertoires among the genomes of 12 species of Drosophila . We find that repertoire size varies substantially and the loci are evolving by duplication and pseudogenization, with striking examples of lineage-specific duplication. Selection analyses imply that the majority of loci are subject to purifying selection, but this is less strong in gustatory loci and in loci prone to duplication. In contrast to some other studies, we find that few loci show statistically significant evidence of positive selection. Overall genome size is strongly correlated with the proportion of duplicated chemoreceptor loci, but genome size, specialization and endemism may be interrelated in their influence on repertoire size.  相似文献   

13.
Identifying changepoints is an important problem in molecular genetics. Our motivating example is from cancer genetics where interest focuses on identifying areas of a chromosome with an increased likelihood of a tumor suppressor gene. Loss of heterozygosity (LOH) is a binary measure of allelic loss in which abrupt changes in LOH frequency along the chromosome may identify boundaries indicative of a region containing a tumor suppressor gene. Our interest was on testing for the presence of multiple changepoints in order to identify regions of increased LOH frequency. A complicating factor is the substantial heterogeneity in LOH frequency across patients, where some patients have a very high LOH frequency while others have a low frequency. We develop a procedure for identifying multiple changepoints in heterogeneous binary data. We propose both approximate and full maximum-likelihood approaches and compare these two approaches with a naive approach in which we ignore the heterogeneity in the binary data. The methodology is used to estimate the pattern in LOH frequency on chromosome 13 in esophageal cancer patients and to isolate an area of inflated LOH frequency on chromosome 13 which may contain a tumor suppressor gene. Using simulations, we show that our approach works well and that it is robust to departures from some key modeling assumptions.  相似文献   

14.
Yang Y  Ott J 《Human heredity》2002,53(4):227-236
In genome-wide screens of genetic marker loci, non-mendelian inheritance of a marker is taken to indicate its vicinity to a disease locus. Heritable complex traits are thought to be under the influence of multiple possibly interacting susceptibility loci yet the most frequently used methods of linkage and association analysis focus on one susceptibility locus at a time. Here we introduce log-linear models for the joint analysis of multiple marker loci and interaction effects between them. Our approach focuses on affected sib pair data and identical by descent (IBD) allele sharing values observed on them. For each heterozygous parent, the IBD values at linked markers represent a sequence of dependent binary variables. We develop log-linear models for the joint distribution of these IBD values. An independence log-linear model is proposed to model the marginal means and the neighboring interaction model is advocated to account for associations between adjacent markers. Under the assumption of conditional independence, likelihood methods are applied to simulated data containing one or two susceptibility loci. It is shown that the neighboring interaction log-linear model is more efficient than the independence model, and incorporating interaction in the two-locus analysis provides increased power and accuracy for mapping of the trait loci.  相似文献   

15.

Background

The abundance of new genomic data provides the opportunity to map the location of gene duplication and loss events on a species phylogeny. The first methods for mapping gene duplications and losses were based on a parsimony criterion, finding the mapping that minimizes the number of duplication and loss events. Probabilistic modeling of gene duplication and loss is relatively new and has largely focused on birth-death processes.

Results

We introduce a new maximum likelihood model that estimates the speciation and gene duplication and loss events in a gene tree within a species tree with branch lengths. We also provide an, in practice, efficient algorithm that computes optimal evolutionary scenarios for this model. We implemented the algorithm in the program DrML and verified its performance with empirical and simulated data.

Conclusions

In test data sets, DrML finds optimal gene duplication and loss scenarios within minutes, even when the gene trees contain sequences from several hundred species. In many cases, these optimal scenarios differ from the lca-mapping that results from a parsimony gene tree reconciliation. Thus, DrML provides a new, practical statistical framework on which to study gene duplication.
  相似文献   

16.
Afrotheria is the clade of placental mammals that, together with Xenarthra, Euarchontoglires and Laurasiatheria, represents 1 of the 4 main recognized supraordinal eutherian clades. It reunites 6 orders of African origin: Proboscidea, Sirenia, Hyracoidea, Macroscelidea, Afrosoricida and Tubulidentata. The apparently unlikely relationship among such disparate morphological taxa and their possible basal position at the base of the eutherian phylogenetic tree led to a great deal of attention and research on the group. The use of biomolecular data was pivotal in Afrotheria studies, as they were the basis for the recognition of this clade. Although morphological evidence is still scarce, a plethora of molecular data firmly attests to the phylogenetic relationship among these mammals of African origin. Modern cytogenetic techniques also gave a significant contribution to the study of Afrotheria, revealing chromosome signatures for the group as a whole, as well as for some of its internal relationships. The associations of human chromosomes HSA1/19 and 5/21 were found to be chromosome signatures for the group and provided further support for Afrotheria. Additional chromosome synapomorphies were also identified linking elephants and manatees in Tethytheria (the associations HSA2/3, 3/13, 8/22, 18/19 and the lack of HSA4/8) and elephant shrews with the aardvark (HSA2/8, 3/20 and 10/17). Herein, we review the current knowledge on Afrotheria chromosomes and genome evolution. The already available data on the group suggests that further work on this apparently bizarre assemblage of mammals will provide important data to a better understanding on mammalian genome evolution.  相似文献   

17.
Piganeau G  Moreau H 《Gene》2007,406(1-2):184-190
The Sargasso Sea water shotgun sequencing unveiled an unprecedented glimpse of marine prokaryotic diversity and gene content. The sequence data was gathered from 0.8 microm filtered surface water extracts, and revealed picoeukaryotic (cell size<2 microm) sequences alongside the prokaryotic data. We used the available genome sequence of the picoeukaryote Ostreococcus tauri (Prasinophyceae, Chlorophyta) as a benchmark for the eukaryotic sequence content of the Sargasso Sea metagenome. Sequence data from at least two new Ostreococcus strains were identified and analyzed, and showed a bias towards higher coverage of the AT-rich organellar genomes. The Ostreococcus nuclear sequence data retrieved from the Sargasso metagenome is divided onto 731 scaffolds of average size 3917 bp, and covers 23% of the complete nuclear genome and 14% of the total number of protein coding genes in O. tauri. We used this environmental Ostreococcus sequence data to estimate the level of constraint on intronic and intergenic sequences in this compact genome.  相似文献   

18.
The nonsynonymous to synonymous substitution rate ratio (omega = d(N)/d(S)) provides a sensitive measure of selective pressure at the protein level, with omega values <1, =1, and >1 indicating purifying selection, neutral evolution, and diversifying selection, respectively. Maximum likelihood models of codon substitution developed recently account for variable selective pressures among amino acid sites by employing a statistical distribution for the omega ratio among sites. Those models, called random-sites models, are suitable when we do not know a priori which sites are under what kind of selective pressure. Sometimes prior information (such as the tertiary structure of the protein) might be available to partition sites in the protein into different classes, which are expected to be under different selective pressures. It is then sensible to use such information in the model. In this paper, we implement maximum likelihood models for prepartitioned data sets, which account for the heterogeneity among site partitions by using different omega parameters for the partitions. The models, referred to as fixed-sites models, are also useful for combined analysis of multiple genes from the same set of species. We apply the models to data sets of the major histocompatibility complex (MHC) class I alleles from human populations and of the abalone sperm lysin genes. Structural information is used to partition sites in MHC into two classes: those in the antigen recognition site (ARS) and those outside. Positive selection is detected in the ARS by the fixed-sites models. Similarly, sites in lysin are classified into the buried and solvent-exposed classes according to the tertiary structure, and positive selection was detected at the solvent-exposed sites. The random-sites models identified a number of sites under positive selection in each data set, confirming and elaborating the results of the fixed-sites models. The analysis demonstrates the utility of the fixed-sites models, as well as the power of previous random-sites models, which do not use the prior information to partition sites.  相似文献   

19.
Models of codon evolution are useful for investigating the strength and direction of natural selection via a parameter for the nonsynonymous/synonymous rate ratio (omega = d(N)/d(S)). Different codon models are available to account for diversity of the evolutionary patterns among sites. Codon models that specify data partitions as fixed effects allow the most evolutionary diversity among sites but require that site partitions are a priori identifiable. Models that use a parametric distribution to express the variability in the omega ratio across site do not require a priori partitioning of sites, but they permit less among-site diversity in the evolutionary process. Simulation studies presented in this paper indicate that differences among sites in estimates of omega under an overly simplistic analytical model can reflect more than just natural selection pressure. We also find that the classic likelihood ratio tests for positive selection have a high false-positive rate in some situations. In this paper, we developed a new method for assigning codon sites into groups where each group has a different model, and the likelihood over all sites is maximized. The method, called likelihood-based clustering (LiBaC), can be viewed as a generalization of the family of model-based clustering approaches to models of codon evolution. We report the performance of several LiBaC-based methods, and selected alternative methods, over a wide variety of scenarios. We find that LiBaC, under an appropriate model, can provide reliable parameter estimates when the process of evolution is very heterogeneous among groups of sites. Certain types of proteins, such as transmembrane proteins, are expected to exhibit such heterogeneity. A survey of genes encoding transmembrane proteins suggests that overly simplistic models could be leading to false signal for positive selection among such genes. In these cases, LiBaC-based methods offer an important addition to a "toolbox" of methods thereby helping to uncover robust evidence for the action of positive selection.  相似文献   

20.

Background

The generation of multiple sequence alignments (MSAs) is a crucial step for many bioinformatic analyses. Thus improving MSA accuracy and identifying potential errors in MSAs is important for a wide range of post-genomic research. We present a novel method called MergeAlign which constructs consensus MSAs from multiple independent MSAs and assigns an alignment precision score to each column.

Results

Using conventional benchmark tests we demonstrate that on average MergeAlign MSAs are more accurate than MSAs generated using any single matrix of sequence substitution. We show that MergeAlign column scores are related to alignment precision and hence provide an ab initio method of estimating alignment precision in the absence of curated reference MSAs. Using two novel and independent alignment performance tests that utilise a large set of orthologous gene families we demonstrate that increasing MSA performance leads to an increase in the performance of downstream phylogenetic analyses.

Conclusion

Using multiple tests of alignment performance we demonstrate that this novel method has broad general application in biological research.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号