首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
It is now widely accepted that sites in a protein do not undergo independent evolutionary processes. The underlying assumption is that proteins are composed of conserved and variable linear domains, and thus rates at neighboring sites are correlated. In this paper, we comprehensively examine the performance of an autocorrelation model of evolutionary rates in protein sequences. We further develop a model in which the level of correlation between rates at adjacent sites is not equal at all sites of the protein. High correlation is expected, for example, in linear functional domains. On the other hand, when we consider nonlinear functional regions (e.g., active sites), low correlation is expected because the interaction between distant sites imposes independence of rates in the linear sequence. Our model is based on a hidden Markov model, which accounts for autocorrelation at certain regions of the protein and rate independence at others. We study the differences between the novel model and models which assume either independence or a fixed level of dependence throughout the protein. Using a diverse set of protein data sets we show that the novel model better fits most data sets. We further analyze the potassium-channel protein family and illustrate the relationship between the dependence of rates at adjacent sites and the tertiary structure of the protein.  相似文献   

3.
J. H. Nadeau  D. Sankoff 《Genetics》1997,147(3):1259-1266
Duplicated genes are an important source of new protein functions and novel developmental and physiological pathways. Whereas most models for fate of duplicated genes show that they tend to be rapidly lost, models for pathway evolution suggest that many duplicated genes rapidly acquire novel functions. Little empirical evidence is available, however, for the relative rates of gene loss vs. divergence to help resolve these contradictory expectations. Gene families resulting from genome duplications provide an opportunity to address this apparent contradiction. With genome duplication, the number of duplicated genes in a gene family is at most 2(n), where n is the number of duplications. The size of each gene family, e.g., 1, 2, 3, . . . , 2(n), reflects the patterns of gene loss vs. functional divergence after duplication. We focused on gene families in humans and mice that arose from genome duplications in early vertebrate evolution and we analyzed the frequency distribution of gene family size, i.e., the number of families with two, three or four members. All the models that we evaluated showed that duplicated genes are almost as likely to acquire a new and essential function as to be lost through acquisition of mutations that compromise protein function. An explanation for the unexpectedly high rate of functional divergence is that duplication allows genes to accumulate more neutral than disadvantageous mutations, thereby providing more opportunities to acquire diversified functions and pathways.  相似文献   

4.
5.
6.
MOTIVATION: The distributions of many genome-associated quantities, including the membership of paralogous gene families can be approximated with power laws. We are interested in developing mathematical models of genome evolution that adequately account for the shape of these distributions and describe the evolutionary dynamics of their formation. RESULTS: We show that simple stochastic models of genome evolution lead to power-law asymptotics of protein domain family size distribution. These models, called Birth, Death and Innovation Models (BDIM), represent a special class of balanced birth-and-death processes, in which domain duplication and deletion rates are asymptotically equal up to the second order. The simplest, linear BDIM shows an excellent fit to the observed distributions of domain family size in diverse prokaryotic and eukaryotic genomes. However, the stochastic version of the linear BDIM explored here predicts that the actual size of large paralogous families is reached on an unrealistically long timescale. We show that introduction of non-linearity, which might be interpreted as interaction of a particular order between individual family members, allows the model to achieve genome evolution rates that are much better compatible with the current estimates of the rates of individual duplication/loss events.  相似文献   

7.
According to the neutral theory of evolution, mutation and genetic drift are the only forces that shape unconstrained, neutral, gene evolution. Thus, pseudogenes (which often evolve neutrally) provide opportunities to obtain direct estimates of mutation rates that are not biased by selection, and gene families comprising functional and pseudogene members provide useful material for both estimating neutral mutation rates and identifying sites that appear to be under positive or negative selection pressures. Conifers could be very useful for such analyses since they have large and complex genomes. There is evidence that pseudogenes make significant contributions to the size and complexity of gene families in pines, although few studies have examined the composition and evolution of gene families in conifers. In this work, I examine the complexity and rates of mutation of the phytochrome gene family in Pinus sylvestris and show that it includes not only functional genes but also pseudogenes. As expected, the functional PHYO does not appear to have evolved neutrally, while phytochrome pseudogenes show signs of unconstrained evolution.  相似文献   

8.
The rapid accumulation of genomic sequences in public databases will finally allow large scale studies of gene family evolution, including evaluation of the role of positive Darwinian selection following a duplication event. This will be possible because recent statistical methods of comparing synonymous and nonsynonymous substitution rates permit reliable detection of positive selection at individual amino acid sites and along evolutionary lineages. Here, we summarize maximum-likelihood based methods, and present a framework for their application to analysis of gene families. Using these methods, we investigated the role of positive Darwinian selection in the ECP-EDN gene family of primates and the Troponin C gene family of vertebrates. We also comment on the limitations of these methods and discuss directions for further improvements.  相似文献   

9.
The completion of the human and mouse genomes has identified at least 20 connexin isomers in this family of intercellular channel proteins. However, there are no specific gap junction blockers or channel-blocking mimetic peptides available for the study of specific connexins. We designed antisense oligodeoxynucleotides that functionally reduce targeted connexin protein expression and can be used to reveal the biological function of individual connexins in vivo. Connexin mRNA was firstly exposed in vitro to deoxyribozymes complementing the sense coding sequence. Those that cleaved the target connexin mRNA in defined regions were used as the basis to design oligodeoxynucleotides to the accessible sites, thus taking into account tertiary mRNA configurations rather than relying on computed predictions. Antisense oligodeoxynucleotides designed to bind to accessible mRNA sites selectively reduced connexin26 and -43 mRNA expression in a corneal epithelium ex vivo model. Connexin43 protein levels were reduced correlating with the knockdown in mRNA and the protein's rapid turnover; protein levels of connexin26 did not alter, supporting lower turnover rates reported for that protein. We show, for the first time, an inexpensive and empirical approach to the preparation of specific and functional antisense oligodeoxynucleotides against known gene targets in the post-genomic era.  相似文献   

10.
The PANTHER database was designed for high-throughput analysis of protein sequences. One of the key features is a simplified ontology of protein function, which allows browsing of the database by biological functions. Biologist curators have associated the ontology terms with groups of protein sequences rather than individual sequences. Statistical models (Hidden Markov Models, or HMMs) are built from each of these groups. The advantage of this approach is that new sequences can be automatically classified as they become available. To ensure accurate functional classification, HMMs are constructed not only for families, but also for functionally distinct subfamilies. Multiple sequence alignments and phylogenetic trees, including curator-assigned information, are available for each family. The current version of the PANTHER database includes training sequences from all organisms in the GenBank non-redundant protein database, and the HMMs have been used to classify gene products across the entire genomes of human, and Drosophila melanogaster. The ontology terms and protein families and subfamilies, as well as Drosophila gene c;assifications, can be browsed and searched for free. Due to outstanding contractual obligations, access to human gene classifications and to protein family trees and multiple sequence alignments will temporarily require a nominal registration fee. PANTHER is publicly available on the web at http://panther.celera.com.  相似文献   

11.
Cai CZ  Han LY  Ji ZL  Chen X  Chen YZ 《Nucleic acids research》2003,31(13):3692-3697
Prediction of protein function is of significance in studying biological processes. One approach for function prediction is to classify a protein into functional family. Support vector machine (SVM) is a useful method for such classification, which may involve proteins with diverse sequence distribution. We have developed a web-based software, SVMProt, for SVM classification of a protein into functional family from its primary sequence. SVMProt classification system is trained from representative proteins of a number of functional families and seed proteins of Pfam curated protein families. It currently covers 54 functional families and additional families will be added in the near future. The computed accuracy for protein family classification is found to be in the range of 69.1-99.6%. SVMProt shows a certain degree of capability for the classification of distantly related proteins and homologous proteins of different function and thus may be used as a protein function prediction tool that complements sequence alignment methods. SVMProt can be accessed at http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi.  相似文献   

12.
13.
It is generally assumed that conservation and divergence of DNA signify function (selection) and no function (drift), respectively. This assumption is based on the view that a mutation is a unique event on a single chromosome, the fate of which depends on selection or drift. Knowledge of the rates, units and biases of widespread mechanisms of non-reciprocal DNA exchange, in particular within multigene families, provides alternative explanations for conservation and divergence, notwithstanding biological function. Such mechanisms of DNA turnover cause continual fluctuations in the copy-number of variant genes in an individual and, hence, promote the gradual and cohesive spread of a variant gene throughout a family (homogenization) and throughout a population (fixation). The dual processes (molecular drive) of homogenization and fixation are inextricably linked. Data are presented of the expected stages of transition in the spread of variant repeats by molecular drive in some non-genic families of DNA, seemingly not under the influence of selection. When a molecularly driven change in a given gene family is accompanied by the coevolution (mediated by selection) of other DNA, RNA or protein molecules that interact with the gene family then biological function is observed to be maintained despite sequence divergence. Conversely, the mechanics of DNA turnover and a turnover bias in favour of ancestral sequences can dramatically retard the rate of sequence change, in the absence of function. Examples of the maintenance of function by molecular coevolution and conservation of sequences in the absence of function, are drawn mainly from the rDNA multigene family.  相似文献   

14.
This article deals with the theoretical size distribution of gene and protein families in complete genomes. A simple evolutionary model for the development of such families in which genes in a family are formed or selected against independently and at random, and in which new families are formed by the random splitting of existing families, is used to derive the resulting size distribution. Mathematically this turns out to be the distribution of the state of a homogeneous birth-and-death process after an exponentially distributed time, which it is shown will under certain conditions exhibit the power-law behaviour observed for gene and protein family sizes.  相似文献   

15.
Wood density is thought to be an important indicator of plant life history because it is coupled to many aspects of whole-plant form and function. We used a hierarchical Bayesian approach to explain variation in mortality rates with wood density, drawing on data for 765,500 trees from 1639 species at 10 sites located across the Old and New World tropics. Mortality rates declined with increasing wood density at five of 10 sites. Similar negative trends were detected at four additional sites, while one site showed no relationship. Our model explained 40% of variation in mortality on average. Both wood density and mortality rates show a high degree of phylogenetic conservatism. Grouping species by family across sites in a second analysis, we found considerable variation in the relationship between wood density and mortality, with 10 of 27 families demonstrating a strong negative relationship. Our results highlight the importance of wood density as a functional trait in tropical forests, as it is strongly linked to variation in survival. However, the relationship varied among families, plots, and even census intervals within sites, indicating that the factors responsible for the relationship between wood density and mortality vary spatially, taxonomically and temporally.  相似文献   

16.
Montgomery Slatkin 《Genetics》1985,110(1):145-158
A model is developed to predict the extent of genetic differentiation in a family of transposable elements under the combined effects of genetic drift, transposition, mutation and unbiased gene conversion. The model is based on simplifying assumptions that are valid when transposition is always to new sites and copy number per site is low. In the absence of gene conversion, the degree of differentiation as measured by the probability of identity of different elements is the same as at a single locus with the same mutation rate but in a population of effective size Nc/2, where N is the population size and c is the number of copies per individual. The inclusion of unbiased gene conversion does not significantly change this result. If, as seems to be the case, families of transposable elements are relatively homogeneous, then the model implies either that mutation rates for transposable elements are much lower than at comparable single-copy loci or that some other force, such as natural selection or biased gene conversion, is at work. Transposition is a very ineffective force for homogenizing a family of transposable elements.  相似文献   

17.
The nonsynonymous (amino acid-altering) to synonymous (silent) substitution rate ratio (omega = d(N)/d(S)) provides a measure of natural selection at the protein level, with omega = 1, >1, and <1, indicating neutral evolution, purifying selection, and positive selection, respectively. Previous studies that used this measure to detect positive selection have often taken an approach of pairwise comparison, estimating substitution rates by averaging over all sites in the protein. As most amino acids in a functional protein are under structural and functional constraints and adaptive evolution probably affects only a few sites at a few time points, this approach of averaging rates over sites and over time has little power. Previously, we developed codon-based substitution models that allow the omega ratio to vary either among lineages or among sites. In this paper we extend previous models to allow the omega ratio to vary both among sites and among lineages and implement the new models in the likelihood framework. These models may be useful for identifying positive selection along prespecified lineages that affects only a few sites in the protein. We apply those branch-site models as well as previous branch- and site-specific models to three data sets: the lysozyme genes from primates, the tumor suppressor BRCA1 genes from primates, and the phytochrome (PHY) gene family in angiosperms. Positive selection is detected in the lysozyme and BRCA genes by both the new and the old models. However, only the new models detected positive selection acting on lineages after gene duplication in the PHY gene family. Additional tests on several data sets suggest that the new models may be useful in detecting positive selection after gene duplication in gene family evolution.  相似文献   

18.
The gene encoding the human tumor marker carcinoembryonic antigen (CEA) belongs to a gene family which can be subdivided into the CEA and the pregnancy-specific glycoprotein subgroups. The corresponding proteins are members of the immunoglobulin superfamily, characterized through the presence of one IgV-like domain and a varying number of IgC-like domains. Since the function of the CEA family is not well understood, we decided to establish an animal model in the rat to study its tissue-specific and developmental stage-dependent expression. To this end, we have screened an 18-day rat placenta cDNA library with a recently isolated fragment of a rat CEA-related gene. Two overlapping clones containing the complete coding region for a putative 709 amino acid protein (rnCGM1; Mr = 78,310) have been characterized. In contrast to all members of the human CEA family, this rat CEA-related protein consists of five IgV-like domains and only one IgC-like domain. This novel structure, which has been confirmed at the genomic level might have important functional implications. Due to the rapid evolutionary divergence of the rat and human CEA gene families it is not possible to assign rnCGM1 to its human counterpart. However, the predominant expression of the rnCGM1 gene in the placenta suggests that it could be analogous to one of the human pregnancy-specific glycoprotein genes.  相似文献   

19.
We have isolated and characterized a third nonallelic tandemly arrayed histone cluster (LpE) from the sea urchin Lytechinus pictus. Although this tandem array is not intermingled with the other two early histone gene families also found in the L. pictus genome, the order and polarity of the five histone coding sequences in this family are the same as every other well characterized sea urchin early histone gene family. Heteroduplex analysis and restriction endonuclease mapping experiments indicate that the LpE family is more closely related to the B-C than the A-D family of early histone genes. Examination of several individual sperm DNA samples has revealed considerable polymorphism in each of the three tandem repeat families. Within an individual, however, each family is remarkably homogeneous. Thus, our results indicate that rapid fixation of variants acts to homogenize the members of a single tandem array at a considerably faster rate within a family than between families. However, at least some exchange of sequences between families is evident based on the conservation of many restriction endonuclease recognition sites and from analysis of a a cosmid clone in which the A-D and E tandem repeats are found adjacent to one another. These differences in the rate of fixation of variants within and between these families are likely to be responsible for the maintenance of diversity between the different families.  相似文献   

20.
Natural history and functional divergence of protein tyrosine kinases   总被引:3,自引:0,他引:3  
Gu J  Gu X 《Gene》2003,317(1-2):49-57
Cellular signaling is important for many biological processes including growth, differentiation, adhesion, motility and apoptosis. The protein tyrosine kinase (PTK) supergene family is the key mediator in cellular signaling in metazoans, directly associated with a variety of human diseases. All PTKs contain a highly conserved catalytic kinase domain, in spite of variable multi-domain structures. Within each PTK gene family, members exhibit functional divergence in substrate-specificity or temporal/tissue-specific expression, although their primary function is conserved. After conducting phylogenetic analysis on major PTK gene families, we found that the expanding of each PTK family was likely caused by gene or genome duplication event(s) that occurred before the emergence of teleosts but after the vertebrate-amphioxus split. We further investigated the evolutionary pattern of functional divergence after gene duplication in those gene families. Our results show that site-specific shifted evolutionary rate (altered functional constraint) is a common pattern in PTK gene family evolution.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号