首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The genes encoding non-specific lipid transfer proteins (nsLTPs), members of a small multigene family, show a complex pattern of expressional regulation, suggesting that some diversification may have resulted from changes in their expression after duplication. In this study, the evolution of nsLTP genes within the Poaceae family was characterized via a survey of the pseudogenes and unigenes encoding the nsLTP in rice pseudomolecules and the NCBI unigene database. nsLTP-rich regions were detected in the distal portions of rice chromosomes 11 and 12; these may have resulted from the most recent large segmental duplication in the rice genome. Two independent tandem duplications were shown to occur within the nsLTP-rich regions of rice. The genomic distribution of the nsLTP genes in the rice genome differs from that in wheat. This may be attributed to gene migration, chromosomal rearrangement, and/or differential gene loss. The genomic distribution pattern of nsLTP genes in the Poaceae family points to the existence of some differences among cereal nsLTP genes, all of which diverged from an ancient gene. The unigenes encoding nsLTPs in each cereal species are clustered into five groups. The somewhat different distribution of nsLTP-encoding EST clones between the groups across cereal species imply that independent duplication(s) followed by subfunctionalization (and/or neofunctionalization) of the nsLTP gene family in each species occurred during speciation.  相似文献   

2.
Tandemly arrayed genes (TAGs) play an important functional and physiological role in the genome. Most previous studies have focused on individual TAG families in a few species, yet a broad characterization of TAGs is not available. Here we identified all TAGs in the genomes of humans, mouse, and rat and performed a comprehensive analysis of TAG distribution, TAG sizes, TAG orientations and intergenic distances, and TAG functions. TAGs account for about 14-17% of all genes in the genome and nearly one-third of all duplicated genes, highlighting the predominant role that tandem duplication plays in gene duplication. For all species, TAG distribution is highly heterogeneous along chromosomes and some chromosomes are enriched with TAG forests, whereas others are enriched with TAG deserts. The majority of TAGs are of size 2 for all genomes, similar to the previous findings in Caenorhabditis elegans, Arabidopsis thaliana, and Oryza sativa, suggesting that it is a rather general phenomenon in eukaryotes. The comparison with the genome patterns shows that TAG members have a significantly higher proportion of parallel gene orientation in all species, corroborating Graham's claim that parallel orientation is the preferred form of orientation in TAGs. Moreover, TAG members with parallel orientation tend to be closer to each other than all neighboring genes in the genome with parallel orientation. The analyses of Gene Ontology function indicate that genes with receptor or binding activities are significantly overrepresented by TAGs. Computer simulation reveals that random gene rearrangements have little effect on the statistics of TAGs for all genomes. Finally, the average proportion of TAGs shows a trend of increase with the increase of family sizes, although the correlation between TAG proportions in individual families and family sizes is not significant.  相似文献   

3.
We have tried to approach the nature of the last common ancestor to Haemophilus influenzae and Escherichia coli and to determine how each bacterium could have diverged from this putative organism. The approach used was exhaustive analysis of the homologous proteins coded by genes present in these bacteria, using as criteria for sequence relatedness an alignment of at least 80 amino acid residues and a PAM distance (number of accepted point mutations per 100 residues separating two sequences) below 250. Evolutionarily significant similarities were found between 1,345 H. influenzae proteins (85% of the total genome) and 3,058 E. coli. proteins (75% of the total genome), many of them belonging to families of various sizes (from 666 doublets to 35 large groups of more than 10 members). Nearly all the genes found by this approach to be duplicated in both bacteria were already duplicated in their last common ancestor. This was deduced from (1) the comparison of the respective distributions of evolutionary distances between orthologs (genes separated only by speciation events) and paralogs (genes duplicated in the same genome) and (2) the analysis of the phylogenetic trees reconstructed for each family of paralogs containing at least two members belonging to each bacterium. The distributions of the different categories of homologs show a significant loss of paralogous genes in H. influenzae (reduction proportional to the genome size), of many sequences which are still present in one copy in E. coli, and of some entire gene families. Phylogenetic trees also confirmed this recent loss of paralogous genes in H. influenzae. Thus, the genome size of the last common ancestor of these two bacteria would have been close to that of present-day E. coli, and the evolution of H. influenzae toward a parasitic life led to an important decrease in its genome size by some mechanism of streamlining. During this recent evolution, the memory of the gene order present in the last common ancestor has been blurred, but a few short conserved chromosomal fragments can still be detected in present-day E. coli and H. influenzae.   相似文献   

4.
MOTIVATION: Gene duplications and losses (GDLs) are important events in genome evolution. They result in expansion or contraction of gene families, with a likely role in phenotypic evolution. As more genomes become available and their annotations are improved, software programs capable of rapidly and accurately identifying the content of ancestral genomes and the timings of GDLs become necessary to understand the unique evolution of each lineage. RESULTS: We report EvolMAP, a new algorithm and software that utilizes a species tree-based gene clustering method to join all-to-all symmetrical similarity comparisons of multiple gene sets in order to infer the gene composition of multiple ancestral genomes. The algorithm further uses Dollo parsimony-based comparison of the inferred ancestral genes to pinpoint the timings of GDLs onto evolutionary intervals marked by speciation events. Using EvolMAP, first we analyzed the expansion of four families of G-protein coupled receptors (GPCRs) within animal lineages. Additional to demonstrating the unique expansion tree for each family, results also show that the ancestral eumetazoan genome contained many fewer GPCRs than modern animals, and these families expanded through concurrent lineage-specific duplications. Second, we analyzed the history of GDLs in mammalian genomes by comparing seven proteomes. In agreement with previous studies, we report that the mammalian gene family sizes have changed drastically through their evolution. Interestingly, although we identified a potential source of duplication for 75% of the gained genes, remaining 25% did not have clear-cut sources, revealing thousands of genes that have likely gained their distinct sequence identities within the descent of mammals. AVAILABILITY: Query server, source code and executable are available at http://kosik-web.mcdb.ucsb.edu/evolmap/index.htm .  相似文献   

5.
J. H. Nadeau  D. Sankoff 《Genetics》1997,147(3):1259-1266
Duplicated genes are an important source of new protein functions and novel developmental and physiological pathways. Whereas most models for fate of duplicated genes show that they tend to be rapidly lost, models for pathway evolution suggest that many duplicated genes rapidly acquire novel functions. Little empirical evidence is available, however, for the relative rates of gene loss vs. divergence to help resolve these contradictory expectations. Gene families resulting from genome duplications provide an opportunity to address this apparent contradiction. With genome duplication, the number of duplicated genes in a gene family is at most 2(n), where n is the number of duplications. The size of each gene family, e.g., 1, 2, 3, . . . , 2(n), reflects the patterns of gene loss vs. functional divergence after duplication. We focused on gene families in humans and mice that arose from genome duplications in early vertebrate evolution and we analyzed the frequency distribution of gene family size, i.e., the number of families with two, three or four members. All the models that we evaluated showed that duplicated genes are almost as likely to acquire a new and essential function as to be lost through acquisition of mutations that compromise protein function. An explanation for the unexpectedly high rate of functional divergence is that duplication allows genes to accumulate more neutral than disadvantageous mutations, thereby providing more opportunities to acquire diversified functions and pathways.  相似文献   

6.
Many genes exist in the form of families; however, little is known about their size variation, evolution and biology. Here, we present the size variation and evolution of the nucleotide-binding site (NBS)-encoding gene family and receptor-like kinase (RLK) gene family in Oryza, Glycine and Gossypium. The sizes of both families vary by numeral fold, not only among species, surprisingly, also within a species. The size variations of the gene families are shown to correlate with each other, indicating their interactions, and driven by natural selection, artificial selection and genome size variation, but likely not by polyploidization. The numbers of genes in the families in a polyploid species are similar to those of one of its diploid donors, suggesting that polyploidization plays little roles in the expansion of the gene families and that organisms tend not to maintain their ‘surplus’ genes in the course of evolution. Furthermore, it is found that the size variations of both gene families are associated with organisms’ phylogeny, suggesting their roles in speciation and evolution. Since both selection and speciation act on organism’s morphological, physiological and biological variation, our results indicate that the variation of gene family size provides a source of genetic variation and evolution.  相似文献   

7.
Several isolates of the marine cyanobacterial genus Prochlorococcus have smaller genome sizes than those of the closely related genus Synechococcus. In order to test whether loss of protein-coding genes has contributed to genome size reduction in Prochlorococcus, we reconstructed events of gene family evolution over a strongly supported phylogeny of 12 Prochlorococcus genomes and 9 Synechococcus genomes. Significantly, more events both of loss of paralogs within gene families and of loss of entire gene families occurred in Prochlorococcus than in Synechococcus. The number of nonancestral gene families in genomes of both genera was positively correlated with the extent of genomic islands (GIs), consistent with the hypothesis that horizontal gene transfer (HGT) is associated with GIs. However, even when only isolates with comparable extents of GIs were compared, significantly more events of gene family loss and of paralog loss were seen in Prochlorococcus than in Synechococcus, implying that HGT is not the primary reason for the genome size difference between the two genera.  相似文献   

8.
Summary Various rodent and primate DNAs exhibit a stronger intra- than interspecies cross-hybridization with probes derived from the N-terminal domain exons of human and rat carcinoembryonic antigen (CEA)-like genes. Southern analyses also reveal that the human and rat CEA gene families are of similar complexity. We counted at least 10 different genes per human haploid genome. In the rat, approximately seven to nine different N-terminal domain exons that presumably represent different genes appear to be present. We were able to assign the corresponding genomic restriction endonuclease fragments to already isolated CEA gene family members of both human and rat. Highly similar subgroups, as found within the human CEA gene family, seem to be absent from the rat genome. Hybridization with an intron probe from the human nonspecific cross-reacting antigen (NCA) gene and analysis of DNA sequence data indicate the conservation of noncoding regions among CEA-like genes within primates, implicating that whole gene units may have been duplicated. With the help of a computer program and by calculating the rate of synonymous substitutions, evolutionary trees have been derived. From this, we propose that an independent parallel evolution, leading to different CEA gene families, must have taken place in, at least, the primate and rodent orders.  相似文献   

9.
The genetic architecture of resistance   总被引:13,自引:0,他引:13  
Plant resistance genes (R genes), especially the nucleotide binding site leucine-rich repeat (NBS-LRR) family of sequences, have been extensively studied in terms of structural organization, sequence evolution and genome distribution. These studies indicate that NBS-LRR sequences can be split into two related groups that have distinct amino-acid motif organizations, evolutionary histories and signal transduction pathways. One NBS-LRR group, characterized by the presence of a Toll/interleukin receptor domain at the amino-terminal end, seems to be absent from the Poaceae. Phylogenetic analysis suggests that a small number of NBS-LRR sequences existed among ancient Angiosperms and that these ancestral sequences diversified after the separation into distinct taxonomic families. There are probably hundreds, perhaps thousands, of NBS-LRR sequences and other types of R gene-like sequences within a typical plant genome. These sequences frequently reside in 'mega-clusters' consisting of smaller clusters with several members each, all localized within a few million base pairs of one another. The organization of R-gene clusters highlights a tension between diversifying and conservative selection that may be relevant to gene families that are unrelated to disease resistance.  相似文献   

10.
The likelihood of duplicate gene retention following polyploidy varies by functional properties (e.g. gene ontologies or protein family domains), but little is known about the effects of whole-genome duplication on gene networks related by a common physiological process. Here, we examined the effects of both polyploid and nonpolyploid duplications on genes encoding the major functional groups of photosynthesis (photosystem I, photosystem II, the light-harvesting complex, and the Calvin cycle) in the cultivated soybean (Glycine max), which has experienced two rounds of whole-genome duplication. Photosystem gene families exhibit retention patterns consistent with dosage sensitivity (preferential retention of polyploid duplicates and elimination of nonpolyploid duplicates), whereas Calvin cycle and light-harvesting complex gene families do not. We observed similar patterns in barrel medic (Medicago truncatula), which shared the older genome duplication with soybean but has evolved independently for approximately 50 million years, and in Arabidopsis (Arabidopsis thaliana), which experienced two nested polyploidy events independent from the legume duplications. In both soybean and Arabidopsis, Calvin cycle gene duplicates exhibit a greater capacity for functional differentiation than do duplicates within the photosystems, which likely explains the greater retention of ancient, nonpolyploid duplicates and larger average gene family size for the Calvin cycle relative to the photosystems.  相似文献   

11.
12.
We have isolated and characterized a third nonallelic tandemly arrayed histone cluster (LpE) from the sea urchin Lytechinus pictus. Although this tandem array is not intermingled with the other two early histone gene families also found in the L. pictus genome, the order and polarity of the five histone coding sequences in this family are the same as every other well characterized sea urchin early histone gene family. Heteroduplex analysis and restriction endonuclease mapping experiments indicate that the LpE family is more closely related to the B-C than the A-D family of early histone genes. Examination of several individual sperm DNA samples has revealed considerable polymorphism in each of the three tandem repeat families. Within an individual, however, each family is remarkably homogeneous. Thus, our results indicate that rapid fixation of variants acts to homogenize the members of a single tandem array at a considerably faster rate within a family than between families. However, at least some exchange of sequences between families is evident based on the conservation of many restriction endonuclease recognition sites and from analysis of a a cosmid clone in which the A-D and E tandem repeats are found adjacent to one another. These differences in the rate of fixation of variants within and between these families are likely to be responsible for the maintenance of diversity between the different families.  相似文献   

13.
MOTIVATION: The distributions of many genome-associated quantities, including the membership of paralogous gene families can be approximated with power laws. We are interested in developing mathematical models of genome evolution that adequately account for the shape of these distributions and describe the evolutionary dynamics of their formation. RESULTS: We show that simple stochastic models of genome evolution lead to power-law asymptotics of protein domain family size distribution. These models, called Birth, Death and Innovation Models (BDIM), represent a special class of balanced birth-and-death processes, in which domain duplication and deletion rates are asymptotically equal up to the second order. The simplest, linear BDIM shows an excellent fit to the observed distributions of domain family size in diverse prokaryotic and eukaryotic genomes. However, the stochastic version of the linear BDIM explored here predicts that the actual size of large paralogous families is reached on an unrealistically long timescale. We show that introduction of non-linearity, which might be interpreted as interaction of a particular order between individual family members, allows the model to achieve genome evolution rates that are much better compatible with the current estimates of the rates of individual duplication/loss events.  相似文献   

14.
Intrinsically disordered regions in eukaryotic proteomes contain key signaling and regulatory modules and mediate interactions with many proteins. Many viral proteomes encode disordered proteins and modulate host factors through the use of short linear motifs (SLiMs) embedded within disordered regions. However, the degree of viral protein disorder across different viruses is not well understood, so we set out to establish the constraints acting on viruses, in terms of their use of disordered protein regions. We surveyed predicted disorder across 2,278 available viral genomes in 41 families, and correlated the extent of disorder with genome size and other factors. Protein disorder varies strikingly between viral families (from 2.9% to 23.1% of residues), and also within families. However, this substantial variation did not follow the established trend among their hosts, with increasing disorder seen across eubacterial, archaebacterial, protists, and multicellular eukaryotes. For example, among large mammalian viruses, poxviruses and herpesviruses showed markedly differing disorder (5.6% and 17.9%, respectively). Viral families with smaller genome sizes have more disorder within each of five main viral types (ssDNA, dsDNA, ssRNA+, dsRNA, retroviruses), except for negative single-stranded RNA viruses, where disorder increased with genome size. However, surveying over all viruses, which compares tiny and enormous viruses over a much bigger range of genome sizes, there is no strong association of genome size with protein disorder. We conclude that there is extensive variation in the disorder content of viral proteomes. While a proportion of this may relate to base composition, to extent of gene overlap, and to genome size within viral types, there remain important additional family and virus-specific effects. Differing disorder strategies are likely to impact on how different viruses modulate host factors, and on how rapidly viruses can evolve novel instances of SLiMs subverting host functions, such as innate and acquired immunity.  相似文献   

15.
Molecular evolution of the rice miR395 gene family   总被引:6,自引:1,他引:5  
  相似文献   

16.
17.
This article deals with the theoretical size distribution of gene and protein families in complete genomes. A simple evolutionary model for the development of such families in which genes in a family are formed or selected against independently and at random, and in which new families are formed by the random splitting of existing families, is used to derive the resulting size distribution. Mathematically this turns out to be the distribution of the state of a homogeneous birth-and-death process after an exponentially distributed time, which it is shown will under certain conditions exhibit the power-law behaviour observed for gene and protein family sizes.  相似文献   

18.
A prototype family of seven genes encoding the variable surface lipoproteins (Vlps) of Mycoplasma hyorhinis is characterized in the pathogenic SK76 strain, using long-range PCR to amplify and analyze the single chromosomal region containing expressed genes vlpA to -G, each of which is subject to phase and size variation. Smaller families of vlp genes in subclones of SK76 or in another strain of M. hyorhinis, GDL, can be attributed to deletions of specific vlp genes from the prototype array described here. Two genes, vlpA and the newly revealed vlpG, contain repeat motifs in their 3' coding regions that differ from the short tandem repeats in other vlp genes yet retain structural features common to all vlp gene products. SK76 and GDL vlp gene families are similarly organized and show sequence similarity between corresponding individual vlp genes. In light of the extensive potential for diversity within the vlp gene system, such conservation provides a provisional basis to hypothesize that vlp genes may exist in specific arrays that endow selected functions while retaining common structural features required during phase-variable expression of this set of gene products.  相似文献   

19.
Multigene sequence data have great potential for elucidating important and interesting evolutionary processes, but statistical methods for extracting information from such data remain limited. Although various biological processes may cause different genes to have different genealogical histories (and hence different tree topologies), we also may expect that the number of distinct topologies among a set of genes is relatively small compared with the number of possible topologies. Therefore evidence about the tree topology for one gene should influence our inferences of the tree topology on a different gene, but to what extent? In this paper, we present a new approach for modeling and estimating concordance among a set of gene trees given aligned molecular sequence data. Our approach introduces a one-parameter probability distribution to describe the prior distribution of concordance among gene trees. We describe a novel 2-stage Markov chain Monte Carlo (MCMC) method that first obtains independent Bayesian posterior probability distributions for individual genes using standard methods. These posterior distributions are then used as input for a second MCMC procedure that estimates a posterior distribution of gene-to-tree maps (GTMs). The posterior distribution of GTMs can then be summarized to provide revised posterior probability distributions for each gene (taking account of concordance) and to allow estimation of the proportion of the sampled genes for which any given clade is true (the sample-wide concordance factor). Further, under the assumption that the sampled genes are drawn randomly from a genome of known size, we show how one can obtain an estimate, with credibility intervals, on the proportion of the entire genome for which a clade is true (the genome-wide concordance factor). We demonstrate the method on a set of 106 genes from 8 yeast species.  相似文献   

20.
We introduce and analyse a simple probabilistic model of genome evolution. It is based on three fundamental evolutionary events: gene loss, duplication and accumulated change. This is motivated by previous works which consisted in fitting the available genomic data into, what is called paralog distributions. This formalism is described by a system of infinite number of linear equations. We show that this system generates a semigroup of linear operators on the space l 1. We prove that size distribution of paralogous gene families in a genome converges to the equilibrium as time goes to infinity. Moreover we show that when probabilities of gene removal and duplication are close to each other, then the resulting distribution is close to logarithmic distribution. Some empirical results for yeast genomes are presented.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号