首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The proliferation of gene data from multiple loci of large multigene families has been greatly facilitated by considerable recent advances in sequence generation. The evolution of such gene families, which often undergo complex histories and different rates of change, combined with increases in sequence data, pose complex problems for traditional phylogenetic analyses, and in particular, those that aim to successfully recover species relationships from gene trees. Here, we implement gene tree parsimony analyses on multicopy gene family data sets of snake venom proteins for two separate groups of taxa, incorporating Bayesian posterior distributions as a rigorous strategy to account for the uncertainty present in gene trees. Gene tree parsimony largely failed to infer species trees congruent with each other or with species phylogenies derived from mitochondrial and single-copy nuclear sequences. Analysis of four toxin gene families from a large expressed sequence tag data set from the viper genus Echis failed to produce a consistent topology, and reanalysis of a previously published gene tree parsimony data set, from the family Elapidae, suggested that species tree topologies were predominantly unsupported. We suggest that gene tree parsimony failure in the family Elapidae is likely the result of unequal and/or incomplete sampling of paralogous genes and demonstrate that multiple parallel gene losses are likely responsible for the significant species tree conflict observed in the genus Echis. These results highlight the potential for gene tree parsimony analyses to be undermined by rapidly evolving multilocus gene families under strong natural selection.  相似文献   

2.
3.
Bayesian phylogenetic inference via Markov chain Monte Carlo methods   总被引:27,自引:0,他引:27  
Mau B  Newton MA  Larget B 《Biometrics》1999,55(1):1-12
We derive a Markov chain to sample from the posterior distribution for a phylogenetic tree given sequence information from the corresponding set of organisms, a stochastic model for these data, and a prior distribution on the space of trees. A transformation of the tree into a canonical cophenetic matrix form suggests a simple and effective proposal distribution for selecting candidate trees close to the current tree in the chain. We illustrate the algorithm with restriction site data on 9 plant species, then extend to DNA sequences from 32 species of fish. The algorithm mixes well in both examples from random starting trees, generating reproducible estimates and credible sets for the path of evolution.  相似文献   

4.
MicroRNAs (miRNAs) and the mRNA targets of miRNAs were identified by sequence complementarity within a DNA sequence database for species of the Triticeae. Data screening identified 28 miRNA precursor sequences from 15 miRNA families that contained conserved mature miRNA sequences within predicted stem-loop structures. In addition, the identification of 337 target sequences among Triticeae genes provided further evidence of the existence of 26 miRNA families in the cereals. MicroRNA targets included genes that are homologous to known targets in diverse model species as well as novel targets. MicroRNA precursors and targets were identified in 10 related species, though the great majority of them were identified in bread wheat, Triticum aestivum, and barley, Hordeum vulgare, the two species with the largest EST data sets among the Triticeae.  相似文献   

5.
We introduce a new method for identifying optimal incomplete data sets from large sequence databases based on the graph theoretic concept of alpha-quasi-bicliques. The quasi-biclique method searches large sequence databases to identify useful phylogenetic data sets with a specified amount of missing data while maintaining the necessary amount of overlap among genes and taxa. The utility of the quasi-biclique method is demonstrated on large simulated sequence databases and on a data set of green plant sequences from GenBank. The quasi-biclique method greatly increases the taxon and gene sampling in the data sets while adding only a limited amount of missing data. Furthermore, under the conditions of the simulation, data sets with a limited amount of missing data often produce topologies nearly as accurate as those built from complete data sets. The quasi-biclique method will be an effective tool for exploiting sequence databases for phylogenetic information and also may help identify critical sequences needed to build large phylogenetic data sets.  相似文献   

6.
Conserved genes have found their way into the mainstream of molecular systematics. Many of these genes are members of multigene families. A difficulty with using single genes of multigene families for phylogenetic inference is that genes from one species may be paralogous to those from another taxon. We focus attention on this problem using heat shock 70 (HSP70) genes. Using polymerase chain reaction techniques with genomic DNA, we isolated and sequenced 123 distinct sequences from 12 species of sharks. Phylogenetic analysis indicated that the sequences cluster with constituitively expressed cytoplasmic heat shock-like genes. Three highly divergent gene clades were sampled. A number of similar sequences were sampled from each species within each distinct gene clade. Comparison of published species trees with an HSP70 gene tree inferred using Bayesian phylogenetic analysis revealed several cases of gene duplication and differential sorting of gene lineages within this group of sharks. Gene tree parsimony based on the objective criteria of duplication and losses showed that previously published hypotheses of species relationships and two novel hypothesis based on Bayesian phylogenetics were concordant with the history of HSP70 gene duplication and loss. By contrast, two published hypotheses based on morphological data were not significantly different from the null hypothesis of a random association between species relatedness and the HSP70 gene tree. These results suggest that gene tree parsimony using data from multigene families can be used for inferring species relationships or testing published alternative hypotheses. More importantly, the results suggest that systematic studies relying on phylogenetic inferences from HSP70 genes may by plagued by unrecognized paralogy of sampled genes. Our results underscore the distinction between gene and species trees and highlight an underappreciated source of discordance between gene trees and organismal phylogeny, i.e., unrecognized paralogy of sampled genes.  相似文献   

7.
Using degenerate PCR primers that target evolutionarily conserved sequences in pal genes, we show that in the gymnosperm, Pinus banksiana, phenylalanine ammonia-lyase (PAL) is encoded by a multigene family of at least eight to ten loci. Five classes of pal sequence were easily distinguished among 28 clones sequenced from the products of PCR amplification of haploid genomic DNA. The dominant sequence from each class was named, yielding pal1 to pal5 loci. These genes shared 68.8% to 94.0% nucleotide identity over the 366 bp region compared. All of pal1 to pal5 were expressed in cell suspension cultures treated with a fungal elicitor and all but pal3 were expressed in differentiating xylem tissue of a mature tree. Only pal1 was expressed in unelicited cell cultures. While these P. banksiana genes are quite divergent, they are still more similar to each other than to any angiosperm pal gene cloned to date. For its roles in development and defense, PAL production in P. banksiana is coordinated from a large, diverse multigene family. We discuss evidence suggesting that other pines have similar pal gene family structures.  相似文献   

8.
MOTIVATION: Genome projects have produced large amounts of data on the sequences of new genes whose functions are as yet unknown. The functions of new genes are usually inferred by comparing their sequences with those of known genes, but evaluation of the sequence homology of individual genes does not make the most of the available sequence information. Therefore, new methods and tools for extracting more biological information from homology searches would be advantageous. RESULTS: We have developed a computational tool, ORI-GENE, to analyze the results of sequence homology searches from the perspective of the evolution of selected sets of new genes. ORI-GENE has a graphical interface and accomplishes two important tasks: first, based on the output of homology searches, it identifies species with similar genes and displays their pattern of distribution on the phylogenetic tree. This function enables one to infer the way in which a given gene may have propagated among species over time. Second, from the distribution patterns, it predicts the point at which a given gene may have been first acquired (i.e. its 'origin'), then classifies the gene on that basis. Because it makes use of available evolutionary information to show the way in which genes cluster among species, ORI-GENE should be an effective tool for the screening and classification of new genes revealed by genome analysis. AVAILABILITY: ORI-GENE is retrievable via the Internet at: http://www.rtc.riken.go.jp/jouhou/ORI-GENE.  相似文献   

9.
In the context of exponential growing molecular databases, it becomes increasingly easy to assemble large multigene data sets for phylogenomic studies. The expected increase of resolution due to the reduction of the sampling (stochastic) error is becoming a reality. However, the impact of systematic biases will also become more apparent or even dominant. We have chosen to study the case of the long-branch attraction artefact (LBA) using real instead of simulated sequences. Two fast-evolving eukaryotic lineages, whose evolutionary positions are well established, microsporidia and the nucleomorph of cryptophytes, were chosen as model species. A large data set was assembled (44 species, 133 genes, and 24,294 amino acid positions) and the resulting rooted eukaryotic phylogeny (using a distant archaeal outgroup) is positively misled by an LBA artefact despite the use of a maximum likelihood-based tree reconstruction method with a complex model of sequence evolution. When the fastest evolving proteins from the fast lineages are progressively removed (up to 90%), the bootstrap support for the apparently artefactual basal placement decreases to virtually 0%, and conversely only the expected placement, among all the possible locations of the fast-evolving species, receives increasing support that eventually converges to 100%. The percentage of removal of the fastest evolving proteins constitutes a reliable estimate of the sensitivity of phylogenetic inference to LBA. This protocol confirms that both a rich species sampling (especially the presence of a species that is closely related to the fast-evolving lineage) and a probabilistic method with a complex model are important to overcome the LBA artefact. Finally, we observed that phylogenetic inference methods perform strikingly better with simulated as opposed to real data, and suggest that testing the reliability of phylogenetic inference methods with simulated data leads to overconfidence in their performance. Although phylogenomic studies can be affected by systematic biases, the possibility of discarding a large amount of data containing most of the nonphylogenetic signal allows recovering a phylogeny that is less affected by systematic biases, while maintaining a high statistical support.  相似文献   

10.
Human ferritin H and L sequences lie on ten different chromosomes   总被引:5,自引:2,他引:3  
Summary In humans, the H (heavy) and L (light) chains of the iron-storage protein ferritin, are derived from multigene families. We have examined the chromosomal distribution of these H and L sequences by Southern analysis of hybrid cell DNA and by chrosomal in situ hybridization. Our results show that human ferritin H genes and related sequences are found on at least seven different chromosomes while L genes and related sequences are on at least three different chromosomes. Further, we have mapped the chromosomal location of expressed genes for human H and L ferritin chains and have found an H sequence which may be a useful marker for idiopathic hemochromatosis.  相似文献   

11.
Partial cDNA sequencing to obtain expressed sequence tags (ESTs) has led to the identification of tags to about 8000 of the estimated 20 000 genes in Arabidopsis thaliana . This figure represents four to five times the number of complete coding sequences from this organism available in international databases. In contrast to mammals, many proteins are encoded by multigene families in A. thaliana . Using ribosomal protein gene families as an example, it is possible to construct relatively long sequences from overlapping ESTs which are of sufficiently high quality to be able to unambiguously identify tags to individual members of multigene families, even when the sequences are highly conserved. A total of 106 genes encoding 50 different cytoplasmic ribosomal protein types have been identified, most proteins being encoded by at least two and up to four genes. Coding sequences of members of individual gene families are almost always very highly conserved and derived amino acid sequences are almost, if not completely, identical in the vast majority of cases. Sequence divergence is observed in untranslated regions which allows the definition of gene-specific probes. The method can be used to construct high-quality tags to any protein.  相似文献   

12.
Higher-level relationships within, and the root of Placentalia, remain contentious issues. Resolution of the placental tree is important to the choice of mammalian genome projects and model organisms, as well as for understanding the biogeography of the eutherian radiation. We present phylogenetic analyses of 63 species representing all extant eutherian mammal orders for a new molecular phylogenetic marker, a 1.3kb portion of exon 26 of the apolipoprotein B (APOB) gene. In addition, we analyzed a multigene concatenation that included APOB sequences and a previously published data set (Murphy et al., 2001b) of three mitochondrial and 19 nuclear genes, resulting in an alignment of over 17kb for 42 placentals and two marsupials. Due to computational difficulties, previous maximum likelihood analyses of large, multigene concatenations for placental mammals have used quartet puzzling, less complex models of sequence evolution, or phylogenetic constraints to approximate a full maximum likelihood bootstrap. Here, we utilize a Unix load sharing facility to perform maximum likelihood bootstrap analyses for both the APOB and concatenated data sets with a GTR+Gamma+I model of sequence evolution, tree-bisection and reconnection branch-swapping, and no phylogenetic constraints. Maximum likelihood and Bayesian analyses of both data sets provide support for the superordinal clades Boreoeutheria, Euarchontoglires, Laurasiatheria, Xenarthra, Afrotheria, and Ostentoria (pangolins+carnivores), as well as for the monophyly of the orders Eulipotyphla, Primates, and Rodentia, all of which have recently been questioned. Both data sets recovered an association of Hippopotamidae and Cetacea within Cetartiodactyla, as well as hedgehog and shrew within Eulipotyphla. APOB showed strong support for an association of tarsier and Anthropoidea within Primates. Parsimony, maximum likelihood and Bayesian analyses with both data sets placed Afrotheria at the base of the placental radiation. Statistical tests that employed APOB to examine a priori hypotheses for the root of the placental tree rejected rooting on myomorphs and hedgehog, but did not discriminate between rooting at the base of Afrotheria, at the base of Xenarthra, or between Atlantogenata (Xenarthra+Afrotheria) and Boreoeutheria. An orthologous deletion of 363bp in the aligned APOB sequences proved phylogenetically informative for the grouping of the order Carnivora with the order Pholidota into the superordinal clade Ostentoria. A smaller deletion of 237-246bp was diagnostic of the superordinal clade Afrotheria.  相似文献   

13.
An investigation of mushroom phylogeny using the largest subunit of RNA polymerase II gene sequences (RPB1) was conducted in comparison with nuclear ribosomal large subunit RNA gene sequences (nLSU) for the same set of taxa in the genus Inocybe (Agaricales, Basidiomycota). The two data sets, though not significantly incongruent, exhibit conflict among the placement of two taxa that exhibit long branches in the nLSU data set. In contrast, RPB1 terminal branch lengths are rather uniform. Bootstrap support is increased for clades in RPB1. Combined data sets increase the degree of confidence for several relationships. Overall, nLSU data do not yield a robust phylogeny when independently assessed by RPB1 sequences. This multigene study indicates that Inocybe is a monophyletic group composed of at least four distinct lineages-subgenus Mallocybe, section Cervicolores, section Rimosae, and subgenus Inocybe sensu Kühner, Kuyper, non Singer. Within subgenus Inocybe, two additional lineages, one composed of species with smooth basidiospores (clade I) and a second characterized by nodulose-spored species (clade II), are recovered by RPB1 and combined data. The nLSU data recover only clade I. The genera Astrosporina and Inocybella cannot be recognized phylogenetically. "Supersections" Cortinatae and Marginatae are not monophyletic groups.  相似文献   

14.
Exploring the plant transcriptome through phylogenetic profiling   总被引:5,自引:0,他引:5       下载免费PDF全文
Publicly available protein sequences represent only a small fraction of the full catalog of genes encoded by the genomes of different plants, such as green algae, mosses, gymnosperms, and angiosperms. By contrast, an enormous amount of expressed sequence tags (ESTs) exists for a wide variety of plant species, representing a substantial part of all transcribed plant genes. Integrating protein and EST sequences in comparative and evolutionary analyses is not straightforward because of the heterogeneous nature of both types of sequence data. By combining information from publicly available EST and protein sequences for 32 different plant species, we identified more than 250,000 plant proteins organized in more than 12,000 gene families. Approximately 60% of the proteins are absent from current sequence databases but provide important new information about plant gene families. Analysis of the distribution of gene families over different plant species through phylogenetic profiling reveals interesting insights into plant gene evolution, and identifies species- and lineage-specific gene families, orphan genes, and conserved core genes across the green plant lineage. We counted a similar number of approximately 9,500 gene families in monocotyledonous and eudicotyledonous plants and found strong evidence for the existence of at least 33,700 genes in rice (Oryza sativa). Interestingly, the larger number of genes in rice compared to Arabidopsis (Arabidopsis thaliana) can partially be explained by a larger amount of species-specific single-copy genes and species-specific gene families. In addition, a majority of large gene families, typically containing more than 50 genes, are bigger in rice than Arabidopsis, whereas the opposite seems true for small gene families.  相似文献   

15.
Odorant receptors (ORs) located in the nasal epithelium, at the ciliated surface of olfactory sensory neurons, represent the initial step of a transduction cascade that leads to odor detection. ORs form the largest and most diverse family of G-protein-coupled receptors (GPCRs). They are encoded by a multigene family that has been partially characterized in cyclostomes, teleosts, amphibia, birds and mammals, as well as in Drosophila melanogaster and the nematode Caenorhabditis elegans. As new sequence data emerge, it is increasingly clear that OR primary structure can vary dramatically across phyla. Some chemoreceptors are encoded by genes with little sequence similarity to the prototypical ORs originally isolated in mammals. A large number of sequences are now available allowing a detailed study of the evolutionary implications of OR diversity across species. This review discusses the evolutionary implications of the divergent primary structures of chemoreceptors with identical functions.  相似文献   

16.

Background  

Olea europaea L. is a traditional tree crop of the Mediterranean basin with a worldwide economical high impact. Differently from other fruit tree species, little is known about the physiological and molecular basis of the olive fruit development and a few sequences of genes and gene products are available for olive in public databases. This study deals with the identification of large sets of differentially expressed genes in developing olive fruits and the subsequent computational annotation by means of different software.  相似文献   

17.
Phylogenomic Analysis of the PEBP Gene Family in Cereals   总被引:1,自引:0,他引:1  
The TFL1 and FT genes, which are key genes in the control of flowering time in Arabidopsis thaliana, belong to a small multigene family characterized by a specific phosphatidylethanolamine-binding protein domain, termed the PEBP gene family. Several PEBP genes are found in dicots and monocots, and act on the control of flowering time. We investigated the evolution of the PEBP gene family in cereals. First, taking advantage of the complete rice genome sequence and EST databases, we found 19 PEBP genes in this species, 6 of which were not previously described. Ten genes correspond to five pairs of paralogs mapped on known duplicated regions of the rice genome. Phylogenetic analysis of Arabidopsis and rice genes indicates that the PEBP gene family consists of three main homology classes (the so-called TFL1-LIKE, MFT-LIKE, and FT-LIKE subfamilies), in which gene duplication and/or loss occurred independently in Arabidopsis and rice. Second, phylogenetic analyses of genomic and EST sequences from five cereal species indicate that the three subfamilies of PEBP genes have been conserved in cereals. The tree structure suggests that the ancestral grass genome had at least two MFT-like genes, two TFL1-like genes, and eight FT-like genes. A phylogenomic approach leads to some hypotheses about conservation of gene function within the subfamilies. [Reviewing Editor: Dr. Yves Van de Peer]  相似文献   

18.
采样自云南同一种群的中华菊头蝠共16 个个体,用于DRB 基因的分子进化和多态性研究。利用翼膜组织提取DNA 基因组,并PCR 克隆测序分析。获得了相差3 bp 的两种不同长度序列类型,A 序列类型263 bp,在研究群体中有15 个等位基因;B 序列类型260 bp,在研究群体中有8 个等位基因。在分析的74 个氨基酸变异位点上检测到12 个正向选择位点。在9 个个体中检测到分布频率最高的等位基因,也有多个等位基因只存在一个个体中。单个个体中最多存在6 个等位基因。遗传多态性分析表明中华菊头蝠DRB 基因具有较高的多态性。中华菊头蝠DRB 基因可能至少存在3 个重复座位。利用已发表的翼手目DRB 第二外显子序列构建的系统进化树表明中华菊头蝠MHCⅡ-DRB 基因处于独立进化支。  相似文献   

19.
20.
We used RT-PCR to sequence approximately 3 kb of the gene coding for the largest subunit of RNA polymerase II (rpb1) from nine land plants. Our results show that plant rpb1 genes all have a similar GC-content and that their amino acid sequences evolve at a similar rate in most species we examined, except for the Arabidopsis thaliana and rice sequences which evolve faster. This gene also exists as a single copy in most species and contains enough phylogenetically informative sites to resolve the evolutionary relationships among seed plants. Protein maximum parsimony, as well as neighbor-joining and maximum likelihood analyses of DNA and protein sequences, all generated identical tree topologies with similar strong support values at each node. The angiosperms are a clade comprising Amborella as a sister group to all other angiosperms, followed by Nymphaea, Magnolia, Arabidopsis, and a monocot clade containing maize and rice. The gymnosperms also form a monophyletic clade with Welwitschia and pine grouped together and sister to a Cycas and Zamia clade. These findings concur with recent studies that refute the Anthophyte Hypothesis and place Amborella at the base of the angiosperm tree. These rpb1 sequences also give a more consistent picture of seed plant relationships than similar analyses performed on data sets made of 18S rDNA, atpB, and rbcL sequences from the same species. These sequences therefore show great promise to help further resolve the phylogenetic relationships of seed plants.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号