首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

The abundance of new genomic data provides the opportunity to map the location of gene duplication and loss events on a species phylogeny. The first methods for mapping gene duplications and losses were based on a parsimony criterion, finding the mapping that minimizes the number of duplication and loss events. Probabilistic modeling of gene duplication and loss is relatively new and has largely focused on birth-death processes.

Results

We introduce a new maximum likelihood model that estimates the speciation and gene duplication and loss events in a gene tree within a species tree with branch lengths. We also provide an, in practice, efficient algorithm that computes optimal evolutionary scenarios for this model. We implemented the algorithm in the program DrML and verified its performance with empirical and simulated data.

Conclusions

In test data sets, DrML finds optimal gene duplication and loss scenarios within minutes, even when the gene trees contain sequences from several hundred species. In many cases, these optimal scenarios differ from the lca-mapping that results from a parsimony gene tree reconciliation. Thus, DrML provides a new, practical statistical framework on which to study gene duplication.
  相似文献   

2.

Background

PCR amplification is an important step in the preparation of DNA sequencing libraries prior to high-throughput sequencing. PCR amplification introduces redundant reads in the sequence data and estimating the PCR duplication rate is important to assess the frequency of such reads. Existing computational methods do not distinguish PCR duplicates from “natural” read duplicates that represent independent DNA fragments and therefore, over-estimate the PCR duplication rate for DNA-seq and RNA-seq experiments.

Results

In this paper, we present a computational method to estimate the average PCR duplication rate of high-throughput sequence datasets that accounts for natural read duplicates by leveraging heterozygous variants in an individual genome. Analysis of simulated data and exome sequence data from the 1000 Genomes project demonstrated that our method can accurately estimate the PCR duplication rate on paired-end as well as single-end read datasets which contain a high proportion of natural read duplicates. Further, analysis of exome datasets prepared using the Nextera library preparation method indicated that 45–50% of read duplicates correspond to natural read duplicates likely due to fragmentation bias. Finally, analysis of RNA-seq datasets from individuals in the 1000 Genomes project demonstrated that 70–95% of read duplicates observed in such datasets correspond to natural duplicates sampled from genes with high expression and identified outlier samples with a 2-fold greater PCR duplication rate than other samples.

Conclusions

The method described here is a useful tool for estimating the PCR duplication rate of high-throughput sequence datasets and for assessing the fraction of read duplicates that correspond to natural read duplicates. An implementation of the method is available at https://github.com/vibansal/PCRduplicates.
  相似文献   

3.

Background

Why do some groups of physically linked genes stay linked over long evolutionary periods? Although several factors are associated with the formation of gene clusters in eukaryotic genomes, the particular contribution of each feature to clustering maintenance remains unclear.

Results

We quantify the strength of the proposed factors in a yeast lineage. First we identify the magnitude of each variable to determine linkage conservation by using several comparator species at different distances to Saccharomyces cerevisiae. For adjacent gene pairs, in line with null simulations, intergenic distance acts as the strongest covariate. Which of the other covariates appear important depends on the comparator, although high co-expression is related to synteny conservation commonly, especially in the more distant comparisons, these being expected to reveal strong but relatively rare selection. We also analyze those pairs that are immediate neighbors through all the lineages considered. Current intergene distance is again the best predictor, followed by the local density of essential genes and co-regulation, with co-expression and recombination rate being the weakest predictors. The genome duplication seen in yeast leaves some mark on linkage conservation, as adjacent pairs resolved as single copy in all post-whole genome duplication species are more often found as adjacent in pre-duplication species.

Conclusion

Current intergene distance is consistently the strongest predictor of synteny conservation as expected under a simple null model. Other variables are of lesser importance and their relevance depends both on the species comparison in question and the fate of the duplicates following genome duplication.
  相似文献   

4.

Background

Osteoglycin (OGN, a.k.a. mimecan) belongs to cluster III of the small leucine-rich proteoglycans (SLRP) of the extracellular matrix (ECM). In vertebrates OGN is a characteristic ECM protein of bone. In the present study we explore the evolution of SLRP III and OGN in teleosts that have a skeleton adapted to an aquatic environment.

Results

The SLRP gene family has been conserved since the separation of chondrichthyes and osteichthyes. Few gene duplicates of the SLRP III family exist even in the teleosts that experienced a specific whole genome duplication. One exception is ogn for which duplicate copies were identified in fish genomes. The ogn promoter sequence and in vitro mesenchymal stem cell (MSC) cultures suggest the duplicate ogn genes acquired divergent functions. In gilthead sea bream (Sparus aurata) ogn1 was up-regulated during osteoblast and myocyte differentiation in vitro, while ogn2 was severely down-regulated during bone-derived MSCs differentiation into adipocytes in vitro.

Conclusions

Overall, the phylogenetic analysis indicates that the SLRP III family in vertebrates has been under conservative evolutionary pressure. The retention of the ogn gene duplicates in teleosts was linked with the acquisition of different functions. The acquisition by OGN of functions other than that of a bone ECM protein occurred early in the vertebrate lineage.
  相似文献   

5.
6.

Introduction

While the evolutionary adaptation of enzymes to their own substrates is a well assessed and rationalized field, how molecules have been originally selected in order to initiate and assemble convenient metabolic pathways is a fascinating, but still debated argument.

Objectives

Aim of the present study is to give a rationale for the preferential selection of specific molecules to generate metabolic pathways.

Methods

The comparison of structural features of molecules, through an inductive methodological approach, offer a reading key to cautiously propose a determining factor for their metabolic recruitment.

Results

Starting with some commonplaces occurring in the structural representation of relevant carbohydrates, such as glucose, fructose and ribose, arguments are presented in associating stable structural determinants of these molecules and their peculiar occurrence in metabolic pathways.

Conclusions

Among other possible factors, the reliability of the structural asset of a molecule may be relevant or its selection among structurally and, a priori, functionally similar molecules.
  相似文献   

7.

Background

The heme-protein interactions are essential for various biological processes such as electron transfer, catalysis, signal transduction and the control of gene expression. The knowledge of heme binding residues can provide crucial clues to understand these activities and aid in functional annotation, however, insufficient work has been done on the research of heme binding residues from protein sequence information.

Methods

We propose a sequence-based approach for accurate prediction of heme binding residues by a novel integrative sequence profile coupling position specific scoring matrices with heme specific physicochemical properties. In order to select the informative physicochemical properties, we design an intuitive feature selection scheme by combining a greedy strategy with correlation analysis.

Results

Our integrative sequence profile approach for prediction of heme binding residues outperforms the conventional methods using amino acid and evolutionary information on the 5-fold cross validation and the independent tests.

Conclusions

The novel feature of an integrative sequence profile achieves good performance using a reduced set of feature vector elements.
  相似文献   

8.

Background

Most genes in Arabidopsis thaliana are members of gene families. How do the members of gene families arise, and how are gene family copy numbers maintained? Some gene families may evolve primarily through tandem duplication and high rates of birth and death in clusters, and others through infrequent polyploidy or large-scale segmental duplications and subsequent losses.

Results

Our approach to understanding the mechanisms of gene family evolution was to construct phylogenies for 50 large gene families in Arabidopsis thaliana, identify large internal segmental duplications in Arabidopsis, map gene duplications onto the segmental duplications, and use this information to identify which nodes in each phylogeny arose due to segmental or tandem duplication. Examples of six gene families exemplifying characteristic modes are described. Distributions of gene family sizes and patterns of duplication by genomic distance are also described in order to characterize patterns of local duplication and copy number for large gene families. Both gene family size and duplication by distance closely follow power-law distributions.

Conclusions

Combining information about genomic segmental duplications, gene family phylogenies, and gene positions provides a method to evaluate contributions of tandem duplication and segmental genome duplication in the generation and maintenance of gene families. These differences appear to correspond meaningfully to differences in functional roles of the members of the gene families.
  相似文献   

9.

Background

Lateral gene transfer can introduce genes with novel functions into genomes or replace genes with functionally similar orthologs or paralogs. Here we present a study of the occurrence of the latter gene replacement phenomenon in the four gene families encoding different classes of glutamate dehydrogenase (GDH), to evaluate and compare the patterns and rates of lateral gene transfer (LGT) in prokaryotes and eukaryotes.

Results

We extend the taxon sampling of gdh genes with nine new eukaryotic sequences and examine the phylogenetic distribution pattern of the various GDH classes in combination with maximum likelihood phylogenetic analyses. The distribution pattern analyses indicate that LGT has played a significant role in the evolution of the four gdh gene families. Indeed, a number of gene transfer events are identified by phylogenetic analyses, including numerous prokaryotic intra-domain transfers, some prokaryotic inter-domain transfers and several inter-domain transfers between prokaryotes and microbial eukaryotes (protists).

Conclusion

LGT has apparently affected eukaryotes and prokaryotes to a similar extent within the gdh gene families. In the absence of indications that the evolution of the gdh gene families is radically different from other families, these results suggest that gene transfer might be an important evolutionary mechanism in microbial eukaryote genome evolution.
  相似文献   

10.

Background

How acceptance of evolution relates to understanding of evolution remains controversial despite decades of research. It even remains unclear whether cultural/attitudinal factors or cognitive factors have a greater impact on student ability to learn evolutionary biology. This study examined the influence of cultural/attitudinal factors (religiosity, acceptance of evolution, and parents’ attitudes towards evolution) and cognitive factors (teleological reasoning and prior understanding of natural selection) on students’ learning of natural selection over a semester-long undergraduate course in evolutionary medicine.

Method

Pre-post course surveys measured cognitive factors, including teleological reasoning and prior understanding of natural selection, and also cultural/attitudinal factors, including acceptance of evolution, parent attitudes towards evolution, and religiosity. We analyzed how these measures influenced increased understanding of natural selection over the semester.

Results

After controlling for other related variables, parent attitude towards evolution and religiosity predicted students’ acceptance of evolution, but did not predict students’ learning gains of natural selection over the semester. Conversely, lower levels of teleological reasoning predicted learning gains in understanding natural selection over the course, but did not predict students’ acceptance of evolution.

Conclusions

Acceptance of evolution did not predict students’ ability to learn natural selection over a semester in an evolutionary medicine course. However, teleological reasoning did impact students’ ability to learn natural selection.
  相似文献   

11.

Background

In prokaryotic genomes, functionally coupled genes can be organized in conserved gene clusters enabling their coordinated regulation. Such clusters could contain one or several operons, which are groups of co-transcribed genes. Those genes that evolved from a common ancestral gene by speciation (i.e. orthologs) are expected to have similar genomic neighborhoods in different organisms, whereas those copies of the gene that are responsible for dissimilar functions (i.e. paralogs) could be found in dissimilar genomic contexts. Comparative analysis of genomic neighborhoods facilitates the prediction of co-regulated genes and helps to discern different functions in large protein families.

Aim

We intended, building on the attribution of gene sequences to the clusters of orthologous groups of proteins (COGs), to provide a method for visualization and comparative analysis of genomic neighborhoods of evolutionary related genes, as well as a respective web server.

Results

Here we introduce the COmparative Gene Neighborhoods Analysis Tool (COGNAT), a web server for comparative analysis of genomic neighborhoods. The tool is based on the COG database, as well as the Pfam protein families database. As an example, we show the utility of COGNAT in identifying a new type of membrane protein complex that is formed by paralog(s) of one of the membrane subunits of the NADH:quinone oxidoreductase of type 1 (COG1009) and a cytoplasmic protein of unknown function (COG3002).

Reviewers

This article was reviewed by Drs. Igor Zhulin, Uri Gophna and Igor Rogozin.
  相似文献   

12.

Background

In order to find correlated pairs of positions between proteins, which are useful in predicting interactions, it is necessary to concatenate two large multiple sequence alignments such that the sequences that are joined together belong to those that interact in their species of origin. When each protein is unique then the species name is sufficient to guide this match, however, when there are multiple related sequences (paralogs) in each species then the pairing is more difficult. In bacteria a good guide can be gained from genome co-location as interacting proteins tend to be in a common operon but in eukaryotes this simple principle is not sufficient.

Results

The methods developed in this paper take sets of paralogs for different proteins found in the same species and make a pairing based on their evolutionary distance relative to a set of other proteins that are unique and so have a known relationship (singletons). The former constitute a set of unlabelled nodes in a graph while the latter are labelled. Two variants were tested, one based on a phylogenetic tree of the sequences (the topology-based method) and a simpler, faster variant based only on the inter-sequence distances (the distance-based method). Over a set of test proteins, both gave good results, with the topology method performing slightly better.

Conclusions

The methods develop here still need refinement and augmentation from constraints other than the sequence data alone, such as known interactions from annotation and databases, or non-trivial relationships in genome location. With the ever growing numbers of eukaryotic genomes, it is hoped that the methods described here will open a route to the use of these data equal to the current success attained with bacterial sequences.
  相似文献   

13.

Background

Bacterial genomes develop new mechanisms to tide them over the imposing conditions they encounter during the course of their evolution. Acquisition of new genes by lateral gene transfer may be one of the dominant ways of adaptation in bacterial genome evolution. Lateral gene transfer provides the bacterial genome with a new set of genes that help it to explore and adapt to new ecological niches.

Methods

A maximum likelihood analysis was done on the five sequenced corynebacterial genomes to model the rates of gene insertions/deletions at various depths of the phylogeny.

Results

The study shows that most of the laterally acquired genes are transient and the inferred rates of gene movement are higher on the external branches of the phylogeny and decrease as the phylogenetic depth increases. The newly acquired genes are under relaxed selection and evolve faster than their older counterparts. Analysis of some of the functionally characterised LGTs in each species has indicated that they may have a possible adaptive role.

Conclusion

The five Corynebacterial genomes sequenced to date have evolved by acquiring between 8 – 14% of their genomes by LGT and some of these genes may have a role in adaptation.
  相似文献   

14.
15.

Background

Most studies inferring species phylogenies use sequences from single copy genes or sets of orthologs culled from gene families. For taxa such as plants, with very high levels of gene duplication in their nuclear genomes, this has limited the exploitation of nuclear sequences for phylogenetic studies, such as those available in large EST libraries. One rarely used method of inference, gene tree parsimony, can infer species trees from gene families undergoing duplication and loss, but its performance has not been evaluated at a phylogenomic scale for EST data in plants.

Results

A gene tree parsimony analysis based on EST data was undertaken for six angiosperm model species and Pinus, an outgroup. Although a large fraction of the tentative consensus sequences obtained from the TIGR database of ESTs was assembled into homologous clusters too small to be phylogenetically informative, some 557 clusters contained promising levels of information. Based on maximum likelihood estimates of the gene trees obtained from these clusters, gene tree parsimony correctly inferred the accepted species tree with strong statistical support. A slight variant of this species tree was obtained when maximum parsimony was used to infer the individual gene trees instead.

Conclusion

Despite the complexity of the EST data and the relatively small fraction eventually used in inferring a species tree, the gene tree parsimony method performed well in the face of very high apparent rates of duplication.
  相似文献   

16.

Background

Large collections of expressed sequence tags (ESTs) are a fundamental resource for analysis of gene expression and annotation of genome sequences. We generated 116,899 ESTs from 17 normalized and two non-normalized cDNA libraries representing 16 tissues from tilapia, a cichlid fish widely used in aquaculture and biological research.

Results

The ESTs were assembled into 20,190 contigs and 36,028 singletons for a total of 56,218 unique sequences and a total assembled length of 35,168,415 bp. Over the whole project, a unique sequence was discovered for every 2.079 sequence reads. 17,722 (31.5%) of these unique sequences had significant BLAST hits (e-value < 10-10) to the UniProt database.

Conclusion

Normalization of the cDNA pools with double-stranded nuclease allowed us to efficiently sequence a large collection of ESTs. These sequences are an important resource for studies of gene expression, comparative mapping and annotation of the forthcoming tilapia genome sequence.
  相似文献   

17.

Background

The Clusters of Orthologous Groups (COGs) of proteins systematize evolutionary related proteins into specific groups with similar functions. However, the available databases do not provide means to assess the extent of similarity between the COGs.

Aim

We intended to provide a method for identification and visualization of evolutionary relationships between the COGs, as well as a respective web server.

Results

Here we introduce the COGcollator, a web tool for identification of evolutionarily related COGs and their further analysis. We demonstrate the utility of this tool by identifying the COGs that contain distant homologs of (i) the catalytic subunit of bacterial rotary membrane ATP synthases and (ii) the DNA/RNA helicases of the superfamily 1.

Reviewers

This article was reviewed by Drs. Igor N. Berezovsky, Igor Zhulin and Yuri Wolf.
  相似文献   

18.

Background

Horizontal gene transfer (HGT), a process of acquisition and fixation of foreign genetic material, is an important biological phenomenon. Several approaches to HGT inference have been proposed. However, most of them either rely on approximate, non-phylogenetic methods or on the tree reconciliation, which is computationally intensive and sensitive to parameter values.

Results

We investigate the locus tree inference problem as a possible alternative that combines the advantages of both approaches. We present several algorithms to solve the problem in the parsimony framework. We introduce a novel tree mapping, which allows us to obtain a heuristic solution to the problems of locus tree inference and duplication classification.

Conclusions

Our approach allows for faster comparisons of gene and species trees and improves known algorithms for duplication inference in the presence of polytomies in the species trees. We have implemented our algorithms in a software tool available at https://github.com/mciach/LocusTreeInference.
  相似文献   

19.

Background

The gene duplication (GD) problem seeks a species tree that implies the fewest gene duplication events across a given collection of gene trees. Solving this problem makes it possible to use large gene families with complex histories of duplication and loss to infer phylogenetic trees. However, the GD problem is NP-hard, and therefore, most analyses use heuristics that lack any performance guarantee.

Results

We describe the first integer linear programming (ILP) formulation to solve instances of the gene duplication problem exactly. With simulations, we demonstrate that the ILP solution can solve problem instances with up to 14 taxa. Furthermore, we apply the new ILP solution to solve the gene duplication problem for the seed plant phylogeny using a 12-taxon, 6, 084-gene data set. The unique, optimal solution, which places Gnetales sister to the conifers, represents a new, large-scale genomic perspective on one of the most puzzling questions in plant systematics.

Conclusions

Although the GD problem is NP-hard, our novel ILP solution for it can solve instances with data sets consisting of as many as 14 taxa and 1, 000 genes in a few hours. These are the largest instances that have been solved to optimally to date. Thus, this work can provide large-scale genomic perspectives on phylogenetic questions that previously could only be addressed by heuristic estimates.
  相似文献   

20.

Background

An influenza H3N2 epidemic occurred throughout Southern China in 2012.

Methods

We analyzed the hemagglutinin (HA) and neuraminidase (NA) genes of influenza H3N2 strains isolated between 2011–2012 from Guangdong. Mutation sites, evolutionary selection, antigenic sites, and N-glycosylation within these strains were analyzed.

Results

The 2011–2012 Guangdong strains contained the HA-A214S, HA-V239I, HA-N328S, NA-L81P, and NA-D93G mutations, similar to those seen in the A/ Perth/16/2009 influenza strain. The HA-NSS061–063 and NNS160–162 glycosylation sites were prevalent among the 2011–2012 Guangdong strains but the NA-NRS402–404 site was deleted. Antigenically, there was a four-fold difference between A/Perth/16/2009 -like strains and the 2011–2012 Guangdong strains.

Conclusion

Antigenic drift of the H3N2 subtype contributed to the occurrence of the Southern China influenza epidemic of 2012.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号