首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.

Background

The Clusters of Orthologous Groups (COGs) of proteins systematize evolutionary related proteins into specific groups with similar functions. However, the available databases do not provide means to assess the extent of similarity between the COGs.

Aim

We intended to provide a method for identification and visualization of evolutionary relationships between the COGs, as well as a respective web server.

Results

Here we introduce the COGcollator, a web tool for identification of evolutionarily related COGs and their further analysis. We demonstrate the utility of this tool by identifying the COGs that contain distant homologs of (i) the catalytic subunit of bacterial rotary membrane ATP synthases and (ii) the DNA/RNA helicases of the superfamily 1.

Reviewers

This article was reviewed by Drs. Igor N. Berezovsky, Igor Zhulin and Yuri Wolf.
  相似文献   

2.

Background

Bacterial carbohydrate metabolism is extremely diverse, since carbohydrates serve as a major energy source and are involved in a variety of cellular processes. Bacterial genes belonging to same metabolic pathway are often co-localized in the chromosome, but it is not a strict rule. Gene co-localization in linked to co-evolution and co-regulation. This study focuses on a large-scale analysis of bacterial genomic loci related to the carbohydrate metabolism.

Results

We demonstrate that only 53% of 148,000 studied genes from over six hundred bacterial genomes are co-localized in bacterial genomes with other carbohydrate metabolism genes, which points to a significant role of singleton genes. Co-localized genes form cassettes, ranging in size from two to fifteen genes. Two major factors influencing the cassette-forming tendency are gene function and bacterial phylogeny. We have obtained a comprehensive picture of co-localization preferences of genes for nineteen major carbohydrate metabolism functional classes, over two hundred gene orthologous clusters, and thirty bacterial classes, and characterized the cassette variety in size and content among different species, highlighting a significant role of short cassettes. The preference towards co-localization of carbohydrate metabolism genes varies between 40 and 76% for bacterial taxa. Analysis of frequently co-localized genes yielded forty-five significant pairwise links between genes belonging to different functional classes. The number of such links per class range from zero to eight, demonstrating varying preferences of respective genes towards a specific chromosomal neighborhood. Genes from eleven functional classes tend to co-localize with genes from the same class, indicating an important role of clustering of genes with similar functions. At that, in most cases such co-localization does not originate from local duplication events.

Conclusions

Overall, we describe a complex web formed by evolutionary relationships of bacterial carbohydrate metabolism genes, manifested as co-localization patterns.

Reviewers

This article was reviewed by Daria V. Dibrova (A.N. Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, Russia), nominated by Armen Mulkidjanian (University of Osnabrück, Germany), Igor Rogozin (NCBI, NLM, NIH, USA) and Yuri Wolf (NCBI, NLM, NIH, USA).
  相似文献   

3.
Xu M  Zhu M  Zhang L 《BMC genomics》2008,9(Z2):S18

Background

Microarray technology is often used to identify the genes that are differentially expressed between two biological conditions. On the other hand, since microarray datasets contain a small number of samples and a large number of genes, it is usually desirable to identify small gene subsets with distinct pattern between sample classes. Such gene subsets are highly discriminative in phenotype classification because of their tightly coupling features. Unfortunately, such identified classifiers usually tend to have poor generalization properties on the test samples due to overfitting problem.

Results

We propose a novel approach combining both supervised learning with unsupervised learning techniques to generate increasingly discriminative gene clusters in an iterative manner. Our experiments on both simulated and real datasets show that our method can produce a series of robust gene clusters with good classification performance compared with existing approaches.

Conclusion

This backward approach for refining a series of highly discriminative gene clusters for classification purpose proves to be very consistent and stable when applied to various types of training samples.
  相似文献   

4.
5.

Background

New technologies for acquisition of genomic data, while offering unprecedented opportunities for genetic discovery, also impose severe burdens of interpretation andpenalties for multiple testing.

Methods

The Pathway-based Analyses Group of the Genetic Analysis Workshop 19 (GAW19) sought reduction of multiple-testing burden through various approaches to aggregation of highdimensional data in pathways informed by prior biological knowledge.

Results

Experimental methods testedincluded the use of "synthetic pathways" (random sets of genes) to estimate power and false-positive error rate of methods applied to simulated data; data reduction via independent components analysis, single-nucleotide polymorphism (SNP)-SNP interaction, and use of gene sets to estimate genetic similarity; and general assessment of the efficacy of prior biological knowledge to reduce the dimensionality of complex genomic data.

Conclusions

The work of this group explored several promising approaches to managing high-dimensional data, with the caveat that these methods are necessarily constrained by the quality of external bioinformatic annotation.
  相似文献   

6.
7.

Background

Human cancers are complex ecosystems composed of cells with distinct molecular signatures. Such intratumoral heterogeneity poses a major challenge to cancer diagnosis and treatment. Recent advancements of single-cell techniques such as scRNA-seq have brought unprecedented insights into cellular heterogeneity. Subsequently, a challenging computational problem is to cluster high dimensional noisy datasets with substantially fewer cells than the number of genes.

Methods

In this paper, we introduced a consensus clustering framework conCluster, for cancer subtype identification from single-cell RNA-seq data. Using an ensemble strategy, conCluster fuses multiple basic partitions to consensus clusters.

Results

Applied to real cancer scRNA-seq datasets, conCluster can more accurately detect cancer subtypes than the widely used scRNA-seq clustering methods. Further, we conducted co-expression network analysis for the identified melanoma subtypes.

Conclusions

Our analysis demonstrates that these subtypes exhibit distinct gene co-expression networks and significant gene sets with different functional enrichment.
  相似文献   

8.

Background

Most genes in Arabidopsis thaliana are members of gene families. How do the members of gene families arise, and how are gene family copy numbers maintained? Some gene families may evolve primarily through tandem duplication and high rates of birth and death in clusters, and others through infrequent polyploidy or large-scale segmental duplications and subsequent losses.

Results

Our approach to understanding the mechanisms of gene family evolution was to construct phylogenies for 50 large gene families in Arabidopsis thaliana, identify large internal segmental duplications in Arabidopsis, map gene duplications onto the segmental duplications, and use this information to identify which nodes in each phylogeny arose due to segmental or tandem duplication. Examples of six gene families exemplifying characteristic modes are described. Distributions of gene family sizes and patterns of duplication by genomic distance are also described in order to characterize patterns of local duplication and copy number for large gene families. Both gene family size and duplication by distance closely follow power-law distributions.

Conclusions

Combining information about genomic segmental duplications, gene family phylogenies, and gene positions provides a method to evaluate contributions of tandem duplication and segmental genome duplication in the generation and maintenance of gene families. These differences appear to correspond meaningfully to differences in functional roles of the members of the gene families.
  相似文献   

9.

Background

Existing clustering approaches for microarray data do not adequately differentiate between subsets of co-expressed genes. We devised a novel approach that integrates expression and sequence data in order to generate functionally coherent and biologically meaningful subclusters of genes. Specifically, the approach clusters co-expressed genes on the basis of similar content and distributions of predicted statistically significant sequence motifs in their upstream regions.

Results

We applied our method to several sets of co-expressed genes and were able to define subsets with enrichment in particular biological processes and specific upstream regulatory motifs.

Conclusions

These results show the potential of our technique for functional prediction and regulatory motif identification from microarray data.
  相似文献   

10.

Background

Bacterial genomes develop new mechanisms to tide them over the imposing conditions they encounter during the course of their evolution. Acquisition of new genes by lateral gene transfer may be one of the dominant ways of adaptation in bacterial genome evolution. Lateral gene transfer provides the bacterial genome with a new set of genes that help it to explore and adapt to new ecological niches.

Methods

A maximum likelihood analysis was done on the five sequenced corynebacterial genomes to model the rates of gene insertions/deletions at various depths of the phylogeny.

Results

The study shows that most of the laterally acquired genes are transient and the inferred rates of gene movement are higher on the external branches of the phylogeny and decrease as the phylogenetic depth increases. The newly acquired genes are under relaxed selection and evolve faster than their older counterparts. Analysis of some of the functionally characterised LGTs in each species has indicated that they may have a possible adaptive role.

Conclusion

The five Corynebacterial genomes sequenced to date have evolved by acquiring between 8 – 14% of their genomes by LGT and some of these genes may have a role in adaptation.
  相似文献   

11.
Comparison of the canine and human olfactory receptor gene repertoires   总被引:2,自引:1,他引:1  

Background

Olfactory receptors (ORs), the first dedicated molecules with which odorants physically interact to arouse an olfactory sensation, constitute the largest gene family in vertebrates, including around 900 genes in human and 1,500 in the mouse. Whereas dogs, like many other mammals, have a much keener olfactory potential than humans, only 21 canine OR genes have been described to date.

Results

In this study, 817 novel canine OR sequences were identified, and 640 have been characterized. Of the 661 characterized OR sequences, representing half of the canine repertoire, 18% are predicted to be pseudogenes, compared with 63% in human and 20% in mouse. Phylogenetic analysis of 403 canine OR sequences identified 51 families, and radiation-hybrid mapping of 562 showed that they are distributed on 24 dog chromosomes, in 37 distinct regions. Most of these regions constitute clusters of 2 to 124 closely linked genes. The two largest clusters (124 and 109 OR genes) are located on canine chromosomes 18 and 21. They are orthologous to human clusters located on human chromosomes 11q11-q13 and HSA11p15, containing 174 and 115 ORs respectively.

Conclusions

This study shows a strongly conserved genomic distribution of OR genes between dog and human, suggesting that OR genes evolved from a common mammalian ancestral repertoire by successive duplications. In addition, the dog repertoire appears to have expanded relative to that of humans, leading to the emergence of specific canine OR genes.
  相似文献   

12.
13.

Background

It is widely accepted that the last eukaryotic common ancestor and early eukaryotes were intron-rich and intron loss dominated subsequent evolution, thus the presence of only very few introns in some modern eukaryotes must be the consequence of massive loss. But it is striking that few eukaryotes were found to have completely lost introns. Despite extensive research, the causes of massive intron losses remain elusive. Actually the reverse question -- how the few introns can be retained under the evolutionary selection pressure of intron loss -- is equally significant but was rarely studied, except that it was conjectured that the essential functions of some introns prevent their loss. The situation that extremely few (eight) spliceosome-mediated cis-spliced introns present in the relatively simple genome of Giardia lamblia provides an excellent opportunity to explore this question.

Results

Our investigation found three types of distribution patterns of the few introns in the intron-containing genes: ancient intron in ancient gene, later-evolved intron in ancient gene, and later-evolved intron in later-evolved gene, which can reflect to some extent the dynamic evolution of introns in Giardia. Without finding any special features or functional importance of these introns responsible for their retention, we noticed and experimentally verified that some intron-containing genes form sense-antisense gene pairs with transcribable genes on their complementary strands, and that the introns just reside in the overlapping regions.

Conclusions

In Giardia’s evolution, despite constant evolutionary selection pressure of intron loss, intron gain can still occur in both ancient and later-evolved genes, but only a few introns are retained; at least the evolutionary retention of some of the introns might not be due to the functional constraint of the introns themselves but the causes outside of introns, such as the constraints imposed by other genomic functional elements overlapping with the introns. These findings can not only provide some clues to find new genomic functional elements -- in the areas overlapping with introns, but suggest that “functional constraint” of introns may not be necessarily directly associated with intron loss and gain, and that the real functions are probably still outside of our current knowledge.

Reviewers

This article was reviewed by Mikhail Gelfand, Michael Gray, and Igor Rogozin.
  相似文献   

14.
15.

Background

The reconstruction of ancestral genomes must deal with the problem of resolution, necessarily involving a trade-off between trying to identify genomic details and being overwhelmed by noise at higher resolutions.

Results

We use the median reconstruction at the synteny block level, of the ancestral genome of the order Gentianales, based on coffee, Rhazya stricta and grape, to exemplify the effects of resolution (granularity) on comparative genomic analyses.

Conclusions

We show how decreased resolution blurs the differences between evolving genomes, with respect to rate, mutational process and other characteristics.
  相似文献   

16.

Background

Hox genes are key elements in patterning animal development. They are renowned for their, often, clustered organisation in the genome, with supposed mechanistic links between the organisation of the genes and their expression. The widespread distribution and comparable functions of Hox genes across the animals has led to them being a major study system for comparing the molecular bases for construction and divergence of animal morphologies. Echinoderms (including sea urchins, sea stars, sea cucumbers, feather stars and brittle stars) possess one of the most unusual body plans in the animal kingdom with pronounced pentameral symmetry in the adults. Consequently, much interest has focused on their development, evolution and the role of the Hox genes in these processes. In this context, the organisation of echinoderm Hox gene clusters is distinctive. Within the classificatory system of Duboule, echinoderms constitute one of the clearest examples of Disorganized (D) clusters (i.e. intact clusters but with a gene order or orientation rearranged relative to the ancestral state).

Results

Here we describe two Hox genes (Hox11/13d and e) that have been overlooked in most previous work and have not been considered in reconstructions of echinoderm Hox complements and cluster organisation. The two genes are related to Posterior Hox genes and are present in all classes of echinoderm. Importantly, they do not reside in the Hox cluster of any species for which genomic linkage data is available.

Conclusion

Incorporating the two neglected Posterior Hox genes into assessments of echinoderm Hox gene complements and organisation shows that these animals in fact have Split (S) Hox clusters rather than simply Disorganized (D) clusters within the Duboule classification scheme. This then has implications for how these genes are likely regulated, with them no longer covered by any potential long-range Hox cluster-wide, or multigenic sub-cluster, regulatory mechanisms.
  相似文献   

17.
18.
19.
20.

Background

Most studies inferring species phylogenies use sequences from single copy genes or sets of orthologs culled from gene families. For taxa such as plants, with very high levels of gene duplication in their nuclear genomes, this has limited the exploitation of nuclear sequences for phylogenetic studies, such as those available in large EST libraries. One rarely used method of inference, gene tree parsimony, can infer species trees from gene families undergoing duplication and loss, but its performance has not been evaluated at a phylogenomic scale for EST data in plants.

Results

A gene tree parsimony analysis based on EST data was undertaken for six angiosperm model species and Pinus, an outgroup. Although a large fraction of the tentative consensus sequences obtained from the TIGR database of ESTs was assembled into homologous clusters too small to be phylogenetically informative, some 557 clusters contained promising levels of information. Based on maximum likelihood estimates of the gene trees obtained from these clusters, gene tree parsimony correctly inferred the accepted species tree with strong statistical support. A slight variant of this species tree was obtained when maximum parsimony was used to infer the individual gene trees instead.

Conclusion

Despite the complexity of the EST data and the relatively small fraction eventually used in inferring a species tree, the gene tree parsimony method performed well in the face of very high apparent rates of duplication.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号