首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Statistical validation of gene clusters is imperative for many important applications in comparative genomics which depend on the identification of genomic regions that are historically and/or functionally related. We develop the first rigorous statistical treatment of max-gap clusters, a cluster definition frequently used in empirical studies. We present exact expressions for the probability of observing an individual cluster of a set of marked genes in one genome, as well as upper and lower bounds on the probability of observing a cluster of h homologs in a pairwise whole-genome comparison. We demonstrate the utility of our approach by applying it to a whole-genome comparison of E. coli and B. subtilis. Code for statistical tests is available at.  相似文献   

2.
Santi DV  Siani MA  Julien B  Kupfer D  Roe B 《Gene》2000,247(1-2):97-102
An approach is described for obtaining 'perfect probes' for type I modular polyketide synthase (PKS) gene clusters that in turn enables the identification of all such gene clusters in a genome. The approach involves sequencing small fragments of a random genomic DNA library containing one or more modular PKS gene clusters, and identifying which fragments emanate from PKS genes. Knowing the approximate sizes of the genome and the target gene cluster, one can predict the the frequency that a PKS gene fragment will be present in the library sequenced. Computer simulations of the approach were applied to the known PKS and non-ribosomal peptide synthetase (NRPS) gene clusters in the Bacillus subtilus genome. The approach was then used to identify PKS gene fragments in a strain of Sorangium cellulosum that produces epothilone. In addition to identifying fragments of the epothilone gene cluster, we obtained 11 unique fragments from other PKS gene clusters; the results suggest that there may be six to eight PKS gene clusters in this organism. In addition, we identified four unique fragments of NRPS genes, demonstrating that the approach is also applicable for identification of these modular gene clusters.  相似文献   

3.

Revealing patterns of genetic diversity and barriers for gene flow are key points for successful conservation in endangered species. Methods based on molecular markers are also often used to delineate conservation units such as evolutionary significant units and management units. Here we combine phylo-geographic analyses (based on mtDNA) with population and landscape genetic analyses (based on microsatellites) for the endangered yellow-bellied toad Bombina variegata over a wide distribution range in Germany. Our analyses show that two genetic clusters are present in the study area, a northern and a southern/central one, but that these clusters are not deeply divergent. The genetic data suggest high fragmentation among toad occurrences and consequently low genetic diversity. Genetic diversity and genetic connectivity showed a negative relationship with road densities and urban areas surrounding toad occurrences, indicating that these landscape features act as barriers to gene flow. To preserve a maximum of genetic diversity, we recommend considering both genetic clusters as management units, and to increase gene flow among toad occurrences with the aim of restoring and protecting functional meta-populations within each of the clusters. Several isolated populations with especially low genetic diversity and signs of inbreeding need particular short-term conservation attention to avoid extinction. We also recommend to allow natural gene flow between both clusters but not to use individuals from one cluster for translocation or reintroduction into the other. Our results underscore the utility of molecular tools for species conservation, highlight outcomes of habitat fragmentation onto the genetic structure of an endangered amphibian and reveal particularly threatened populations in need for urgent conservation efforts.

  相似文献   

4.
5.
6.
MOTIVATION: Hierarchical clustering is a common approach to study protein and gene expression data. This unsupervised technique is used to find clusters of genes or proteins which are expressed in a coordinated manner across a set of conditions. Because of both the biological and technical variability, experimental repetitions are generally performed. In this work, we propose an approach to evaluate the stability of clusters derived from hierarchical clustering by taking repeated measurements into account. RESULTS: The method is based on the bootstrap technique that is used to obtain pseudo-hierarchies of genes from resampled datasets. Based on a fast dynamic programming algorithm, we compare the original hierarchy to the pseudo-hierarchies and assess the stability of the original gene clusters. Then a shuffling procedure can be used to assess the significance of the cluster stabilities. Our approach is illustrated on simulated data and on two microarray datasets. Compared to the standard hierarchical clustering methodology, it allows to point out the dubious and stable clusters, and thus avoids misleading interpretations. AVAILABILITY: The programs were developed in C and R languages.  相似文献   

7.
The genomic database for a marsupial, the opossum Monodelphis domestica, is highly advanced. This allowed a complete analysis of the keratin I and keratin II gene cluster with some 30 genes in each cluster as well as a comparison with the human keratin clusters. Human and marsupial keratin gene clusters have an astonishingly similar organization. As placental mammals and marsupials are sister groups a corresponding organization is also expected for the archetype mammal. Since hair is a mammalian acquisition the following features of the cluster refer to its origin. In both clusters hair keratin genes arose at an interior position. While we do not know from which epithelial keratin genes the first hair keratins type-I and -II genes evolved, subsequent gene duplications gave rise to a subdomain of the clusters with many neighboring hair keratin genes. A second subdomain accounts in both clusters for 4 neighboring genes encoding the keratins of the inner root sheath (irs) keratins. Finally the hair keratin gene subdomain in the type-I gene cluster is interrupted after the second gene by a region encoding numerous genes for the high/ultrahigh sulfur hair keratin-associated proteins (KAPs). We also propose a tentative synteny relation of opossum and human genes based on maximal sequence conservation of the encoded keratins. The keratin gene clusters of the opossum seem to lack pseudogenes and display a slightly increased number of genes. Opossum keratin genes are usually longer than their human counterparts and also show longer intergenic distances.  相似文献   

8.
The mycobactin siderophore system is present in many Mycobacterium species, including M. tuberculosis and other clinically relevant mycobacteria. This siderophore system is believed to be utilized by both pathogenic and nonpathogenic mycobacteria for iron acquisition in both in vivo and ex vivo iron-limiting environments, respectively. Several M. tuberculosis genes located in a so-called mbt gene cluster have been predicted to be required for the biosynthesis of the core scaffold of mycobactin based on sequence analysis. A systematic and controlled mutational analysis probing the hypothesized essential nature of each of these genes for mycobactin production has been lacking. The degree of conservation of mbt gene cluster orthologs remains to be investigated as well. In this study, we sought to conclusively establish whether each of nine mbt genes was required for mycobactin production and to examine the conservation of gene clusters orthologous to the M. tuberculosis mbt gene cluster in other bacteria. We report a systematic mutational analysis of the mbt gene cluster ortholog found in Mycobacterium smegmatis. This mutational analysis demonstrates that eight of the nine mbt genes investigated are essential for mycobactin production. Our genome mining and phylogenetic analyses reveal the presence of orthologous mbt gene clusters in several bacterial species. These gene clusters display significant organizational differences originating from an intricate evolutionary path that might have included horizontal gene transfers. Altogether, the findings reported herein advance our understanding of the genetic requirements for the biosynthesis of an important mycobacterial secondary metabolite with relevance to virulence.  相似文献   

9.
Cancer is a complex genetic disease, resulting from defects of multiple genes. Development of microarray techniques makes it possible to survey the whole genome and detect genes that have influential impacts on the progression of cancer. Statistical analysis of cancer microarray data is challenging because of the high dimensionality and cluster nature of gene expressions. Here, clusters are composed of genes with coordinated pathological functions and/or correlated expressions. In this article, we consider cancer studies where censored survival endpoint is measured along with microarray gene expressions. We propose a hybrid clustering approach, which uses both pathological pathway information retrieved from KEGG and statistical correlations of gene expressions, to construct gene clusters. Cancer survival time is modeled as a linear function of gene expressions. We adopt the clustering threshold gradient directed regularization (CTGDR) method for simultaneous gene cluster selection, within-cluster gene selection, and predictive model building. Analysis of two lymphoma studies shows that the proposed approach - which is composed of the hybrid gene clustering, linear regression model for survival, and clustering regularized estimation with CTGDR - can effectively identify gene clusters and genes within selected clusters that have satisfactory predictive power for censored cancer survival outcomes.  相似文献   

10.
11.

Background

A tremendous amount of efforts have been devoted to identifying genes for diagnosis and prognosis of diseases using microarray gene expression data. It has been demonstrated that gene expression data have cluster structure, where the clusters consist of co-regulated genes which tend to have coordinated functions. However, most available statistical methods for gene selection do not take into consideration the cluster structure.

Results

We propose a supervised group Lasso approach that takes into account the cluster structure in gene expression data for gene selection and predictive model building. For gene expression data without biological cluster information, we first divide genes into clusters using the K-means approach and determine the optimal number of clusters using the Gap method. The supervised group Lasso consists of two steps. In the first step, we identify important genes within each cluster using the Lasso method. In the second step, we select important clusters using the group Lasso. Tuning parameters are determined using V-fold cross validation at both steps to allow for further flexibility. Prediction performance is evaluated using leave-one-out cross validation. We apply the proposed method to disease classification and survival analysis with microarray data.

Conclusion

We analyze four microarray data sets using the proposed approach: two cancer data sets with binary cancer occurrence as outcomes and two lymphoma data sets with survival outcomes. The results show that the proposed approach is capable of identifying a small number of influential gene clusters and important genes within those clusters, and has better prediction performance than existing methods.  相似文献   

12.
Various membrane functional units such as receptors, transporters, and channels, whose action necessarily involves capturing diffusing molecules, are often organized into multimeric complexes forming clusters on the cell and organelle membranes. These functional units themselves are usually oligomers of several integral proteins, which have their own symmetry. Depending on the symmetry, they form clusters on different packing lattices. Moreover, local membrane inhomogeneities, e.g., the so-called membrane domains, rafts, stalks, etc., lead to different patterns even within the structures on the same packing lattice. Units in the cluster compete for diffusing molecules and screen each other. Here we propose a general approach that allows one to quantify the screening effects. The approach is used to derive simple approximate formulas giving the trapping rates of diffusing molecules by clusters of absorbers on lattices of different packing symmetries. The obtained results describe smooth variation of the trapping rate from the sum of the rates of individual absorbers forming the cluster to the effective collective rate. The latter shows how the trapping efficiency of an individual absorber decreases as the number of absorbers in the cluster increases and/or the inter-absorber distance decreases. Numerical tests demonstrate good agreement between the rates predicted by the theory and obtained from Brownian dynamics simulations for clusters of different shapes and sizes.  相似文献   

13.
Various membrane functional units such as receptors, transporters, and channels, whose action necessarily involves capturing diffusing molecules, are often organized into multimeric complexes forming clusters on the cell and organelle membranes. These functional units themselves are usually oligomers of several integral proteins, which have their own symmetry. Depending on the symmetry, they form clusters on different packing lattices. Moreover, local membrane inhomogeneities, e.g., the so-called membrane domains, rafts, stalks, etc., lead to different patterns even within the structures on the same packing lattice. Units in the cluster compete for diffusing molecules and screen each other. Here we propose a general approach that allows one to quantify the screening effects. The approach is used to derive simple approximate formulas giving the trapping rates of diffusing molecules by clusters of absorbers on lattices of different packing symmetries. The obtained results describe smooth variation of the trapping rate from the sum of the rates of individual absorbers forming the cluster to the effective collective rate. The latter shows how the trapping efficiency of an individual absorber decreases as the number of absorbers in the cluster increases and/or the inter-absorber distance decreases. Numerical tests demonstrate good agreement between the rates predicted by the theory and obtained from Brownian dynamics simulations for clusters of different shapes and sizes.  相似文献   

14.
Genes encoding proteins in a common pathway are often found near each other along bacterial chromosomes. Several explanations have been proposed to account for the evolution of these structures. For instance, natural selection may directly favour gene clusters through a variety of mechanisms, such as increased efficiency of coregulation. An alternative and controversial hypothesis is the selfish operon model, which asserts that clustered arrangements of genes are more easily transferred to other species, thus improving the prospects for survival of the cluster. According to another hypothesis (the persistence model), genes that are in close proximity are less likely to be disrupted by deletions. Here we develop computational models to study the conditions under which gene clusters can evolve and persist. First, we examine the selfish operon model by re-implementing the simulation and running it under a wide range of conditions. Second, we introduce and study a Moran process in which there is natural selection for gene clustering and rearrangement occurs by genome inversion events. Finally, we develop and study a model that includes selection and inversion, which tracks the occurrence and fixation of rearrangements. Surprisingly, gene clusters fail to evolve under a wide range of conditions. Factors that promote the evolution of gene clusters include a low number of genes in the pathway, a high population size, and in the case of the selfish operon model, a high horizontal transfer rate. The computational analysis here has shown that the evolution of gene clusters can occur under both direct and indirect selection as long as certain conditions hold. Under these conditions the selfish operon model is still viable as an explanation for the evolution of gene clusters.  相似文献   

15.
Zheng Y  Roberts RJ  Kasif S 《Genome biology》2002,3(11):research0060.1-research00609

Background  

The current speed of sequencing already exceeds the capability of annotation, creating a potential bottleneck. A large proportion of the genes in microbial genomes remains uncharacterized. Here we propose a new method for functional annotation using the conservation patterns of gene clusters. If several gene clusters show the same coevolution pattern across different genomes it is reasonable to infer they are functionally related. The gene cluster phylogenetic profile integrates chromosomal proximity information and phylogenetic profile information and allows us to infer functional dependences between the gene clusters even at great distance on the chromosome.  相似文献   

16.
Evolutionarily conserved non-coding genomic sequences represent a potentially rich source for the discovery of gene regulatory regions. Since these elements are subject to stabilizing selection they evolve much more slowly than adjacent non-functional DNA. These so-called phylogenetic footprints can be detected by comparison of the sequences surrounding orthologous genes in different species. Therefore the loss of phylogenetic footprints as well as the acquisition of conserved non-coding sequences in some lineages, but not in others, can provide evidence for the evolutionary modification of cis-regulatory elements. We introduce here a statistical model of footprint evolution that allows us to estimate the loss of sequence conservation that can be attributed to gene loss and other structural reasons. This approach to studying the pattern of cis-regulatory element evolution, however, requires the comparison of relatively long sequences from many species. We have therefore developed an efficient software tool for the identification of corresponding footprints in long sequences from multiple species. We apply this novel method to the published sequences of HoxA clusters of shark, human, and the duplicated zebrafish and Takifugu clusters as well as the published HoxB cluster sequences. We find that there is a massive loss of sequence conservation in the intergenic region of the HoxA clusters, consistent with the finding in [Chiu et al., PNAS 99 (2002) 5492]. The loss of conservation after cluster duplication is more extensive than expected from structural reasons. This suggests that binding site turnover and/or adaptive modification may also contribute to the loss of sequence conservation.  相似文献   

17.
Given the accelerated rate of environmental degradation and climate change, there is an urgent need to protect biodiversity, especially endemic species with restricted ranges. However, which areas should be prioritized for protection remains a critical issue. A common approach to prioritizing conservation is to rank areas using species-level metrics. Nevertheless, biodiversity and threat patterns can become complex when the amounts of data increase. Here, we analyzed the distribution of 1570 Argentinean endemic plants using clustering of spatially associated species to disentangle distribution and threat patterns. We explored vulnerability levels in each cluster using mean values of species-level metrics of vulnerability, relating values obtained to the regions and environments they occupy. For each cluster we also identified its hotspots and evaluated the effectiveness of the current protected area network for their conservation. Results yielded nine main clusters, mostly differentiated by their geographic distribution and by the ecoregions they occupy. Metrics revealed disparity in vulnerability levels among clusters, with the highest values recorded for clusters related to the Central Puna in northwestern Argentina, to the Espinal, Humid Pampas, and Low Monte in the east of the country, and to the Patagonian steppe in the south. Likewise, coverage by protected areas was low for most hotspots, with the lowest values recorded for the Patagonian cluster. In particular, for hotspot of this cluster, located along the Patagonian steppe in southern Chubut and northeastern Santa Cruz provinces, analyses showed that it has both high levels of vulnerability and low levels of protection, giving it the highest conservation priority of the entire pool analyzed. Our findings identify gaps in the current protected area network and highlight key areas in need of conservation policies and strategies, both in situ and ex situ, to protect the endemic plants of Argentina.  相似文献   

18.
19.
In silico database searches allowed the identification in the S. flavogriseus ATCC 33331 genome of a carbapenem gene cluster highly related to the S. cattleya thienamycin one. This is the second cluster found for a complex highly substituted carbapenem. Comparative analysis revealed that both gene clusters display a high degree of synteny in gene organization and in protein conservation. Although the cluster appears to be silent under our laboratory conditions, the putative metabolic product was predicted from bioinformatics analyses using sequence comparison tools. These data, together with previous reports concerning epithienamycins production by S. flavogriseus strains, suggest that the cluster metabolic product might be a thienamycin-like carbapenem, possibly the epimeric epithienamycin. This finding might help in understanding the biosynthetic pathway to thienamycin and other highly substituted carbapenems. It also provides another example of genome mining in Streptomyces sequenced genomes as a powerful approach for novel antibiotic discovery.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号