首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We previously reported two graph algorithms for analysis of genomic information: a graph comparison algorithm to detect locally similar regions called correlated clusters and an algorithm to find a graph feature called P-quasi complete linkage. Based on these algorithms we have developed an automatic procedure to detect conserved gene clusters and align orthologous gene orders in multiple genomes. In the first step, the graph comparison is applied to pairwise genome comparisons, where the genome is considered as a one-dimensionally connected graph with genes as its nodes, and correlated clusters of genes that share sequence similarities are identified. In the next step, the P-quasi complete linkage analysis is applied to grouping of related clusters and conserved gene clusters in multiple genomes are identified. In the last step, orthologous relations of genes are established among each conserved cluster. We analyzed 17 completely sequenced microbial genomes and obtained 2313 clusters when the completeness parameter P was 40%. About one quarter contained at least two genes that appeared in the metabolic and regulatory pathways in the KEGG database. This collection of conserved gene clusters is used to refine and augment ortholog group tables in KEGG and also to define ortholog identifiers as an extension of EC numbers.  相似文献   

2.
3.
The quaternary structures impart structural and functional credibility to proteins. In a multi-subunit protein, it is important to understand the factors that drive the association or dissociation of the subunits. It is a well known fact that both hydrophobic and charged interactions contribute to the stability of the protein interface. The interface residues are also known to be highly conserved. Though they are buried in the oligomer, these residues are either exposed or partially exposed in the monomer. It is felt that a systematic and objective method of identifying interface clusters and their analysis can significantly contribute to the identification of a residue or a collection of residues important for oligomerization. Recently, we have applied the techniques of graph-spectral methods to a variety of problems related to protein structure and folding. A major advantage of this methodology is that the problem is viewed from a global protein topology point of view rather than localized regions of the protein structure. In the present investigation, we have applied the methods of graph-spectral analysis to identify side chain clusters at the interface and the centers of these clusters in a set of homodimeric proteins. These clusters are analyzed in terms of properties such as amino acid composition, accessibility to solvent and conservation of residues. Interesting results such as participation of charged and aromatic residues like arginine, glutamic acid, histidine, phenylalanine and tyrosine, consistent with earlier investigations, have emerged from these analyses. Important additional information is that the residues involved are a part of a cluster(s) and that they are sequentially distant residues which have come closer to each other in the three-dimensional structure of the protein. These residues can easily be detected using our graph-spectral algorithm. This method has also been used to identify important residues ('hot spots') in dimerization and also to detect dimerization sites on the monomer. The residues predicted using the present algorithm have correlated well with the experiments indicating the efficacy of this method in predicting residues involved in dimer stability.  相似文献   

4.
5.
6.
Genes that are clustered on multiple genomes and are likely to functionally interact tend to be gained or lost together during genome evolution. Here, we demonstrate that exceptions to this pattern indicate relatively distant functional interactions between the encoded proteins. Hence, this can be used to divide predicted clusters of functionally interacting proteins into sub-clusters, and as such, to refine the prediction of their function and functional interactions.  相似文献   

7.
Identifying clusters of functionally related genes in genomes   总被引:4,自引:0,他引:4  
MOTIVATION: An increasing body of literature shows that genomes of eukaryotes can contain clusters of functionally related genes. Most approaches to identify gene clusters utilize microarray data or metabolic pathway databases to find groups of genes on chromosomes that are linked by common attributes. A generalized method that can find gene clusters regardless of the mechanism of origin would provide researchers with an unbiased method for finding clusters and studying the evolutionary forces that give rise to them. RESULTS: We present an algorithm to identify gene clusters in eukaryotic genomes that utilizes functional categories defined in graph-based vocabularies such as the Gene Ontology (GO). Clusters identified in this manner need only have a common function and are not constrained by gene expression or other properties. We tested the algorithm by analyzing genomes of a representative set of species. We identified species-specific variation in percentage of clustered genes as well as in properties of gene clusters including size distribution and functional annotation. These properties may be diagnostic of the evolutionary forces that lead to the formation of gene clusters. AVAILABILITY: A software implementation of the algorithm and example output files are available at http://fcg.tamu.edu/C_Hunter/.  相似文献   

8.
Hydrogen/deuterium exchange, which depends on solvent accessibility, can be probed by mass spectrometry (MS) to get information on protein conformation or protein–ligand interaction. In this work, the conformational properties of the cyanobacterium Anabaena wild-type ferredoxin as well as of two single-site mutants (Phe 65 Ala and Arg 42 Ala) were studied. After incubation of the wild type and mutant proteins in deuterated water and quenching of the exchange at low pH, the proteins were rapidly digested at high enzyme-to-substrate ratio using immobilized pepsin, and the resulting peptides were characterized using ESI-MS. We have identified specific regions for which the H-bonding or solvent accessibility properties were perturbed by the mutations. These results show that this approach can provide local information on the influence of mutations, even for a highly structured protein like ferredoxin, and sometimes in regions distant from the mutation point.  相似文献   

9.
A method to detect DNA-binding sites on the surface of a protein structure is important for functional annotation. This work describes the analysis of residue patches on the surface of DNA-binding proteins and the development of a method of predicting DNA-binding sites using a single feature of these surface patches. Surface patches and the DNA-binding sites were initially analysed for accessibility, electrostatic potential, residue propensity, hydrophobicity and residue conservation. From this, it was observed that the DNA-binding sites were, in general, amongst the top 10% of patches with the largest positive electrostatic scores. This knowledge led to the development of a prediction method in which patches of surface residues were selected such that they excluded residues with negative electrostatic scores. This method was used to make predictions for a data set of 56 non-homologous DNA-binding proteins. Correct predictions made for 68% of the data set.  相似文献   

10.
Fodor AA  Aldrich RW 《Proteins》2004,56(2):211-221
It has long been argued that algorithms that find correlated mutations in multiple sequence alignments can be used to find structurally or functionally important residues in proteins. We examined the properties of four different methods for detecting these correlated mutations. On both simple, artificial alignments and real alignments from the Pfam database, we found a surprising lack of agreement between the four correlated mutation methods. We argue that these differences are caused in part by differing sensitivities to background conservation. Correlated mutation algorithms can be envisioned as "filters" of background conservation with each algorithm searching for correlated mutations that occur at a different background conservation frequency.  相似文献   

11.
Backbone cluster identification in proteins by a graph theoretical method   总被引:4,自引:0,他引:4  
A graph theoretical algorithm has been developed to identify backbone clusters of residues in proteins. The identified clusters show protein sites with the highest degree of interactions. An adjacency matrix is constructed from the non-bonded connectivity information in proteins. The diagonalization of such a matrix yields eigenvalues and eigenvectors, which contain the information on clusters. In graph theory, distinct clusters can be obtained from the second lowest eigenvector components of the matrix. However, in an interconnected graph, all the points appear as one single cluster. We have developed a method of identifying highly interacting centers (clusters) in proteins by truncating the vector components of high eigenvalues. This paper presents in detail the method adopted for identifying backbone clusters and the application of the algorithm to families of proteins like RNase-A and globin. The objective of this study was to show the efficiency of the algorithm as well as to detect conserved or similar backbone packing regions in a particular protein family. Three clusters in topologically similar regions in the case of the RNase-A family and three clusters around the porphyrin ring in the globin family were observed. The predicted clusters are consistent with the features of the family of proteins such as the topology and packing density. The method can be applied to problems such as identification of domains and recognition of structural similarities in proteins.  相似文献   

12.
Amino acid residues can be divided into similar groups by frequencies of interreplacements in the evolutionary pathway and by trends to spatial contacts at the tertiary structures of globular proteins. Each residue was compared to the cluster of spatial surrounding--the totality of residues spacially drawn together. 5210 clusters in 32 unhomologous proteins with established tertiary structure and 6447 clusters formed only by variables amino acid residues were analysed. Spatial contacts among residues were studied depending on the secondary structure and the amount of residues in a cluster. It was assumed that functionally admissible mutations may be defined, first of all, by the degree of neighboring of amino acid residues in the spatial surrounding.  相似文献   

13.
The conformational dynamics of cytochrome P450 enzymes are critical to their catalytic activity. In this study, the correlated motion between residues in a 200 ns molecular dynamics trajectory of the thermophilic CYP119 was analyzed to parse out conformational relationships. Residues that are structurally related, for example residues within a helix, generally have highly correlated motion. In addition, clusters of non-adjacent residues that show correlated motion (“hot spots”) are seen in various regions, including at the base of the F and G helices that make up the most dynamic region of the enzyme. A modified k-means algorithm that clusters residues based on their correlated motion indicates that functionally related residues are in the same cluster (e.g., the catalytic threonines and the heme). Tightly coupled clusters form a solvent-exposed “shell” around the enzyme, whereas less coupling between clusters is seen in regions that are critical to ligand interactions, redox partner interactions, and catalysis. Most notably, we find that residues in the active site move independently from the rest of the enzyme, effectively insulating the catalytic machinery from other regions of the protein.  相似文献   

14.
In eukaryotes, neighboring genes can be packaged together in specific chromatin structures that ensure their coordinated expression. Examples of such multi-gene chromatin domains are well-documented, but a global view of the chromatin organization of eukaryotic genomes is lacking. To systematically identify multi-gene chromatin domains, we constructed a compendium of genome-scale binding maps for a broad panel of chromatin-associated proteins in Drosophila melanogaster. Next, we computationally analyzed this compendium for evidence of multi-gene chromatin domains using a novel statistical segmentation algorithm. We find that at least 50% of all fly genes are organized into chromatin domains, which often consist of dozens of genes. The domains are characterized by various known and novel combinations of chromatin proteins. The genes in many of the domains are coregulated during development and tend to have similar biological functions. Furthermore, during evolution fewer chromosomal rearrangements occur inside chromatin domains than outside domains. Our results indicate that a substantial portion of the Drosophila genome is packaged into functionally coherent, multi-gene chromatin domains. This has broad mechanistic implications for gene regulation and genome evolution.  相似文献   

15.
Deleterious mutations inevitably emerge in any evolutionary process and are speculated to decisively influence the structure of the genome. Meiosis, which is thought to play a major role in handling mutations on the population level, recombines chromosomes via non-randomly distributed hot spots for meiotic recombination. In many genomes, various types of genetic elements are distributed in patterns that are currently not well understood. In particular, important (essential) genes are arranged in clusters, which often cannot be explained by a functional relationship of the involved genes. Here we show by computer simulation that essential gene (EG) clustering provides a fitness benefit in handling deleterious mutations in sexual populations with variable levels of inbreeding and outbreeding. We find that recessive lethal mutations enforce a selective pressure towards clustered genome architectures. Our simulations correctly predict (i) the evolution of non-random distributions of meiotic crossovers, (ii) the genome-wide anti-correlation of meiotic crossovers and EG clustering, (iii) the evolution of EG enrichment in pericentromeric regions and (iv) the associated absence of meiotic crossovers (cold centromeres). Our results furthermore predict optimal crossover rates for yeast chromosomes, which match the experimentally determined rates. Using a Saccharomyces cerevisiae conditional mutator strain, we show that haploid lethal phenotypes result predominantly from mutation of single loci and generally do not impair mating, which leads to an accumulation of mutational load following meiosis and mating. We hypothesize that purging of deleterious mutations in essential genes constitutes an important factor driving meiotic crossover. Therefore, the increased robustness of populations to deleterious mutations, which arises from clustered genome architectures, may provide a significant selective force shaping crossover distribution. Our analysis reveals a new aspect of the evolution of genome architectures that complements insights about molecular constraints, such as the interference of pericentromeric crossovers with chromosome segregation.  相似文献   

16.
The formation of clustered DNA damage sites is a unique feature of ionizing radiation. Recent studies have shown that the repair of lesions within clusters may be compromised, but little is understood about the mutagenic consequences of such damage sites. Using a plasmid-based method, damaged DNA containing uracil positioned at 1–5 bp separations from 8-oxo-7,8-dihydroguanine on the complementary strand was transfected into wild-type Escherichia coli or into strains lacking the DNA glycosylases Fpg and MutY. Mutation frequencies were found to be significantly higher for clustered damage sites than for single lesions. The loss of MutY gave a large relative increase in mutation frequency and a strain lacking both Fpg and MutY showed even higher mutation frequencies, up to nearly 40% of rescued plasmid. In these strains, the mutation frequency decreases with increasing spacing of the uracil from the 8-oxo-7,8-dihydroguanine site. Sequencing of plasmid DNA carrying clustered damage, following rescue from bacteria, showed that almost all of the mutations are GC→TA transversions. The data suggest that at clustered damage sites, depending on lesion spacing, the action of Fpg is compromised and post-replication processing of lesions by MutY is the most important mechanism for protection against mutagenesis.  相似文献   

17.
Substitutions of individual amino acids in proteins may be under very different evolutionary restraints depending on their structural and functional roles. The Environment Specific Substitution Table (ESST) describes the pattern of substitutions in terms of amino acid location within elements of secondary structure, solvent accessibility, and the existence of hydrogen bonds between side chains and neighbouring amino acid residues. Clearly amino acids that have very different local environments in their functional state compared to those in the protein analysed will give rise to inconsistencies in the calculation of amino acid substitution tables. Here, we describe how the calculation of ESSTs can be improved by discarding the functional residues from the calculation of substitution tables. Four categories of functions are examined in this study: protein–protein interactions, protein–nucleic acid interactions, protein–ligand interactions, and catalytic activity of enzymes. Their contributions to residue conservation are measured and investigated. We test our new ESSTs using the program CRESCENDO, designed to predict functional residues by exploiting knowledge of amino acid substitutions, and compare the benchmark results with proteins whose functions have been defined experimentally. The new methodology increases the Z-score by 98% at the active site residues and finds 16% more active sites compared with the old ESST. We also find that discarding amino acids responsible for protein–protein interactions helps in the prediction of those residues although they are not as conserved as the residues of active sites. Our methodology can make the substitution tables better reflect and describe the substitution patterns of amino acids that are under structural restraints only.  相似文献   

18.
Human-Specific Integrations of the HERV-K Endogenous Retrovirus Family   总被引:13,自引:5,他引:8  
Several distinct families of endogenous retrovirus-like sequences (HERVs) exist in the genomes of humans and other primates. One of these families, the HERV-K group, contains members that encode functional proteins and that have been implicated in the etiology of insulin-dependent diabetes mellitus (IDDM). Because of potential functional and disease relevance, it is important to determine if there are HERV-K-associated genetic differences between individuals. In this study, we have investigated the divergence and evolutionary age of HERV-K long terminal repeats (LTRs). Thirty-seven LTRs, taken primarily from random human clones in GenBank, were aligned and grouped into nine clusters with decreasing sequence divergence. Cluster 1 sequences are 8.6% divergent, on average, whereas cluster 9 LTRs, represented by the LTRs of the fully sequenced HERV-K10 clone, show an average of only 1.1% divergence from each other. The evolutionary age of 18 LTRs from different clusters was then investigated by genomic PCR to determine presence or absence of the retroviral element in different primate species. LTRs from clusters of higher divergence were detected in monkeys and apes, whereas LTRs in clusters with lower divergence were acquired later in evolution. Notably, LTRs of cluster 9 were found only in humans at all nine loci examined. Genomic Southern analysis with an oligonucleotide probe specific for cluster 9 LTRs suggests that HERV-K elements with this type of LTR expanded independently in the genomes of humans and the great apes. This is the first report of endogenous retroviral integrations that are specific to humans and indicates that some HERVs have amplified much later than previously thought. These elements may still be actively transposing and may therefore represent a source of genetic variation linked to disease development.  相似文献   

19.
The L1 cell adhesion molecule has six domains homologous to members of the immunoglobulin superfamily and five homologous to fibronectin type III domains. We determined the outline structure of the L1 domains by showing that they have, at the key sites that determine conformation, residues similar to those in proteins of known structure. The outline structure describes the relative positions of residues, the major secondary structures and residue solvent accessibility. We use the outline structure to investigate the likely effects of 22 mutations that cause neurological diseases. The mutations are not randomly distributed but cluster in a few regions of the structure. They can be divided into those that act mainly by changing conformation or denaturing their domain and those that alter its surface properties.  相似文献   

20.
The availability of computerized knowledge on biochemical pathways in the KEGG database opens new opportunities for developing computational methods to characterize and understand higher level functions of complete genomes. Our approach is based on the concept of graphs; for example, the genome is a graph with genes as nodes and the pathway is another graph with gene products as nodes. We have developed a simple method for graph comparison to identify local similarities, termed correlated clusters, between two graphs, which allows gaps and mismatches of nodes and edges and is especially suitable for detecting biological features. The method was applied to a comparison of the complete genomes of 10 microorganisms and the KEGG metabolic pathways, which revealed, not surprisingly, a tendency for formation of correlated clusters called FRECs (functionally related enzyme clusters). However, this tendency varied considerably depending on the organism. The relative number of enzymes in FRECs was close to 50% for Bacillus subtilis and Escherichia coli, but was <10% for Synechocystis and Saccharomyces cerevisiae. The FRECs collection is reorganized into a collection of ortholog group tables in KEGG, which represents conserved pathway motifs with the information about gene clusters in all the completely sequenced genomes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号