首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Connected gene neighborhoods in prokaryotic genomes   总被引:12,自引:1,他引:11  
A computational method was developed for delineating connected gene neighborhoods in bacterial and archaeal genomes. These gene neighborhoods are not typically present, in their entirety, in any single genome, but are held together by overlapping, partially conserved gene arrays. The procedure was applied to comparing the orders of orthologous genes, which were extracted from the database of Clusters of Orthologous Groups of proteins (COGs), in 31 prokaryotic genomes and resulted in the identification of 188 clusters of gene arrays, which included 1001 of 2890 COGs. These clusters were projected onto actual genomes to produce extended neighborhoods including additional genes, which are adjacent to the genes from the clusters and are transcribed in the same direction, which resulted in a total of 2387 COGs being included in the neighborhoods. Most of the neighborhoods consist predominantly of genes united by a coherent functional theme, but also include a minority of genes without an obvious functional connection to the main theme. We hypothesize that although some of the latter genes might have unsuspected roles, others are maintained within gene arrays because of the advantage of expression at a level that is typical of the given neighborhood. We designate this phenomenon ‘genomic hitchhiking’. The largest neighborhood includes 79 genes (COGs) and consists of overlapping, rearranged ribosomal protein superoperons; apparent genome hitchhiking is particularly typical of this neighborhood and other neighborhoods that consist of genes coding for translation machinery components. Several neighborhoods involve previously undetected connections between genes, allowing new functional predictions. Gene neighborhoods appear to evolve via complex rearrangement, with different combinations of genes from a neighborhood fixed in different lineages.  相似文献   

2.
3.
Conservation of proximity of a pair of genes across multiple genomes generally indicates that their functions could be linked. Here, we present a systematic evaluation using 42 complete microbial genomes from 25 phylogenetic groups to test the reliability of this observation in predicting function for genes. We find a relationship between the number of phylogenetic groups in which a gene pair is proximate and the probability that the pair belongs to a common pathway. Our method produces 1586 links between ortholog families substantiated by observed proximity in genomes representing at least three phylogenetic groups. Of the pairs annotated in the KEGG database, 80% are in the same biological pathway in KEGG.  相似文献   

4.
Knowledge of the contributions of arterialand venous transit time dispersion to the pulmonary vascular transittime distribution is important for understanding lung function and forinterpreting various kinds of data containing information aboutpulmonary function. Thus, to determine the dispersion of blood transittimes occurring within the pulmonary arterial and venous trees, imagesof a bolus of contrast medium passing through the vasculature ofpump-perfused dog lung lobes were acquired by using an X-ray microfocalangiography system. Time-absorbance curves from the lobar artery andvein and from selected locations within the intrapulmonary arterial tree were measured from the images. Overall dispersion within the lunglobe was determined from the difference in the first and second moments(mean transit time and variance, respectively) of the inlet arterialand outlet venous time-absorbance curves. Moments at selected locationswithin the arterial tree were also calculated and compared with thoseof the lobar artery curve. Transit times for the arterial pathwaysupstream from the smallest measured arteries (200-µm diameter) wereless than ~20% of the total lung lobe mean transit time. Transittime variance among these arterial pathways (interpathway dispersion)was less than ~5% of the total variance imparted on the bolus as itpassed through the lung lobe. On average, the dispersion that occurredalong a given pathway (intrapathway dispersion) was negligible. Similar results were obtained for the venous tree. Taken together, the resultssuggest that most of the variation in transit time in theintrapulmonary vasculature occurs within the pulmonary capillary bedrather than in conducting arteries or veins.

  相似文献   

5.
6.
Wang J  Zhang Y  Shen X  Zhu J  Zhang L  Zou J  Guo Z 《Molecular bioSystems》2011,7(4):1158-1166
Finding candidate cancer genes playing causal roles in carcinogenesis is an important task in cancer research. The non-randomness of the co-mutation of genes in cancer samples can provide statistical evidence for these genes' involvement in carcinogenesis. It can also provide important information on the functional cooperation of gene mutations in cancer. However, due to the relatively small sample sizes used in current high-throughput somatic mutation screening studies and the extraordinary large-scale hypothesis tests, the statistical power of finding co-mutated gene pairs based on high-throughput somatic mutation data of cancer genomes is very low. Thus, we proposed a stratified FDR (False Discovery Rate) control approach, for identifying significantly co-mutated gene pairs according to the mutation frequency of genes. We then compared the identified co-mutated gene pairs separately by pre-selecting genes with higher mutation frequencies and by the stratified FDR control approach. Finally, we searched for pairs of pathways annotated with significantly more between-pathway co-mutated gene pairs to evaluate the functional roles of the identified co-mutated gene pairs. Based on two datasets of somatic mutations in cancer genomes, we demonstrated that, at a given FDR level, the power of finding co-mutated gene pairs could be increased by pre-selecting genes with higher mutation frequencies. However, many true co-mutation between genes with lower mutation rates will still be missed. By the stratified FDR control approach, many more co-mutated gene pairs could be found. Finally, the identified pathway pairs significantly overrepresented with between-pathway co-mutated gene pairs suggested that their co-dysregulations may play causal roles in carcinogenesis. The stratified FDR control strategy is efficient in identifying co-mutated gene pairs and the genes in the identified co-mutated gene pairs can be considered as candidate cancer genes because their non-random co-mutations in cancer genomes are highly unlikely to be attributable to chance.  相似文献   

7.
Comparative genomics is an essential tool to unravel how genomes change over evolutionary time and to gain clues on the links between functional genomics and evolution. In prokaryotes, the large, good quality, genome sequences available in public databases and the recently developed large-scale computational methods, offer an unprecedent view on the ecology and evolution of microorganisms through comparative genomics. In this work, we examined the links among genome structure (i.e., the sequential distribution of nucleotides itself by detrended fluctuation analysis, DFA) and genomic diversity (i.e., gene functionality by Clusters of Orthologous Genes, COGs) in 828 full sequenced prokaryotic genomes from 548 different bacteria and archaea species. DFA scaling exponent α indicated persistent long-range correlations (fractality) in each genome analyzed. Higher resolution power was found when considering the sequential succession of purine (AG) vs. pyrimidine (CT) bases than either keto (GT) to amino (AC) forms or strongly (GC) vs. weakly (AT) bonded nucleotides. Interestingly, the phyla Aquificae, Fusobacteria, Dictyoglomi, Nitrospirae, and Thermotogae were closer to archaea than to their bacterial counterparts. A strong significant correlation was found between scaling exponent α and COGs distribution, and we consistently observed that the larger α the more heterogeneous was the gene distribution within each functional category, suggesting a close relationship between primary nucleotides sequence structure and functional genes composition.  相似文献   

8.
MOTIVATION: One popular method for analyzing functional connectivity between genes is to cluster genes with similar expression profiles. The most popular metrics measuring the similarity (or dissimilarity) among genes include Pearson's correlation, linear regression coefficient and Euclidean distance. As these metrics only give some constant values, they can only depict a stationary connectivity between genes. However, the functional connectivity between genes usually changes with time. Here, we introduce a novel insight for characterizing the relationship between genes and find out a proper mathematical model, variable parameter regression and Kalman filtering to model it. RESULTS: We applied our algorithm to some simulated data and two pairs of real gene expression data. The changes of connectivity in simulated data are closely identical with the truth and the results of two pairs of gene expression data show that our method has successfully demonstrated the dynamic connectivity between genes. CONTACT: jiangtz@nlpr.ia.ac.cn.  相似文献   

9.
MOTIVATION: Gaussian graphical models (GGMs) are a popular tool for representing gene association structures. We propose using estimated partial correlations from these models to attach lengths to the edges of the GGM, where the length of an edge is inversely related to the partial correlation between the gene pair. Graphical lasso is used to fit the GGMs and obtain partial correlations. The shortest paths between pairs of genes are found. Where terminal genes have the same biological function intermediate genes on the path are classified as having the same function. We validate the method using genes of known function using the Rosetta Compendium of yeast (Saccharomyces Cerevisiae) gene expression profiles. We also compare our results with those obtained using a graph constructed using correlations. RESULTS: Using a partial correlation graph, we are able to classify approximately twice as many genes to the same level of accuracy as when using a correlation graph. More importantly when both methods are tuned to classify a similar number of genes, the partial correlation approach can increase the accuracy of the classifications.  相似文献   

10.
11.
Operon prediction in Pyrococcus furiosus   总被引:1,自引:0,他引:1  
Identification of operons in the hyperthermophilic archaeon Pyrococcus furiosus represents an important step to understanding the regulatory mechanisms that enable the organism to adapt and thrive in extreme environments. We have predicted operons in P.furiosus by combining the results from three existing algorithms using a neural network (NN). These algorithms use intergenic distances, phylogenetic profiles, functional categories and gene-order conservation in their operon prediction. Our method takes as inputs the confidence scores of the three programs, and outputs a prediction of whether adjacent genes on the same strand belong to the same operon. In addition, we have applied Gene Ontology (GO) and KEGG pathway information to improve the accuracy of our algorithm. The parameters of this NN predictor are trained on a subset of all experimentally verified operon gene pairs of Bacillus subtilis. It subsequently achieved 86.5% prediction accuracy when applied to a subset of gene pairs for Escherichia coli, which is substantially better than any of the three prediction programs. Using this new algorithm, we predicted 470 operons in the P.furiosus genome. Of these, 349 were validated using DNA microarray data.  相似文献   

12.
Tao X  Chen X  Yang X  Tian J 《PloS one》2012,7(4):e35704
Fingerprint recognition with identical twins is a challenging task due to the closest genetics-based relationship existing in the identical twins. Several pioneers have analyzed the similarity between twins' fingerprints. In this work we continue to investigate the topic of the similarity of identical twin fingerprints. Our study was tested based on a large identical twin fingerprint database that contains 83 twin pairs, 4 fingers per individual and six impressions per finger: 3984 (83*2*4*6) images. Compared to the previous work, our contributions are summarized as follows: (1) Two state-of-the-art fingerprint identification methods: P071 and VeriFinger 6.1 were used, rather than one fingerprint identification method in previous studies. (2) Six impressions per finger were captured, rather than just one impression, which makes the genuine distribution of matching scores more realistic. (3) A larger sample (83 pairs) was collected. (4) A novel statistical analysis, which aims at showing the probability distribution of the fingerprint types for the corresponding fingers of identical twins which have same fingerprint type, has been conducted. (5) A novel analysis, which aims at showing which finger from identical twins has higher probability of having same fingerprint type, has been conducted. Our results showed that: (a) A state-of-the-art automatic fingerprint verification system can distinguish identical twins without drastic degradation in performance. (b) The chance that the fingerprints have the same type from identical twins is 0.7440, comparing to 0.3215 from non-identical twins. (c) For the corresponding fingers of identical twins which have same fingerprint type, the probability distribution of five major fingerprint types is similar to the probability distribution for all the fingers' fingerprint type. (d) For each of four fingers of identical twins, the probability of having same fingerprint type is similar.  相似文献   

13.
14.
15.
A comparative analysis has been made of the DNA sequences of the isofunctional genes encodingN-acetylglutamate synthase of the arginine biosynthetic pathway of the bacterial speciesPseudomonas aeruginosa andPseudomonas putida. Overall homologies of 81% and 84% at the nucleotide and deduced amino acid sequence levels, respectively, were observed. This high homology was also reflected in the strikingly similar hydropathy profiles of the encoded proteins; patterns of codon usage, including rare codon usage; and amino acid composition of the proteins. This high level of homology at the DNA sequence level is consistent with the location of these genes in the genetically conserved chromosomal region (called auxotrophic-rich region) of the respectivePseudomonas species. Despite chromosomal rearrangements identified in this region the conservation observed at the chromosomal level between thesePseudomonas species is also maintained at the level of the DNA sequence, and in the deduced amino acid sequence, of the genes reported here and of six other pairs of genes of the tryptophan biosynthetic pathway, reported by others, which are also located within this chromosomal region.  相似文献   

16.
17.

Background  

An evolutionary classification of genes from sequenced genomes that distinguishes between orthologs and paralogs is indispensable for genome annotation and evolutionary reconstruction. Shortly after multiple genome sequences of bacteria, archaea, and unicellular eukaryotes became available, an attempt on such a classification was implemented in Clusters of Orthologous Groups of proteins (COGs). Rapid accumulation of genome sequences creates opportunities for refining COGs but also represents a challenge because of error amplification. One of the practical strategies involves construction of refined COGs for phylogenetically compact subsets of genomes.  相似文献   

18.
Cancer evolves through the accumulation of mutations, but the order in which mutations occur is poorly understood. Inference of a temporal ordering on the level of genes is challenging because clinically and histologically identical tumors often have few mutated genes in common. This heterogeneity may at least in part be due to mutations in different genes having similar phenotypic effects by acting in the same functional pathway. We estimate the constraints on the order in which alterations accumulate during cancer progression from cross-sectional mutation data using a probabilistic graphical model termed Hidden Conjunctive Bayesian Network (H-CBN). The possible orders are analyzed on the level of genes and, after mapping genes to functional pathways, also on the pathway level. We find stronger evidence for pathway order constraints than for gene order constraints, indicating that temporal ordering results from selective pressure acting at the pathway level. The accumulation of changes in core pathways differs among cancer types, yet a common feature is that progression appears to begin with mutations in genes that regulate apoptosis pathways and to conclude with mutations in genes involved in invasion pathways. H-CBN models provide a quantitative and intuitive model of tumorigenesis showing that the genetic events can be linked to the phenotypic progression on the level of pathways.  相似文献   

19.
20.
We present a strategy for generating and analyzing comprehensive genetic-interaction maps, termed E-MAPs (epistatic miniarray profiles), comprising quantitative measures of aggravating or alleviating interactions between gene pairs. Crucial to the interpretation of E-MAPs is their high-density nature made possible by focusing on logically connected gene subsets and including essential genes. Described here is the analysis of an E-MAP of genes acting in the yeast early secretory pathway. Hierarchical clustering, together with novel analytical strategies and experimental verification, revealed or clarified the role of many proteins involved in extensively studied processes such as sphingolipid metabolism and retention of HDEL proteins. At a broader level, analysis of the E-MAP delineated pathway organization and components of physical complexes and illustrated the interconnection between the various secretory processes. Extension of this strategy to other logically connected gene subsets in yeast and higher eukaryotes should provide critical insights into the functional/organizational principles of biological systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号