首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

Pathogenicity islands (PAIs), distinct genomic segments of pathogens encoding virulence factors, represent a subgroup of genomic islands (GIs) that have been acquired by horizontal gene transfer event. Up to now, computational approaches for identifying PAIs have been focused on the detection of genomic regions which only differ from the rest of the genome in their base composition and codon usage. These approaches often lead to the identification of genomic islands, rather than PAIs.  相似文献   

2.
3.
Laboratories working with draft phase genomes have specific software needs, such as the unattended processing of hundreds of single scaffolds and subsequent sequence annotation. In addition, it is critical to follow the "movement" and the manual annotation of single open reading frames (ORFs) within the successive sequence updates. Even with finished genomes, regular database updates can lead to significant changes in the annotation of single ORFs. In functional genomics it is important to mine data and identify new genetic targets rapidly and easily. Often there is no need for sophisticated relational databases (RDB) that greatly reduce the system-independent access of the results. Another aspect is the internet dependency of most software packages. If users are working with confidential data, this dependency poses a security issue. GAMOLA was designed to handle the numerous scaffolds and changing contents of draft phase genomes in an automated process and stores the results for each predicted ORF in flatfile databases. In addition, annotation transfers, ORF designation tracking, Blast comparisons, and primer design for whole genome microarrays have been implemented. The software is available under the license of North Carolina State University. A website and a downloadable example are accessible under (http://fsweb2.schaub. ncsu.edu/TRKwebsite/index.htm).  相似文献   

4.

Background  

Selenocysteine and pyrrolysine are the 21st and 22nd amino acids, which are genetically encoded by stop codons. Since a number of microbial genomes have been completely sequenced to date, it is tempting to ask whether the 23rd amino acid is left undiscovered in these genomes. Recently, a computational study addressed this question and reported that no tRNA gene for unknown amino acid was found in genome sequences available. However, performance of the tRNA prediction program on an unknown tRNA family, which may have atypical sequence and structure, is unclear, thereby rendering their result inconclusive. A protein-level study will provide independent insight into the novel amino acid.  相似文献   

5.
Most of the gene prediction algorithms for prokaryotes are based on Hidden Markov Models or similar machine-learning approaches, which imply the optimization of a high number of parameters. The present paper presents a novel method for the classification of coding and non-coding regions in prokaryotic genomes, based on a suitably defined compression index of a DNA sequence. The main features of this new method are the non-parametric logic and the costruction of a dictionary of words extracted from the sequences. These dictionaries can be very useful to perform further analyses on the genomic sequences themselves. The proposed approach has been applied on some prokaryotic complete genomes, obtaining optimal scores of correctly recognized coding and non-coding regions. Several false-positive and false-negative cases have been investigated in detail, which have revealed that this approach can fail in the presence of highly structured coding regions (e.g., genes coding for modular proteins) or quasi-random non-coding regions (e.g., regions hosting non-functional fragments of copies of functional genes; regions hosting promoters or other protein-binding sequences). We perform an overall comparison with other gene-finder software, since at this step we are not interested in building another gene-finder system, but only in exploring the possibility of the suggested approach.  相似文献   

6.
7.
Zhang SH  Wang L 《Genomics》2011,97(5):330-331
It has been reported that there is a majority triplet profile among genomes, which was considered as a reflection of general mechanisms of genome evolution (Albrecht-Buehler, 2007). However, there are actually, according to our further analysis and at least among prokaryotic genomes, two common triplet profiles: one is from low-GC content genomes; the other is from high-GC content genomes. Both common profiles would be direct reflections of GC content variations and strand symmetry of genomic sequences.  相似文献   

8.
Insertion sequences (ISs) are small DNA segments that are often capable of moving neighbouring genes. Over 1500 different ISs have been identified to date. They can have large and spectacular effects in shaping and reshuffling the bacterial genome. Recent studies have provided dramatic examples of such IS activity, including massive IS expansion during the emergence of some pathogenic bacterial species and the intimate involvement of ISs in assembling genes into complex plasmid structures. However, a global understanding of their impact on bacterial genomes requires detailed knowledge of their distribution across the eubacterial and archaeal kingdoms, understanding their partition between chromosomes and extra-chromosomal elements (e.g. plasmids and viruses) and the factors which influence this, and appreciation of the different transposition mechanisms in action, the target preferences and the host factors that influence transposition. In addition, defective (non- autonomous) elements, which can be complemented by related active elements in the same cell, are often overlooked in genome annotations but also contribute to the evolution of genome organisation.  相似文献   

9.
Detecting uber-operons in prokaryotic genomes   总被引:3,自引:1,他引:3       下载免费PDF全文
Che D  Li G  Mao F  Wu H  Xu Y 《Nucleic acids research》2006,34(8):2418-2427
  相似文献   

10.
Prokaryotic genomics is shifting towards comparative approaches to unravel how and why genomes change over time. Both phylogenetic and population genetics approaches are required to dissect the relative roles of selection and drift under these conditions. Lineages evolve adaptively by selection of changes in extant genomes and the way this occurs is being explored from a systemic and evolutionary perspective to understand how mutations relate with gene repertoire changes and how both are contextualized in cellular networks. Through an increased appreciation of genome dynamics in given ecological contexts, a more detailed picture of the genetic basis of prokaryotic evolution is emerging.  相似文献   

11.
Connected gene neighborhoods in prokaryotic genomes   总被引:11,自引:1,他引:11  
A computational method was developed for delineating connected gene neighborhoods in bacterial and archaeal genomes. These gene neighborhoods are not typically present, in their entirety, in any single genome, but are held together by overlapping, partially conserved gene arrays. The procedure was applied to comparing the orders of orthologous genes, which were extracted from the database of Clusters of Orthologous Groups of proteins (COGs), in 31 prokaryotic genomes and resulted in the identification of 188 clusters of gene arrays, which included 1001 of 2890 COGs. These clusters were projected onto actual genomes to produce extended neighborhoods including additional genes, which are adjacent to the genes from the clusters and are transcribed in the same direction, which resulted in a total of 2387 COGs being included in the neighborhoods. Most of the neighborhoods consist predominantly of genes united by a coherent functional theme, but also include a minority of genes without an obvious functional connection to the main theme. We hypothesize that although some of the latter genes might have unsuspected roles, others are maintained within gene arrays because of the advantage of expression at a level that is typical of the given neighborhood. We designate this phenomenon ‘genomic hitchhiking’. The largest neighborhood includes 79 genes (COGs) and consists of overlapping, rearranged ribosomal protein superoperons; apparent genome hitchhiking is particularly typical of this neighborhood and other neighborhoods that consist of genes coding for translation machinery components. Several neighborhoods involve previously undetected connections between genes, allowing new functional predictions. Gene neighborhoods appear to evolve via complex rearrangement, with different combinations of genes from a neighborhood fixed in different lineages.  相似文献   

12.
Prokaryotic restriction-modification (R-M) systems defend the host cell from the invasion of a foreign DNA. They comprise two enzymatic activities: specific DNA cleavage activity and DNA methylation activity preventing cleavage. Typically, these activities are provided by two separate enzymes: a DNA methyltransferase (MTase) and a restriction endonuclease (RE). In the absence of a corresponding MTase, an RE of Type II R-M system is highly toxic for the cell. Genes of the R-M system are linked in the genome in the vast majority of annotated cases. There are only a few reported cases in which the genes of MTase and RE from one R-M system are not linked. Nevertheless, a few hundreds solitary RE genes are present in the Restriction Enzyme Database (http://rebase.neb.com) annotations. Using the comparative genomic approach, we analysed 272 solitary RE genes. For 57 solitary RE genes we predicted corresponding MTase genes located distantly in a genome. Of the 272 solitary RE genes, 99 are likely to be fragments of RE genes. Various explanations for the existence of the remaining 116 solitary RE genes are also discussed.  相似文献   

13.
In the pelagic environment, iron is a scarce but essential micronutrient. The iron acquisition capabilities of selected marine bacteria have been investigated, but the recent proliferation of marine prokaryotic genomes and metagenomes offers a more comprehensive picture of microbial iron uptake pathways in the ocean. Searching these data sets, we were able to identify uptake mechanisms for Fe(3+), Fe(2+) and iron chelates (e.g. siderophore and haem iron complexes). Transport of iron chelates is accomplished by TonB-dependent transporters (TBDTs). After clustering the TBDTs from marine prokaryotic genomes, we identified TBDT clusters for the transport of hydroxamate and catecholate siderophore iron complexes and haem using gene neighbourhood analysis and co-clustering of TBDTs of known function. The genomes also contained two classes of siderophore biosynthesis genes: NRPS (non-ribosomal peptide synthase) genes and NIS (NRPS Independent Siderophore) genes. The most common iron transporters, in both the genomes and metagenomes, were Fe(3+) ABC transporters. Iron uptake-related TBDTs and siderophore biosynthesis genes were less common in pelagic marine metagenomes relative to the genomic data set, in part because Pelagibacter ubique and Prochlorococcus species, which almost entirely lacked these Fe uptake systems, dominate the metagenomes. Our results are largely consistent with current knowledge of iron speciation in the ocean, but suggest that in certain niches the ability to acquire siderophores and/or haem iron chelates is beneficial.  相似文献   

14.

Background  

Accurate annotation of translation initiation sites (TISs) is essential for understanding the translation initiation mechanism. However, the reliability of TIS annotation in widely used databases such as RefSeq is uncertain due to the lack of experimental benchmarks.  相似文献   

15.
Minimal absent words have been computed in genomes of organisms from all domains of life. Here, we explore different sets of minimal absent words in the genomes of 22 organisms (one archaeota, thirteen bacteria and eight eukaryotes). We investigate if the mutational biases that may explain the deficit of the shortest absent words in vertebrates are also pervasive in other absent words, namely in minimal absent words, as well as to other organisms. We find that the compositional biases observed for the shortest absent words in vertebrates are not uniform throughout different sets of minimal absent words. We further investigate the hypothesis of the inheritance of minimal absent words through common ancestry from the similarity in dinucleotide relative abundances of different sets of minimal absent words, and find that this inheritance may be exclusive to vertebrates.  相似文献   

16.
The availability of hundreds of complete bacterial genomes has created new challenges and simultaneously opportunities for bioinformatics. In the area of statistical analysis of genomic sequences, the studies of nucleotide compositional bias and gene bias between strands and replichores paved way to the development of tools for prediction of bacterial replication origins. Only a few (about 20) origin regions for eubacteria and archaea have been proven experimentally. One reason for that may be that this is now considered as an essentially bioinformatics problem, where predictions are sufficiently reliable not to run labor-intensive experiments, unless specifically needed. Here we describe the main existing approaches to the identification of replication origin (oriC) and termination (terC) loci in prokaryotic chromosomes and characterize a number of computational tools based on various skew types and other types of evidence. We also classify the eubacterial and archaeal chromosomes by predictability of their replication origins using skew plots. Finally, we discuss possible combined approaches to the identification of the oriC sites that may be used to improve the prediction tools, in particular, the analysis of DnaA binding sites using the comparative genomic methods.  相似文献   

17.
MOTIVATION: Arrays allow measurements of the expression levels of thousands of mRNAs to be made simultaneously. The resulting data sets are information rich but require extensive mining to enhance their usefulness. Information theoretic methods are capable of assessing similarities and dissimilarities between data distributions and may be suited to the analysis of gene expression experiments. The purpose of this study was to investigate information theoretic data mining approaches to discover temporal patterns of gene expression from array-derived gene expression data. RESULTS: The Kullback-Leibler divergence, an information-theoretic distance that measures the relative dissimilarity between two data distribution profiles, was used in conjunction with an unsupervised self-organizing map algorithm. Two published, array-derived gene expression data sets were analyzed. The patterns obtained with the KL clustering method were found to be superior to those obtained with the hierarchical clustering algorithm using the Pearson correlation distance measure. The biological significance of the results was also examined. AVAILABILITY: Software code is available by request from the authors. All programs were written in ANSI C and Matlab (Mathworks Inc., Natick, MA).  相似文献   

18.

Background  

The introduction of next generation sequencing approaches has caused a rapid increase in the number of completely sequenced genomes. As one result of this development, it is now feasible to analyze large groups of related genomes in a comparative approach. A main task in comparative genomics is the identification of orthologous genes in different genomes and the classification of genes as core genes or singletons.  相似文献   

19.
In this article, we propose a method for analyzing the spatial variations in the range expansion of the pine processionary moth (PPM), an invasive species in France. Based on binary measurements - the presence or absence of PPM nests - the proposed method allows us to infer the local effect of the environment on PPM population expansion. This effect is estimated at each position x using a parameter F(x) that corresponds to the local PPM fitness. The data type and the two stage PPM life cycle make estimating this parameter difficult. To overcome these difficulties we adopt a mechanistic-statistical approach that combines a statistical model for the observation process with a hierarchical,reaction-diffusion based mechanistic model for the expansion process. Bayesian inference of the parameter F(x) reveals that PPM fitness is spatially heterogeneous and highlights the existence of large regions associated with lower fitness. The factors underlying this lower fitness are yet to be determined.  相似文献   

20.
Insertion sequences (ISs) are the smallest and most frequent transposable elements in prokaryotes where they play an important evolutionary role by promoting gene inactivation and genome plasticity. Their genomic abundance varies by several orders of magnitude for reasons largely unknown and widely speculated. The current availability of hundreds of genomes renders testable many of these hypotheses, notably that IS abundance correlates positively with the frequency of horizontal gene transfer (HGT), genome size, pathogenicity, nonobligatory ecological associations, and human association. We thus reannotated ISs in 262 prokaryotic genomes and tested these hypotheses showing that when using appropriate controls, there is no empirical basis for IS family specificity, pathogenicity, or human association to influence IS abundance or density. HGT seems necessary for the presence of ISs, but cannot alone explain the absence of ISs in more than 20% of the organisms, some of which showing high rates of HGT. Gene transfer is also not a significant determinant of the abundance of IS elements in genomes, suggesting that IS abundance is controlled at the level of transposition and ensuing natural selection and not at the level of infection. Prokaryotes engaging in obligatory associations have fewer ISs when controlled for genome size, but this may be caused by some being sexually isolated. Surprisingly, genome size is the only significant predictor of IS numbers and density. Alone, it explains over 40% of the variance of IS abundance. Because we find that genome size and IS abundance correlate negatively with minimal doubling times, we conclude that selection for rapid replication cannot account for the few ISs found in small genomes. Instead, we show evidence that IS numbers are controlled by the frequency of highly deleterious insertion targets. Indeed, IS abundance increases quickly with genome size, which is the exact inverse trend found for the density of genes under strong selection such as essential genes. Hence, for ISs, the bigger the genome the better.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号