共查询到20条相似文献,搜索用时 9 毫秒
1.
Mridul K Kalita Gowthaman Ramasamy Sekhar Duraisamy Virander S Chauhan Dinesh Gupta 《BMC bioinformatics》2006,7(1):336
Background
Genome wide and cross species comparisons of amino acid repeats is an intriguing problem in biology mainly due to the highly polymorphic nature and diverse functions of amino acid repeats. Innate protein repeats constitute vital functional and structural regions in proteins. Repeats are of great consequence in evolution of proteins, as evident from analysis of repeats in different organisms. In the post genomic era, availability of protein sequences encoded in different genomes provides a unique opportunity to perform large scale comparative studies of amino acid repeats. ProtRepeatsDB is a relational database of perfect and mismatch repeats, access to which is designed as a resource and collection of tools for detection and cross species comparisons of different types of amino acid repeats. 相似文献2.
Minimal absent words have been computed in genomes of organisms from all domains of life. Here, we explore different sets of minimal absent words in the genomes of 22 organisms (one archaeota, thirteen bacteria and eight eukaryotes). We investigate if the mutational biases that may explain the deficit of the shortest absent words in vertebrates are also pervasive in other absent words, namely in minimal absent words, as well as to other organisms. We find that the compositional biases observed for the shortest absent words in vertebrates are not uniform throughout different sets of minimal absent words. We further investigate the hypothesis of the inheritance of minimal absent words through common ancestry from the similarity in dinucleotide relative abundances of different sets of minimal absent words, and find that this inheritance may be exclusive to vertebrates. 相似文献
3.
Genomes contain various types of repetitive sequences. They may be used as probes for seeking genome rearrangements because they are rather free from the natural selection if they are located in the intergenic regions. In this study, we searched for tandem repeats (TRs) in 44 prokaryotic genomes by the color-coding method and sought the signs of genome rearrangements by detailed analysis of the detected TRs. We found 13,542 tandem repeats from 44 prokaryotic genomes in total ranging from several tens to one thousand per genome. The results of statistical analysis show that TRs tend to exist on high base composition bias regions in some genomes. Moreover, we recognized the characteristic distribution patterns of equivalent TR-pairs in 12 genomes, which are expected to indicate the occurrence of whole-genome duplication (WGD) on the genomes. It is demonstrated that TRs could indeed be used for seeking genome rearrangements. Although it has not been made clear at this time whether or not WGD had occurred in prokaryotic genomes, the results of the analyses of equivalent TR-pairs in this study are thought to be evidences of WGD in these genomes. 相似文献
4.
Todd J. Treangen Anne-Laure Abraham Marie Touchon & Eduardo P.C. Rocha 《FEMS microbiology reviews》2009,33(3):539-571
DNA repeats are causes and consequences of genome plasticity. Repeats are created by intrachromosomal recombination or horizontal transfer. They are targeted by recombination processes leading to amplifications, deletions and rearrangements of genetic material. The identification and analysis of repeats in nearly 700 genomes of bacteria and archaea is facilitated by the existence of sequence data and adequate bioinformatic tools. These have revealed the immense diversity of repeats in genomes, from those created by selfish elements to the ones used for protection against selfish elements, from those arising from transient gene amplifications to the ones leading to stable duplications. Experimental works have shown that some repeats do not carry any adaptive value, while others allow functional diversification and increased expression. All repeats carry some potential to disorganize and destabilize genomes. Because recombination and selection for repeats vary between genomes, the number and types of repeats are also quite diverse and in line with ecological variables, such as host-dependent associations or population sizes, and with genetic variables, such as the recombination machinery. From an evolutionary point of view, repeats represent both opportunities and problems. We describe how repeats are created and how they can be found in genomes. We then focus on the functional and genomic consequences of repeats that dictate their fate. 相似文献
5.
Background
A number of completely sequenced eukaryotic genome data are available in the public domain. Eukaryotic genes are either 'intron containing' or 'intronless'. Eukaryotic 'intronless' genes are interesting datasets for comparative genomics and evolutionary studies. The SEGE database containing a collection of eukaryotic single exon genes is available. However, SEGE is derived using GenBank. The redundant, incomplete and heterogeneous qualities of GenBank data are a bottleneck for biological investigation in comparative genomics and evolutionary studies. Such studies often require representative gene sets from each genome and this is possible only by deriving specific datasets from completely sequenced genome data. Thus Genome SEGE, a database for 'intronless' genes in completely sequenced eukaryotic genomes, has been constructed. 相似文献6.
The Horizontal Gene Transfer DataBase (HGT-DB) is a genomic database that includes statistical parameters such as G+C content, codon and amino-acid usage, as well as information about which genes deviate in these parameters for prokaryotic complete genomes. Under the hypothesis that genes from distantly related species have different nucleotide compositions, these deviated genes may have been acquired by horizontal gene transfer. The current version of the database contains 88 bacterial and archaeal complete genomes, including multiple chromosomes and strains. For each genome, the database provides statistical parameters for all the genes, as well as averages and standard deviations of G+C content, codon usage, relative synonymous codon usage and amino-acid content. It also provides information about correspondence analyses of the codon usage, plus lists of extraneous group of genes in terms of G+C content and lists of putatively acquired genes. With this information, researchers can explore the G+C content and codon usage of a gene when they find incongruities in sequence-based phylogenetic trees. A search engine that allows searches for gene names or keywords for a specific organism is also available. HGT-DB is freely accessible at http://www.fut.es/~debb/HGT. 相似文献
7.
We used a power spectrum method to identify periodic patterns in nucleotide sequence, and characterized nucleotide sequences that confer periodicities to prokaryotic and eukaryotic genomes and genomes. A 10-bp periodicity was prevalent in hyperthermophilic bacteria and archaebacteria, and an 11-bp periodicity was prevalent in eubacteria. The 10-bp periodicity was also prevalent in the eukaryotes such as the worm Caenorhabditis elegans. Additionally, in the worm genome, a 68-bp periodicity in chromosome I, a 59-bp periodicity in chromosome II, and a 94-bp periodicity in chromosome III were found. In human chromosomes 21 and 22, approximately 167- or 84-bp periodicity was detected along the entire length of these chromosomes. Because the 167-bp is identical to the length of DNA that forms two complete helical turns in nucleosome organization, we speculated that the respective sequences may correspond to arrays of a special compact form of nucleosomes clustered in specific regions of the human chromosomes. This periodic element contained a high frequency of TGG. TGG-rich sequences are known to form a specific subset of folded DNA structures, and therefore, the sequences might have potential to form specific higher order structures related to the clustered occurrence of a specific form of the speculated nucleosomes. 相似文献
8.
Inverted repeats are unstable motifs in a genome, having a causal relation to fragment rearrangements and recombination events. We have investigated long inverted repeats (LIR) of > 30 bp in length in eukaryotic genomes to assess their contribution to genome stability. An algorithm was first designed for searching for LIRs with < 2 kb internal spacers and >85% identity (degree of homology between repeat copies of a LIR). There are much fewer LIRs in yeast, fruitfly, pufferfish and chicken than in Caenorhabditis elegans, zebrafish, frog and human. However, the high LIR frequencies do not necessarily imply high genome instability because of variant internal spacers and stem lengths and identities. From the collection of identified LIRs, we selected recombinogenic LIRs that had a short internal spacer and a high copy identity and were prone to induce high instability. We found that a relatively high proportion (5-9.8%) of the LIRs in C. elegans, zebrafish and frog were recombinogenic LIRs. In contrast, the proportions in human and mouse LIRs were quite low (0.4-1.1%) basically accounting for long internal spacers. We suggest that C. elegans, zebrafish and frog genomes are unstable in terms of the LIR frequency and the proportion of recombinogenic LIRs. For the other genomes, LIRs most likely have a minor impact. 相似文献
9.
Statistical evidence for the correlation of DNA deletions in prokaryotic genomes with direct repeats
In the present work a computer analysis of deletion localization in the sequence of the E. coli lacI gene has been carried out by the statistical weight method. Reliable statistical correlation of the deletions location sites with the arrangement of the most perfect direct repeats revealing the shortest distance between repeated fragments has been shown. At the same time statistical analysis did not reveal reliable connection of deletions localization regions with the expected sites of gyrase recognition, sites and other recombination sites. A conclusion has been drawn, that the mechanism of deletions emergence on the basis of repeats appears to be predominant. 相似文献
10.
To study the roles of translational accuracy, translational efficiency, and the Hill-Robertson effect in codon usage bias, we studied the intragenic spatial distribution of synonymous codon usage bias in four prokaryotic (Escherichia coli, Bacillus subtilis, Sulfolobus tokodaii, and Thermotoga maritima) and two eukaryotic (Saccharomyces cerevisiae and Drosophila melanogaster) genomes. We generated supersequences at each codon position across genes in a genome and computed the overall bias at each codon position. By quantitatively evaluating the trend of spatial patterns using isotonic regression, we show that in yeast and prokaryotic genomes, codon usage bias increases along translational direction, which is consistent with purifying selection against nonsense errors. Fruit fly genes show a nearly symmetric M-shaped spatial pattern of codon usage bias, with less bias in the middle and both ends. The low codon usage bias in the middle region is best explained by interference (the Hill-Robertson effect) between selections at different codon positions. In both yeast and fruit fly, spatial patterns of codon usage bias are characteristically different from patterns of GC-content variations. Effect of expression level on the strength of codon usage bias is more conspicuous than its effect on the shape of the spatial distribution. 相似文献
11.
RepeatAround: a software tool for finding and visualizing repeats in circular genomes and its application to a human mtDNA database 总被引:2,自引:0,他引:2
RepeatAround is a Windows based software tool designed to find "direct repeats", "inverted repeats", "mirror repeats" and "complementary repeats", from 3 to 64bp length, in circular genomes. It processes input files directly extracted from GenBank database, providing visualisation of the repeats location in the genomic structure, so that for instance, in most mtDNAs the user can check if the repeats are located in coding or non-coding region (and in the first case in which gene), and how far apart the repeat pair(s) are. Besides the visual tool, it provides other outputs in a spreadsheet containing information on the number and location of the repeats, facilitating graphic analyses. Several genomes can be inputed simultaneously, for phylogenetic comparison purposes. Other capabilities of the software are the generation of random circular genomes, for statistical evaluation of comparison between observed repeats distributions with their shuffled counterparts, as well as the search for specific motifs, allowing an easy confirmation of repeats flanking a newly detected rearrangement. As an example of the programme's applications we analysed the Direct Repeats distribution in a large human mtDNA database. Results showed that Direct Repeats, even the larger ones, are evenly distributed among the human mtDNA haplogroups, enabling us to state that, based only on the repetitive motifs, no haplogroup is particularly more or less prone to mtDNA macrodeletions. 相似文献
12.
MIPS: a database for genomes and protein sequences 总被引:15,自引:0,他引:15
H. W. Mewes D. Frishman U. Güldener G. Mannhaupt K. Mayer M. Mokrejs B. Morgenstern M. Münsterktter S. Rudd B. Weil 《Nucleic acids research》2002,30(1):31-34
13.
Mewes HW Frishman D Gruber C Geier B Haase D Kaps A Lemcke K Mannhaupt G Pfeiffer F Schüller C Stocker S Weil B 《Nucleic acids research》2000,28(1):37-40
The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried, near Munich, Germany, continues its longstanding tradition to develop and maintain high quality curated genome databases. In addition, efforts have been intensified to cover the wealth of complete genome sequences in a systematic, comprehensive form. Bioinformatics, supporting national as well as European sequencing and functional analysis projects, has resulted in several up-to-date genome-oriented databases. This report describes growing databases reflecting the progress of sequencing the Arabidopsis thaliana (MATDB) and Neurospora crassa genomes (MNCDB), the yeast genome database (MYGD) extended by functional analysis data, the database of annotated human EST-clusters (HIB) and the database of the complete cDNA sequences from the DHGP (German Human Genome Project). It also contains information on the up-to-date database of complete genomes (PEDANT), the classification of protein sequences (ProtFam) and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database. These databases can be accessed through the MIPS WWW server (http://www. mips.biochem.mpg.de). 相似文献
14.
We describe FrameD, a program that predicts coding regions in prokaryotic and matured eukaryotic sequences. Initially targeted at gene prediction in bacterial GC rich genomes, the gene model used in FrameD also allows to predict genes in the presence of frameshifts and partially undetermined sequences which makes it also very suitable for gene prediction and frameshift correction in unfinished sequences such as EST and EST cluster sequences. Like recent eukaryotic gene prediction programs, FrameD also includes the ability to take into account protein similarity information both in its prediction and its graphical output. Its performances are evaluated on different bacterial genomes. The web site (http://genopole.toulouse.inra.fr/bioinfo/FrameD/FD) allows direct prediction, sequence correction and translation and the ability to learn new models for new organisms. 相似文献
15.
MOTIVATION: Tandemly organized repetitive sequences (satellite DNA) are widespread in complex eukaryotic genomes. In plants, satellite repeats often represent a substantial part of nuclear DNA but only a little is known about the molecular mechanisms of their amplification and their possible role(s) in genome evolution and function. Unfortunately, addressing these questions via characterization of general sequence properties of known satellite repeats has been hindered by a difficulty in obtaining a complete and unbiased set of sequence data for this analysis. This is mainly due to the presence of multiple entries of homologous sequences and of single entries that contain more than one repeated unit (monomer) in the public databases. RESULTS: We have established a computer database specialized for plant satellite repeats (PlantSat) that integrates sequence data available from various resources with supplementary information including repeat consensus sequences, abundances, and chromosomal localizations. The sequences are stored as individual repeat monomers grouped into families, which simplifies their computer analysis and makes it more accurate. Using this feature, we have performed a basic sequence analysis of the whole set of plant satellite repeats with respect to their monomer length and nucleotide composition. The analysis revealed several preferred length ranges of the monomers (approximately 165 bp and its multiples) and an over-representation of the AA/TT dinucleotide in the repeats. We have also detected an enrichment of satellite DNA sequences for the motif CAAAA that is supposed to be involved in breakage-reunion of repeated sequences. 相似文献
16.
Lima-Mendez G Van Helden J Toussaint A Leplae R 《Bioinformatics (Oxford, England)》2008,24(6):863-865
Prophinder is a prophage prediction tool coupled with a prediction database, a web server and web service. Predicted prophages will help to fill the gaps in the current sparse phage sequence space, which should cover an estimated 100 million species. Systematic and reliable predictions will enable further studies of prophages contribution to the bacteriophage gene pool and to better understand gene shuffling between prophages and phages infecting the same host. AVAILABILITY: Softare is available at http://aclame.ulb.ac.be/prophinder 相似文献
17.
Philippe Le Flèche Yolande Hauck Lucie Onteniente Agnès Prieur France Denoeud Vincent Ramisse Patricia Sylvestre Gary Benson Françoise Ramisse Gilles Vergnaud 《BMC microbiology》2001,1(1):2-14
Background
Some pathogenic bacteria are genetically very homogeneous, making strain discrimination difficult. In the last few years, tandem repeats have been increasingly recognized as markers of choice for genotyping a number of pathogens. The rapid evolution of these structures appears to contribute to the phenotypic flexibility of pathogens. The availability of whole-genome sequences has opened the way to the systematic evaluation of tandem repeats diversity and application to epidemiological studies. 相似文献18.
Masashi Fujita Hisaaki Mihara Susumu Goto Nobuyoshi Esaki Minoru Kanehisa 《BMC bioinformatics》2007,8(1):225
Background
Selenocysteine and pyrrolysine are the 21st and 22nd amino acids, which are genetically encoded by stop codons. Since a number of microbial genomes have been completely sequenced to date, it is tempting to ask whether the 23rd amino acid is left undiscovered in these genomes. Recently, a computational study addressed this question and reported that no tRNA gene for unknown amino acid was found in genome sequences available. However, performance of the tRNA prediction program on an unknown tRNA family, which may have atypical sequence and structure, is unclear, thereby rendering their result inconclusive. A protein-level study will provide independent insight into the novel amino acid. 相似文献19.
Different mechanistic requirements for prokaryotic and eukaryotic chaperonins: a lattice study 总被引:1,自引:0,他引:1
MOTIVATION: The folding of many proteins in vivo and in vitro is assisted by molecular chaperones. A well-characterized molecular chaperone system is the chaperonin GroEL/GroES from Escherichia coli which has a homolog found in the eukaryotic cytosol called CCT. All chaperonins have a ring structure with a cavity in which the substrate protein folds. An interesting difference between prokaryotic and eukaryotic chaperonins is in the nature of the ATP-mediated conformational changes that their ring structures undergo during their reaction cycle. Prokaryotic chaperonins are known to exhibit a highly cooperative concerted change of their cavity surface while in eukaryotic chaperonins the change is sequential. Approximately 70% of proteins in eukaryotic cells are multi-domain whereas in prokaryotes single-domain proteins are more common. Thus, it was suggested that the different modes of action of prokaryotic and eukaryotic chaperonins can be explained by the need of eukaryotic chaperonins to facilitate folding of multi-domain proteins. RESULTS: Using a 2D square lattice model, we generated two large populations of single-domain and double-domain substrate proteins. Chaperonins were modeled as static structures with a cavity wall with which the substrate protein interacts. We simulated both concerted and sequential changes of the cavity surfaces and demonstrated that folding of single-domain proteins benefits from concerted but not sequential changes whereas double-domain proteins benefit also from sequential changes. Thus, our results support the suggestion that the different modes of allosteric switching of prokaryotic and eukaryotic chaperonin rings have functional implications as it enables eukaryotic chaperonins to better assist multi-domain protein folding. 相似文献
20.
The MIPS group [Munich Information Center for Protein Sequences of the German National Center for Environment and Health (GSF)] at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, is involved in a number of data collection activities, including a comprehensive database of the yeast genome, a database reflecting the progress in sequencing the Arabidopsis thaliana genome, the systematic analysis of other small genomes and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). Through its WWW server (http://www.mips.biochem.mpg.de ) MIPS provides access to a variety of generic databases, including a database of protein families as well as automatically generated data by the systematic application of sequence analysis algorithms. The yeast genome sequence and its related information was also compiled on CD-ROM to provide dynamic interactive access to the 16 chromosomes of the first eukaryotic genome unraveled. 相似文献