共查询到20条相似文献,搜索用时 15 毫秒
1.
Simple sequence repeats (SSRs) are ubiquitous short tandem repeats, which are associated with various regulatory mechanisms and have been found in viral genomes. Herein, we develop MfSAT (Multi-functional SSRs Analytical Tool), a new powerful tool which can fast identify SSRs in multiple short viral genomes and then automatically calculate the numbers and proportions of various SSR types (mono-, di-, tri-, tetra-, penta- and hexanucleotide repeats). Furthermore, it also can detect codon repeats and report the corresponding amino acid. 相似文献
2.
Mrázek J 《Molecular biology and evolution》2006,23(7):1370-1385
Simple sequence repeats (SSRs) composed of extensive tandem iterations of a single nucleotide or a short oligonucleotide are rare in most bacterial genomes, but they are common among Mycoplasma. Some of these repeats act as contingency loci in association with families of surface antigens. By contraction or expansion during replication, these SSRs increase genetic variance of the population and facilitate avoidance of the immune response of the host. Occurrence and distribution of SSRs are analyzed in complete genomes of 11 Mycoplasma and 3 related Mollicutes in order to gain insights into functional and evolutionary diversity of the SSRs in Mycoplasma. The results revealed an unexpected variety of SSRs with respect to their distribution and composition and suggest that it is unlikely that all SSRs function as contingency loci or recombination hot spots. Various types of SSRs are most abundant in Mycoplasma hyopneumoniae, whereas Mycoplasma penetrans, Mycoplasma mobile, and Mycoplasma synoviae do not contain unusually long SSRs. Mycoplasma hyopneumoniae and Mycoplasma pulmonis feature abundant short adenine and thymine runs periodically spaced at 11 and 12 bp, respectively, which likely affect the supercoiling propensities of the DNA molecule. Physiological roles of long adenine and thymine runs in M. hyopneumoniae appear independent of location upstream or downstream of genes, unlike contingency loci that are typically located in protein-coding regions or upstream regulatory regions. Comparisons among 3 M. hyopneumoniae strains suggest that the adenine and thymine runs are rarely involved in genome rearrangements. The results indicate that the SSRs in the Mycoplasma genomes play diverse roles, including modulating gene expression as contingency loci, facilitating genome rearrangements via recombination, affecting protein structure and possibly protein-protein interactions, and contributing to the organization of the DNA molecule in the cell. 相似文献
3.
Survey of simple sequence repeats in completed fungal genomes 总被引:7,自引:0,他引:7
The use of simple sequence repeats or microsatellites as genetic markers has become very popular because of their abundance and length variation between different individuals. SSRs are tandem repeat units of 1 to 6 base pairs that are found abundantly in many prokaryotic and eukaryotic genomes. This is the first study examining and comparing SSRs in completely sequenced fungal genomes. We analyzed and compared the occurrences, relative abundance, relative density, most common, and longest SSRs in nine taxonomically different fungal species: Aspergillus nidulans, Cryptococcus neoformans, Encephalitozoon cuniculi, Fusarium graminearum, Magnaporthe grisea, Neurospora crassa, Saccharomyces cerevisiae, Schizosaccharomyces pombe, and Ustilago maydis. Our analysis revealed that, in all of the genomes studied, the occurrence, abundance, and relative density of SSRs varied and was not influenced by the genome sizes. No correlation between relative abundance and the genome sizes was observed, but it was shown that N. crassa, the largest genome analyzed had the highest relative abundance of SSRs. In most genomes, mononucleotide, dinucleotide, and trinucleotide repeats were more abundant than the longer repeated SSRs. Generally, in each organism, the occurrence, relative abundance, and relative density of SSRs decreased as the repeat unit increased. Furthermore, each organism had its own common and longest SSRs. Our analysis showed that the relative abundance of SSRs in fungi is low compared with the human genome and that longer SSRs in fungi are rare. In addition to providing new information concerning the abundance of SSRs for each of these fungi, the results provide a general source of molecular markers that could be useful for a variety of applications such as population genetics and strain identification of fungal organisms. 相似文献
4.
Sahil Mahfooz Pallavi Singh Deepak K Maurya Mahesh C Yadav Azram Tahoor Harmesh Sahay Arpita Srivastava Anil Prakash 《Bioinformation》2012,8(23):1171-1175
The frequency and distribution of microsatellites were analyzed in the 19 mitogenomes of phytopathogenic fungi covering five
phyla. Our analysis revealed that in all the mitogenomes studied, the frequency and relative abundance varied, and it was neither
influenced by genome size nor by GC content. SSRs were found to be differential distributed in genic and intergenic regions. An
average of 5.14 (23.6%) SSRs were present in genic sequences and 21.7 (76.4%) SSRs were located in the intergenic sequences.
Relative abundance of SSRs in mitogenomes was the highest in Aspergillus tubigensis, whereas, it was the least in Phaeosphaeria
nodurum, the average being 0.45. Trinucleotide repeats were the most abundant motifs in the genic and intergenic regions of the
mitogenomes of the phytopathogenic fungi. Among the genes, cox1 harbors the maximum SSRs, whereas cox3 and nad 7 contain the
least. Based on the presence of SSRs in a particular gene, genetic relationships among individual organisms were also established. 相似文献
5.
We studied the occurrence of mammalian interspersed repeats (MIRs) in DNA and RNA of vertebrates, invertebrates, and bacteria using the data from GenBank. A special algorithm based on a weight position matrix with optimal alignment using dynamic programming was developed to search for the traces of MIR dissemination. This allowed us to search for highly divergent MIRs carrying deletions and insertions. MIRs were detected in genomes of various fishes, includingLatimeria. This suggests that the origin of MIRs dates back more than 400 million years. The method to search for similarity between highly divergent sequences may be used to find the genome fragments from various ancient repeat families and from various gene families. 相似文献
6.
Complete archaeal genomes were probed for the presence of long (> or = 25 bp) oligonucleotide repeats (words). We detected the presence of many words distributed in tandem with narrow ranges of periodicity (i.e., spacer length between repeats). Similar words were not identified in genomes of non-archaeal species, namely Escherichia coli, Bacillus subtilis, Haemophilus influenzae, Mycoplasma genitalium and Mycoplasma pneumoniae. BLAST similarity searches against the GenBank nucleotide sequence database revealed that these words were archaeal species-specific, indicating that they are of a signature character. Sequence analysis and genome viewing tools showed these repeats to be restricted to non-coding regions. Thus, archaea appear to possess a non-coding genomic signature that is absent in bacterial species. The identification of a species-specific genomic signature would be of great value to archaeal genome mapping, evolutionary studies and analyses of genome complexity. 相似文献
7.
The contribution of short repeats of low sequence complexity to large conifer genomes 总被引:6,自引:0,他引:6
A. Schmidt R. L. Doudrick J. S. Heslop-Harrison T. Schmidt 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2000,101(1-2):7-14
The abundance and genomic organization of six simple sequence repeats, consisting of di-, tri-, and tetranucleotide sequence motifs, and a minisatellite repeat have been analyzed in different gymnosperms by Southern hybridization. Within the gymnosperm genomes investigated, the abundance and genomic organization of micro- and minisatellite repeats largely follows taxonomic groupings. We found that only particular simple sequence repeat motifs are amplified in gymnosperm genomes, while others such as (CAC)5 and (GACA)4 are present in only low copy numbers. The variation in abundance of simple sequence motifs reflects a similar situation to that found in angiosperms. Species of the two- and three-needle pine section Pinus are relatively conserved and can be distinguished from Pinus strobus which belongs to the five-needle pine section Strobus. The hybridization pattern of Picea species, bald cypress and gingko were different from the patterns detected in the Pinus species. Furthermore, sequences with homology to the plant telomeric repeat (TTTAGGG)n have been analyzed in the same set of gymnosperms. Telomere-like repeats are highly amplified within two- and three- needle pine genomes, such as slash pine (Pinus elliottii Engelm. var. elliottii), compared to P. strobus, Picea species, bald cypress and gingko. P. elliottii var. elliottii was used as a representative species to investigate the chromosomal organization of telomere-like sequences by fluorescence in situ hybridization (FISH). The telomere-like sequences are not restricted to the ends of chromosomes; they form large intercalary and pericentric blocks showing that they are a repeated component of the slash pine genome.Conifers have genomes larger than 20000 Mbp, and our results clearly demonstrate that repeats of low sequence complexity, such to (CA)8, (GA)8, (GGAT)4 and (GATA)4, and minisatellite- and telomere-like sequences represent a large fraction of the repetitive DNA of these species. The striking differences in abundance and genome organization of the various repeat motifs suggest that these repetitive sequences evolved differently in the gymnosperm genomes investigated. Received: 1 October 1999 / Accepted: 3 November 1999 相似文献
8.
Repeated motifs of amino acids within proteins are an abundant feature of eukaryotic sequences and may catalyze the rapid production of genetic and even phenotypic variation among organisms. The completion of the genome sequencing projects of 12 distinct Drosophila species provides a unique dataset to study these intriguing sequence features on a phylogeny with a variety of timescales. We show that there is a higher percentage of proteins containing repeats within the Drosophila genus than most other eukaryotes, including non-Drosphila insects, which makes this collection of species particularly useful for the study of protein repeats. We also find that proteins containing repeats are overrepresented in functional categories involving developmental processes, signaling, and gene regulation. Using the set of 1-to-1 ortholog alignments for the 12 Drosophila species, we test the ability of repeats to act as reliable phylogenetic signals and find that they resolve the generally accepted phylogeny despite the noise caused by their accelerated rate of evolution. We also determine that in general the position of repeats within a protein sequence is non-random, with repeats more often being absent from the middle regions of sequences. Finally we find evidence to suggest that the presence of repeats is associated with an increase in evolutionary rate upon the entire sequence in which they are embedded. With additional evidence to suggest a corresponding elevation in positive selection we propose that some repeats may be inducing compensatory substitutions in their surrounding sequence. 相似文献
9.
Senthilkumar R Sabarinathan R Hameed BS Banerjee N Chidambarathanu N Karthik R Sekar K 《Bioinformation》2010,4(7):271-275
An Internet computing server has been developed to identify all the occurrences of the internal sequence repeats in a protein and DNA sequences. Further, an option is provided for the users to check the occurrence(s) of the resultant sequence repeats in the other sequence and structure (Protein Data Bank) databases. The databases deployed in the proposed computing engine are up-to-date and thus the users will get the latest information available in the respective databases. The server is freely accessible over the World Wide Web (WWW). AVAILABILITY: http://bioserver1.physics.iisc.ernet.in/fair/ 相似文献
10.
The increasing availability of prokaryotic genome sequenceshas shown that simple sequence repeats (SSRs) are widespreadin prokaryotes and that there is extensive variation in theirlength, number and distribution. Considering their potentialimportance in generating genomic diversity, we determined thedistribution of a specific group of SSRs, mononucleotide repeatsof size between 5 and 13 nt, in 157 sequenced prokaryotic genomes.The data obtained in the present study show that (i) a largenumber of mononucleotide SSRs is present in all prokaryoticgenomes investigated, (ii) shorter repeats are much more abundantthan longer repeats, and (iii) in the majority of the genomes,longer mononucleotide SSRs are excluded from coding regionsalthough we identified several organisms where mononucleotideSSRs are not excluded from the coding regions. We also observedthat some genomes contain more mononucleotide SSRs than expected,while others contain significantly less. Bacterial genomes thatcontain much less mononucleotide SSRs than expected are generallylarger and more GC-rich, while bacterial genomes that containmuch more mononucleotide SSRs than expected are in general smallerand more AT-rich. Finally, we also noted that genomes that containa high fraction of horizontally transferred genes have a lowermononucleotide SSR density and that A and T are generally overrepresentedin mononucleotide SSRs. 相似文献
11.
12.
Nirjhar Banerjee Rangarajan Sarani Chellamuthu Vasuki Ranjani Govindaraj Sowmiya Daliah Michael Narayanasamy Balakrishnan Kanagaraj Sekar 《Bioinformation》2008,3(1):28-32
Distant repeats in protein sequence play an important role in various aspects of protein analysis. A keen analysis of the distant repeats would enable to establish a firm relation of the repeats with respect to their function and three-dimensional structure during the evolutionary process. Further, it enlightens the diversity of duplication during the evolution. To this end, an algorithm has been developed to find all distant repeats in a protein sequence. The scores from Point Accepted Mutation (PAM) matrix has been deployed for the identification of amino acid substitutions while detecting the distant repeats. Due to the biological importance of distant repeats, the proposed algorithm will be of importance to structural biologists, molecular biologists, biochemists and researchers involved in phylogenetic and evolutionary studies. 相似文献
13.
Frequency, type, distribution and annotation of simple sequence repeats in Rosaceae ESTs 总被引:13,自引:0,他引:13
Genomic resources for peach, a model species for Rosaceae, are being developed to accelerate gene discovery in other Rosaceae species by comparative mapping. Simple sequence repeats (SSRs) are an important tool for comparative mapping because of their high polymorphism and transportability. To accelerate the development of SSR markers, we analyzed publicly available Rosaceae expressed sequence tags (ESTs) for SSRs. A total of 17,284 ESTs from almond, peach and rose were assembled into putatively non-redundant EST sets. For comparison, 179,099 ESTs from Arabidopsis were also used in the analysis. About 4% of the assembled ESTs contained SSRs in Rosaceae, which was higher than the 2.4% found in Arabidopsis. About half of the SSRs were found in the putative UTR, and the estimated average distance between SSRs in the UTR was 5.5 kb in rose, 5.1 kb in almond, 7 kb in peach and 13 kb in Arabidopsis. In the putative coding region, the estimated average distance was two to four times longer than in the UTR. Rosaceae ESTs containing SSRs were functionally annotated using the GenBank nr database and further classified using the gene ontology terms associated with the matching sequences in the SwissProt database. The detailed data including the sequences and annotation results are available from . 相似文献
14.
15.
Abundant repetitive DNA sequences are an enigmatic part of the human genome. Despite increasing evidence on the functionality of DNA repeats, their biologic role is still elusive and under frequent debate. Macrosatellites are the largest of the tandem DNA repeats, located on one or multiple chromosomes. The contribution of macrosatellites to genome regulation and human health was demonstrated for the D4Z4 macrosatellite repeat array on chromosome 4q35. Reduced copy number of D4Z4 repeats is associated with local euchromatinization and the onset of facioscapulohumeral muscular dystrophy. Although the role other macrosatellite families may play remains rather obscure, their diverse functionalities within the genome are being gradually revealed. In this review, we will outline structural and functional features of coding and noncoding macrosatellite repeats, and highlight recent findings that bring these sequences into the spotlight of genome organization and disease development. 相似文献
16.
The survey of simple sequence repeats (SSRs) has been extensively made in eukaryotes and prokaryotes. However, its still rare in viruses. Thus, we undertook a survey of SSRs in Human Immunodeficiency Virus Type 1 (HIV-1) which is an excellent system to study evolution and roles of SSRs in viruses. Distribution of SSRs was examined in 81 completed HIV-1 genome sequences which come from 34 different countries or districts over 6 continents. In these surveyed sequences, although relative abundance and relative density exhibit very high similarity, some of these sequences show different preference for most common SSRs and longest SSRs. Our results suggest proportion of various repeat types might be related to genome stability. 相似文献
17.
JAMES W. BORRONE J. STEVEN BROWN DAVID N. KUHN JUAN C. MOTAMAYOR RAYMOND J. SCHNELL 《Molecular ecology resources》2007,7(2):236-239
Theobroma cacao L. expressed sequence tags (ESTs) were converted into useful genetic markers for fingerprinting individuals and genetic linkage mapping. Primers were designed to microsatellite‐containing ESTs. Twenty‐two T. cacao accessions, parents of various mapping populations segregating for disease resistance and crop yield characteristics, were tested. Twenty‐seven informative loci were discovered with 26 primer pairs. The number of detected alleles ranged from two to 11 and averaged 4.4 per locus. All 27 markers could be mapped into at least one of the existing F1 or F2 populations segregating for agronomically important traits. 相似文献
18.
Background
Microsatellite loci have high mutation rates and thus are indicative of mutational processes within the genome. By concentrating on the symbiotic and aposymbiotic cnidarians, we investigated if microsatellite abundances follow a phylogenetic or ecological pattern. Individuals from eight species were shotgun sequenced using 454 GS-FLX Titanium technology. Sequences from the three available cnidarian genomes (Nematostella vectensis, Hydra magnipapillata and Acropora digitifera) were added to the analysis for a total of eleven species representing two classes, three subclasses and eight orders within the phylum Cnidaria.Results
Trinucleotide and tetranucleotide repeats were the most abundant motifs, followed by hexa- and dinucleotides. Pentanucleotides were the least abundant motif in the data set. Hierarchical clustering and log likelihood ratio tests revealed a weak relationship between phylogeny and microsatellite content. Further, comparisons between cnidaria harboring intracellular dinoflagellates and those that do not, show microsatellite coverage is higher in the latter group.Conclusions
Our results support previous studies that found tri- and tetranucleotides to be the most abundant motifs in invertebrates. Differences in microsatellite coverage and composition between symbiotic and non-symbiotic cnidaria suggest the presence/absence of dinoflagellates might place restrictions on the host genome.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-939) contains supplementary material, which is available to authorized users. 相似文献19.
An automated algorithm is presented that delineates protein sequence fragments which display similarity. The method incorporates a selection of a number of local nonoverlapping sequence alignments with the highest similarity scores and a graphtheoretical approach to elucidate the consistent start and end points of the fragments comprising one or more ensembles of related subsequences. The procedure allows the simultaneous identification of different types of repeats within one sequence. A multiple alignment of the resulting fragments is performed and a consensus sequence derived from the ensemble(s). Finally, a profile is constructed form the multiple alignment to detect possible and more distant members within the sequence. The method tolerates mutations in the repeats as well as insertions and deletions. The sequence spans between the various repeats or repeat clusters may be of different lengths. The technique has been applied to a number of proteins where the repeating fragments have been derived from information additional to the protein sequences. © 1993 Wiley-Liss, Inc. 相似文献
20.