首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 28 毫秒
1.
MOTIVATION: Tandem repeats are associated with disease genes, play an important role in evolution and are important in genomic organization and function. Although much research has been done on short perfect patterns of repeats, there has been less focus on imperfect repeats. Thus, there is an acute need for a tandem repeats database that provides reliable and up to date information on both perfect and imperfect tandem repeats in the human genome and relates these to disease genes. RESULTS: This paper presents a web-accessible relational tandem repeats database that relates tandem repeats to gene locations and disease genes of the human genome. In contrast to other available databases, this database identifies both perfect and imperfect repeats of 1-2000 bp unit lengths. The utility of this database has been illustrated by analysing these repeats for their distribution and frequencies across chromosomes and genomic locations and between protein-coding and non-coding regions. The applicability of this database to identify diseases associated with previously uncharacterized tandem repeats is demonstrated.  相似文献   

2.
All the protein sequences from SWISS-PROT database were analyzed for occurrence of single amino acid repeats, tandem oligo-peptide repeats, and periodically conserved amino acids. Single amino acid repeats of glutamine, serine, glutamic acid, glycine, and alanine seem to be tolerated to a considerable extent in many proteins. Tandem oligo-peptide repeats of different types with varying levels of conservation were detected in several proteins and found to be conspicuous, particularly in structural and cell surface proteins. It appears that repeated sequence patterns may be a mechanism that provides regular arrays of spatial and functional groups, useful for structural packing or for one to one interactions with target molecules. To facilitate further explorations, a database of Tandem Repeats in Protein Sequences (TRIPS) has been developed and is available at URL: http://www.ncl-india.org/trips.  相似文献   

3.
The bioinformatics analysis of proteins containing tandem repeats requires special computer programs and databases, since the conventional approaches predominantly developed for globular domains have limited success. Here, I survey bioinformatics tools which have been developed recently for identification and proteome-wide analysis of protein repeats. The last few years have also been marked by an emergence of new 3D structures of these proteins. Appraisal of the known structures and their classification uncovers a straightforward relationship between their architecture and the length of the repetitive units. This relationship and the repetitive character of structural folds suggest rules for better prediction of the 3D structures of such proteins. Furthermore, bioinformatics approaches combined with low resolution structural data, from biophysical techniques, especially, the recently emerged cryo-electron microscopy, lead to reliable prediction of the protein repeat structures and their mode of binding with partners within molecular complexes. This hybrid approach can actively be used for structural and functional annotations of proteomes.  相似文献   

4.
The non-coding fraction of the human genome, which is approximately 98%, is mainly constituted by repeats. Transpositions, expansions and deletions of these repeat elements contribute to a number of diseases. None of the available databases consolidates information on both tandem and interspersed repeats with the flexibility of FASTA based homology search with reference to disease genes. Repeats in diseases database (RiDs db) is a web accessible relational database, which aids analysis of repeats associated with Mendelian disorders. It is a repository of disease genes, which can be searched by FASTA program or by limitedor free- text keywords. Unlike other databases, RiDs db contains the sequences of these genes with access to corresponding information on both interspersed and tandem repeats contained within them, on a unified platform. Comparative analysis of novel or patient sequences with the reference sequences in RiDs db using FASTA search will indicate change in structure of repeats, if any, with a particular disorder. This database also provides links to orthologs in model organisms such as zebrafish, mouse and Drosophila. AVAILABILITY: The database is available for free at http://115.111.90.196/ridsdb/index.php.  相似文献   

5.
In recent years, a number of new protein structures that possess tandem repeats have emerged. Many of these proteins are comprised of tandem arrays of β-hairpins. Today, the amount and variety of the data on these β-hairpin repeat (BHR) structures have reached a level that requires detailed analysis and further classification. In this paper, we classified the BHR proteins, compared structures, sequences of repeat motifs, functions and distribution across the major taxonomic kingdoms of life and within organisms. As a result, we identified six different BHR folds in tandem repeat proteins of Class III (elongated structures) and one BHR fold (up-and-down β-barrel) in Class IV (“closed” structures). Our survey reveals the high incidence of the BHR proteins among bacteria and viruses and their possible relationship to the structures of amyloid fibrils. It indicates that BHR folds will be an attractive target for future structural studies, especially in the context of age-related amyloidosis and emerging infectious diseases. This work allowed us to update the RepeatsDB database, which contains annotated tandem repeat protein structures and to construct sequence profiles based on BHR structural alignments.  相似文献   

6.

Background

Polymorphic tandem repeat typing is a new generic technology which has been proved to be very efficient for bacterial pathogens such as B. anthracis, M. tuberculosis, P. aeruginosa, L. pneumophila, Y. pestis. The previously developed tandem repeats database takes advantage of the release of genome sequence data for a growing number of bacteria to facilitate the identification of tandem repeats. The development of an assay then requires the evaluation of tandem repeat polymorphism on well-selected sets of isolates. In the case of major human pathogens, such as S. aureus, more than one strain is being sequenced, so that tandem repeats most likely to be polymorphic can now be selected in silico based on genome sequence comparison.

Results

In addition to the previously described general Tandem Repeats Database, we have developed a tool to automatically identify tandem repeats of a different length in the genome sequence of two (or more) closely related bacterial strains. Genome comparisons are pre-computed. The results of the comparisons are parsed in a database, which can be conveniently queried over the internet according to criteria of practical value, including repeat unit length, predicted size difference, etc. Comparisons are available for 16 bacterial species, and the orthopox viruses, including the variola virus and three of its close neighbors.

Conclusions

We are presenting an internet-based resource to help develop and perform tandem repeats based bacterial strain typing. The tools accessible at http://minisatellites.u-psud.fr now comprise four parts. The Tandem Repeats Database enables the identification of tandem repeats across entire genomes. The Strain Comparison Page identifies tandem repeats differing between different genome sequences from the same species. The "Blast in the Tandem Repeats Database" facilitates the search for a known tandem repeat and the prediction of amplification product sizes. The "Bacterial Genotyping Page" is a service for strain identification at the subspecies level.
  相似文献   

7.
The current pace of the generation of sequence data requires the development of software tools that can rapidly provide full annotation of the data. We have developed a new method for rapid sequence comparison using the exact match algorithm without repeat masking. As a demonstration, we have identified all perfect simple tandem repeats (STR) within the draft sequence of the human genome. The STR elements (chromosome, position, length and repeat subunit) have been placed into a relational database. Repeat flanking sequence is also publicly accessible at http://grid.abcc.ncifcrf.gov. To illustrate the utility of this complete set of STR elements, we documented the increased density of potentially polymorphic markers throughout the genome. The new STR markers may be useful in disease association studies because so many STR elements manifest multiallelic polymorphism. Also, because triplet repeat expansions are important for human disease etiology, we identified trinucleotide repeats that exist within exons of known genes. This resulted in a list that includes all 14 genes known to undergo polynucleotide expansion, and 48 additional candidates. Several of these are non-polyglutamine triplet repeats. Other examinations of the STR database demonstrated repeats spanning splice junctions and identified SNPs within repeat elements.  相似文献   

8.
A novel concept on mechanisms of evolution of genes and genomes is suggested: the sequences evolve largely by local events of triplet expansion and subsequent mutational changes in the repeats. The immediate memory about the earlier expansion events still resides in the sequences, in form of the frequently occurring segments of tandemly repeating codons. Other predicted fossils of the original repeats are: (I) the expanding triplets should be accompanied by their point mutation derivatives and (II) the remaining excess of codons formerly belonging to the tandem repeats should be reflected in overall codon usage biases. Both predictions are confirmed by analysis of largest available database of non-redundant protein coding sequences, of total size ~5?×?10(9) codons. One important conclusion also follows from the results. Life which, presumably, started with replication of expanding triplets and their subsequent mutational changes, is continuing to emerge within the genes and genomes, in form of new events of triplet expansion.  相似文献   

9.
MOTIVATION: Complex genomes contain numerous repeated sequences, and genomic duplication is believed to be a main evolutionary mechanism to obtain new functions. Several tools are available for de novo repeat sequence identification, and many approaches exist for clustering homologous protein sequences. We present an efficient new approach to identify and cluster homologous DNA sequences with high accuracy at the level of whole genomes, excluding low-complexity repeats, tandem repeats and annotated interspersed repeats. We also determine the boundaries of each group member so that it closely represents a biological unit, e.g. a complete gene, or a partial gene coding a protein domain. RESULTS: We developed a program called HomologMiner to identify homologous groups applicable to genome sequences that have been properly marked for low-complexity repeats and annotated interspersed repeats. We applied it to the whole genomes of human (hg17), macaque (rheMac2) and mouse (mm8). Groups obtained include gene families (e.g. olfactory receptor gene family, zinc finger families), unannotated interspersed repeats and additional homologous groups that resulted from recent segmental duplications. Our program incorporates several new methods: a new abstract definition of consistent duplicate units, a new criterion to remove moderately frequent tandem repeats, and new algorithmic techniques. We also provide preliminary analysis of the output on the three genomes mentioned above, and show several applications including identifying boundaries of tandem gene clusters and novel interspersed repeat families. AVAILABILITY: All programs and datasets are downloadable from www.bx.psu.edu/miller_lab.  相似文献   

10.
MOTIVATION: A tandem repeat in DNA is a sequence of two or more contiguous, approximate copies of a pattern of nucleotides. Tandem repeats occur in the genomes of both eukaryotic and prokaryotic organisms. They are important in numerous fields including disease diagnosis, mapping studies, human identity testing (DNA fingerprinting), sequence homology and population studies. Although tandem repeats have been used by biologists for many years, there are few tools available for performing an exhaustive search for all tandem repeats in a given sequence. RESULTS: In this paper we describe an efficient algorithm for finding all tandem repeats within a sequence, under the edit distance measure. The contributions of this paper are two-fold: theoretical and practical. We present a precise definition for tandem repeats over the edit distance and an efficient, deterministic algorithm for finding these repeats. AVAILABILITY: The algorithm has been implemented in C++, and the software is available upon request and can be used at http://www.sci.brooklyn.cuny.edu/~sokol/trepeats. The use of this tool will assist biologists in discovering new ways that tandem repeats affect both the structure and function of DNA and protein molecules.  相似文献   

11.
ABSTRACT: BACKGROUND: Dystrophin is a large essential protein of skeletal and heart muscle. It is a filamentous scaffolding protein with numerous binding domains. Mutations in the DMD gene, which encodes dystrophin, mostly result in the deletion of one or several exons and cause Duchenne (DMD) and Becker (BMD) muscular dystrophies. The most common DMD mutations are frameshift mutations resulting in an absence of dystrophin from tissues. In-frame DMD mutations are less frequent and result in a protein with partial wild-type dystrophin function. The aim of this study was to highlight structural and functional modifications of dystrophin caused by in-frame mutations. Methods and results We developed a dedicated database for dystrophin, the eDystrophin database. It contains 209 different non frame-shifting mutations found in 945 patients from a French cohort and previous studies. Bioinformatics tools provide models of the three-dimensional structure of the protein at deletion sites, making it possible to determine whether the mutated protein retains the typical filamentous structure of dystrophin. An analysis of the structure of mutated dystrophin molecules showed that hybrid repeats were reconstituted at the deletion site in some cases. These hybrid repeats harbored the typical triple coiled-coil structure of native repeats, which may be correlated with better function in muscle cells. CONCLUSION: This new database focuses on the dystrophin protein and its modification due to in-frame deletions in BMD patients. The observation of hybrid repeat reconstitution in some cases provides insight into phenotype-genotype correlations in dystrophin diseases and possible strategies for gene therapy. The eDystrophin database is freely available: http://edystrophin.genouest.org/.  相似文献   

12.

Background

Tandem repeat variation in protein-coding regions will alter protein length and may introduce frameshifts. Tandem repeat variants are associated with variation in pathogenicity in bacteria and with human disease. We characterized tandem repeat polymorphism in human proteins, using the UniGene database, and tested whether these were associated with host defense roles.

Results

Protein-coding tandem repeat copy-number polymorphisms were detected in 249 tandem repeats found in 218 UniGene clusters; observed length differences ranged from 2 to 144 nucleotides, with unit copy lengths ranging from 2 to 57. This corresponded to 1.59% (218/13,749) of proteins investigated carrying detectable polymorphisms in the copy-number of protein-coding tandem repeats. We found no evidence that tandem repeat copy-number polymorphism was significantly elevated in defense-response proteins (p = 0.882). An association with the Gene Ontology term 'protein-binding' remained significant after covariate adjustment and correction for multiple testing. Combining this analysis with previous experimental evaluations of tandem repeat polymorphism, we estimate the approximate mean frequency of tandem repeat polymorphisms in human proteins to be 6%. Because 13.9% of the polymorphisms were not a multiple of three nucleotides, up to 1% of proteins may contain frameshifting tandem repeat polymorphisms.

Conclusion

Around 1 in 20 human proteins are likely to contain tandem repeat copy-number polymorphisms within coding regions. Such polymorphisms are not more frequent among defense-response proteins; their prevalence among protein-binding proteins may reflect lower selective constraints on their structural modification. The impact of frameshifting and longer copy-number variants on protein function and disease merits further investigation.  相似文献   

13.
We have developed a publicly accessible database (ALFRED, the ALlele FREquency Database) that catalogues allele frequency data for a wide range of population samples and DNA polymorphisms. This database is web-accessible through our laboratory (Kidd Lab) Web site: http://info.med.yale.edu/genetics/kkidd. ALFRED currently contains data on 60 populations and 156 genetic systems including single nucleotide polymorphisms (SNPs), short tandem repeat polymorphisms (STRPs), variable number of tandem repeats (VNTRs) and insertion-deletion polymorphisms. While data are not available for all population-DNA polymorphism combinations, over 2000 allele frequency tables have been entered. Our database is designed (i) to address our specific research requirements as well as broader scientific objectives; (ii) to allow researchers and interested educators to easily navigate and retrieve data of interest to them; and (iii) to integrate links to other related public databases such as dbSNP, GenBank and PubMed.  相似文献   

14.
The circumsporozoite gene of the Plasmodium cynomolgi complex   总被引:14,自引:0,他引:14  
An analysis of the circumsporozoite (CS) genes of six closely related plasmodia is presented. Like other plasmodial antigens, the CS protein contains tandem repeats flanked by conventional nonrepeated sequences. Our analysis shows that the repeats, which encode the immunodominant epitope of the CS protein, diverge more rapidly than the remainder of the gene, and that the maintenance and evolution of the repeats cannot be explained as the result of selection at the protein level. We argue that a mechanism acts directly on the DNA sequence to constrain the internal divergence of the repeats, and as a result promotes their rapid divergence between taxa.  相似文献   

15.
Proteins that share even low sequence homologies are known to adopt similar folds. The beta-propeller structural motif is one such example. Identifying sequences that adopt a beta-propeller fold is useful to annotate protein structure and function. Often, tandem sequence repeats provide the necessary signal for identifying beta-propellers in proteins. In our recent analysis to identify cell surface proteins in archaeal and bacterial genomes, we identified some proteins that contain novel tandem repeats "LVIVD", "RIVW" and "LGxL". In this work, based on protein fold predictions and three-dimensional comparative modeling methods, we predicted that these repeat types fold as beta-propeller. Further, the evolutionary trace analysis of all proteins constituting amino acid sequence repeats in beta-propellers suggest that the novel repeats have diverged from a common ancestor.  相似文献   

16.
MOTIVATION: Tandem repeats (TRs) are associated with human disease, play a role in evolution and are important in regulatory processes. Despite their importance, locating and characterizing these patterns within anonymous DNA sequences remains a challenge. In part, the difficulty is due to imperfect conservation of patterns and complex pattern structures. We study recognition algorithms for two complex pattern structures: variable length tandem repeats (VLTRs) and multi-period tandem repeats (MPTRs). RESULTS: We extend previous algorithmic research to a class of regular tandem repeats (RegTRs). We formally define RegTRs, as well as two important subclasses: VLTRs and MPTRs. We present algorithms for identification of TRs in these classes. Furthermore, our algorithms identify degenerate VLTRs and MPTRs: repeats containing substitutions, insertions and deletions. To illustrate our work, we present results of our analysis for two difficult regions in cattle and human data which reflect practical occurrences of these subclasses in GenBank sequence data. In addition, we show the applicability of our algorithmic techniques for identifying Alu sequences, gene clusters and other distant regions of similarity. We illustrate this with an example from yeast chromosome I.  相似文献   

17.
Laskin  A. A.  Korotkov  E. V.  Chaley  M. B.  Kudryashov  N. A. 《Molecular Biology》2003,37(4):561-570
A program package has been developed to search for hidden tandem repeats of any specified type in the protein sequence databases. The applied algorithm of the locally optimal cyclic alignment is able to find subsequences possessing a certain profile-based periodicity type when no appreciable homology between periods is observed, as well as in the presence of arbitrary insertions/deletions. The profile can be adjusted to search for the periodicity types structurally and functionally important. The Swiss-Prot database has been analyzed to reveal the periodicities undetectable earlier that are caused by the secondary and super-secondary structure regularities of the NAD-binding sites. In particular, a significant periodicity of 24 aa was found to be characteristic of the absolute majority of domains possessing the Rossman (or Rossman-like) fold and displaying apparent regularity in their secondary structures, not being obvious at the primary structure level.  相似文献   

18.
A program package has been developed to search for hidden tandem repeats of any specified type in the protein sequence databases. The applied algorithm of the locally optimal cyclic alignment is able to find subsequences possessing a certain profile-based periodicity type when no appreciable homology between periods is observed, as well as in the presence of arbitrary insertions/deletions. The profile can be adjusted to search for the periodicity types structurally and functionally important. The Swiss-Prot database has been analyzed to reveal the periodicities undetectable earlier that are caused by the secondary and super-secondary structure regularities of the NAD-binding sites. In particular, a significant periodicity of 24 aa was found to be characteristic of the absolute majority of domains possessing the Rossman (or Rossman-like) fold and displaying the apparent regularity in their secondary structures, not being obvious at the primary structure level.  相似文献   

19.
Bordetella pertussis establishes infection by attaching to epithelial cells of the respiratory tract. One of its adhesins is filamentous haemagglutinin (FHA), a 500-A-long secreted protein that is rich in beta-structure and contains two regions, R1 and R2, of tandem 19-residue repeats. Two models have been proposed in which the central shaft is (i) a hairpin made up of a pairing of two long antiparallel beta-sheets; or (ii) a beta-helix in which the polypeptide chain is coiled to form three long parallel beta-sheets. We have analysed a truncated variant of FHA by electron microscopy (negative staining, shadowing and scanning transmission electron microscopy of unstained specimens): these observations support the latter model. Further support comes from detailed sequence analysis and molecular modelling studies. We applied a profile search method to the sequences adjacent to and between R1 and R2 and found additional "covert" copies of the same motifs that may be recognized in overt form in the R1 and R2 sequence repeats. Their total number is sufficient to support the tenet of the beta-helix model that the shaft domain--a 350 A rod--should consist of a continuous run of these motifs, apart from loop inserts. The N-terminus, which does not contain such repeats, was found to be weakly homologous to cyclodextrin transferase, a protein of known immunoglobulin-like structure. Drawing on crystal structures of known beta-helical proteins, we developed structural models of the coil motifs putatively formed by the R1 and R2 repeats. Finally, we applied the same profile search method to the sequence database and found several other proteins--all large secreted proteins of bacterial provenance--that have similar repeats and probably also similar structures.  相似文献   

20.
Complete eukaryote chromosomes were investigated for intrachromosomal duplications of nucleotide sequences. The analysis was performed by looking for nonexact repeats on two complete genomes, Saccharomyces cerevisiae and Caenorhabditis elegans, and four partial ones, Drosophila melanogaster, Plasmodium falciparum, Arabidopsis thaliana, and Homo sapiens. Through this analysis, we show that all eukaryote chromosomes exhibit similar characteristics for their intrachromosomal repeats, suggesting similar dynamics: many direct repeats have their two copies physically close together, and these close direct repeats are more similar and shorter than the other repeats. On the contrary, there are almost no close inverted repeats. These results support a model for the dynamics of duplication. This model is based on a continuous genesis of tandem repeats and implies that most of the distant and inverted repeats originate from these tandem repeats by further chromosomal rearrangements (insertions, inversions, and deletions). Remnants of these predicted rearrangements have been brought out through fine analysis of the chromosome sequence. Despite these dynamics, shared by all eukaryotes, each genome exhibits its own style of intrachromosomal duplication: the density of repeated elements is similar in all chromosomes issued from the same genome, but is different between species. This density was further related to the relative rates of duplication, deletion, and mutation proper to each species. One should notice that the density of repeats in the X chromosome of C. elegans is much lower than in the autosomes of that organism, suggesting that the exchange between homologous chromosomes is important in the duplication process.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号