共查询到20条相似文献,搜索用时 15 毫秒
1.
An approximate nested tandem repeat (NTR) in a string T is a complex repetitive structure consisting of many approximate copies of two substrings x and X ("motifs") interspersed with one another. NTRs fall into a class of repetitive structures broadly known as subrepeats. NTRs have been found in real DNA sequences and are expected to be important in evolutionary biology, both in understanding evolution of the ribosomal DNA (where NTRs can occur), and as a potential marker in population genetic and phylogenetic studies. This article describes an alignment algorithm for the verification phase of the software tool NTRFinder developed for database searches for NTRs. When the search algorithm has located a subsequence containing a possible NTR, with motifs X and x, a verification step aligns this subsequence against an exact NTR built from the templates X and x, to determine whether the subsequence contains an approximate NTR and its extent. This article describes an algorithm to solve this alignment problem in O(|T|(|X| + |x|)) space and time. The algorithm is based on Fischetti et al.'s wrap-around dynamic programming. 相似文献
2.
We study the problem of approximate non-tandem repeat extraction. Given a long subject string S of length N over a finite alphabet Sigma and a threshold D, we would like to find all short substrings of S of length P that repeat with at most D differences, i.e., insertions, deletions, and mismatches. We give a careful theoretical characterization of the set of seeds (i.e., some maximal exact repeats) required by the algorithm, and prove a sublinear bound on their expected numbers. Using this result, we present a sub-quadratic algorithm for finding all short (i.e., of length O(log N)) approximate repeats. The running time of our algorithm is O(DN(3pow(epsilon)-1)log N), where epsilon = D/P and pow(epsilon) is an increasing, concave function that is 0 when epsilon = 0 and about 0.9 for DNA and protein sequences. 相似文献
3.
Finding approximate tandem repeats in genomic sequences. 总被引:1,自引:0,他引:1
Ydo Wexler Zohar Yakhini Yechezkel Kashi Dan Geiger 《Journal of computational biology》2005,12(7):928-942
An efficient algorithm is presented for detecting approximate tandem repeats in genomic sequences. The algorithm is based on a flexible statistical model which allows a wide range of definitions of approximate tandem repeats. The ideas and methods underlying the algorithm are described and its effectiveness on genomic data is demonstrated. 相似文献
4.
An efficient algorithm for detecting approximate tandem repeats in genomic sequences is presented. The algorithm is based on innovative statistical criteria to detect candidate regions which may include tandem repeats; these regions are subsequently verified by alignments based on dynamic programming. No prior information about the period size or pattern is needed. Also, the algorithm is virtually capable of detecting repeats with any period. An implementation of the algorithm is compared with the two state-of-the-art tandem repeats detection tools to demonstrate its effectiveness both on natural and synthetic data. The algorithm is available at www.cs.brown.edu/people/domanic/tandem/. 相似文献
5.
In vivo glycosylation of mucin tandem repeats. 总被引:4,自引:0,他引:4
H S Silverman S Parry M Sutton-Smith M D Burdick K McDermott C J Reid S K Batra H R Morris M A Hollingsworth A Dell A Harris 《Glycobiology》2001,11(6):459-471
The biochemical and biophysical properties of mucins are largely determined by extensive O-glycosylation of serine- and threonine-rich tandem repeat (TR) domains. In a number of human diseases aberrant O-glycosylation is associated with variations in the properties of the cell surface-associated and secreted mucins. To evaluate in vivo the O-glycosylation of mucin TR domains, we generated recombinant chimeric mucins with TR sequences from MUC2, MUC4, MUC5AC, or MUC5B, which were substituted for the native TRs of epitope-tagged MUC1 protein (MUC1F). These hybrid mucins were extensively O-glycosylated and showed the expected association with the cell surface and release into culture media. The presence of different TR domains within the chimeric mucins appears to have limited influence on their posttranslational processing. Alterations in glycosylation were detailed by fast atom bombardment mass spectrometry and reactivity with antibodies against particular blood-group and tumor-associated carbohydrate antigens. Future applications of these chimeras will include investigations of mucin posttranslational modification in the context of disease. 相似文献
6.
7.
Background
Biological sequence repeats arranged in tandem patterns are widespread in DNA and proteins. While many software tools have been designed to detect DNA tandem repeats (TRs), useful algorithms for identifying protein TRs with varied levels of degeneracy are still needed. 相似文献8.
M S Wehnert R S Matson J B Rampal P J Coassin C T Caskey 《Nucleic acids research》1994,22(9):1701-1704
Oligonucleotides representing 60 trinucleotide (21mers) and four dinucleotide (20mers) tandem repeats were directly synthesized and arrayed onto an aminated polypropylene substrate. DNA samples of different complexities (a CAG-containing 21mer oligonucleotide, PCR fragments of 200 to 3,000 bp, and cosmids with 31 to 35 kb inserts) were radiolabelled and hybridized to the oligonucleotide array at various temperatures. When compared to sequence data available from the test DNAs, the reverse blot system specifically identified various tri- and dinucleotide short tandem repeats (STRs) in every case. Moreover, there was no random or cross hybridization to nonspecific sequences. It was possible to detect as few as three repeated units in a particular location, as shown for (CCT)n, (GCC)n and (CAC)n triplets in cosmid DNA. Varying the hybridization stringency can enhance the detection of STRs. This single-step reverse blot system therefore allows the rapid, specific and sensitive identification of various STRs in DNA sources of different complexity. 相似文献
9.
10.
X Y Zhu A Testori S Oh J D Skinner L A Burgoyne 《Biochemical and biophysical research communications》1992,182(2):447-451
Three avian highly repetitive tandem repeats were identified and examined. These repeats had similar unit lengths (about 42 bp long) but completely different sequences each containing particular protein binding sites. Each of these repeats was found within only one of the five closely related genera studied. 相似文献
11.
Protein domains constructed from tandem α-helical repeats have until recently been primarily associated with protein scaffolds or RNA recognition. Recent crystal structures of human mitochondrial termination factor MTERF1 and Bacillus cereus alkylpurine DNA glycosylase AlkD bound to DNA revealed two new superhelical tandem repeat architectures capable of wrapping around the double helix in unique ways. Unlike DNA sequence recognition motifs that rely mainly on major groove read-out, MTERF and ALK motifs locate target sequences and aberrant nucleotides within DNA by resculpting the double-helix through extensive backbone contacts. Comparisons between MTERF and ALK repeats, together with recent advances in ssRNA recognition by Pumilio/FBF (PUF) domains, provide new insights into the fundamental principles of protein-nucleic acid recognition. 相似文献
12.
Parry S Sutton-Smith M Heal P Leir SH Palmai-Pallag T Morris HR Hollingsworth MA Dell A Harris A 《Biochimica et biophysica acta》2005,1722(1):77-83
The MUC6 mucin was originally isolated from stomach mucus and is one of the major secreted mucins of the digestive tract. A full-length cDNA has not been isolated for this large molecule (greater than 15 kb) and it remains poorly studied. To circumvent the lack of reagents for investigating MUC6, we isolated a cDNA clone from a human fetal pancreatic duct cDNA library that encodes 282 amino acids of the MUC6 tandem repeat. A blast search with the sequence of this cDNA clone showed 90% homology with the original MUC6 (L07517) derived from a human stomach cDNA library and 95% homology both with AK096772, a MUC6-related protein isolated from a human prostate cDNA library and the human genome project clone AC083984. The MUC6 partial cDNA clone isolated from fetal pancreas was inserted into an epitope-tagged MUC1 mucin molecule in place of the native tandem repeat. This chimeric mucin was expressed in human pancreatic (Panc1) and colon (Caco2) carcinoma cell lines and purified for analysis of O-glycosylation by fast atom bombardment mass spectrometry (FAB-MS). The FAB-MS spectra showed O-glycans that had been detected previously on chimeric mucins carrying different tandem repeats, though the spectra for MUC1F/6TR mucins expressed in the Panc1 and Caco2 cells were very different. There was a paucity of O-glycosylation in Panc1 cells in comparison to Caco2 cells where many more structures were evident, and the most abundant glycans in Panc1 cells were sialylated. 相似文献
13.
Background
Tandem repetition of structural motifs in proteins is frequently observed across all forms of life. Topology of repeating unit and its frequency of occurrence are associated to a wide range of structural and functional roles in diverse proteins, and defects in repeat proteins have been associated with a number of diseases. It is thus desirable to accurately identify specific repeat type and its copy number. Weak evolutionary constraints on repeat units and insertions/deletions between them make their identification difficult at the sequence level and structure based approaches are desired. The proposed graph spectral approach is based on protein structure represented as a graph for detecting one of the most frequently observed structural repeats, Ankyrin repeat.Results
It has been shown in a large number of studies that 3-dimensional topology of a protein structure is well captured by a graph, making it possible to analyze a complex protein structure as a mathematical entity. In this study we show that eigen spectra profile of a protein structure graph exhibits a unique repetitive profile for contiguous repeating units enabling the detection of the repeat region and the repeat type. The proposed approach uses a non-redundant set of 58 Ankyrin proteins to define rules for the detection of Ankyrin repeat motifs. It is evaluated on a set of 370 proteins comprising 125 known Ankyrin proteins and remaining non-solenoid proteins and the prediction compared with UniProt annotation, sequence-based approach, RADAR, and structure-based approach, ConSole. To show the efficacy of the approach, we analyzed the complete PDB structural database and identified 641 previously unrecognized Ankyrin repeat proteins. We observe a unique eigen spectra profile for different repeat types and show that the method can be easily extended to detect other repeat types. It is implemented as a web server, AnkPred. It is freely available at ‘bioinf.iiit.ac.in/AnkPred’.Conclusions
AnkPred provides an elegant and computationally efficient graph-based approach for detecting Ankyrin structural repeats in proteins. By analyzing the eigen spectra of the protein structure graph and secondary structure information, characteristic features of a known repeat family are identified. This method is especially useful in correctly identifying new members of a repeat family.Electronic supplementary material
The online version of this article (doi:10.1186/s12859-014-0440-9) contains supplementary material, which is available to authorized users. 相似文献14.
MOTIVATION: S-attributed grammars (a generalization of classical Context-Free grammars) provide a versatile formalism for sequence analysis which allows to express long range constraints: the RNA folding problem is a typical example of application. Efficient algorithms have been developed to solve problems expressed with these tools, which generally compute the optimal attribute of the sequence w.r.t. the grammar. However, it is often more meaningful and/or interesting from the biological point of view to consider almost optimal attributes as well as approximate sequences; we thus need more flexible and powerful algorithms able to perform these generalized analyses. RESULTS: In this paper we present a basic algorithm which, given a grammar G and a sequence omega, computes the optimal attribute for all (approximate) strings omega(') in L(G) such that d(omega, omega(')) < or = M, and whose complexity is O(n(r + 1)) in time and O(n(2)) in space (r is the maximal length of the right-hand side of any production of G). We will also give some extensions and possible improvements of this algorithm. 相似文献
15.
The total number of microsatellite loci is considered to be at least 10-fold lower in avian species than in mammalian species. Therefore, efficient large-scale cloning of chicken microsatellites, as required for the construction of a high-resolution linkage map, is facilitated by the construction of libraries using an enrichment strategy. In this study, a plasmid library enriched for tandem repeats was constructed from chicken genomic DNA by hybridization selection. Using this technique the proportion of recombinant clones that cross-hybridized to probes containing simple tandem repeats was raised to 16%, compared with < 0·1% in a non-enriched library. Primers were designed from 121 different sequences. Polymerase chain reaction (PCR) analysis of two chicken reference pedigrees enabled 72 loci to be localized within the collaborative chicken genetic map, and at least 30 of the remaining loci have been shown to be informative in these or other crosses. 相似文献
16.
17.
18.
《Cell cycle (Georgetown, Tex.)》2013,12(23):4605-4606
Comment on: Law MJ, et al. Cell 2010; 143:367-78. 相似文献
19.
Emma Joy Dodson Vered Fishbain-Yoskovitz Shahar Rotem-Bamberger Ora Schueler-Furman 《Experimental biology and medicine (Maywood, N.J.)》2015,240(3):351-360
Interactions mediated by short linear motifs in proteins play major roles in regulation of cellular homeostasis since their transient nature allows for easy modulation. We are still far from a full understanding and appreciation of the complex regulation patterns that can be, and are, achieved by this type of interaction. The fact that many linear-motif-binding domains occur in tandem repeats in proteins indicates that their mutual communication is used extensively to obtain complex integration of information toward regulatory decisions. This review is an attempt to overview, and classify, different ways by which two and more tandem repeats cooperate in binding to their targets, in the well-characterized family of WW domains and their corresponding polyproline ligands. 相似文献
20.
Taxonomy of thermophilic, endospore-forming bacteria has evoked a great interest over the past few years. Although a number of taxonomic markers were previously evaluated, their sequences in Geobacillus were too conservative, and identification of more variable markers is needed. Repetitive DNA is one of the promising variable targets in the development of the taxon-specific genotyping and identification schemes in bacteria. The aim of our study was to evaluate the possibility of using repetitive DNA in the taxonomy of Geobacillus. In this paper, we report the analysis of perfect tandem repeats of geobacilli. We focused on the long repeats (with a motif length of ≥20 nucleotides). This choice was based on the assumption that these motifs can be used for the construction of oligonucleotides — primers and probes. Thirty-three Geobacillus genus-specific motifs were identified in our work, fifteen of them were species-specific and fifteen — species cluster -specific. Three of them were genus-, but not species- or species cluster-specific. Some of the motifs were used for the construction of the primer pairs. The primers were validated by PCR. Out of 12 designed primer pairs, 11 were genus-specific and 4 — species-specific. Species-specific primers were successfully constructed for the phylogenetically defined species Geobacillus thermodenitrificans and Geobacillus toebii. 相似文献