首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Protein domains constructed from tandem α-helical repeats have until recently been primarily associated with protein scaffolds or RNA recognition. Recent crystal structures of human mitochondrial termination factor MTERF1 and Bacillus cereus alkylpurine DNA glycosylase AlkD bound to DNA revealed two new superhelical tandem repeat architectures capable of wrapping around the double helix in unique ways. Unlike DNA sequence recognition motifs that rely mainly on major groove read-out, MTERF and ALK motifs locate target sequences and aberrant nucleotides within DNA by resculpting the double-helix through extensive backbone contacts. Comparisons between MTERF and ALK repeats, together with recent advances in ssRNA recognition by Pumilio/FBF (PUF) domains, provide new insights into the fundamental principles of protein-nucleic acid recognition.  相似文献   

2.
An exhaustive search of the crystal structure of beta-chitin was carried out by simultaneously optimizing all the structural parameters based on published X-ray diffraction data and stereochemical criteria. The most probable structure was characterized by a parallel-up chain polarity, a gg orientation of hydroxymethyl groups and an intermolecular hydrogen bond along the a-axis, which essentially reproduced the original structure proposed by Gardner and Blackwell. The proposed crystal structure was subsequently subjected to crystal modeling using the AMBER force field. The probable orientation of hydroxyl groups and their motional behaviors is proposed based on calculations for the crystal models identified. Solvated crystal models exhibited a slightly deformed structure with the formation of appreciable numbers of hydrogen bonds along the b-axis.  相似文献   

3.
An algorithm for approximate tandem repeats.   总被引:4,自引:0,他引:4  
A perfect single tandem repeat is defined as a nonempty string that can be divided into two identical substrings, e.g., abcabc. An approximate single tandem repeat is one in which the substrings are similar, but not identical, e.g., abcdaacd. In this paper we consider two criterions of similarity: the Hamming distance (k mismatches) and the edit distance (k differences). For a string S of length n and an integer k our algorithm reports all locally optimal approximate repeats, r = umacro ?, for which the Hamming distance of umacro and ? is at most k, in O(nk log (n/k)) time, or all those for which the edit distance of umacro and ? is at most k, in O(nk log k log (n/k)) time. This paper concentrates on a more general type of repeat called multiple tandem repeats. A multiple tandem repeat in a sequence S is a (periodic) substring r of S of the form r = u(a)u', where u is a prefix of r and u' is a prefix of u. An approximate multiple tandem repeat is a multiple repeat with errors; the repeated subsequences are similar but not identical. We precisely define approximate multiple repeats, and present an algorithm that finds all repeats that concur with our definition. The time complexity of the algorithm, when searching for repeats with up to k errors in a string S of length n, is O(nka log (n/k)) where a is the maximum number of periods in any reported repeat. We present some experimental results concerning the performance and sensitivity of our algorithm. The problem of finding repeats within a string is a computational problem with important applications in the field of molecular biology. Both exact and inexact repeats occur frequently in the genome, and certain repeats occurring in the genome are known to be related to diseases in the human.  相似文献   

4.
The MUC6 mucin was originally isolated from stomach mucus and is one of the major secreted mucins of the digestive tract. A full-length cDNA has not been isolated for this large molecule (greater than 15 kb) and it remains poorly studied. To circumvent the lack of reagents for investigating MUC6, we isolated a cDNA clone from a human fetal pancreatic duct cDNA library that encodes 282 amino acids of the MUC6 tandem repeat. A blast search with the sequence of this cDNA clone showed 90% homology with the original MUC6 (L07517) derived from a human stomach cDNA library and 95% homology both with AK096772, a MUC6-related protein isolated from a human prostate cDNA library and the human genome project clone AC083984. The MUC6 partial cDNA clone isolated from fetal pancreas was inserted into an epitope-tagged MUC1 mucin molecule in place of the native tandem repeat. This chimeric mucin was expressed in human pancreatic (Panc1) and colon (Caco2) carcinoma cell lines and purified for analysis of O-glycosylation by fast atom bombardment mass spectrometry (FAB-MS). The FAB-MS spectra showed O-glycans that had been detected previously on chimeric mucins carrying different tandem repeats, though the spectra for MUC1F/6TR mucins expressed in the Panc1 and Caco2 cells were very different. There was a paucity of O-glycosylation in Panc1 cells in comparison to Caco2 cells where many more structures were evident, and the most abundant glycans in Panc1 cells were sialylated.  相似文献   

5.

Background

Tandem repetition of structural motifs in proteins is frequently observed across all forms of life. Topology of repeating unit and its frequency of occurrence are associated to a wide range of structural and functional roles in diverse proteins, and defects in repeat proteins have been associated with a number of diseases. It is thus desirable to accurately identify specific repeat type and its copy number. Weak evolutionary constraints on repeat units and insertions/deletions between them make their identification difficult at the sequence level and structure based approaches are desired. The proposed graph spectral approach is based on protein structure represented as a graph for detecting one of the most frequently observed structural repeats, Ankyrin repeat.

Results

It has been shown in a large number of studies that 3-dimensional topology of a protein structure is well captured by a graph, making it possible to analyze a complex protein structure as a mathematical entity. In this study we show that eigen spectra profile of a protein structure graph exhibits a unique repetitive profile for contiguous repeating units enabling the detection of the repeat region and the repeat type. The proposed approach uses a non-redundant set of 58 Ankyrin proteins to define rules for the detection of Ankyrin repeat motifs. It is evaluated on a set of 370 proteins comprising 125 known Ankyrin proteins and remaining non-solenoid proteins and the prediction compared with UniProt annotation, sequence-based approach, RADAR, and structure-based approach, ConSole. To show the efficacy of the approach, we analyzed the complete PDB structural database and identified 641 previously unrecognized Ankyrin repeat proteins. We observe a unique eigen spectra profile for different repeat types and show that the method can be easily extended to detect other repeat types. It is implemented as a web server, AnkPred. It is freely available at ‘bioinf.iiit.ac.in/AnkPred’.

Conclusions

AnkPred provides an elegant and computationally efficient graph-based approach for detecting Ankyrin structural repeats in proteins. By analyzing the eigen spectra of the protein structure graph and secondary structure information, characteristic features of a known repeat family are identified. This method is especially useful in correctly identifying new members of a repeat family.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0440-9) contains supplementary material, which is available to authorized users.  相似文献   

6.
In vivo glycosylation of mucin tandem repeats.   总被引:4,自引:0,他引:4  
The biochemical and biophysical properties of mucins are largely determined by extensive O-glycosylation of serine- and threonine-rich tandem repeat (TR) domains. In a number of human diseases aberrant O-glycosylation is associated with variations in the properties of the cell surface-associated and secreted mucins. To evaluate in vivo the O-glycosylation of mucin TR domains, we generated recombinant chimeric mucins with TR sequences from MUC2, MUC4, MUC5AC, or MUC5B, which were substituted for the native TRs of epitope-tagged MUC1 protein (MUC1F). These hybrid mucins were extensively O-glycosylated and showed the expected association with the cell surface and release into culture media. The presence of different TR domains within the chimeric mucins appears to have limited influence on their posttranslational processing. Alterations in glycosylation were detailed by fast atom bombardment mass spectrometry and reactivity with antibodies against particular blood-group and tumor-associated carbohydrate antigens. Future applications of these chimeras will include investigations of mucin posttranslational modification in the context of disease.  相似文献   

7.
8.
Comment on: Law MJ, et al. Cell 2010; 143:367-78.  相似文献   

9.
Force-driven conformational changes provide a broad basis for protein extensibility, and multidomain proteins broaden the possibilities further by allowing for a multiplicity of forcibly extended states. Red cell spectrin is prototypical in being an extensible, multidomain protein widely recognized for its contribution to erythrocyte flexibility. Atomic force microscopy has already shown that single repeats of various spectrin family proteins can be forced to unfold reversibly under extension. Recent structural data indicates, however, that the linker between triple-helical spectrin repeats is often a contiguous helix, thus raising questions as to what the linker contributes and what defines a domain mechanically. We have examined the extensible unfolding of red cell spectrins as monomeric constructs of just two, three, or four repeats from the actin-binding ends of both alpha- and beta-chains, i.e., alpha(18-21) and beta(1-4) or their subfragments. In addition to single repeat unfolding evident in sawtooth patterns peaked at relatively low forces (<50 pN at 1 nm/ms extension rates), tandem repeat unfolding is also demonstrated in ensemble-scale analyses of thousands of atomic force microscopy contacts. Evidence for extending two chains and loops is provided by force versus length scatterplots which also indicate that tandem repeat unfolding occurs at a significant frequency relative to single repeat unfolding. Cooperativity in forced unfolding of spectrin is also clearly demonstrated by a common force scale for the unfolding of both single and tandem repeats.  相似文献   

10.
Interactions mediated by short linear motifs in proteins play major roles in regulation of cellular homeostasis since their transient nature allows for easy modulation. We are still far from a full understanding and appreciation of the complex regulation patterns that can be, and are, achieved by this type of interaction. The fact that many linear-motif-binding domains occur in tandem repeats in proteins indicates that their mutual communication is used extensively to obtain complex integration of information toward regulatory decisions. This review is an attempt to overview, and classify, different ways by which two and more tandem repeats cooperate in binding to their targets, in the well-characterized family of WW domains and their corresponding polyproline ligands.  相似文献   

11.
Finding approximate tandem repeats in genomic sequences.   总被引:1,自引:0,他引:1  
An efficient algorithm is presented for detecting approximate tandem repeats in genomic sequences. The algorithm is based on a flexible statistical model which allows a wide range of definitions of approximate tandem repeats. The ideas and methods underlying the algorithm are described and its effectiveness on genomic data is demonstrated.  相似文献   

12.
Joh  Yoonsung  Lee  Kangbae  Kim  Hyunwoo  Park  Heejin 《BMC bioinformatics》2023,24(1):1-21
A cell exhibits a variety of responses to internal and external cues. These responses are possible, in part, due to the presence of an elaborate gene regulatory network (GRN) in every single cell. In the past 20 years, many groups worked on reconstructing the topological structure of GRNs from large-scale gene expression data using a variety of inference algorithms. Insights gained about participating players in GRNs may ultimately lead to therapeutic benefits. Mutual information (MI) is a widely used metric within this inference/reconstruction pipeline as it can detect any correlation (linear and non-linear) between any number of variables (n-dimensions). However, the use of MI with continuous data (for example, normalized fluorescence intensity measurement of gene expression levels) is sensitive to data size, correlation strength and underlying distributions, and often requires laborious and, at times, ad hoc optimization. In this work, we first show that estimating MI of a bi- and tri-variate Gaussian distribution using k-nearest neighbor (kNN) MI estimation results in significant error reduction as compared to commonly used methods based on fixed binning. Second, we demonstrate that implementing the MI-based kNN Kraskov–Stoögbauer–Grassberger (KSG) algorithm leads to a significant improvement in GRN reconstruction for popular inference algorithms, such as Context Likelihood of Relatedness (CLR). Finally, through extensive in-silico benchmarking we show that a new inference algorithm CMIA (Conditional Mutual Information Augmentation), inspired by CLR, in combination with the KSG-MI estimator, outperforms commonly used methods. Using three canonical datasets containing 15 synthetic networks, the newly developed method for GRN reconstruction—which combines CMIA, and the KSG-MI estimator—achieves an improvement of 20–35% in precision-recall measures over the current gold standard in the field. This new method will enable researchers to discover new gene interactions or better choose gene candidates for experimental validations.  相似文献   

13.
We consider simple lattice models for short peptide chains whose states can be exhaustively enumerated to find the lowest energy conformation. Using these exact results and numerical simulations, we compute the distributions for the mean time tN, required to find the global minimum energy state by simulated annealing (SA), as a function of N, the number of units in the chain. On the basis of scaling arguments, the time tN, to find the global minimum energy of longer chains, beyond the range covered by exhaustive enumeration, can be estimated. On the basis of the observed exponential increase in folding time of the standard SA algorithms, it is imperative that better algorithms be found for minimizing longer chains. © 1993 John Wiley & Sons, Inc.  相似文献   

14.
Genes containing multiple coding mini- and microsatellite repeats are highly dynamic components of genomes. Frequent recombination events within these tandem repeats lead to changes in repeat numbers, which in turn alters the amino acid sequence of the corresponding protein. In bacteria and yeasts, the expansion of such coding repeats in cell wall proteins is associated with alterations in immunogenicity, adhesion, and pathogenesis. We hypothesized that identification of repeat-containing putative cell wall proteins in the human pathogen Aspergillus fumigatus may reveal novel pathogenesis-related elements. Here, we report that the genome of A. fumigatus contains as many as 292 genes with internal repeats. Fourteen of 30 selected genes showed size variation of their repeat-containing regions among 11 clinical A. fumigatus isolates. Four of these genes, Afu3g08990, Afu2g05150 (MP-2), Afu4g09600, and Afu6g14090, encode putative cell wall proteins containing a leader sequence and a glycosylphosphatidylinositol anchor motif. All four genes are expressed and produce variable-size mRNA encoding a discrete number of repeat amino acid units. Their expression was altered during development and in response to cell wall-disrupting agents. Deletion of one of these genes, Afu3g08990, resulted in a phenotype characterized by rapid conidial germination and reduced adherence to extracellular matrix suggestive of an alteration in cell wall characteristics. The Afu3g08990 protein was localized to the cell walls of dormant and germinating conidia. Our findings suggest that a subset of the A. fumigatus cell surface proteins may be hypervariable due to recombination events in their internal tandem repeats. This variation may provide the functional diversity in cell surface antigens which allows rapid adaptation to the environment and/or elusion of the host immune system.  相似文献   

15.
Genomes contain various types of repetitive sequences. They may be used as probes for seeking genome rearrangements because they are rather free from the natural selection if they are located in the intergenic regions. In this study, we searched for tandem repeats (TRs) in 44 prokaryotic genomes by the color-coding method and sought the signs of genome rearrangements by detailed analysis of the detected TRs. We found 13,542 tandem repeats from 44 prokaryotic genomes in total ranging from several tens to one thousand per genome. The results of statistical analysis show that TRs tend to exist on high base composition bias regions in some genomes. Moreover, we recognized the characteristic distribution patterns of equivalent TR-pairs in 12 genomes, which are expected to indicate the occurrence of whole-genome duplication (WGD) on the genomes. It is demonstrated that TRs could indeed be used for seeking genome rearrangements. Although it has not been made clear at this time whether or not WGD had occurred in prokaryotic genomes, the results of the analyses of equivalent TR-pairs in this study are thought to be evidences of WGD in these genomes.  相似文献   

16.
《Fungal Biology Reviews》2008,22(3-4):85-96
Coding tandem repeats are adjacent sequences that are directly repeated. The repeated units can be identical or partially degenerate. They are completely contained within a coding sequence and are composed of repeated units in which copy number does not disrupt the reading frame. They have been observed in viruses, prokaryotes and eukaryotes. The benefits offered by repeats include the modular construction of new proteins and introduction of rapidly evolving protein sequences which allow faster adaptation to new environments. Here we review the subject of tandem repeats and their relevance in fungi. Emphasis is given to repeat-containing fungal cell wall proteins and their role in generating diversity, adaptation to the environment, immunogenicity, adhesion, and pathogenesis. We describe in detail the recent studies analyzing coding tandem repeats in the model yeast Saccharomyces cerevisiae and the important human pathogens Candida albicans and Aspergillus fumigatus. Numerous unanswered questions are highlighted, providing a rich hunting ground for future research.  相似文献   

17.
The search for all sequences containing centromeric (CEN) minor satellite (MiSat) or pericen-tromeric (periCEN) mouse major satellite (MaSat) was conducted in the whole genome shotgun (WGS) database. The sequences were checked for the presence of the known dispersed repeats using the Censor software. The presence of tandem repeats was tested using Tandem Repeat Finder (TRF). Monotonous MiSat and MaSat arrays and MaSat to MiSat array transitions were detected. Moreover, two other types of contacts were revealed: (1) MiSat transition to fragments of retroelements LINE and IAP (ERV family, intracisternal A-type particles), mainly to ORF2 and 5′-LTR containing elements; (2) MaSat transition to two tandem repeats with monomers 21 bp and 31 bp in size. The presence of the MiSat/IAP transition could be checked experimentally. The common DNA motif among the IAP fragments close to MiSat was isolated. IAP-specific primers were constructed and the fragments obtained in PCR with IAP and MiSat primers compiled the plasmid vector library. Clone n51 with the maximum length of the possible insertion (∼no. 800 bp) was selected from the library. FISH on extended chromatin fibers (fiberFISH) carried out on the n51 clone demonstrated that the main signal definitely belonged to CEN. However, the signals on the chromosome arms were also detected that could be due to the partial homology of n51 to the dispersed repeats. The duplicated fiberFISH with MiSat and n51 allowed to measure the distances between the fragments. The previously obtained MS3 sequence has some homology to IAP and CEN localization. Accordingly, the regular associations of MiSat with IAP retroelements were shown in silico and in situ. Together with the published data, the present findings suggest that retroelements or their fragments may be essential components of the normal centromere of higher eukaryotes.  相似文献   

18.
Genomic DNA contains a wide variety of repetitive sequences. In Escherichia coli, there have been several classes of repetitive sequences reported, some of which cluster as tandem repeats. We propose a novel method for analyzing symbolic sequences by two-dimensional pattern formation with color-coding. We applied this method for searching tandem repeats in the E. coli genome and found approximately 50 repeats with periods longer than 30 bases. The longest repeat has a period of 1267 bases.  相似文献   

19.
Stupar RM  Song J  Tek AL  Cheng Z  Dong F  Jiang J 《Genetics》2002,162(3):1435-1444
The heterochromatin in eukaryotic genomes represents gene-poor regions and contains highly repetitive DNA sequences. The origin and evolution of DNA sequences in the heterochromatic regions are poorly understood. Here we report a unique class of pericentromeric heterochromatin consisting of DNA sequences highly homologous to the intergenic spacer (IGS) of the 18S.25S ribosomal RNA genes in potato. A 5.9-kb tandem repeat, named 2D8, was isolated from a diploid potato species Solanum bulbocastanum. Sequence analysis indicates that the 2D8 repeat is related to the IGS of potato rDNA. This repeat is associated with highly condensed pericentromeric heterochromatin at several hemizygous loci. The 2D8 repeat is highly variable in structure and copy number throughout the Solanum genus, suggesting that it is evolutionarily dynamic. Additional IGS-related repetitive DNA elements were also identified in the potato genome. The possible mechanism of the origin and evolution of the IGS-related repeats is discussed. We demonstrate that potato serves as an interesting model for studying repetitive DNA families because it is propagated vegetatively, thus minimizing the meiotic mechanisms that can remove novel DNA repeats.  相似文献   

20.
The ability to generate tandem repeats of a DNA sequence has proven important for a large variety of studies of DNA structure and function. The most commonly used method to produce tandem repeats involves cloning of an oligomerized monomer sequence that contains asymmetric overlapping ends, but, in practice, this approach is inefficient because of the circularization of oligomers before they ligate into vector. Described here is a method that circumvents this problem by the use of two separate oligomerization reactions, each containing an initiator fragment onto which monomer polymerizes without circularization. Subsequent mixing of the two reactions permits circularization, generating a viable plasmid containing the sum of the added repeats from each reaction. A variation of this method is also demonstrated that permits the synthesis of constructs with a defined number of repeats.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号