首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.

Background

Chaos Game Representation (CGR) is an iterated function that bijectively maps discrete sequences into a continuous domain. As a result, discrete sequences can be object of statistical and topological analyses otherwise reserved to numerical systems. Characteristically, CGR coordinates of substrings sharing an L-long suffix will be located within 2 -L distance of each other. In the two decades since its original proposal, CGR has been generalized beyond its original focus on genomic sequences and has been successfully applied to a wide range of problems in bioinformatics. This report explores the possibility that it can be further extended to approach algorithms that rely on discrete, graph-based representations.

Results

The exploratory analysis described here consisted of selecting foundational string problems and refactoring them using CGR-based algorithms. We found that CGR can take the role of suffix trees and emulate sophisticated string algorithms, efficiently solving exact and approximate string matching problems such as finding all palindromes and tandem repeats, and matching with mismatches. The common feature of these problems is that they use longest common extension (LCE) queries as subtasks of their procedures, which we show to have a constant time solution with CGR. Additionally, we show that CGR can be used as a rolling hash function within the Rabin-Karp algorithm.

Conclusions

The analysis of biological sequences relies on algorithmic foundations facing mounting challenges, both logistic (performance) and analytical (lack of unifying mathematical framework). CGR is found to provide the latter and to promise the former: graph-based data structures for sequence analysis operations are entailed by numerical-based data structures produced by CGR maps, providing a unifying analytical framework for a diversity of pattern matching problems.  相似文献   

2.
Pectins are critical polysaccharides of the cell wall that are involved in key aspects of a plant's life, including cell‐wall stiffness, cell‐to‐cell adhesion, and mechanical strength. Pectins undergo methylesterification, which affects their cellular roles. Pectin methyltransferases are believed to methylesterify pectins in the Golgi, but little is known about their identity. To date, there is only circumstantial evidence to support a role for QUASIMODO2 (QUA2)‐like proteins and an unrelated plant‐specific protein, cotton Golgi‐related 3 (CGR3), in pectin methylesterification. To add to the knowledge of pectin biosynthesis, here we characterized a close homolog of CGR3, named CGR2, and evaluated the effect of loss‐of‐function mutants and over‐expression lines of CGR2 and CGR3 in planta. Our results show that, similar to CGR3, CGR2 is a Golgi protein whose enzyme active site is located in the Golgi lumen where pectin methylesterification occurs. Through phenotypical analyses, we also established that simultaneous loss of CGR2 and CGR3 causes severe defects in plant growth and development, supporting critical but overlapping functional roles of these proteins. Qualitative and quantitative cell‐wall analytical assays of the double knockout mutant demonstrated reduced levels of pectin methylesterification, coupled with decreased microsomal pectin methyltransferase activity. Conversely, CGR2 and CGR3 over‐expression lines have markedly opposite phenotypes to the double knockout mutant, with increased cell‐wall methylesterification levels and microsomal pectin methyltransferase activity. Based on these findings, we propose that CGR2 and CGR3 are critical proteins in plant growth and development that act redundantly in pectin methylesterification in the Golgi apparatus.  相似文献   

3.
A new method to determine entropic profiles in DNA sequences is presented. It is based on the chaos-game representation (CGR) of gene structure, a technique which produces a fractal-like picture of DNA sequences. First, the CGR image was divided into squares 4-m in size (m being the desired resolution), and the point density counted. Second, appropriate intervals were adjusted, and then a histogram of densities was prepared. Third, Shannon's formula was applied to the probability-distribution histogram, thus obtaining a new entropic estimate for DNA sequences, the histogram entropy , a measurement that goes with the level of constraints on the DNA sequence. Lastly, the entropic profile for the sequence was drawn, by considering the entropies at each resolution level, thus providing a way to summarize the complexity of large genomic regions or even entire genomes at different resolution levels. The application of the method to DNA sequences reveals that entropic profiles obtained in this way, as opposed to previously published ones, clearly discriminate between random and natural DNA sequences. Entropic profiles also show a different degree of variability within and between genomes. The results of these analyses are discussed in relation both to the genome compartmentalization in vertebrates and to the differential action of compositional and/or functional constraints on DNA sequences.  相似文献   

4.
Fourier transform spectroscopy in the mid-infrared (400–5,000 cm−1) (FT-IR) is being recognized as a powerful tool for analyzing chemical composition of food, with special concern to molecular architecture of food proteins. Unlike other spectroscopic techniques, it provides high-quality spectra with very small amount of protein, in various environments irrespective of the molecular mass. The fraction of peptide bonds in α-helical, β-pleated sheet, turns and aperiodic conformations can be accurately estimated by analysis of the amide I band (1,600–1,700 cm−1) in the mid-IR region. In addition, FT-IR measurement of secondary structure highlights the mechanism of protein aggregation and stability, making this technique of strategic importance in the food proteomic field. Examples of applications of FT-IR spectroscopy in the study of structural features of food proteins critical of nutritional and technological performance are discussed.  相似文献   

5.
A field experiment was carried out in order to evaluate genetic diversity of 41 rice genotypes using physiological traits and molecular markers. All the genotypes unveiled variations for crop growth rate (CGR), relative growth rate (RGR), net assimilation rate (NAR), yield per hill (Yhill?1), total dry matter (TDM), harvest index (HI), photosynthetic rate (PR), leaf area index (LAI), chlorophyll‐a and chlorophyll‐b at maximum tillering stage. The CGR values varied from 0.23 to 0.76 gm cm?2 day?1. The Yhill?1 ranged from 15.91 to 92.26 g, while TDM value was in the range of 7.49 to 20.45 g hill?1. PR was found to vary from 9.40 to 22.34 µmol m?2 s?1. PR expressed positive relation with Yhill?1. Significant positive relation was found between CGR and TDM (r = 0.61**), NAR and CGR (r = 0.62**) and between TDM and NAR (r = 0.31**). High heritability was found in RGR and Yhill?1. Cluster analysis based on the traits grouped 41 rice genotypes into seven clusters. A total of 310 polymorphic loci were detected across the 20 inter‐simple sequence repeats (ISSR) markers. The UPGMA dendrogram grouped 41 rice genotypes into 11 clusters including several sub‐clusters. The Mantel test revealed positive correlation between quantitative traits and molecular markers (r = 0.41). On the basis of quantitative traits and molecular marker analyses parental genotypes, IRBB54 with MR84, IRBB60 with MR84, Purbachi with MR263, IRBB65 with BR29, IRBB65 with Pulut Siding and MRQ74 with Purbachi could be hybridized for future breeding program.  相似文献   

6.
Multiheme proteins play major roles in various biological systems. Structural information on these systems in solution is crucial to understand their functional mechanisms. However, the presence of numerous proton-containing groups in the heme cofactors and the magnetic properties of the heme iron, in particular in the oxidised state, complicates significantly the assignment of the NMR signals. Consequently, the multiheme proteins superfamily is extremely under-represented in structural databases, which constitutes a severe bottleneck in the elucidation of their structural-functional relationships. In this work, we present a strategy that simplifies the assignment of the NMR signals in multiheme proteins and, concomitantly, their solution structure determination, using the triheme cytochrome PpcA from the bacterium Geobacter sulfurreducens as a model. Cost-effective isotopic labeling was used to double label (13C/15N) the protein in its polypeptide chain, with the correct folding and heme post-translational modifications. The combined analysis of 1H-13C HSQC NMR spectra obtained for labeled and unlabeled samples of PpcA allowed a straight discrimination between the heme cofactors and the polypeptide chain signals and their confident assignment. The results presented here will be the foundations to assist solution structure determination of multiheme proteins, which are still very scarce in the literature.  相似文献   

7.
Disordered or unstructured regions of proteins, while often very important biologically, can pose significant challenges for resonance assignment and three‐dimensional structure determination of the ordered regions of proteins by NMR methods. In this article, we demonstrate the application of 1H/2H exchange mass spectrometry (DXMS) for the rapid identification of disordered segments of proteins and design of protein constructs that are more suitable for structural analysis by NMR. In this benchmark study, DXMS is applied to five NMR protein targets chosen from the Northeast Structural Genomics project. These data were then used to design optimized constructs for three partially disordered proteins. Truncated proteins obtained by deletion of disordered N‐ and C‐terminal tails were evaluated using 1H‐15N HSQC and 1H‐15N heteronuclear NOE NMR experiments to assess their structural integrity. These constructs provide significantly improved NMR spectra, with minimal structural perturbations to the ordered regions of the protein structure. As a representative example, we compare the solution structures of the full length and DXMS‐based truncated construct for a 77‐residue partially disordered DUF896 family protein YnzC from Bacillus subtilis, where deletion of the disordered residues (ca. 40% of the protein) does not affect the native structure. In addition, we demonstrate that throughput of the DXMS process can be increased by analyzing mixtures of up to four proteins without reducing the sequence coverage for each protein. Our results demonstrate that DXMS can serve as a central component of a process for optimizing protein constructs for NMR structure determination. Proteins 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

8.
过量施氮对冬小麦生产力的影响   总被引:2,自引:0,他引:2       下载免费PDF全文
设置0、70、140、210和280 kg N·hm -2 5个施N梯度, 对冬小麦(Triticum aestivum)旗叶光合速率(Aleaf)、群体冠层光合速率(Acanopy)、作物生长速率(CGR)和籽粒产量(GY) 4个生产力水平进行综合观测研究, 结果发现: 在0-210 kg N·hm -2区间, AleafAcanopyCGRGY都随施N量的增大而增大; 在施N量由210增加到280 kg N·hm -2时, GY没有显著变化, 而灌浆期Aleaf、开花期和灌浆期Acanopy、开花-成熟阶段CGR有显著减小。综合分析认为: 1)过量施N (280 kg N·hm -2)能显著降低灌浆期冬小麦AleafAcanopyCGR, 进而抑制GY; 2)过量施N对冬小麦光合生产力的抑制作用主要发生在灌浆期; 3)在AleafAcanopyCGRGY 4个生产力指标中, Acanopy对过量施N的反应最敏感。  相似文献   

9.
Comprehensive knowledge of thermophilic mechanisms about some organisms whose optimum growth temperature (OGT) ranges from 50 to 80 °C degree plays a major role for helping to design stable proteins. How to predict function-unknown proteins to be thermophilic is a long but not fairly resolved problem. Chaos game representation (CGR) can investigate hidden patterns in protein sequences, and also can visually reveal their previously unknown structures. In this paper, using the general form of pseudo amino acid composition to represent protein samples, we proposed a novel method for presenting protein sequence to a CGR picture using CGR algorithm. A 24-dimensional vector extracted from these CGR segments and the first two PCA features are used to classify thermophilic and mesophilic proteins by Support Vector Machine (SVM). Our method is evaluated by the jackknife test. For the 24-dimensional vector, the accuracy is 0.8792 and Matthews Correlation Coefficient (MCC) is 0.7587. The 26-dimensional vector by hybridizing with PCA components performs highly satisfaction, in which the accuracy achieves 0.9944 and MCC achieves 0.9888. The results show the effectiveness of the new hybrid method.  相似文献   

10.
The identification of circulating biomarkers holds great potential for non invasive approaches in early diagnosis and prognosis, as well as for the monitoring of therapeutic efficiency.1-3 The circulating low molecular weight proteome (LMWP) composed of small proteins shed from tissues and cells or peptide fragments derived from the proteolytic degradation of larger proteins, has been associated with the pathological condition in patients and likely reflects the state of disease.4,5 Despite these potential clinical applications, the use of Mass Spectrometry (MS) to profile the LMWP from biological fluids has proven to be very challenging due to the large dynamic range of protein and peptide concentrations in serum.6 Without sample pre-treatment, some of the more highly abundant proteins obscure the detection of low-abundance species in serum/plasma. Current proteomic-based approaches, such as two-dimensional polyacrylamide gel-electrophoresis (2D-PAGE) and shotgun proteomics methods are labor-intensive, low throughput and offer limited suitability for clinical applications.7-9 Therefore, a more effective strategy is needed to isolate LMWP from blood and allow the high throughput screening of clinical samples. Here, we present a fast, efficient and reliable multi-fractionation system based on mesoporous silica chips to specifically target and enrich LMWP.10,11 Mesoporous silica (MPS) thin films with tunable features at the nanoscale were fabricated using the triblock copolymer template pathway. Using different polymer templates and polymer concentrations in the precursor solution, various pore size distributions, pore structures, connectivity and surface properties were determined and applied for selective recovery of low mass proteins. The selective parsing of the enriched peptides into different subclasses according to their physicochemical properties will enhance the efficiency of recovery and detection of low abundance species. In combination with mass spectrometry and statistic analysis, we demonstrated the correlation between the nanophase characteristics of the mesoporous silica thin films and the specificity and efficacy of low mass proteome harvesting. The results presented herein reveal the potential of the nanotechnology-based technology to provide a powerful alternative to conventional methods for LMWP harvesting from complex biological fluids. Because of the ability to tune the material properties, the capability for low-cost production, the simplicity and rapidity of sample collection, and the greatly reduced sample requirements for analysis, this novel nanotechnology will substantially impact the field of proteomic biomarker research and clinical proteomic assessment.  相似文献   

11.
Saccharomyces cerevisiae CGR1 encodes a 120-amino acid protein with a predominant nucleolar localization. In this study we report the identification and cloning of the ortholog, cgrA, from Aspergillus nidulans. The cgrA gene is comprised of three exons on A. nidulans Chromosome 7. The cDNA contains a single open reading frame (ORF) that would encode a protein of 114 amino acids with 44% sequence identity to yeast Cgr1p. A plasmid expressing cgrA complemented the impaired growth phenotype of a yeast strain that can be inducibly depleted of CGR1, and a green fluorescent protein (GFP)-tagged CgrA protein had the same nucleolar localization as the corresponding yeast protein. These results identify cgrA as the A. nidulans ortholog of yeast CGR1 and suggest evolutionary conservation of nucleolar localization mechanisms used by these proteins. Received: 14 September 2000 / Accepted: 13 November 2000  相似文献   

12.
An experiment is presented to determine 3JHNHα coupling constants, with significant advantages for applications to unfolded proteins. The determination of coupling constants for the peptide chain using 1D 1H, or 2D and 3D 1H-15N correlation spectroscopy is often hampered by extensive resonance overlap when dealing with flexible, disordered proteins. In the experiment detailed here, the overlap problem is largely circumvented by recording 1H-13C′ correlation spectra, which demonstrate superior resolution for unfolded proteins. J-coupling constants are extracted from the peak intensities in a pair of 2D spin-echo difference experiments, affording rapid acquisition of the coupling data. In an application to the cytoplasmic domain of human neuroligin-3 (hNlg3cyt) data were obtained for 78 residues, compared to 54 coupling constants obtained from a 3D HNHA experiment. The coupling constants suggest that hNlg3cyt is intrinsically disordered, with little propensity for structure.  相似文献   

13.
Community structure is expected to be affected by spatial heterogeneity in a landscape. We examined the spatial-scale-dependent effects of windthrow caused by a large typhoon on a forest bird community. Typhoon events of this magnitude are rare in Hokkaido, Japan, occurring only once or twice a century. To assess the “functional spatial scale” at which bird groups (community, species, body-size class, and foraging guild) specifically responded to landscape heterogeneity, the canopy gap rate (CGR, gap percentage) was evaluated at different spatial scales by varying the radius of a circular landscape sector from 100 to 500 m stepwise by 10 m. We then analysed bird community responses, in terms of species richness and abundance, to CGR. Bird species richness did not significantly depend on CGR. In contrast, abundance was significantly dependent on CGR in many groups (species, body-size class, and foraging guild). The guild-level response was clearer than the species-level response, which suggests that the integration and filtration of species traits by guild can reveal a clear response of bird abundance to the extent of canopy gaps. For example, the scale dependence of responses to disturbance clearly varied among body-size classes, where larger birds had larger functional spatial scales. These results reveal that different groups of organisms have different functional spatial scales at which they respond to habitat heterogeneity. Our results also suggest that monitoring only a small number of species could be misleading for conserving biodiversity at the landscape level.  相似文献   

14.
Ge  Li  Liu  Jiaguo  Zhang  Yusen  Dehmer  Matthias 《Journal of mathematical biology》2019,78(1-2):441-463

We generalize chaos game representation (CGR) to higher dimensional spaces while maintaining its bijection, keeping such method sufficiently representative and mathematically rigorous compare to previous attempts. We first state and prove the asymptotic property of CGR and our generalized chaos game representation (GCGR) method. The prediction follows that the dissimilarity of sequences which possess identical subsequences but distinct positions would be lowered exponentially by the length of the identical subsequence; this effect was taking place unbeknownst to researchers. By shining a spotlight on it now, we show the effect fundamentally supports (G)CGR as a similarity measure or feature extraction technique. We develop two feature extraction techniques: GCGR-Centroid and GCGR-Variance. We use the GCGR-Centroid to analyze the similarity between protein sequences by using the datasets 9 ND5, 24 TF and 50 beta-globin proteins. We obtain consistent results compared with previous studies which proves the significance thereof. Finally, by utilizing support vector machines, we train the anticancer peptide prediction model by using both GCGR-Centroid and GCGR-Variance, and achieve a significantly higher prediction performance by employing the 3 well-studied anticancer peptide datasets.

  相似文献   

15.
Visalizing the structure and dynamics of proteins, supramolecular assemblies, and cellular components are often key to our understanding of biological function. Here, we focus on the major approaches in imaging, analyzing, and processing biomedical data ranging from the atomic to the macro scale. Relevant biomedical applications at different length scales are chosen to illustrate and discuss the various aspects of data acquisition using multiple modalities including electron microscopy and scanning force microscopy. Moreover, powerful scientific software is presented for processing, analyzing, and visualizing heterogeneous data. Examples of using this software in the context of visualizing biological nano-machines are presented and discussed.  相似文献   

16.
Two experiments are presented that yield amino acid type identification of individual residues in a protein by editing the 1H?C15N correlations into four different 2D subspectra, each corresponding to a different amino acid type class, and that can be applied to deuterated proteins. One experiment provides information on the amino acid type of the residue preceding the detected amide 1H?C15N correlation, while the other gives information on the type of its own residue. Versions for protonated proteins are also presented, and in this case it is possible to classify the residues into six different classes. Both sequential and intraresidue experiments provide highly complementary information, greatly facilitating the assignment of protein resonances. The experiments will also assist in transferring the assignment of a protein to the spectra obtained under different experimental conditions (e.g. temperature, pH, presence of ligands, cofactors, etc.).  相似文献   

17.
Over the last few years we have developed an empirical potential function that solves the protein structure recognition problem: given the sequence for an n-residue globular protein and a collection of plausible protein conformations, including the native conformation for that sequence, identify the correct, native conformation. Having determined this potential on the basis of only some 6500 native/nonnative pairs of structures for 58 proteins, we find it recognizes the native conformation for essentially all compact, soluble, globular proteins having known native conformations in comparisons with 104 to 106 reasonable alternative conformations apiece. In this sense, the potential encodes nearly all the essential features of globular protein conformational preference. In addition it “knows” about many additional factors in protein folding, such as the stabilization of multimeric proteins, quaternary structure, the role of disulfide bridges and ligands, proproteins vs. processed proteins, and minimal strand lengths in globular proteins. Comparisons are made with other sorts of protein folding problems, and applications in protein conformational determination and prediction are discussed. © 1994 Wiley-Liss, Inc.  相似文献   

18.
Alignment free methods based on Chaos Game Representation (CGR), also known as sequence signature approaches, have proven of great interest for DNA sequence analysis. Indeed, they have been successfully applied for sequence comparison, phylogeny, detection of horizontal transfers or extraction of representative motifs in regulation sequences. Transposing such methods to proteins poses several fundamental questions related to representation space dimensionality. Several studies have tackled these points, but none has, so far, brought the application of CGRs to proteins to their fully expected potential. Yet, several studies have shown that techniques based on n-peptide frequencies can be relevant for proteins. Here, we investigate the effectiveness of a strategy based on the CGR approach using a fixed reverse encoding of amino acids into nucleic sequences. We first explore its relevance to protein classification into functional families. We then attempt to apply it to the prediction of protein structural classes. Our results suggest that the reverse encoding approach could be relevant in both cases. We show that it is able to classify functional families of proteins by extracting signatures close to the ProSite patterns. Applied to structural classification, the approach reaches scores of correct classification close to 84%, i.e. close to the scores of related methods in the field. Various optimizations of the approach are still possible, which open the door for future applications.  相似文献   

19.
20.
Experimental residual dipolar couplings (RDCs) in combination with structural models have the potential for accelerating the protein backbone resonance assignment process because RDCs can be measured accurately and interpreted quantitatively. However, this application has been limited due to the need for very high-resolution structural templates. Here, we introduce a new approach to resonance assignment based on optimal agreement between the experimental and calculated RDCs from a structural template that contains all assignable residues. To overcome the inherent computational complexity of such a global search, we have adopted an efficient two-stage search algorithm and included connectivity data from conventional assignment experiments. In the first stage, a list of strings of resonances (CA-links) is generated via exhaustive searches for short segments of sequentially connected residues in a protein (local templates), and then ranked by the agreement of the experimental 13Cα chemical shifts and 15N-1H RDCs to the predicted values for each local template. In the second stage, the top CA-links for different local templates in stage I are combinatorially connected to produce CA-links for all assignable residues. The resulting CA-links are ranked for resonance assignment according to their measured RDCs and predicted values from a tertiary structure. Since the final RDC ranking of CA-links includes all assignable residues and the assignment is derived from a “global minimum”, our approach is far less reliant on the quality of experimental data and structural templates. The present approach is validated with the assignments of several proteins, including a 42 kDa maltose binding protein (MBP) using RDCs and structural templates of varying quality. Since backbone resonance assignment is an essential first step for most of biomolecular NMR applications and is often a bottleneck for large systems, we expect that this new approach will improve the efficiency of the assignment process for small and medium size proteins and will extend the size limits assignable by current methods for proteins with structural models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号