首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Overlapping redundant short oligomers in DNA sequences of retroviruses and papovaviruses have been identified. For each sequence, a search procedure determines the 5% short oligomers of the same length with the highest ratios of observed to expected occurrences based on singlet composition of the sequence. These short oligomers are referred to as compositionally-assessed redundant sequence elements (COARSEs). A pair of COARSEs overlapping by at least one base is considered to be a COARSE overlap. Most COARSE overlaps of the 7th order (overlapping septuplets) are found in long terminal repeats of retroviruses and in the regulatory control regions of papovaviruses SV40, BK and JC. Many of the 7th order COARSE overlaps in HIV-1 and SV40 are identical with regulatory elements determined experimentally. On the contrary, very few of the most frequently occurring oligomer overlaps, which are defined differently from COARSE overlaps, are present in the regulatory regions of retroviruses and papovaviruses. Examining DNA sequences of other genomes by the COARSE overlap method may identify putative regulatory regions.  相似文献   

2.
Systematic Evolution of Ligands by EXponential enrichment (SELEX) is an experimental procedure that allows extraction, from an initially random pool of oligonucleotides, of the oligomers with a desired binding affinity for a given molecular target. The procedure can be used to infer the strongest binders for a given DNA or RNA binding protein, and the highest affinity binding sequences isolated through SELEX can have numerous research, diagnostic and therapeutic applications. Recently, important new modifications of the SELEX protocol have been proposed. In particular, a modification of the standard SELEX procedure allows generating a dataset from which protein-DNA interaction parameters can be determined with unprecedented accuracy. Another variant of SELEX allows investigating interactions of a protein with nucleic-acid fragments derived from the entire genome of an organism. We review here different SELEX-based methods, with particular emphasis on the experimental design and on the applications aimed at inferring protein-DNA interactions. In addition to the experimental issues, we also review relevant methods of data analysis, as well as theoretical modeling of SELEX.  相似文献   

3.
4.
The gap between the number of known protein sequences and structures continues to widen, particularly as a result of sequencing projects for entire genomes. Recently there have been many attempts to generate structural assignments to all genes on sets of completed genomes using fold-recognition methods. We developed a method that detects false positives made by these genome-wide structural assignment experiments by identifying isolated occurrences. The method was tested using two sets of assignments, generated by SUPERFAMILY and PSI-BLAST, on 150 completed genomes. A phylogeny of these genomes was built and a parsimony algorithm was used to identify isolated occurrences by detecting occurrences that cause a gain at leaf level. Isolated occurrences tend to have high e-values, and in both sets of assignments, a sudden increase in isolated occurrences is observed for e-values >10−8 for SUPERFAMILY and >10−4 for PSI-BLAST. Conditions to predict false positives are based on these results. Independent tests confirm that the predicted false positives are indeed more likely to be incorrectly assigned. Evaluation of the predicted false positives also showed that the accuracy of profile-based fold-recognition methods might depend on secondary structure content and sequence length. We show that false positives generated by fold-recognition methods can be identified by considering structural occurrence patterns on completed genomes; occurrences that are isolated within the phylogeny tend to be less reliable. The method provides a new independent way to examine the quality of fold assignments and may be used to improve the output of any genome-wide fold assignment method.  相似文献   

5.
This paper reports a novel symbol-to-signal mapping for DNA sequences, based on the concept of categorical periodograms. A categorical periodogram is a numeric sequence with the n-th element of the sequence indicating the number of occurrences of cycles with period n in it. The period of the cycle is defined as the number of intervening events plus one. Spectral analysis studies have been conducted on Cumulative Categorical Periodogram (CCP) of 10 genes from the data set of Burset and Guigo. It is observed that the spectral signatures in CCP are functionally equivalent to the established N/3 peak in the spectrum of indicator sequences of genomes. Being a single sequence compared to four sequences in the case of indicator sequence representation, the method is claimed to be functionally equivalent, but computationally better for identification of gene coding regions in sequences.  相似文献   

6.
Frenkel S  Kirzhner V  Korol A 《PloS one》2012,7(2):e32076
Genomes of higher eukaryotes are mosaics of segments with various structural, functional, and evolutionary properties. The availability of whole-genome sequences allows the investigation of their structure as "texts" using different statistical and computational methods. One such method, referred to as Compositional Spectra (CS) analysis, is based on scoring the occurrences of fixed-length oligonucleotides (k-mers) in the target DNA sequence. CS analysis allows generating species- or region-specific characteristics of the genome, regardless of their length and the presence of coding DNA. In this study, we consider the heterogeneity of vertebrate genomes as a joint effect of regional variation in sequence organization superimposed on the differences in nucleotide composition. We estimated compositional and organizational heterogeneity of genome and chromosome sequences separately and found that both heterogeneity types vary widely among genomes as well as among chromosomes in all investigated taxonomic groups. The high correspondence of heterogeneity scores obtained on three genome fractions, coding, repetitive, and the remaining part of the noncoding DNA (the genome dark matter--GDM) allows the assumption that CS-heterogeneity may have functional relevance to genome regulation. Of special interest for such interpretation is the fact that natural GDM sequences display the highest deviation from the corresponding reshuffled sequences.  相似文献   

7.
8.
A general phosphate analysis (GPA) is developed which assays the concentration of nucleic acid oligomers and their analogs based on stoichiometric phosphorus in the sequence. The method involves complete digestion of the oligomer sample to orthophosphate using acid at high temperature and subsequent colorimetric analysis by phosphomolybdate complex formation. GPA is applied to oligomers having phosphodiester, methylphosphonate, and phosphorothioate backbone linkages. Given the absorption spectra of oligomers having these backbones, extinction coefficients are obtained and compared to other quantitative and predictive methods. In addition to sequences having the usual nucleoside residues found in naturally occurring nucleic acids, oligomers having base analog residues can be readily quantified by GPA.  相似文献   

9.
Eighteen kinds of oligodeoxyribonucleic acids have been examined to reveal their structures in aqueous solutions at different ionic strengths by Raman spectroscopy. The structures in solutions were found to be very polymorphic depending on their sequences as well as on the salt concentrations. At a low salt condition a DNA oligomer assumes a unique B form within a B family, for examples Ba, Bh, B', or Bn form. Amongst these DNA oligomers, d(CGCG)2 showed a salt induced Ba-Z transition, while d(GGGGCCCC)2 showed a salt induced Bh-A transition. DNA oligomers with AA/TT sequences were found to prefer B' form even at high salt condition. From comparing the structures of DNA oligomers in solutions with their crystal structures, it is safe to say that the crystal structure of a DNA oligomer is very similar to the structure in the high salt solution.  相似文献   

10.
Subunit composition of oligomeric human von Willebrand factor   总被引:10,自引:0,他引:10  
The oligomerization of human endothelial cell-synthesized von Willebrand factor (vWf) has been studied by gel chromatography in columns of Sephacryl S-500 and by discontinuous agarose gel electrophoresis. A quantitative recovery of high Mr vWf oligomers has been obtained after binding to a monoclonal anti-vWf-Sepharose adduct. This reagent has been used to analyze gel filtration chromatographic elution profiles of [35S]methionine-labeled culture medium and cell lysate. It was determined that high Mr oligomers are present in endothelial cell lysates as well as in the medium overlying these cells and are composed of Mr 225,000 subunits. When vWf oligomers were analyzed by sodium dodecyl sulfate-polyacrylamide gel electrophoresis in the presence of a reducing agent, the Mr 240,000 subunit (provWf) was not observed to oligomerize beyond the dimer stage to a significant degree. Therefore, vWf oligomerization appears to be facilitated by conversion of provWf subunits to mature vWf subunits, most likely by proteolytic removal of sequences unique to the intracellular precursor.  相似文献   

11.
The frequency occurrences of K-tuple (overlapping sequences of defined length, K) were computed from the known human genome sequences. The significance of these frequencies for the whole human genome was tested by polymerase chain reaction (PCR). A computer programs based on these results was written to choose primers to amplify DNA target sequences, either of human genes or of human infectious agents. The software also gave nested primer sequences which were used to synthesize non radioactive probes by PCR. We applied these two methods, primer selection and non radioactive probes, to easily and quickly set up very efficient PCR sets to work in the human genome context.  相似文献   

12.
We propose a computational method to measure and visualize interrelationships among any number of DNA sequences allowing, for example, the examination of hundreds or thousands of complete mitochondrial genomes. An "image distance" is computed for each pair of graphical representations of DNA sequences, and the distances are visualized as a Molecular Distance Map: Each point on the map represents a DNA sequence, and the spatial proximity between any two points reflects the degree of structural similarity between the corresponding sequences. The graphical representation of DNA sequences utilized, Chaos Game Representation (CGR), is genome- and species-specific and can thus act as a genomic signature. Consequently, Molecular Distance Maps could inform species identification, taxonomic classifications and, to a certain extent, evolutionary history. The image distance employed, Structural Dissimilarity Index (DSSIM), implicitly compares the occurrences of oligomers of length up to k (herein k = 9) in DNA sequences. We computed DSSIM distances for more than 5 million pairs of complete mitochondrial genomes, and used Multi-Dimensional Scaling (MDS) to obtain Molecular Distance Maps that visually display the sequence relatedness in various subsets, at different taxonomic levels. This general-purpose method does not require DNA sequence alignment and can thus be used to compare similar or vastly different DNA sequences, genomic or computer-generated, of the same or different lengths. We illustrate potential uses of this approach by applying it to several taxonomic subsets: phylum Vertebrata, (super)kingdom Protista, classes Amphibia-Insecta-Mammalia, class Amphibia, and order Primates. This analysis of an extensive dataset confirms that the oligomer composition of full mtDNA sequences can be a source of taxonomic information. This method also correctly finds the mtDNA sequences most closely related to that of the anatomically modern human (the Neanderthal, the Denisovan, and the chimp), and that the sequence most different from it in this dataset belongs to a cucumber.  相似文献   

13.
The resolution of parent-offspring conflict (POC) might sway in favour of the offspring if the parent relies on offspring-supplied information about need. Here, three hypotheses from a resolution model of POC were tested using data on sickness histories and mother-infant interactions from 24 Karo Batak women and their young children from two rural villages in North Sumatra, Indonesia. First, as predicted, offspring with greater need (measured as age and propensity to illness) tended to fuss more often. Second, as expected, observed fussiness predicted the number of suckling occurrences observed during sampling periods. Third, contrary to the prediction, the duration of fussing observed after breast-feeding occurrences was longer than the duration of the breast-feeding occurrences themselves. Parental decisions were made based on offspring-supplied information about need, but offspring failed to garner resources in excess of the parental optimum. This suggests that a POC interpretation is unnecessary to account for these results.  相似文献   

14.
Recent studies of homooligomer tracts suggest different characteristics from random sequence DNA (dA).(dT) and (dG).(dC) tracts are frequent in upstream regions and in some cases have been shown to be essential for regulation. Here we examine homooligomer occurrences in non-coding and coding eukaryotic sequences, focusing on the context in which the homooligomers occur. This analysis of sequences in the junction areas yields distinct and consistent characteristics. In particular, the nucleotide interrupting a run is most frequently complementary to the run. The base next to it is most frequently identical to the one constituting the run. For A or T runs the least frequent nearest and next to nearest neighbors are G or C. For G or C tracts the least frequent are A or T. Complementary oligomers behave similarly. These and additional trends are strongest for run lengths greater than or equal to 3. The computations are carried out on the whole eukaryotic database of greater than 4 x 10(6) nucleotides, separately for coding and non-coding regions. These same trends are evident for both groups, but are somewhat stronger for the non-coding regions. The context in which the homooligomers occur may yield some clues to DNA conformation and its biological implications.  相似文献   

15.
In Stenico et al. (1996) we reported unusually high levels of mitochondrial diversity in the Alps. In particular, two communities of Ladin speakers appeared the most extreme European mitochondrial outliers at that time. Recently, it has been observed that some rare nucleotide substitutions occur repeatedly among those sequences, raising the possibility of systematic sequencing errors. No biological material was left from the previous study, and hence we had to sample new individuals from the same communities. Here, we present the HVSI sequence variation, along with haplogroup assignment based on restriction fragment length polymorphism (RFLP), in 20 Ladin speakers of Colle Santa Lucia. None of the new sequences displays substitutions at the sites viewed as problematic. However, Ladins still show high levels of mtDNA diversity, both within their community and with respect to other Europeans, and they can still be considered one of the main European mitochondrial outliers.  相似文献   

16.
Filizola M  Weinstein H 《The FEBS journal》2005,272(12):2926-2938
To achieve a structural context for the analysis of G-protein coupled receptor (GPCR) oligomers, molecular modeling must be used to predict the corresponding interaction interfaces. The task is complicated by the paucity of detailed structural data at atomic resolution, and the large number of possible modes in which the bundles of seven transmembrane (TM) segments of the interacting GPCR monomers can be packed together into dimers and/or higher-order oligomers. Approaches and tools offered by bioinformatics can be used to reduce the complexity of this task and, combined with computational modeling, can serve to yield testable predictions for the structural properties of oligomers. Most of the bioinformatics methods take advantage of the evolutionary relation that exists among GPCRs, as expressed in their sequences and measurable in the common elements of their structural and functional features. These common elements are responsible for the presence of detectable patterns of motifs and correlated mutations evident from the alignment of the sequences of these complex biological systems. The decoding of these patterns in terms of structural and functional determinants can provide indications about the most likely interfaces of dimerization/oligomerization of GPCRs. We review here the main approaches from bioinformatics, enhanced by computational molecular modeling, that have been used to predict likely interfaces of dimerization/oligomerization of GPCRs, and compare results from their application to rhodopsin-like GPCRs. A compilation of the most frequently predicted GPCR oligomerization interfaces points to specific regions of TMs 4-6.  相似文献   

17.
The G-protein coupled receptor (GPCR) superfamily fulfils various metabolic functions and interacts with a diverse range of ligands. There is a lack of sequence similarity between the six classes that comprise the GPCR superfamily. Moreover, most novel GPCRs found have low sequence similarity to other family members which makes it difficult to infer properties from related receptors. Many different approaches have been taken towards developing efficient and accurate methods for GPCR classification, ranging from motif-based systems to machine learning as well as a variety of alignment-free techniques based on the physiochemical properties of their amino acid sequences. This review describes the inherent difficulties in developing a GPCR classification algorithm and includes techniques previously employed in this area.  相似文献   

18.
The apparent translational diffusion coefficients of four 20 base pair (bp) DNA oligonucleotides with different sequences have been measured by capillary electrophoresis, using the stopped migration method. The diffusion coefficients of the four oligomers were equal within experimental error, and averaged (120 +/- 10) x 10(-8) cm(2) s(-1) in 40 mM Tris-acetate-EDTA buffer at 25 degrees C. Since this value is nearly identical to the translational diffusion coefficient determined for a different 20-bp oligomer using other methods, the stopped migration method can accurately measure the diffusion coefficients of small DNA oligomers. The apparent diffusion coefficient of a 118-bp DNA restriction fragment was also measured by the stopped migration method. However, the observed value was approximately 25% larger than expected from other measurements, possibly because the diffusion coefficients of larger DNA molecules are somewhat dependent on the ionic strength of the solution.  相似文献   

19.
Emergence of thousands of crystal structures of noncoding RNA molecules indicates its structural and functional diversity. RNA function is based upon a large variety of structural elements which are specifically assembled in the folded molecules. Along with the canonical Watson‐Crick base pairs, different orientations of the bases to form hydrogen‐bonded non‐canonical base pairs have also been observed in the available RNA structures. Frequencies of occurrences of different non‐canonical base pairs in RNA indicate their important role to maintain overall structure and functions of RNA. There are several reports on geometry and energetic stabilities of these non‐canonical base pairs. However, their stacking geometry and stacking stability with the neighboring base pairs are not well studied. Among the different non‐canonical base pairs, the G:U wobble base pair (G:U W:WC) is most frequently observed in the RNA double helices. Using quantum chemical method and available experimental data set we have studied the stacking geometry of G:U W:WC base pair containing dinucleotide sequences in roll‐slide parameters hyperspace for different values of twist. This study indicates that the G:U W:WC base pair can stack well with the canonical base pairs giving rise to large interaction energy. The overall preferred stacking geometry in terms of roll, twist and slide for the eleven possible dinucleotide sequences is seen to be quite dependent on their sequences. © 2015 Wiley Periodicals, Inc. Biopolymers 103: 328–338, 2015.  相似文献   

20.
We analyse for each of 20 amino acids X the statistics of spacings between consecutive occurrences of X within the well-characterized Saccharomyces cerevisiae genome. The occurrences of amino acids may exhibit near random, clustered or smoothed out behaviour, like one-dimensional stochastic processes along the protein chain. If amino acids are distributed randomly within a sequence, then they follow a Poisson process, and a histogram of the number of observations of each gap size would asymptotically follow a negative exponential distribution. The novelty of the present approach lies in the use of differential geometric methods to quantify information on sequencing of amino acids and groups of amino acids, via the sequences of intervals between their occurrences. The differential geometry arises from an information-theoretic distance function on the two-dimensional space of stochastic processes subordinate to gamma distributions-which latter include the random process as a special case. We find that maximum-likelihood estimates of parametric statistics show that all 20 amino acids tend to cluster, some substantially. In other words, the frequencies of short gap lengths tend to be higher and the variance of the gap lengths is greater than expected by chance. This may be because localizing amino acids with the same properties may favour secondary structure formation or transmembrane domains. Gap sizes of 1 or 2 are generally disfavoured, 1 strongly so. The only exceptions to this are Gln and Ser, as a result of poly(Gln) or poly(Ser) sequences. There are preferences for gaps of 4 and 7 that can be attributed to alpha -helices. In particular, a favoured gap of 7 for Leu is found in coiled coils. Our method contributes to the characterization of whole sequences by extracting and quantifying stable stochastic features.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号