首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Directed graphs of DNA sequences and their numerical characterization   总被引:1,自引:0,他引:1  
In this paper we (1) introduce a directed graphical representation of DNA primary sequences; (2) describe a scheme that transforms the directed graph of a DNA sequence into an upper triangular matrix; (3) investigate whether or not the existing matrix-based invariants of DNA sequences are compatible for the upper triangular matrix representation. The utility of our method is illustrated by an examination of the similarity between human and other seven species.  相似文献   

2.
3.
Elucidation of the molecular basis of cell lineage details is a major activity of both developmental biologists and those studying renewing systems in the adult. Given this priority I was surprised to find how little theory has been developed for the main object of this work, the lineage diagram. Even simple questions like—how many different diagrams are plausible?—do not appear to have been addressed and so I decided it would be timely to try to do so here. The results are applied to the intestinal epithelium as an example with special emphasis on the interpretation of recent mutation-based lineages studies.  相似文献   

4.
We analyze the known long natural nucleotide sequences, looking for possible homonucleotide clustering. We find strong evidence for adenine clustering starting at the A3 level. Single, isolated As and to a lesser extent AAs occur less frequently than expected from the base composition of the given sequence alone. A3, A4 and higher A—runs occur much more frequently than. expected. The effect is quite universal and occurs with equal strength in prokaryotes and eukaryotes.Analogues thymine clustering is observed in a single genome only. Cytosines and/or guanines mildly display the reverse trend.  相似文献   

5.
Nucleic acids are molecules of choice for both established and emerging nanoscale technologies. These technologies benefit from large functional densities of ‘DNA processing elements’ that can be readily manufactured. To achieve the desired functionality, polynucleotide sequences are currently designed by a process that involves tedious and laborious filtering of potential candidates against a series of requirements and parameters. Here, we present a complete novel methodology for the rapid rational design of large sets of DNA sequences. This method allows for the direct implementation of very complex and detailed requirements for the generated sequences, thus avoiding ‘brute force’ filtering. At the same time, these sequences have narrow distributions of melting temperatures. The molecular part of the design process can be done without computer assistance, using an efficient ‘human engineering’ approach by drawing a single blueprint graph that represents all generated sequences. Moreover, the method eliminates the necessity for extensive thermodynamic calculations. Melting temperature can be calculated only once (or not at all). In addition, the isostability of the sequences is independent of the selection of a particular set of thermodynamic parameters. Applications are presented for DNA sequence designs for microarrays, universal microarray zip sequences and electron transfer experiments.  相似文献   

6.
In this paper, we propose a nongraphical representation for protein secondary structures. By counting the frequency of occurrence of all possible four-tuples (i.e., four-letter words) of a protein secondary structure sequence, we construct a set of 3x3 matrices for the corresponding protein secondary structure sequence. Furthermore, the leading eigenvalues of these matrices are computed and considered as invariants for the protein secondary structure sequences. To illustrate the utility of our approach, we apply it to a set of real data to distinguish protein structural classes. The result indicates that it can be used to complement the classification of protein secondary structures.  相似文献   

7.
The usefulness of information-theoretic measures of the Shannon-Weaver type, when applied to molecular biological systems such as DNA or protein sequences, has been critically evaluated. It is shown that entropy can be re-expressed in dimensionless terms, thereby making it commensurate with information. Further, we have identified processes in which entropy S and information H change in opposite directions. These processes of opposing signs for delta S and delta H demonstrate that while the Second Law of Thermodynamics mandates that entropy always increases, it places no such restrictions on changes in information. Additionally, we have developed equations permitting information calculations, incorporating conditional occurrence probabilities, on DNA and protein sequences. When the results of such calculations are compared for sequences of various general types, there are no informational content patterns. We conclude that information-theoretic calculations of the present level of sophistication do not provide any useful insights into molecular biological sequences.  相似文献   

8.
Hoff PD 《Biometrics》2005,61(4):1027-1036
This article develops a model-based approach to clustering multivariate binary data, in which the attributes that distinguish a cluster from the rest of the population may depend on the cluster being considered. The clustering approach is based on a multivariate Dirichlet process mixture model, which allows for the estimation of the number of clusters, the cluster memberships, and the cluster-specific parameters in a unified way. Such a clustering approach has applications in the analysis of genomic abnormality data, in which the development of different types of tumors may depend on the presence of certain abnormalities at subsets of locations along the genome. Additionally, such a mixture model provides a nonparametric estimation scheme for dependent sequences of binary data.  相似文献   

9.
Journal of Mathematical Biology - In this paper we consider Susceptible $$\rightarrow $$ Infectious $$\rightarrow $$ Recovered (SIR) epidemics on random graphs with clustering. To incorporate group...  相似文献   

10.
Summary We report the isolation of 50 independent unique sequences from a human chromosome 21 library (identification code LA21 NSO1). These sequences were individually assigned to chromosome 21 using a mouse-human somatic hybrid cell line, WAVR 4d-F94a. Use of these unique clones as a mixture of probes for in situ hybridization of human metaphase chromosomes demonstrated strong signals on chromosome 21. These unique DNA sequences should provide useful tools for structural and functional analysis of human chromosome 21. The use of these sequences for the detection of Down syndrome is discussed.  相似文献   

11.
12.
Alignment ambiguity is a widespread problem in molecular evolutionary studies that has received insufficient attention. Most studies ignore such regions by deleting them before analyses, even though alignment-ambiguous regions can contain useful phylogenetic and evolutionary information. The alignment ambiguity might affect only one taxon, the region being readily alignable and phylogenetically informative across all other taxa. Alternatively, all possible alignments can consistently imply certain relationships. Because they are usually the most rapidly evolving regions, alignment-ambiguous regions might be those that are most able to resolve closely spaced divergences and contribute to estimates of branch lengths, evolutionary rates and divergence times. Three methods to incorporate such regions into phylogenetic and evolutionary analyses have been devised. The multiple analysis method evaluates each plausible alignment separately and seeks areas of congruence among the resultant trees, whereas the elision method combines all plausible alignments into a single analysis. Fragment-level alignment (= fixed states, INAASE) treats the entire unalignable section as a single but highly complex multistate character. Although these methods still need refining, they are preferable to discarding large portions of hard-earned and potentially informative sequence data.  相似文献   

13.
Ray tracing is a powerful, and highly computer intensive means for generating high-quality molecular displays. A variety of simple, yet effective optimization strategies are described that allow large molecular models to be ray traced on microcomputers and low-cost desktop workstations. In particular, the method of fractal clustering provides a time and space-efficient means for spatially subdividing the molecular scene into a hierarchy of spherical bounding volumes, permitting ray-atom intersections to be determined by a form of binary search. An implementation of the algorithms, MolRay, is described which demonstrates that large structures may be ray traced in a reasonable time on a PC or small Unix workstation. Images generated by PC and Unix versions of MolRay are shown.  相似文献   

14.
Consensus functions and patterns in molecular sequences   总被引:1,自引:0,他引:1  
In recent years, methods of consensus, developed for the solution of problems in the social sciences, have become widely used in molecular biology. Westudy a method of consensus originally due to Watermanet al. (Waterman, Galas and Arratia. 1984. Pattern recognition in several sequences: consensus and alignment.Bull. math. Biol. 46, 515–527) which is used to identify patterns or features in a molecular sequence where a pattern can vary in position within a given window. We show that some well-known consensus methods of the social sciences, the median and the mean, are special cases of this method for certain choices of the parameters used in it and give a precise account of the parameters for which these special cases arise. We also show that the specific parameters used in the method of Watermanet al. make their method equivalent to the median procedure which is widely used in the social sciences.  相似文献   

15.
A clustering method for repeat analysis in DNA sequences   总被引:1,自引:0,他引:1  
Volfovsky N  Haas BJ  Salzberg SL 《Genome biology》2001,2(8):research0027.1-research002711

Background

A computational system for analysis of the repetitive structure of genomic sequences is described. The method uses suffix trees to organize and search the input sequences; this data structure has been used previously for efficient computation of exact and degenerate repeats.

Results

The resulting software tool collects all repeat classes and outputs summary statistics as well as a file containing multiple sequences (multi fasta), that can be used as the target of searches. Its use is demonstrated here on several complete microbial genomes, the entire Arabidopsis thaliana genome, and a large collection of rice bacterial artificial chromosome end sequences.

Conclusions

We propose a new clustering method for analysis of the repeat data captured in suffix trees. This method has been incorporated into a system that can find repeats in individual genome sequences or sets of sequences, and that can organize those repeats into classes. It quickly and accurately creates repeat databases from small and large genomes. The associated software (RepeatFinder), should prove helpful in the analysis of repeat structure for both complete and partial genome sequences.  相似文献   

16.

Background  

The investigation of plant genome structure and evolution requires comprehensive characterization of repetitive sequences that make up the majority of higher plant nuclear DNA. Since genome-wide characterization of repetitive elements is complicated by their high abundance and diversity, novel approaches based on massively-parallel sequencing are being adapted to facilitate the analysis. It has recently been demonstrated that the low-pass genome sequencing provided by a single 454 sequencing reaction is sufficient to capture information about all major repeat families, thus providing the opportunity for efficient repeat investigation in a wide range of species. However, the development of appropriate data mining tools is required in order to fully utilize this sequencing data for repeat characterization.  相似文献   

17.
18.
Spatio-temporal patterns of spikes have an advantage of representing information by their spike composition similar to words of languages. First we review the models of neuronal coding, then we discuss technical aspects of detecting spatio-temporal spike patterns. We argue by presenting data from rat hippocampus that spike trains recorded simultaneously from multiple pyramidal cells are not independent. Their hidden dependency structure can be revealed by spike 'sequences', defined as a set of neurons which fire in a specific temporal order with certain delay between successive spikes. The only way to prove their existence in vivo is to show that they recur with higher than by-chance frequency. We observed that 'sequences' possess 'compositional' features and that a given spike composition is time scale invariant. We illustrate that the same neuron can be a part of different 'sequences' and 'sequences' recur in a temporally compressed fashion during slow wave sleep. The statistical significance of 'sequences' is testable. Their biological significance has been implicated by experiments where recurrence rate of the sequences during different behavioral sessions were compared. As consistent with the 'replay hypothesis' of memory consolidation, new sequences generated during the wake state are persistent during the subsequent sleep. Thus, information acquired during the wake state and represented by spatio-temporal patterns of spikes may transfer to the neocortex during sleep. Our results suggest that 'sequences' reflect the activation of specific but configurable circuitries during exploratory behavior, followed by spontaneous re-activation of the same circuitry during sleep. Whether the delay structure of spikes as a combination is an effective input to single neurons downstream or 'sequence' components are being processed in parallel pathways and evaluated independently is an open question.  相似文献   

19.

Background

Polypeptides are composed of amino acids covalently bonded via a peptide bond. The majority of peptide bonds in proteins is found to occur in the trans conformation. In spite of their infrequent occurrence, cis peptide bonds play a key role in the protein structure and function, as well as in many significant biological processes.

Results

We perform a systematic analysis of regions in protein sequences that contain a proline cis peptide bond in order to discover non-random associations between the primary sequence and the nature of proline cis/trans isomerization. For this purpose an efficient pattern discovery algorithm is employed which discovers regular expression-type patterns that are overrepresented (i.e. appear frequently repeated) in a set of sequences. Four types of pattern discovery are performed: i) exact pattern discovery, ii) pattern discovery using a chemical equivalency set, iii) pattern discovery using a structural equivalency set and iv) pattern discovery using certain amino acids' physicochemical properties. The extracted patterns are carefully validated using a specially implemented scoring function and a significance measure (i.e. log-probability estimate) indicative of their specificity. The score threshold for the first three types of pattern discovery is 0.90 while for the last type of pattern discovery 0.80. Regarding the significance measure, all patterns yielded values in the range [-9, -31] which ensure that the derived patterns are highly unlikely to have emerged by chance. Among the highest scoring patterns, most of them are consistent with previous investigations concerning the neighborhood of cis proline peptide bonds, and many new ones are identified. Finally, the extracted patterns are systematically compared against the PROSITE database, in order to gain insight into the functional implications of cis prolyl bonds.

Conclusion

Cis patterns with matches in the PROSITE database fell mostly into two main functional clusters: family signatures and protein signatures. However considerable propensity was also observed for targeting signals, active and phosphorylation sites as well as domain signatures.  相似文献   

20.
Mustelidae is the largest and most diverse family in the order Carnivora. The phylogenetic relationships among the subfamilies have especially long been a focus of study. Herein we are among the first to employ two new introns (4 and 7) of the nuclear beta-fibrinogen gene to clarify these enigmatic problems. In addition, two previously available nuclear (IRBP exon 1 and TTR intron 1) and one mt (ND2) data sets were also combined and analyzed simultaneously with the newly obtained sequence data in this study. Detailed characterizations of the two intronic regions not only reveal the remarkable occurrences of short interspersed element (SINE) insertion events, providing a new example supporting the attractive hypothesis that attrition of an earlier retroposition may offer a proper environment for successive retropositions by forming a "dimer-like" structure, but also demonstrate their utility in the resolution of mustelid phylogeny. All of our analyses confirm the assemblage of Mustelinae, Lutrinae, and Melinae with confidence; moreover, two clades within Mustelinae were clearly recognized, i.e., genera Mustela and Martes. Notably, genus Martes of Mustelinae was found to branch off first, followed by Melinae and then a clade containing Lutrinae and genus Mustela of Mustelinae, indicating paraphyly of Mustelinae. In addition, Mephitinae diverges before the other mustelids and the monophyletic Procyonidae in all cases, supporting its elevation to a separate family. Additional independent genetic markers are still in need to resolve the trichotomy among Mephitinae and the other two carnivoran clades, Ailuridae and Procyonidae/non-mephitine Mustelidae.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号