首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Summary Analysis of the sequence data available today, comprising more than 500,000 bases, confirms the previously observed phenomenon that there are distinct dinucleotide preferences in DNA sequences. Consistent behaviour is observed in the major sequence groups analysed here in prokaryotes, eukaryotes and mitochondria. Some doublet preferences are common to all groups and are found in most sequences of the Los Alamos Library. The patterns seen in such large data sets are very significant statistically and biologically. Since they are present in numerous and diverse nucleotide sequences, one may conclude that they confer evolutionary advantages on the organism.In eukaryotes RR and YY dinucleotides are preferred over YR and RY (where R is a purine and Y a pyrimidine). Since opposite-chain nearest-neighbour purine clashes are major determinants of DNA structure, it appears that the tight packaging of DNA in nucleosomes disfavors, in general, such (YR and RY) steric repulsion.  相似文献   

2.
Fast algorithms for analysing sequence data are presented. An algorithm for strict homologies finds all common subsequences of length greater than or equal to 6 in two given sequences. With it, nucleic acid pieces five thousand nucleotides long can be compared in five seconds on CDC 6600. Secondary structure algorithms generate the N most stable secondary structures of an RNA molecule, taking into account all loop contributions, and the formation of all possible base-pairs in stems, including odd pairs (G.G., C.U., etc.). They allow a typical 100-nucleotide sequence to be analysed in 10 seconds. The homology and secondary structure programs are respectively illustrated with a comparison of two phage genomes, and a discussion of Drosophila melanogaster 55 RNA folding.  相似文献   

3.
The number of segregating sites provides an indicator of the degree of DNA sequence variation that is present in a sample, and has been of great interest to the biological, pharmaceutical and medical professions. In this paper, we first provide linear- and expected-sublinear-time algorithms for finding all the segregating sites of a given set of DNA sequences. We also describe a data structure for tracking segregating sites in a set of sequences, such that every time the set is updated with the insertion of a new sequence or removal of an existing one, the segregating sites are updated accordingly without the need to re-scan the entire set of sequences.  相似文献   

4.
This paper presents a simple program for interactive searchingfor nucleotide sequences that may code for the helix—turn—helix,zinc finger or leucine zipper motifs in proteins. The helix—turn—helixmotifs are predicted using the recently published method ofDodd and Egan, while zinc fingers and leucine zippers are searchedfor by our original methods. DNABIND is shown to detect allfour known helix—turn—helix motifs in bacteriophagelambda genes and both zinc fingers of the adrl gene of yeast.  相似文献   

5.
Summary The data from a genomic library can be sorted into the frequencies of every possible tetranucleotide in the sequence. This tabulation, a short sequence distribution, contains the frequency of occurrence of the 256 tetranucleotides and thus seems to serve as a vehicle for averaging sequence information. Two such distributions can be readily compared by correlation. Reported here are correlations (Spearmanrs) of the distributions from all of the genomic libraries in GenBank 44.0 with sizes equal to or larger than that ofSalmonella typhimurium, except for the data for mouse and humans. All of the organisms examined showed highly significant correlations between the two DNA strands (not the complementarity expected from base pairing). Of 155 comparisons between libraries, 132 showed significant correlations at the 99% confidence level. Application of the correlation coefficients as a similarity matrix clustered most organisms in a phenogram in a pattern consistent with other hypotheses. This suggests a highly conserved pattern underlying all other genetic information in cellular DNA and affecting both DNA strands, perhaps caused by interaction with conserved factors necessary for DNA packaging.  相似文献   

6.
7.
Hasan MS  Liu Q  Wang H  Fazekas J  Chen B  Che D 《Bioinformation》2012,8(4):203-205
Genomic Islands (GIs) are genomic regions that are originally from other organisms, through a process known as Horizontal Gene Transfer (HGT). Detection of GIs plays a significant role in biomedical research since such align genomic regions usually contain important features, such as pathogenic genes. We have developed a use friendly graphic user interface, Genomic Island Suite of Tools (GIST), which is a platform for scientific users to predict GIs. This software package includes five commonly used tools, AlienHunter, IslandPath, Colombo SIGI-HMM, INDeGenIUS and Pai-Ida. It also includes an optimization program EGID that ensembles the result of existing tools for more accurate prediction. The tools in GIST can be used either separately or sequentially. GIST also includes a downloadable feature that facilitates collecting the input genomes automatically from the FTP server of the National Center for Biotechnology Information (NCBI). GIST was implemented in Java, and was compiled and executed on Linux/Unix operating systems. AVAILABILITY: The database is available for free at http://www5.esu.edu/cpsc/bioinfo/software/GIST.  相似文献   

8.
Estimation of demographic history from nucleotide sequences represents an important component of many studies in molecular ecology. For example, knowledge of a population's history can allow us to test hypotheses about the impact of climatic and anthropogenic factors. In the past, demographic analysis was typically limited to relatively simple population models, such as exponential or logistic growth. More flexible approaches are now available, including skyline-plot methods that are able to reconstruct changes in population sizes through time. This technical review focuses on these skyline-plot methods. We describe some general principles relating to sampling design and data collection. We then provide an outline of the methodological framework, which is based on coalescent theory, before tracing the development of the various skyline-plot methods and describing their key features. The performance and properties of the methods are illustrated using two simulated data sets.  相似文献   

9.
U Brinckmann  G Darai  R M Flügel 《Gene》1983,24(1):131-135
The termini of the tupaia (tree shrew) adenovirus (TAV) DNA have been sequenced. The inverted terminal repetitions (ITR) are 166 bp long containing the A + T-rich, highly conserved sequence present in all adenovirus DNAs so far analysed. An unusual feature within the TAV ITR is the presence of four sets of a conserved sequence TGACCG which occur at or near the ends of many adenovirus ITR.  相似文献   

10.
Summary Complete nucleotide sequences were compared between papova viruses BKV and SV40 and the degrees of sequence divergences were compared between structurally and/or functionally different segments or genes in details. It was shown that the rate of synonymous substitution is not only very high but also approximately uniform among different genes in these viruses as in eukaryotic genes examined to date. While all the non-coding regions including the intron showed marked sequence preservation which is in sharp contrasted with the case of eukaryotic genes where the large bulk of non-coding regions evolve at a rate as rapidly as that of synonymous substitution. It is remarkable that a long continuous stretch of sequence including the putative VPX gene and a 5 half of VP2 gene showed strong homology between BKV and SV40. A close examination of the pattern of base substitutions revealed that this unusual homology was derived by recombination between the two viruses during their evolution. On the basis of the pattern of base substitutions and the bias in code word utilization, we also showed that the putative VPX gene actually could code for a functional polypeptide. In papova viruses, the 3 terminal sequence of VP2/3 gene overlaps with the 5 terminal sequence of VPI gene. The pattern of base substitutions in the overlapping segment was examined in detail in comparison with those in the non-overlapping portions of VP2/3 and VP1 genes. It was shown that the evolutionary mode of the overlapping genes is in good agreement with our previous prediction.  相似文献   

11.
12.
Some properties of exact tests for unit roots   总被引:1,自引:0,他引:1  
BHARGAVA  ALOK 《Biometrika》1996,83(4):944-949
  相似文献   

13.
14.
Computational prediction of the origin of replication is a challenging problem and of immense interest to biologists. Several methods have been proposed for identifying the replicon site for various classes of organisms. However, these methods have limited applicability since the replication mechanism is different in different organisms. We propose a correlation measure and show that it is correctly able to predict the origin of replication in most of the bacterial genomes. When applied to Methanocaldococcus jannaschii, Plasmodium falciparum apicoplast and Nicotiana tabacum plastid, this correlation based method is able to correctly predict the origin of replication whereas the generally used GC skew measure fails. Thus, this correlation based measure is a novel and promising tool for predicting the origin of replication in a wide class of organisms. This could have important implications in not only gaining a deeper understanding of the replication machinery in higher organisms, but also for drug discovery.  相似文献   

15.
16.
One can generate trajectories to simulate a system of chemical reactions using either Gillespie's direct method or Gibson and Bruck's next reaction method. Because one usually needs many trajectories to understand the dynamics of a system, performance is important. In this paper, we present new formulations of these methods that improve the computational complexity of the algorithms. We present optimized implementations, available from http://cain.sourceforge.net/, that offer better performance than previous work. There is no single method that is best for all problems. Simple formulations often work best for systems with a small number of reactions, while some sophisticated methods offer the best performance for large problems and scale well asymptotically. We investigate the performance of each formulation on simple biological systems using a wide range of problem sizes. We also consider the numerical accuracy of the direct and the next reaction method. We have found that special precautions must be taken in order to ensure that randomness is not discarded during the course of a simulation.  相似文献   

17.
18.
A simple method for searching amphipathic helices based on estimation of correlation between hydrophobicity distribution and periodic function is proposed. The method was examined in a series of proteins with known T-cell epitopes, which are mostly amphipathic helices. The predictive power of the method is discussed.  相似文献   

19.
Method of informational decomposition has been developed, allowing one to reveal hidden periodicity in any symbol sequences. The informational decomposition is calculated without conversion of a symbol sequence into the numerical one, which facilitates finding periodicities in a symbol sequence. The method permits introducing an analog of the autocorrelation function of a symbol sequence. The method developed by us has been applied to reveal hidden periodicities in nucleotide and amino acid sequences, as well as in different poetical texts. Hidden periodicity has been detected in various genes, testifying to their quantum structure. The functional and structural role of hidden periodicity is discussed.  相似文献   

20.
Goal, Scope, and Background As Life Cycle Assessment (LCA) and Input-Output Analysis (IOA) systems increase in size, computation times and memory usage can increase rapidly. The use of efficient methods of solution allows the use of a wide range of analysis techniques. Some techniques, such as Monte-Carlo Analysis, may be limited if computational times are too slow. Discussion of Methods In this article, I describe algorithms that substantially reduce computation times and memory usage for solving LCA and IOA systems and performing Monte-Carlo analysis. The algorithms are based on well-established iterative methods of solving linear systems and exploit the power series expansion of the Leontief inverse. The algorithms are further enhanced by using sparse matrix algebra. Results and Discussion The algorithms presented in this article reduce computational time and memory usage by orders of magnitude, while still retaining a high degree of accuracy. For a 3225×3225 LCA system, the algorithm reduced computation time from 70s to 0.06s while retaining an accuracy of 10−3%. Storage was reduced from 166 megabytes to 1.8 megabytes. The algorithm was used to perform a Monte-Carlo analysis on the same system with 1,000 samples in 90s. I also discuss various issues of power series convergence for general LCA and IOA systems and show that convergence will generally hold due to the mathematical structure of LCA and IOA systems. Conclusions By exploiting the mathematical structure of LCA and IOA iterative techniques substantially reduced the computational times required for solving LCA and IOA systems and for performing Monte-Carlo simulations. This allows more wide-spread implementation analysis techniques, such as Monte-Carlo analysis, in LCA and IOA. Recommendations and Perspectives It is suggested that algorithms, such as the ones described in this article, should be implemented in LCA packages. Various checks can be used to verify that computational errors are kept to a minimum.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号