首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 281 毫秒
1.
Discovering simple DNA sequences by the algorithmic significance method   总被引:6,自引:1,他引:5  
A new method, ‘algorithmic significance’, is proposedas a tool for discovery of patterns in DNA sequences. The mainidea is that patterns can be discovered by finding ways to encodethe observed data concisely. In this sense, the method can beviewed as a formal version of the Occam's Razor principle. Inthis paper the method is applied to discover significantly simpleDNA sequences. We define DNA sequences to be simple if theycontain repeated occurrences of certain ‘words’and thus can be encoded in a small number of bits. Such definitionincludes minisatellites and microsatellites. A standard dynamicprogramming algorithm for data compression is applied to computethe minimal encoding lengths of sequences in linear time. Anelectronic mail server for identification of simple sequencesbased on the proposed method has been installed at the Internetaddress pythia@anl.gov.  相似文献   

2.
This paper presents a method for the multiple alignment of asequence set. The MASH algorithm uses a non-redundant databaseof common motifs and an ‘alignment priority’ criterionthat depends on the length and the occurrence frequency of thepatterns in the set of sequences. This user-defined criterionallows the determination of the series of the patterns to bealigned. This program is applied to a fragment of envelope geneenv gp120 for 20 isolates of the immunodeficiency virus. Themultiplicity of alignments obtained by modifying the criterionparameters reveals different aspects of similarity between thesequences. Received on June 4, 1990; accepted on December 14, 1990  相似文献   

3.
Upon searching local similarities in long sequences, the necessityof a ‘rapid’ similarity search becomes acute. Quadraticcomplexity of dynamic programming algorithms forces the employmentof filtration methods that allow elimination of the sequenceswith a low similarity level. The paper is devoted to the theoreticalsubstantiations of the filtration method based on the statisticaldistance between texts. The notion of the filtration efficiencyis introduced and the efficiency of several filters is estimated.It is shown that the efficiency of the statistical l-tuple filtrationupon DNA database search is associated with a potential extensionof the original four–letter alphabet and grows exponentiallywith increasing l. The formula that allows one to estimate thefiltration parameters is presented.  相似文献   

4.
Pairwise alignment is one of the most fundamental tools of bioinformaticsand underpins a variety of other, more sophisticated methodsof annotation. Pairwise alignment in its most rigorous formuses a method called ‘dynamic programming’, whichis highly accurate, but also incredibly costly to compute. In order to align anything other than an exact alphabetic match,the algorithm has to know what it is looking for and how itcan evaluate the worth of what it finds. To this end, ‘comparisonmatrices’ have been created which define a score for everypossible match possibility—an effective tally of how wellthe computational alignment is doing. The software will searchfor the highest score available. The final score is relevantonly with its resulting alignment and cannot be used outsidethis context. In the case of DNA, comparison values  相似文献   

5.
Here we present a performance test of a Kohonen features mapapplied to the fast extraction of uncommon sequences from thecoding region of the human insulin receptor gene. We used anetwork with 30 neurons and with a variable input window. Theprogram was aimed at detecting unique or uncommon DNA regionspresent in crude sequence data and was able to automaticallydetect the signal peptide coding regions of a set of human insulinreceptor gene data. The testing of this program with HSIRPRcDNA release (EMBL data bank) indicated the presence of uniquefeatures in the signal peptide coding region. On the basis ofour results this program can automatically detect ‘singularity’from crude sequencing data and it does not require knowledgeof the features to be found. Received on August 27, 1990; accepted on March 14, 1991  相似文献   

6.
We present a new algorithm for the display of RNA secondarystructure. The principle of the algorithm is entirely differentfrom those currently in use in that our algorithm is ‘objectoriented’ while currrent algorithms are ‘procedural’.The circular RNA molecule of chrysanthemum stunt viroid wasused as input data for demonstrating the operation of the program.The major interest of this method will be found in its potentialuse in simulation graphics of RNA folding processes. Received on October 9, 1986; accepted on February 17, 1987  相似文献   

7.
‘The GenBank’* nucleic acid sequence database isa computer-based collection of all published DNA and RNA sequences;it contains over five million bases in close to six thousandsequence entries drawn from four thousand five hundred publishedarticles. Each sequence is accompanied by relevant biologicalannotation. The database is available either on magnetic tape,on floppy diskettes, on-line or in hardcopy form. We discussthe structure of the database, the extent of the data and theimplications of the database for research on nucleic acids.  相似文献   

8.
A program, BIOSITE, providing for the interactive visual comparisonof aligned homologous amino-acid sequences is presented, includingan example of its application. The program allows for two typesof comparison sequence to be generated: an ‘identity’sequence and a ‘difference’ sequence. These maybe used on subsets of sequences and in further comparisons toidentify candidate sites involved in a distinct functional property.The program should prove a useful tool for biologists engagedin understanding sequence—function relationships.  相似文献   

9.
A system for pattern matching applications on biosequences   总被引:5,自引:0,他引:5  
ANREP is a system for finding matches to patterns composed of(i) spacing constraints called ‘spacers’, and (ii)approximate matches to ‘motifs’ that are, recursively,patterns composed of ‘atomic’ symbols. A user specifiessuch patterns via a declarative, free-format and strongly typedlanguage called A that is presented here in a tutorial stylethrough a series of progressively more complex examples. Thesample patterns are for protein and DNA sequences, the applicationdomain for which ANREP wos specifically created. ANREP providesa unified framework for almost all previously proposed biosequencepatterns and extends them by providing approximate matching,a feature heretofore unavailable except for the limited caseof individual sequences. The pemformance of ANREP is discussedand an appendix gives concise specification of syntax and semantics.A portable C softwore package implementing ANREP is availablevia anonymous remote file transfer.  相似文献   

10.
We present a fast algorithm to produce a graphic matrix representationof sequence homology. The algorithm is based on lexicographicalordering of fragments. It preserves most of the options of asimple naive algorithm with a significant increase in speed.This algorithm was the basis for a program, called DNAMAT, thathas been extensively tested during the last three years at theWeizmann Institute of Science and has proven to be very useful.In addition we suggest a way to extend our approach to analysea series of related DNA or RNA sequences, in order to determinecertain common structural features. The analysis is done by‘summing’ a set of dot-matrices to produce an overallmatrix that displays structural elements common to most of thesequences. We give an example of this procedure by analysingtRNA sequences. Received on June 26, 1986; accepted on September 28, 1986  相似文献   

11.
To many biologists, geneticists and bioinformaticians, the excitementof genomics comes from systematic analyses of large amountsof information, such as complete end-to-end DNA sequences, denselypacked genetic markers on chromosomes, and sometimes, comprehensivepopulation genetics history in places like Iceland and Finlandwhere extensive genealogical data are available. Also, realizingthe importance of sample sizes in mapping susceptibility genesin complex and common diseases, various national and internationalconsortiums were established, and meta analyses were frequentlyin use on pooled data. Almost everybody agrees that ‘themore (information), the better’. There are two senses in the word ‘more’ used here.One concerns the search space, and another concerns the samplesize. It is easy to understand why one would like to see  相似文献   

12.
Genome inhomogeneity is determined mainly by WW and SS dinucleotides   总被引:1,自引:0,他引:1  
According to the hypothesis of the modular structure of DNA,genomes consist of modules of various nature which may differin statistical characteristics. Statistical analysis helps inrevealing the differences in statistical characteristics andpredicting the modular structure. In this connection the questionabout the contribution of each word of length l (l-tuple) tothe inhomogeneity of genetic text arises. The notion of stationary(i.e. relatively evenly distributed over a genome) versus non-stationaryl-tuples has been introduced previously. In this paper, thedinucleotide distributions for all long sequences from GenBankwere analyzed and it was shown that non-stationary dinucleotidesare closely associated with polyW and polyS tracts (W denotes‘weak’ nucleotides A or T, while S stands for the‘strong’ nucleotides G or C). Thus, genome inhomogeneityis shown to be determined mainly by AA, TT, GG, CC, AT, TA,GC and CG dinucleotides. It has been demonstrated that neither‘codon usage’ nor the ‘isochore model’can account for this phenomenon.  相似文献   

13.
The effects of the foliar application of phytocidal concentrationsof 2-methyl-4-chlorophenoxyacetic acid (MCPA) on change in totaldry weight, and in ‘available carbohydrate’ (starch,‘total’ and ‘reducing’ sugars), totalnitrogen, phosphorus, potassium, calcium, and magnesium of ‘tops’and roots of tomato plants have been followed over a periodof 14 days following spraying. There were two main treatments—‘nutrient’(nutrient supply to roots continued after spraying) and ‘water’(distilled water only supplied to roots after spraying) and‘water’ (distilled water only supplied to rootsafter spraying)—the sub-treatments consisting of ‘MCPA’versus ‘no-MCPA’ for each of the main treatments.Twelve different times of sampling were used. In analysing the present data, the quantity ‘residualdry weight’ (total dry weight less ‘available carbohydrate’),which was originally introduced by Mason and Maskell as a basisof reference for analyses of plant organs in short-period experimentsnot involving appreciable growth, has been used as an estimateof the permanent structure of plant growth. This new use ofthe ‘residual dry weight’ basis has brought outimportant features which were obscured when the data were leftin their primary form (as percentages of total dry weight oramounts per plant). Growth, as measured by increase in ‘residual dry weight’,was greatly inhibited by 2-methyl-4-chlorophenoxyacetic acidshortly after spraying, in both the presence and the absenceof nutrient. In the presence of 2-methyl-4-chlorophenoxyacetic acid, netassimilation rate (estimated as rate of increase in total dryweight per gram ‘residual dry weight’ of the ‘tops’)was greatly diminished while uptake of total nitrogen and ofP2O5 (estimated as increase in total nitrogen or of P2O5 ofthe whole plant per day per 1 g. ‘residual dry weight’of the roots) appeared to undergo a similar but much smallerdiminution. It seemed probable, however, that in the presenceof MCPA a larger proportion of the carbohydrate actually formedwas utilized for synthesis of aminoacids and protein. In the plant as a whole there was no evidence of actual depletionof ‘available carbohydrate’ as a result of MCPAtreatment, this fraction showing a steady increase in all treatmentsthroughout the experiment. The rate of increase was, however,much reduced by MCPA treatment. The ‘tops’ presentedmuch the same picture as the whole plant, but for the rootsthe situation was quite different. While the roots of the ‘no-MCPA’plants and also of the ‘MCPA-water’ plants showeda steady increase in available carbohydrate, those of the ‘MCPA-nutrient’plants rose only very slightly (from the initial value of 8mg. per plant to about 10 mg.) during the first 2 days, andthen in the next 2 days declined to a value (about 6 mg.) belowthe initial and remained at this low level for the rest of theexperiment. It is suggested that the phytocidal effect of 2-methyl-4-chlorophenoxyaceticacid in the presence of nutrient may be due to depletion ofthe ‘available carbohydrate’ supplies in the roots,which is shown to be brought about, in part, by reduced transportfrom the tops, and partly by the relatively greater utilizationof the carbohydrate present. These results offer an explanationfor the facts that plants showing vigorous growth are more easilykilled by MCPA and that perennial plants, particularly thosewith storage tissues in their roots, are more resistant. Further,they suggest the useful practical application that MCPA treatmentshould be given when the carbohydrate reserves of the rootsare at a minimum. For perennial plants, conditions might beexpected to be optimal for the application of MCPA in late spring,at a time when the first ‘flush’ of growth is slowingdown and before any appreciable new reserves of carbohydratehave been accumulated. It was also shown that 2-methyl-4-chlorophenoxyacetic acid preventedthe net synthesis of starch, but still permitted an appreciablenet formation of sucrose. 2-methyl-4-chlorophenoxyacetic acid appeared to have no effecton the uptake of potassium, calcium, or of magnesium. The lackof effect on potassium is contrasted with the previous observationby Rhodes, Templeman, and Thruston (1950) that sub-lethal concentrationsof MCPA, applied over a relatively long period to the rootsof tomato plants, specifically depressed the uptake of potassium.  相似文献   

14.
Recent molecular systematic and developmental genetic findingshave drawn attention to plant morphology as a discipline dealingwith the phenotypic appearance of plant forms. However, sincedifferent terms and conceptual frameworks have evolved overa period of more than 200 years, it is reasonable to surveythe history of plant morphology; this is the first of two paperswith this aim. The present paper deals with the historic conceptsof Troll, Zimmermann and Arber, which are based on Goethe'smorphology. Included are contrasting views of ‘unity anddiversity’, ‘position and process’, and ‘morphologyand phylogeny’, which, in part, are basic views of currentplant morphology, phylogenetic systematics and developmentalgenetics. Wilhelm Troll established the ‘type concept’and the ‘principle of variable proportions’. Hehas provided the most comprehensive overview of the positionalrelations of plant forms. Agnes Arber started from the universaldynamics of life and attempted to describe all structures asprocesses. She paid attention to ‘repetitive branching’,‘differential growth’, and ‘parallelism’.As a result she has recently been rediscovered by developmentalbotanists. Walter Zimmermann rejected any metaphysical influenceon plant form and instead called for objective procedures. Hewas mainly interested in phylogenetic ‘character transformation’and the ‘reconstruction of genealogical lines’.Guided by the example of flower-like inflorescences, a futurepaper will deal with functional and developmental constraintsinfluencing plant forms. Recent morphological concepts (‘trialectical’,‘continuum’/‘fuzzy’, ‘processmorphology’) will be discussed and related to currentmorphological and developmental genetic research. Copyright2001 Annals of Botany Company Plant form, plant morphology, natural philosophy, homology, phylogeny, Goethe, Troll, Arber, Zimmermann, typology, character transformation, differential growth, complementarity  相似文献   

15.
An improved sequence handling package that runs on the Apple Macintosh   总被引:4,自引:0,他引:4  
We report improvements to our sequence analysis package andadaptation to run on the Apple Macintosh range of machines.The ‘standard’ version of the programs, which runon a VAX, has been given a new user interface that makes theprograms very much easier to work with and has facilitated themove to the Macintosh. The reorganization of the code shouldsimplify moves to other systems that offer WIMP user interfaces.In addition to a large number of small but useful extra features,some important new analytical functions have been devised. Theseinclude sequence and contig editors; optimal alignment and comparisonmethods; and a new method for comparing the observed and expectedfrequencies of selected oligonucleotides. Received on February 12, 1990; accepted on April 19, 1990  相似文献   

16.
HARDWICK  R. C. 《Annals of botany》1987,60(4):439-446
The ‘core-skin’ hypothesis postulates that secondarilythickened plants behave energetically as an inert ‘core’covered by an active ‘skin’, the ‘skin’being two-imensional, the ‘core’ three-dimensional.This would explain the ‘self-thinning ‘or‘–3/2’ rule of plant ecology, that is, the tendencyfor log (dry weight per plant) and log (number of plants perunit area) to progress along a straight line relationship, withslope = – 3/2’. The hypothesis was tested as follows. Plant nitrogen contentwas used as an estimate of the mass of ‘skin’ perplant, and dry weight as an estimate of the mass of the ‘core’.As plants mature the slope of the relationship between y = log(mass of nitrogen per plant) and x = log (mass of dry matterper plant) is expected to decline from an initial value of 1.0towards a final value of 0.66. The intercept of the relationshipis expected to reflect the intrinsic content of ‘skin’per unit of ‘core’. Genotypic variation in thisparameter should cause genotypic differences in the maximumattainable yield of biomass per unit area. The expectations were investigated by fitting the function y= p+qx+r exp – x to 30 sets of data on plant nitrogencontent, plant weight and time in 18 different vegetables. Simplelinear regressions of y on x were fitted to more limited setsof data on weights and nitrogen contents of mature trees. Theexpectations were, with some minor exceptions, confirmed. Nitrogen, yield, plant competition, self-thinning  相似文献   

17.
18.
Fiskeby V soya bean was grown from seed germination to seedmaturation with two contrasting patterns of nitrogen metabolism:either wholly dependent on dinitrogen fixation, or with an abundantsupply of nitrate nitrogen, but lacking root nodules. The carbonand nitrogen economies of the plants were assessed at frequentintervals by measurements of photosynthesis, shoot and rootrespiration, and organic and inorganic nitrogen contents. Plantsfixing atmospheric nitrogen assimilated only 25–30 percent as much nitrogen as equivalent plants given nitrate nitrogen:c. 40 per cent of the nitrogen of ‘nitrate’ plantswas assimilated after dinitrogen fixation had ceased in ‘nodulated’plants. The rates of photosynthesis and respiration of the shootsof soya bean were not markedly affected by source of nitrogen;in contrast, the roots of ‘nodulated’ plants respiredtwice as rapidly during intense dinitrogen fixation as thoseof ‘nitrate’ plants. The magnitude of this respiratoryburden was calculated to increase the daily whole-plant respiratory loss of assimilate by 10–15 per cent over thatof plants receiving abundant nitrate. It is concluded that ‘nodulated’plants grew more slowly than ‘nitrate’ plants inthese experiments for at least two reasons: firstly, the symbioticassociation fixed insufficient nitrogen for optimum growth and,secondly, the assimila tion of the nitrogen which was fixedin the root nodules was more energy-demanding in terms of assimilatethan that of plants which assimilated nitrogen by reducing nitratein their leaves.  相似文献   

19.
A new analytical method has been used to examine the set of40 exon/intron boundaries within the rat embryonic myosin heavychain (MHCemb) gene. It has also been applied to an additionalset of 850 splice sequences selected from GenBank. Strong evidenceis obtained for the involvement of 3' ends but not 5' ends ofexon sequences in splice site recognition. It can be determinedthat signal sequences of 5' intron ends concentrate near thesplice borders, while the distributions of the 3' intron endshave a diffuse character. The possibility of re-interpretingsome known features, in terms of the absence of certain elementsrather than the presence of elements forming sequence determinants,is discussed. The analysis undertaken enabled us to work outa more detailed set of recognition sequence requirements forthe splicing of nuclear pre-mRNA. In addition to requirementswhich have already been established we suggest the following:the ‘AG-absence’ in the immediate 3' terminal intronsequences; and a minimal match between a particular sequenceand the known exon/intron consensus sequence of 5' splice junctions. Received on March 22, 1988; accepted on November 19, 1988  相似文献   

20.
A program was written in GFA-BASIC for the Atari ST microcomputeraimed at drawing two-dimensional homology ‘dotplot’patterns for two protein or DNA sequences. The program, builtaround a machine-code subroutine, communicates interactivelywith the user by means of a multi-button dialogue panel andmouse-directed input. A 1000 x 1000 sequence comparison witha 14: 21 stringency window takes 12 s.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号