首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
MOTIVATION: Homologous sequences are sometimes similar over some regions but different over other regions. Homologous sequences have a much lower global similarity if the different regions are much longer than the similar regions. RESULTS: We present a generalized global alignment algorithm for comparing sequences with intermittent similarities, an ordered list of similar regions separated by different regions. A generalized global alignment model is defined to handle sequences with intermittent similarities. A dynamic programming algorithm is designed to compute an optimal general alignment in time proportional to the product of sequence lengths and in space proportional to the sum of sequence lengths. The algorithm is implemented as a computer program named GAP3 (Global Alignment Program Version 3). The generalized global alignment model is validated by experimental results produced with GAP3 on both DNA and protein sequences. The GAP3 program extends the ability of standard global alignment programs to recognize homologous sequences of lower similarity. AVAILABILITY: The GAP3 program is freely available for academic use at http://bioinformatics.iastate.edu/aat/align/align.html.  相似文献   

2.
Pattern matching of biological sequences with limited storage   总被引:1,自引:0,他引:1  
Existing methods for getting the locally best matched alignmentsbetween a pair of biological sequences require O(N2) computationalsteps and O(N2) storage, where N is the average sequence length.An improved method is presented with which the storage requirementis greatly reduced, while the computational steps remain O(N2).Only a small number of additional steps are required to displayany common sub–sequences with similarity scores greaterthan a given threshold. The aligments found by the algorithmare optimal in the sense that their scores are locally maximal,where each score is a sum of weights given to individual matches/replacements,insertions and deletions involved in the alignment. The algorithmwas implemented in C programming language on a personal computer.Data area of 64 kbytes on random access memory and a few hundredkbytes on a disk is sufficient for comparing two protein ornucleic acid sequences of 2500 residues. The programs are particularlyvaluable when used in combination with fast sequence searchprograms. Received on July 25, 1986; accepted on October 27, 1986  相似文献   

3.
A new approach to search for common patterns in many sequencesis presented. The idea is that one sequence from the set ofsequences to be compared is considered as a ‘basic’one and all its similarities with other sequences are found.Multiple similarities are then reconstructed using these data.This approach allows one to search for similar segments whichcan differ in both substitutions and deletions/insertions. Thesesegments can be situated at different positions in various sequences.No regions of complete or strong similarity within the segmentsare required. The other parts of the sequences can have no similarityat all. The only requirement is that the similar segments canbe found in all the sequences (or in the majority of them, giventhe common segments are present in the basic sequence). Workingtime of an algorithm presented is proportional to n.L2when nsequences of length L are analyzed. The algorithm proposed isimplemented as programs for the IBM-PC and IBM/370. Its applicationsto the analysis of biopolymer primary structures as well asthe dependence of the results on the choice of basic sequenceare discussed.  相似文献   

4.
The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximum- likelihood principle, which clearly satisfies these requirements. The core of this method is a simple hill-climbing algorithm that adjusts tree topology and branch lengths simultaneously. This algorithm starts from an initial tree built by a fast distance-based method and modifies this tree to improve its likelihood at each iteration. Due to this simultaneous adjustment of the topology and branch lengths, only a few iterations are sufficient to reach an optimum. We used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximum-likelihood programs and much higher than the performance of distance-based and parsimony approaches. The reduction of computing time is dramatic in comparison with other maximum-likelihood packages, while the likelihood maximization ability tends to be higher. For example, only 12 min were required on a standard personal computer to analyze a data set consisting of 500 rbcL sequences with 1,428 base pairs from plant plastids, thus reaching a speed of the same order as some popular distance-based and parsimony algorithms. This new method is implemented in the PHYML program, which is freely available on our web page: http://www.lirmm.fr/w3ifa/MAAS/.  相似文献   

5.
A suite of tests to evaluate the statistical significance of protein sequence similarities is developed for use in data bank searches. The tests are based on the Wilbur-Lipman word-search algorithm, and take into account the sequence lengths and compositions, and optionally the weighting of amino acid matches. The method is extended to allow for the existence of a sequence insertion/deletion within the region of similarity. The accuracy of statistical distributions underlying the tests is validated using randomly generated sequences and real sequences selected at random from the data banks. A computer program to perform the tests is briefly described.  相似文献   

6.
Except at high numerical densities (>106 m-3, nearest-neighbordistances in monospecific aggregations of copepods are muchgreater than known perception distances (usually <2–4body lengths). Most copepod aggregations are several ordersof magnitude less dense, with nearest-neighbor distances muchgreater than three body lengths. Biological mechanisms for theformation and maintenance of aggregation require recognitionof, and appropriate behavioral response to, conspecifics. Whileit appears obvious that such mechanisms are responsible, invokingthem to explain most aggregations has little support from eitherlaboratory/field evidence or theory. More research is neededon zooplankton behavior, sensory modalities and learning toreconcile the difference in scales between known perceptiondistances and spacing in aggregations.  相似文献   

7.
Abstract

A heptanucleotide sequence d(TATCACC)2 from OR3 region of bacteriophage X is considered sufficient for the recognition of Cro protein. We present here results on molecular dynamic simulations on this sequence for 100 ps in 0.02 ps interval. The simulations are done using computer program GROMOS. The conformational results are averaged over each ps. The IUPAC torsional parameters for 100 conformations are illustrated using a wheal and a dial systems. Several other stereochemical parameters such as H-bonding lengths and angles, sugar puckers, helix twist and roll angles as also distances between opposite strand phosphorus are depicted graphically. We find that there is rupture of terminal H-bonds. The bases are tilted and shifted away from the helix axis giving rise to bifurcated H-bonds. H- bonds are seen even in between different base pairs. The role of these dynamic structural changes in the recognition of OR3 operator by Cro protein is discussed in the paper.  相似文献   

8.
《Mathematical biosciences》1987,83(2):157-165
Frequently there is a need to determine lengths associated with each edge of a phylogenetic tree, as these are often used as an indication of relative time intervals. Where this tree has been constructed from sequence data of r characters for n taxa, using the maximum parsimony model, an edge length can be determined from the differences between the inferred sequences of the end vertices of that edge. These inferred sequences are often not uniquely defined; a range of possible sequences are possible at a given internal vertex. In this paper we introduce an efficient [O(r×n)] algorithm which calculates the range of lengths on any edge over all the minimal labelings and significantly reduces the number of potential cases to be considered to obtain an objective measure of edge length.  相似文献   

9.
Existing algorithms for finding restriction endonuclease recognitionsites use brute-force algorithms which run in time 0(NM) whereN is the number of nucleotides in the sequence under analysisand M is the total number of nucleotides in all the differentsites being searched for. This paper presents a deterministicfinite state machine algorithm which runs in time 0(N). Memoryuse can be as high as 0(M4) but a slight modification to thebasic algorithm can impose a theoretical upper bound of 0(M)at the cost of some added complexity in the execution of thestate machine. The algorithm can operate with a single passthrough the sequence under analysis, with no need to back upor (for non-circular sequences) store more than a single inputcharacter at a time. This type of algorithm can be adapted tomany pattern-matching tasks and is simple enough to implementin hardware that it could, for example, be built into a diskcontroller as part of a specialized database machine. Received on April 14, 1988; accepted on June 16, 1988  相似文献   

10.
E. Betran  J. Rozas  A. Navarro    A. Barbadilla 《Genetics》1997,146(1):89-99
DNA sequence variation studies report the transfer of small segments of DNA among different sequences caused by gene conversion events. Here, we provide an algorithm to detect gene conversion tracts and a statistical model to estimate the number and the length distribution of conversion tracts for population DNA sequence data. Two length distributions are defined in the model: (1) that of the observed tract lengths and (2) that of the true tract lengths. If the latter follows a geometric distribution, the relationship between both distributions depends on two basic parameters: ψ, which measures the probability of detecting a converted site, and , the parameter of the geometric distribution, from which the average true tract length, 1/(1 - ), can be estimated. Expressions are provided for estimating by the method of the moments and that of the maximum likelihood. The robustness of the model is examined by computer simulation. The present methods have been applied to the published rp49 sequences of Drosophila subobscura. Maximum likelihood estimate of for this data set is 0.9918, which represents an average conversion tract length of 122 bp. Only a small percentage of extant conversion events is detected.  相似文献   

11.
Mapping the order of DNA restriction fragments   总被引:3,自引:0,他引:3  
W M Fitch  T F Smith  W W Ralph 《Gene》1983,22(1):19-29
A straightforward method was designed for mapping the order of DNA restriction fragments obtained by a double and two single digestions, without the necessity of using a computer or a radioactive label. All possible solutions compatible with a pre-set level of error in the determination of sequence lengths are obtained. The primary assumptions are given, and the appropriate modifications of the algorithm are presented as a function of any assumptions one is unable (or unwilling) to make. Use of the method in connection with end-labeled fragments is also described.  相似文献   

12.
Abstract

An algorithm is described for generation of the long sequence written in a four letter alphabet from the constituent k-tuple words in the minimal number of separate, randomly defined fragments of the starting sequence. It is primarily intended for use in sequencing by hybridization (SBH) process- a potential method for sequencing human genome DNA (Drmanac et al., Genomics 4, pp. 114–128, 1989). The algorithm is based on the formerly defined rules and informative entities of the linear sequence.

The algorithm requires neither knowledge on the number of appearances of a given k-tuple in sequence fragments, nor the information on which k-tuple words are on the ends of a fragment. It operates with the mixed content of k-tuples of the various lengths. The concept of the algorithm enables operations with the k-tuple sets containing false positive and false negative k-tuples. The content of the false k-tuples primarily affects the completeness of the generated sequence, and its correctness in the specific cases only. The algorithm can be used for the optimization of SBH parameters in the simulation experiments, as well as for the sequence generation in the real SBH experiments on the genomic DNA.  相似文献   

13.
14.
Cardiac myofibrilsisolated from trout heart have been demonstrated to have a highersensitivity for Ca2+ than mammalian cardiac myofibrils.Using cardiac troponin C (cTnC) cloned from trout and mammalian hearts,we have previously demonstrated that this comparatively highCa2+ sensitivity is due, in part, to trout cTnC (ScTnC)having twice the Ca2+ affinity of mammalian cTnC (McTnC)over a broad range of temperatures. The amino acid sequence of ScTnC is92% identical to McTnC. To determine the residues responsible for thehigh Ca2+ affinity, the function of a number of ScTnC andMcTnC mutants was characterized by monitoring an intrinsic fluorescentreporter that monitors Ca2+ binding to site II (F27W). Theremoval of the COOH terminus (amino acids 90-161) from ScTnC andMcTnC maintained the difference in Ca2+ affinity betweenthe truncated cTnC isoforms (ScNTnC and McNTnC). The replacement ofGln29 and Asp30 in ScNTnC with thecorresponding residues from McNTnC, Leu and Gly, respectively, reducedCa2+ affinity to that of McNTnC. These results demonstratethat Gln29 and Asp30 in ScTnC are required forthe high Ca2+ affinity of site II.

  相似文献   

15.
The mechanisms underlying the observed acceleration of monooxygenationreactions in two-tank accelerator/aerator suspended growth system are evaluatedin detail. The accelerator tank is characterized by a very high electron flow throughreduced nicotinamide adenine dinucleotide (NADH + H+), particularly when the retention-time ratio is small. Only a small fraction of the electron flow wasdiverted to oxygenation reactions, and the major sinks of NADH + H+ were respiration and biomass synthesis. The main producer of NADH + H+ is oxidation of acetate, a rapidly degraded electron-donor substrate. The half-maximum-rate concentration for oxygen used in respiration was 0.03 mg/L, while the half-maximum-rate concentration for oxygen used as a cosubstrate in monooxygenation was 0.18 mg/L. Thus, monooxygenations were more sensitive to oxygen limitation than was respiration. The NADH + H+ concentration had a direct effect on the monooxygenation kinetics. Therate coefficients for both monooxygenation reactions were directly proportional to thespecific growth rate in the accelerator, which supports that the accelerator tank causedan up-regulation of the monooxygenase content. Because the rate coefficients in theaerator tank were much larger than in the one-tank system, even though the specificgrowth rates were nearly the same, monooxygenases may have carried over from theaccelerator tank to the aerator tank. Its higher concentration of 2,4-dichlorophenol(2,4-DCP) and the higher specific growth rate were the main reasons why the accelerator had faster kinetics for 2,4-DCP utilization than did the aerator tank. The apparently higher levels of monooxygenase in both tanks of the two-tank system also appears be a primary reason why its performance was substantially superior to that of the one-tank system in terms of 2,4-DCP removal.  相似文献   

16.
MOTIVATION: Recently, the concept of the constrained sequence alignment was proposed to incorporate the knowledge of biologists about structures/functionalities/consensuses of their datasets into sequence alignment such that the user-specified residues/nucleotides are aligned together in the computed alignment. The currently developed programs use the so-called progressive approach to efficiently obtain a constrained alignment of several sequences. However, the kernels of these programs, the dynamic programming algorithms for computing an optimal constrained alignment between two sequences, run in (gamman2) memory, where gamma is the number of the constraints and n is the maximum of the lengths of sequences. As a result, such a high memory requirement limits the overall programs to align short sequences only. RESULTS: We adopt the divide-and-conquer approach to design a memory-efficient algorithm for computing an optimal constrained alignment between two sequences, which greatly reduces the memory requirement of the dynamic programming approaches at the expense of a small constant factor in CPU time. This new algorithm consumes only O(alphan) space, where alpha is the sum of the lengths of constraints and usually alpha < n in practical applications. Based on this algorithm, we have developed a memory-efficient tool for multiple sequence alignment with constraints. AVAILABILITY: http://genome.life.nctu.edu.tw/MUSICME.  相似文献   

17.
The Gulf of Carpentaria is a large (ca. 3.7 x 105 km2) shallow(<70 m) embayment in tropical northern Australia lying between11 and 17.5°S latitude. Although it contains a multi-speciespenaeid prawn fishery which is Australia's largest and mostvaluable fishery its hydrology and planktology are largely unknown.As a background to a study of the larval ecology of penaeidstocks, ten Gulf-wide survey cruises, sampling the planktonand hydrography, were undertaken over a twenty month periodfrom August 1975 to May 1977. Though comparisons with otherstudies are difficult because of variations in sampling techniquesand biomass estimation methods, the plankton biomass in theGulf of Carpentaria appears to be high by comparison with otherareas around Australia. The mean estimate over all stationsand all cruises of 77 mg/m3 dry weight (1880 mg/m2) compareswith the very high abundances found only in seasonal upwellingareas south of Java and off the northwest shelf of Australia.Further, the Gulf of Carpentaria standing stocks of planktoncompare with other coastal areas supporting important fisheriesoff the west coast of North America, the eastern North AtlanticOcean and some European waters. Because of its depth, relativelyhigh temperature and primary production rates, secondary productionrates are assumed to be high as well but as yet are unmeasured. *Microfiche of station list available upon request. CSIRO MarineLaboratories Reprint No. 1280  相似文献   

18.
通过有机磷杀虫剂毒死蜱与生物源农药阿维菌素混配制剂对美洲斑潜蝇Liriomyza sativae室内毒力实验,测定共毒系数为165~234,处于明显增效范围内。据此确定最佳配比和次佳配比,配制该增效混剂30%渗透型可湿性粉剂-1和-02,在山东防治美洲斑潜蝇幼虫的田间试验表明药效优良。制剂用量50 g/667m2药后3,7,11天,两可湿粉的校正防效分别为90.43%~91.71%和87.09%~90.53%,可湿粉-1用量25 g/667m2防效为85.96%~88.28%,可湿粉-2用量37.5 g/667m2相应校正防效为84.01%~85.38%,两增效混剂防治斑潜蝇速效性和持效性皆佳,成本有所下降,且对南美洲斑潜蝇L. huidobrensis亦有较好防效。使用可湿粉与乳油相比较可减少投放入环境的化学品数量。  相似文献   

19.
By applying a vacuum to stem segments of poplar, simultaneousdeterminations of the flow velocity were made using a convectedheat-pulse and a radioisotope (32P). A regression equation v= 7.225 x 10–2 u—4.329 x 10–5u2 (heat-pulsevelocity v on radioisotope velocity u) was fitted. This gavean average ratio u/v of c. 20 over the heat-pulse velocity range0—20 cm h–1 with incremental ratios, du/dv, of 14and 49 for values of v of 0 and 600 cm h–1. Using Marshall'stheoretical relationship between u and v, and taking into accountthe percentage of vessels involved in the flow, it was possibleto derive the value of the ratio u/v over the range 0–20cm h–1 for v, and thus verify the theory. Increasing valuesof u/v over 20 cm h–1 are attributed to lack of thermalhomogencity. Attention is drawn to the necessity to distinguishbetween the total lumen area and the percentage of vessels involvedin the flow.  相似文献   

20.
11CO2 was offered to leaves of sunflower, corn and ryegrassand 13N2 to root nodules of alfalfa and alder. Movement of thetracers out of the feed region was monitored along stems orpetioles using geiger tubes. Fluctuations in radioactivity werenot detected as statistically significant from random tracerdecay in the ‘background’ section of the time-activityprofiles before mass-flow commenced, but became highly significantin the mass-flow sections. These pulses of radioactivity couldbe followed from one detector to the next over 1–3 cmand were analysed for periodicity by cross-correlation and auto-correlationcomputer programmes. Periodicity was only rarely detected in11C runs, but was evident in many 13N experiments. Speed ofpulse movement (microfronts) were measured, both visually andby computer cross-correlations, and compared with rates foundby the ‘moving intercept’ of mass flow. Microfrontspeeds were faster. Speeds of 11C movement were comparable withthose reported for phloem, but 13N movements were often muchhigher, suggesting xylem movements. Fine structure pulses indicatethat movements of 11C photosynthate or 13N compounds are rapid,erratic and far more complex than expected by a simple Münchpressure flow mechanism. Key words: 11C, 13N, plant stems, radioactive pulse  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号