首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
It has been observed that in homology search gapped seeds have better sensitivity than ungapped ones for the same cost (weight). In this paper, we propose a probability leakage model (a dissipative Markov system) to elucidate the mechanism that confers power to spaced seeds. Based on this model, we identify desirable features of gapped search seeds and formulate an extremely efficient procedure for seed design: it samples from the set of spaced seed exhibiting those features, evaluates their sensitivity, and then selects the best. The sensitivity of the constructed seeds is negligibly less than that of the corresponding known optimal seeds. While the challenging mathematical question of characterizing optimal search seeds remains open, we believe that our eminently efficient and effective approach represents a satisfactory solution from a practitioner's viewpoint.  相似文献   

2.
We are interested in detecting homologous genomic DNA sequences with the goal of locating approximate inverted, interspersed, and tandem repeats. Standard search techniques start by detecting small matching parts, called seeds, between a query sequence and database sequences. Contiguous seed models have existed for many years. Recently, spaced seeds were shown to be more sensitive than contiguous seeds without increasing the random hit rate. To determine the superiority of one seed model over another, a model of homologous sequence alignment must be chosen. Previous studies evaluating spaced and contiguous seeds have assumed that matches and mismatches occur within these alignments, but not insertions and deletions (indels). This is perhaps appropriate when searching for protein coding sequences (<5% of the human genome), but is inappropriate when looking for repeats in the majority of genomic sequence where indels are common. In this paper, we assume a model of homologous sequence alignment which includes indels and we describe a new seed model, called indel seeds, which explicitly allows indels. We present a waiting time formula for computing the sensitivity of an indel seed and show that indel seeds significantly outperform contiguous and spaced seeds when homologies include indels. We discuss the practical aspect of using indel seeds and finally we present results from a search for inverted repeats in the dog genome using both indel and spaced seeds.  相似文献   

3.
The Goulden-Jackson cluster method is a powerful method to calculate the probability of occurrences of a pattern or set of patterns in a sequence. If the patterns contain wildcard characters, however, the size of the connector matrix grows exponentially with the number of wildcards. Here we show that average correlation c(z) is a good predicator of hitting probability q (n), and the generalized correlation function ?(z) can be used to approximate c(z) efficiently.We apply the method to the problem of optimal multiple spaced seed selection for homology search. We reexamine the concept of optimal sensitivity of spaced seeds and show that it is better to select optimal seeds based on some average properties, such as c(1), which is the expectation of the first hitting length. Higher order approximations can also be constructed easily. Tests on arbitrary large genomic data with multiple seeds show that the optimal multiple seeds selected by the methods are indeed more sensitive. The methods provide a theoretical background on which various empirical observations can be unified and further heuristic search methods can be developed.  相似文献   

4.
As the demand for accurately aligning gene sequences to the genome of a related species grows with the sequencing of new genomes, spaced seeds emerge as a promising vehicle for increasing alignment sensitivity. We extend the existing {0, 1} match-mismatch models for sensitivity evaluation to take into account the compositional structure of coding sequences and ultimately produce seeds better suited to this particular application. Designing seeds for alignment programs, however, needs to balance sensitivity and specificity.We assess the effects of seed variations on both sensitivity and specificity in an extended model that incorporates transitions and differentiates among the three codon positions, and show that spaced seeds with transitions offer a better sensitivity-specificity tradeoff. Furthermore, we propose a theoretical formulation for rigorously assessing seed specificity, starting from Bernoulli and Markov models of the mRNA and genomic sequences. Within this framework, we perform the first comprehensive analysis of seeds to serve as a blueprint for selecting sensitive and specific seeds for practical applications. Our analyses show that specificity is relatively constant for seeds of a given weight, while sensitivity varies widely, with the highest values attained by seeds allowing a small (2-6) number of transitions.A strategy for designing seeds, therefore, is to first select the weight of the seed by identifying the desired sensitivity-specificity tradeoff, then choose the most sensitive seed(s) within that weight group. We illustrate our methods with the alignment of chicken coding sequences against the human genome assembly version HG17.  相似文献   

5.
Pods and seeds of field-collected Baptisia lanceolala plants were analyzed to partition seed weight and seed packaging trait variance among and within plants and to detect relationships between these traits. Packaging traits studied were: pod weight, seed weight per pod, number of seeds per pod, mean weight of seeds per pod, proportion seed weight of total pod weight, and pod weight per seed. Significant among-plant variation was found for seed weight and all packaging traits. Within plants, positive correlations were found between number of seeds per pod and pod dry weight and between the proportion seed weight of total pod weight and number of seeds per pod. Pod weight per seed was negatively correlated with number of seeds per pod. Most plants had a negative correlation between mean seed weight and number of seeds per pod. When compared with an equality of slopes test, slopes of regressions of the above pairs of traits were found to differ among plants. Among plants, the same relationships were found, except for the latter two traits, which were not correlated. These within-plant patterns may represent constraints on seed weight variance imposed by the seed package. This view is supported by a positive correlation between packaging trait variance and seed weight variance. Packaging-related constraints could have an effect on seed weight in this and other species. If these phenotypic constraints have a genetic basis, then selection on seed packaging could change seed weight in a way different from that which might be predicted by considering seed weight alone.  相似文献   

6.
MOTIVATION: Homology search finds similar segments between two biological sequences, such as DNA or protein sequences. The introduction of optimal spaced seeds in PatternHunter has increased both the sensitivity and the speed of homology search, and it has been adopted by many alignment programs such as BLAST. With the further improvement provided by multiple spaced seeds in PatternHunterII, Smith-Waterman sensitivity is approached at BLASTn speed. However, computing optimal multiple spaced seeds was proved to be NP-hard and current heuristic algorithms are all very slow (exponential). RESULTS: We give a simple algorithm which computes good multiple seeds in polynomial time. Due to a completely different approach, the difference with respect to the previous methods is dramatic. The multiple spaced seed of PatternHunterII, with 16 weight 11 seeds, was computed in 12 days. It takes us 17 s to find a better one. Our approach changes the way of looking at multiple spaced seeds.  相似文献   

7.
《Acta Oecologica》1999,20(1):61-66
Variation in seed size is common both within and among plant species. This study examined within-species variation in seed weight, and its implications for some components of fitness in the clonal herb Convallaria majalis. This species produces berries containing 4.1 seeds on average. The average seed weight was 16.5 mg, with a coefficient of variation of 32.7%. Seed packaging in fruits was on average 12.5%, and showed a slight tendency to increase with fruit weight. A trade-off was found between seed weight and seed number both within fruits and within ramets. The probability and timing of germination was not influenced by seed size. A field experiment and indirect evidence suggested that post-dispersal seed predation was not related to seed size. Increasing seed weight conferred an advantage to developing seedlings. This advantage was enhanced if a seedling was growing in the close vicinity of a seedling of another species. It is suggested that seed size variation in C. majalis primarily is the result of resource variation during fruit development. A conflict between parents and offspring may however contribute to increase seed size variation.  相似文献   

8.
Optimal spaced seeds were developed as a method to increase sensitivity of local alignment programs similar to BLASTN. Such seeds have been used before in the program PatternHunter, and have given improved sensitivity and running time relative to BLASTN in genome-genome comparison. We study the problem of computing optimal spaced seeds for detecting homologous coding regions in unannotated genomic sequences. By using well-chosen seeds, we are able to improve the sensitivity of coding sequence alignment over that of TBLASTX, while keeping runtime comparable to BLASTN. We identify good seeds by first giving effective hidden Markov models of conservation in alignments of homologous coding regions. We give an efficient algorithm to compute the optimal spaced seed when conservation patterns are generated by these models. Our results offer the hope of improved gene finding due to fewer missed exons in DNA/DNA comparison, and more effective homology search in general, and may have applications outside of bioinformatics.  相似文献   

9.
The notion of the probability of back mutation is introduced and the method of probability generating functions is used in order to simplify and unify the Holmquist's (1972) investigation of the effect of multiple hits on nucleotide differences between homologous DNAs. We obtain explicit expressions for the distribution of the number of hit nucleotide sites, the number of altered sites, and the number of differences between two homologous DNAs, as functions of the total number of hits. For the case where the hit rate is a known function of time, we derive a formula for extending these results as functions of time.  相似文献   

10.
SUMMARY: To annotate newly sequenced organisms, cross-species sequence comparison algorithms can be applied to align gene sequences to the genome of a related species. To improve the accuracy of alignment, spaced seeds must be optimized for each comparison. As the number and diversity of genomes increase, an efficient alternative is to cluster pairwise comparisons into groups and identify seeds for groups instead of individual comparisons. Here we investigate a measure of comparison closeness and identify classes of comparisons that show similar seed behavior and therefore can employ the same seed. AVAILABILITY: Source code is freely available at http://dna.cs.gwu.edu and from Bioinformatics online.  相似文献   

11.
濒危植物掌叶木种子生态特征   总被引:14,自引:1,他引:14  
重点研究了贵州茂兰国家级自然保护区喀斯特森林国家一级保护树种掌叶木(Handeliodendron bodinieri)母树的种子生态特征。单因素方差分析表明在同株母树的不同果序球果量和不同母树的球果出种量均未显示明显差异,而不同母树果序出球果量、种子重量分异具有权显著差异,这反应了掌叶木母树的有性生殖基本同步,球果的发育程度比较接近。掌叶木种子干粒重为201.66g,种子的平均重量为0.20g,占乔木树种平均重量的61%;种子含空粒比例高达8.1%;饱满的种子中具有生活力者占78%,反映出掌叶木种子有较高的败育率;种子的平均含水量为13.56%,自然散播的种子萌发率仅为1%,种子在自然环境下受松鼠、鼠和金龟子类等动物的破坏率高达98.25%。在同样贮藏条件下,播种环境对种子的发芽率影响显著,质地疏松、透气性好或火烧地有利于掌叶木种子的萌发,其种子萌发率可达84.98%。  相似文献   

12.
Seed number per pod in pea is variable. After fertilization,any number of ovules may abort in the pod. This paper describesan analysis of the relationship between pod growth and seedabortion rate. Pod dry weight was highly correlated to pod lengthbefore the final stage in seed abortion. By measuring pod lengthduring the period of seed formation and counting the seeds inthe same pods at the end of this period, we show that seed numberper pod was correlated with early pod elongation. From thesedata we propose and test a model for predicting seed numberper pod from early pod elongation rates. Key words: Pea, pod length, pod growth, seed number, modelling  相似文献   

13.
The tropical tree, Lonchocarpus pentaphyllus (Poir.) DC. (Leguminosae-Papilionoideae), matures indehiscent wind-dispersed fruits containing 0–4 seeds. Most fruits are one-seeded (82%) while less than 2% are three-seeded. An increase in seed number per fruit correlates with increases in four characteristics expected to affect dispersal distance under field conditions: fruit weight, fruit area, square root of wing-loading, and rate of descent in still air. The dry weight of a seed decreases with an increase in seed number per fruit. Under field conditions nearly 40% of the mature fruits fall within the radius of the tree crown. Fruits with more intact seeds are dispersed shorter distances; fruits with no developed seeds travel the farthest. Among one-seeded fruits dispersed beyond the crown radius, dispersal distance is inversely proportional to the square root of wing-loading. The weight of seed in these one-seeded fruits, however, is independent of dispersal distance. Fruits with more seeds have a higher proportion of underdeveloped seeds. However, a greater proportion of two- and three-seeded fruits have at least one intact mature seed than do one-seeded fruits. This comparative study illustrates that changes in fruit morphology and weight associated with different numbers of seeds per fruit affect dispersal properties as well. A decrease in seed number per fruit increases both seed weight and dispersal distance, but it decreases the probability that a given dispersal event results in movement of an intact seed.  相似文献   

14.
Timothy G. Laman 《Oecologia》1996,107(3):347-355
Due to their copious seed production and numerous dispersers, rain forest fig trees have been assumed to produce extensive and dense seed shadows. To test this idea, patterns of seed dispersal of two species of large hemiepiphytic fig tree were measured in a Bornean rain forest. The sample included four Ficus stupenda and three F. subtecta trees with crop sizes ranging from 2,000 to 40,000 figs (400,000 to 13,000,000 seeds). Seed rain out to a distance of 60 m from each study tree was quantified using arrays of seed traps deployed in the understory. These trees showed a strongly leptokurtic pattern of dispersal, as expected, but all individuals had measurable seed rain at 60 m, ranging from 0.2 to 5.0 seeds/m2. A regression of In-transformed seed rain density against distance gave a significant fit to all seven trees' dispersal patterns, indicating that the data could be fitted to the negative exponential distribution most commonly fitted to seed shadows. However, for six of seven trees, an improved fit was obtained for regressions in which distance was also In-transformed. This transformation corresponds to an inverse power distribution, indicating that for vertebrate-dispersed Ficus seeds, the tail of the seed rain distribution does not drop off as rapidly as in the exponential distribution typically associated with wind dispersed seed shadows. Over 50% of the seed crop was estimated to fall below each fig tree's crown. Up to 22% of the seed crop was dispersed beyond the crown edge, but within 60 m of the tree. Estimates of the maximum numbers of seeds which could have been transported beyond 60 m were 45% for the two largest crops of figs, but were under 24% for the trees with smaller crops. Seed traps positioned where they had an upper canopy layer above them were associated with higher probabilities of being hit by seeds, suggesting that vertebrate dispersal agents are likely to perch or travel through forest layers at the same level as the fig crown and could concentrate seeds in such areas to some degree. The probability of a safe site at 60 m from the fig tree being hit by seeds is calculated to be on the order of 0.01 per fruiting episode. Fig trees do not appear to saturate safe sites with seeds despite their large seed crops. If we in addition consider the rarity of quality establishment sites and post-dispersal factors reducing successful seedling establishment, hemiepiphytic fig trees appear to face severe obstacles to seedling recruitment.  相似文献   

15.
We propose a general approach to compute the seed sensitivity, that can be applied to different definitions of seeds. It treats separately three components of the seed sensitivity problem--a set of target alignments, an associated probability distribution, and a seed model--that are specified by distinct finite automata. The approach is then applied to a new concept of subset seeds for which we propose an efficient automaton construction. Experimental results confirm that sensitive subset seeds can be efficiently designed using our approach, and can then be used in similarity search producing better results than ordinary spaced seeds.  相似文献   

16.
Seed weight is known to have a marked impact on emergence and post-emergence productivity in wild radish (Raphanus raphanistrum). In this paper, I describe several levels of seed weight variation in plants taken from a natural population in Hamden, Connecticut. Six maternal plants from the 1981 season were analyzed in detail: the weights and positions of all seeds within a fruit were recorded, and some of these seeds were used the following summer for competition studies and progeny analysis. Within a plant, average seed weight decreased as the number of seeds within a fruit increased, suggesting that developing embryos compete for maternal resources. Seed weight also varied significantly among the six maternal plants used in the study. Comparison of the average weights of seeds produced by offspring of those six plants with the average weights of seeds borne by the maternal plant revealed a significant genetic component to seed weight variation. Seed weight varied up to six-fold within single fruits of R. raphanistrum; large seeds tend to occur near the pedicel or in the middle positions. Seed size variation seen within single fruits is of sufficient magnitude to result in differential reproductive output among closely related seeds under competitive field conditions.  相似文献   

17.
18.
The average number of ovules produced per individual of Lupinus texensis is much greater than the average number of seeds per plant. Each plant produces approximately 2,000 ovules but only 2.5% develop into seeds. One fourth of the seeds is lost due to abortion and 0.3% is lost due to predation on the plant. Mature seeds from this population exhibit a five-fold range in weight, from 10 to 56 mg. The distribution of seed weights in the field population is skewed and leptokurtic. Seed wt is positively correlated with both seed germination and seedling survivorship. Heritability of seed wt is 0.09. There is no correlation between average seed wt per plant and total number of seeds per plant, seeds per pod, or legumes per plant.  相似文献   

19.
20.
Summary Models of the evolution of seed dormancy reveal that dormancy is favoured either when opportunities for establishment vary over time and when there is wide variation in the probability of success, or when the probability of success is limited by frequency dependence. Empirical evidence supporting the temporal heterogeneity hypothesis exists, but there is scant evidence for dormancy being favoured by frequency dependent competition among seedlings. We test the hypothesis that the intensity of between-sib competition should favour a positive relationship between maternal fecundity and seed dormancy. This hypothesis is supported for the rare, vernal pool annual,Pogogyne abramsii: the proportion of dormant offspring was significantly higher among high fecundity mothers than among other mothers. Dormancy inP. abramsii is controlled by the seed coat, a maternal tissue, so delaying germination favours the inclusive fitness of mothers by reducing the potential for competition among siblings. Seed weight and time to first germination varied significantly amongP. abramsii plants and mean seed weight increased linearly with plant biomass. Seed weight and seed number are independently regulated by plant size. Overall, seed weight varied 10-fold and variability in seed weight within mothers was not explained by plant biomass, seed yield or mean seed weight. GerminableP. abramsii seeds were significantly heavier than dormant seeds, and germinable seeds heavier than 0.20 mg germinated more rapidly than those smaller than 0.20 mg.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号