首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Optimal reconstruction of a sequence from its probes.   总被引:4,自引:0,他引:4  
An important combinatorial problem, motivated by DNA sequencing in molecular biology, is the reconstruction of a sequence over a small finite alphabet from the collection of its probes (the sequence spectrum), obtained by sliding a fixed sampling pattern over the sequence. Such construction is required for Sequencing-by-Hybridization (SBH), a novel DNA sequencing technique based on an array (SBH chip) of short nucleotide sequences (probes). Once the sequence spectrum is biochemically obtained, a combinatorial method is used to reconstruct the DNA sequence from its spectrum. Since technology limits the number of probes on the SBH chip, a challenging combinatorial question is the design of a smallest set of probes that can sequence an arbitrary DNA string of a given length. We present in this work a novel probe design, crucially based on the use of universal bases [bases that bind to any nucleotide (Loakes and Brown, 1994)] that drastically improves the performance of the SBH process and asymptotically approaches the information-theoretic bound up to a constant factor. Furthermore, the sequencing algorithm we propose is substantially simpler than the Eulerian path method used in previous solutions of this problem.  相似文献   

2.
In a recent paper (Preparata et aL, 1999) we introduced a novel probing scheme for DNA sequencing by hybridization (SBH). The new gapped-probe scheme combines natural and universal bases in a well-defined periodic pattern. It has been shown (Preparata et al, 1999) that the performance of the gapped-probe scheme (in terms of the length of a sequence that can be uniquely reconstructed using a given size library of probes) is significantly better than the standard scheme based on oligomer probes. In this paper we present and analyze a new, more powerful, sequencing algorithm for the gapped-probe scheme. We prove that the new algorithm exploits the full potential of the SBH technology with high-confidence performance that comes within a small constant factor (about 2) of the information-theory bound. Moreover, this performance is achieved while maintaining running time linear in the target sequence length.  相似文献   

3.
DNA sequencing by hybridization using semi-degenerate bases.   总被引:1,自引:0,他引:1  
One way to enhance the performance of hybridization microarrrays for DNA de novo sequencing is the use of probing patterns with gaps of unsampled positions. Ideally, such gaps could be realized by the inclusion into microarray oligos (probes) of wild-card compounds, referred to as universal bases (which bind nonspecifically to natural bases). The suggested alternative is to deploy in the gap positions degenerate bases, i.e., uniform mixtures of the four natural bases, with ensuing deterioration of the hybridization signal. In this paper, we show that such signal loss is a minor shortcoming, compared with the fact that degenerate bases cannot be treated as universal. Indeed, the substantial spread of hybridization energies at any microarray feature is such that on overwhelming number of mismatches bind more strongly than legal matches. We observed, however, that much narrower energy spreads are exhibited by pairs of bases in the same strength class (A-T and C-G). We call semi-degenerate a gap position realized with bases in the same energy class and show that well-known sequence reconstruction algorithms can be modified to achieve substantial improvements in sequencing effectiveness. For example, with a 4(9)-feature microarray and an acceptable weakening of the hybridization signal, one may achieve lengths of about 4,000 bases (compared with < 250 of the standard uniform method). Our approach also incorporates the use of a spectrum expressed in terms of observed feature melting temperatures (analog spectrum), rather than binary decisions made directly at the biochemical level (digital spectrum). While universal bases represent the ultimate goal of sequencing by hybridization, semidegenerate natural bases are the most effective known substitute.  相似文献   

4.
DNA sequencing by hybridization, potentially a powerful alternative to standard wet lab techniques, has received renewed interest after a novel probing scheme has been recently proposed whose performance for the first time asymptotically meets the information theory bound. After settlement of the question of asymptotic performance, there remains the issue of algorithmic fine tunings aimed at improving the performance "constants," with substantial practical implications. In this paper, we show that a probing scheme based on the joint use of direct and reverse spectra (tandem spectra) for a given gapped probing pattern achieves a performance improvement per unit of microarray area of about 5/4 and does not appear to be susceptible to further improvement by increasing the number of cooperating spectra. In other words, tandem-spectrum reconstruction is the best known technique for sequencing by hybridization.  相似文献   

5.
Sequencing by hybridization (SBH) is a DNA sequencing technique, in which the sequence is reconstructed using its k-mer content. This content, which is called the spectrum of the sequence, is obtained by hybridization to a universal DNA array. Standard universal arrays contain all k-mers for some fixed k, typically 8 to 10. Currently, in spite of its promise and elegance, SBH is not competitive with standard gel-based sequencing methods. This is due to two main reasons: lack of tools to handle realistic levels of hybridization errors and an inherent limitation on the length of uniquely reconstructible sequence by standard universal arrays. In this paper, we deal with both problems. We introduce a simple polynomial reconstruction algorithm which can be applied to spectra from standard arrays and has provable performance in the presence of both false negative and false positive errors. We also propose a novel design of chips containing universal bases that differs from the one proposed by Preparata et al. (1999). We give a simple algorithm that uses spectra from such chips to reconstruct with high probability random sequences of length lower only by a squared log factor compared to the information theoretic bound. Our algorithm is very robust to errors and has a provable performance even if there are both false negative and false positive errors. Simulations indicate that its sensitivity to errors is also very small in practice.  相似文献   

6.
Sequencing by Hybridization (SBH) reconstructs an n-long target DNA sequence from its biochemically determined l-long subsequences. In the standard approach, the length of a uniformly random sequence that can be unambiguously reconstructed is limited to n = O(2(l)) due to repetitive subsequences causing reconstruction degeneracies. We present a modified sequencing method that overcomes this limitation without the need for different types of biochemical assays and is robust to error.  相似文献   

7.
We consider the following problem: Given a set of binary sequences, determine lower bounds on the minimum number of recombinations required to explain the history of the sample, under the infinite-sites model of mutation. The problem has implications for finding recombination hotspots and for the Ancestral Recombination Graph reconstruction problem. Hudson and Kaplan gave a lower bound based on the four-gamete test. In practice, their bound R/sub m/ often greatly underestimates the minimum number of recombinations. The problem was recently revisited by Myers and Griffiths, who introduced two new lower bounds R/sub h/ and R/sub s/ which are provably better, and also yield good bounds in practice. However, the worst-case complexities of their procedures for computing R/sub h/ and R/sub s/ are exponential and super-exponential, respectively. In this paper, we show that the number of nontrivial connected components, R/sub c/, in the conflict graph for a given set of sequences, computable in time 0(nm/sup 2/), is also a lower bound on the minimum number of recombination events. We show that in many cases, R/sub c/ is a better bound than R/sub h/. The conflict graph was used by Gusfield et al. to obtain a polynomial time algorithm for the galled tree problem, which is a special case of the Ancestral Recombination Graph (ARG) reconstruction problem. Our results also offer some insight into the structural properties of this graph and are of interest for the general Ancestral Recombination Graph reconstruction problem.  相似文献   

8.
On the complexity of positional sequencing by hybridization.   总被引:2,自引:0,他引:2  
In sequencing by hybridization (SBH), one has to reconstruct a sequence from its l-long substrings. SBH was proposed as an alternative to gel-based DNA sequencing approaches, but in its original form the method is not competitive. Positional SBH (PSBH) is a recently proposed enhancement of SBH in which one has additional information about the possible positions of each substring along the target sequence. We give a linear time algorithm for solving PSBH when each substring has at most two possible positions. On the other hand, we prove that the problem is NP-complete if each substring has at most three possible positions. We also show that PSBH is NP-complete if the set of allowed positions for each substring is an interval of length k and provide a fast algorithm for the latter problem when k is bounded.  相似文献   

9.
We consider the estimation of success rate and harvest under post survey stratification at the sub‐domain (county) level. Often in this situation, the population size for the sub‐domain is unknown and the random mechanism that dictates the sample size for sub‐domains is ignored. Finding good estimators of success rate and harvest is very important for wildlife abundance. A Bayesian hierarchical model is developed to estimate both success rate and harvest simultaneously. The model includes a random sub‐domain sample size correlated with the number of successes in the sub‐domain, fixed week effects, random geographic effects, and spatial correlations between neighboring sub‐domains. The computation is done by Gibbs sampling and adaptive rejection sampling techniques. The method developed is illustrated using data from the Missouri Turkey Hunting Survey. The estimation of success rate is improved by treating the the sub‐domain sample size as a random variable instead of a fixed constant. The Bayesian model yields a reasonable harvest estimation. The spatial pattern of the estimated harvest matches the pattern of the check station data.  相似文献   

10.
Noller HF 《Biochimie》2006,88(8):935-941
Prior to the emergence of crystal structures of the ribosome, different ribosomal functions were identified with specific regions of ribosomal RNA by biochemical and genetic approaches. In particular, three universally conserved bases of 16S rRNA, G530, A1492 and A1493, were implicated in the interaction of the incoming aminoacyl-tRNA with the 30S subunit and mRNA. The conserved region surrounding A1492 and A1493 was called the "decoding site", based on the results of chemical probing experiments and antibiotic resistance mutations. Crystallographic studies from the Ramakrishnan laboratory have now shown that G530 loop, A1492 and A1493 undergo localized conformational changes to form an RNA structure that positions these three bases to inspect the accuracy of the codon-anticodon match with high stereochemical precision, using A-minor interactions. Some results from the pre-X-ray era may provide clues to further aspects of the decoding process.  相似文献   

11.
Summary DNA amplification fingerprinting (DAF) is the enzymatic amplification of arbitrary stretches of DNA which is directed by very short oligonucleotide primers of arbitrary sequence to generate complex but characteristic DNA fingerprints. To determine the contribution of primer sequence and length to the fingerprint pattern and the effect of primer-template mismatches, DNA was amplified from several sources using sequence-related primers. Primers of varying length, constructed by removing nucleotides from the 5 terminus, produced unique patterns only when primers were 8 nucleotides or fewer in length. Larger primers produced either identical or related fingerprints, depending on the sequence. Single base changes within this first 8-nucleotide region of the primer significantly altered the spectrum of amplification products, especially at the 3 terminus. Increasing annealing temperatures from 15° to 70° C during amplification did not shift the boundary of the 8-nucleotide region, but reduced the amplification ability of shorter primers. Our observations define a 3-terminal oligonucleotide domain that is at least 8 bases in length and largely conditions amplification, but that is modulated by sequences beyond it. Our results indicate that only a fraction of template annealing sites are efficiently amplified during DAF. A model is proposed in which a single primer preferentially amplifies certain products due to competition for annealing sites between primer and terminal hairpin loop structures of the template.  相似文献   

12.
We present here the use of fluorescent methodologies for structural and functional studies of RNA in place of radioactivity. The methods are highly sensitive and quantitative with the use of an infrared fluorescence imaging system. IRD-700 and IRD-800 labels are used for fluorescence detection. Chemical probing methods are largely used for mapping RNA secondary structure and to monitor ligand interactions and conformational changes involving individual bases of RNA. The new fluorescent primer extension methodology allows simple and fast chemical probing of RNA with high sensitivity. IRD-700 and IRD-800 labeled primers can also be used to monitor protein-RNA interactions by fluorescent mobility shift assays. The speed and ease of these approaches are advantages over prior methods that used hazardous radioisotopes. Structural and biochemical investigations of RNA should benefit from the use of these fluorescent methodologies.  相似文献   

13.
A fast restriction sites search algorithm using a quadruplet look-ahead feature has been written in 6502 assembly language code. The search time, tested on the sequence of pBR322, is 4.1 s/kilobase using a restriction site library including 112 specificities corresponding to a total site length of over 700 bases. The search for a short sequence (less than 36 bases) within a longer one (up to 9999 bases) with a given number of mismatches or gaps allowed has also been written in assembly language. Typical run time for the search of a 12 base sequence with 1, 2 or 3 gaps allowed are 6.2, 9.4 or 13.6 s/kilobase, respectively. The dot matrix analysis needs 7.5 minutes per square kilobase when using a stringency of 15 matched bases out of 25. A 7/21 matrix of two 500 amino acid proteins is obtained in 3 minutes. These three routines are included in DPSA, a general package of programs allowing manipulation and analysis of DNA and protein sequences.  相似文献   

14.
I L Cartwright  S E Kelly 《BioTechniques》1991,11(2):188-90, 192-4, 196 passim
  相似文献   

15.
Summary The effects of short- and long-term exposure to a range in concentration of sea salts on the kinetics of NH inf4 sup+ uptake by Spartina alterniflora were examined in a laboratory culture experiment. Long-term exposure to increasing salinity up to 50 g/L resulted in a progressive increase in the apparent Km but did not significantly affect Vmax (mean Vmax=4.23±1.97 mole·g–1·h–1). The apparent Km increased in a nonlinear fashion from a mean of 2.66±1.10 mole/L at a salinity of 5 g/L to a mean of 17.56±4.10 mole/L at a salinity of 50 g/L. These results suggest that the long-term effect of exposure to total salt concentrations within the range 5–50 g/L was a competitive inhibition of NH inf4 sup+ uptake in S. alterniflora. No significant NH inf4 sup+ uptake was observed in S. alterniflora exposed to 65 g/L sea salts. Short-term exposure to rapid changes in salinity significantly affected both Vmax and Km. Reduction of solution salinity from 35 to 5 g/L did not change Vmax but reduced Km by 71%. However, exposing plants grown at 5 g/L salinity to 35 resulted in an decrease in Vmax of approximately 50%. Exposure of plants grown at 35 g/L to a total sea salt concentration of 50 g/L for 48h completely inhibited uptake of NH inf4 sup+ . For both experiments, increasing salinity led to an increase in the apparent Km similar to that found in response to long-term exposure. Our data are consistent with a conceptual model of changes in the productivity of S. alterniflora in the salt marsh as a function of environmental modification of NH inf4 sup+ uptake kinetics.  相似文献   

16.
Sequence elements, at all levels-DNA, RNA and protein, play a central role in mediating molecular recognition and thereby molecular regulation and signaling. Studies that focus on -measuring and investigating sequence-based recognition make use of statistical and computational tools, including approaches to searching sequence motifs. State-of-the-art motif searching tools are limited in their coverage and ability to address large motif spaces. We develop and present statistical and algorithmic approaches that take as input ranked lists of sequences and return significant motifs. The efficiency of our approach, based on suffix trees, allows searches over motif spaces that are not covered by existing tools. This includes searching variable gap motifs-two half sites with a flexible length gap in between-and searching long motifs over large alphabets. We used our approach to analyze several high-throughput measurement data sets and report some validation results as well as novel suggested motifs and motif refinements. We suggest a refinement of the known estrogen receptor 1 motif in humans, where we observe gaps other than three nucleotides that also serve as significant recognition sites, as well as a variable length motif related to potential tyrosine phosphorylation.  相似文献   

17.
M S Livshits 《Biofizika》1975,20(5):920-924
The model proposed in an attempt to find out physical bases of object perception during echolocation. It is shown that echolocational perception can be provided with correlational treatment of corresponding signals. The character of objects is determined by the comparison by echo probing accepted in the given cycle with typical distortions remembered in the course of individual experience of the animal. The distortions take place during the reflection of the probing impulse from these or those objects. "Binding" of the objects according to distance may be carried out by using the choice of typical distortions for corresponding correction of the copy of probing impulse, serving as a bearing signal of distance correlometer. The response of correlometer to the echo from correctly perceived target increases. The block-scheme of such correlation perception during echolocation is given. Performance of some experiments allowing to check and refine the model considered.  相似文献   

18.
Korn R 《Planta》2006,224(4):915-923
Tracheid analysis was carried out on the veinlets and minor veins of the coleus (Solenostemon scutellarioides [L.] Codd) leaf. Third- to fifth-order, or minor, veins average 3.4 tracheids in tandem and they bipartition islets when these enclosed islets reach a critical size; both these features of vein length and islet size contribute to a self-similar process of vein pattern generation. An areole was calculated to be initially comprised of about ten cells making the patterning event for vein formation requiring only a few cells. An algorithmic model developed here for minor vein formation includes five production rules, and this computer model explains the 3–4 tracheids per minor vein, presence of isolated tracheids, the structure of veinlets, and the elaborate branching patterns of veinlets in coleus and other plants.  相似文献   

19.
Within this paper we investigate the Bernoulli model for random secondary structures of ribonucleic acid (RNA) molecules. Assuming that two random bases can form a hydrogen bond with probability p we prove asymptotic equivalents for the averaged number of hairpins and bulges, the averaged loop length, the expected order, the expected number of secondary structures of size n and order k and further parameters all depending on p. In this way we get an insight into the change of shape of a random structure during the process . Afterwards we compare the computed parameters for random structures in the Bernoulli model to the corresponding quantities for real existing secondary structures of large subunit rRNA molecules found in the database of Wuyts et al. That is how it becomes possible to identify those parameters which behave (almost) randomly and those which do not and thus should be considered as interesting, e.g., with respect to the biological functions or the algorithmic prediction of RNA secondary structures.  相似文献   

20.
Plant and microbial metabolic engineering is commonly used in the production of functional foods and quality trait improvement. Computational model-based approaches have been used in this important endeavour. However, to date, fish metabolic models have only been scarcely and partially developed, in marked contrast to their prominent success in metabolic engineering. In this study we present the reconstruction of fully compartmentalised models of the Danio rerio (zebrafish) on a global scale. This reconstruction involves extraction of known biochemical reactions in D. rerio for both primary and secondary metabolism and the implementation of methods for determining subcellular localisation and assignment of enzymes. The reconstructed model (ZebraGEM) is amenable for constraint-based modelling analysis, and accounts for 4,988 genes coding for 2,406 gene-associated reactions and only 418 non-gene-associated reactions. A set of computational validations (i.e., simulations of known metabolic functionalities and experimental data) strongly testifies to the predictive ability of the model. Overall, the reconstructed model is expected to lay down the foundations for computational-based rational design of fish metabolic engineering in aquaculture.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号