首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: It is widely recognized that the hybridization process is prone to errors and that the future of DNA sequencing by hybridization is predicated on the ability to successfully cope with such errors. However, the occurrence of hybridization errors results in the computational difficulty of the reconstruction of DNA sequencing by hybridization. The reconstruction problem of DNA sequencing by hybridization with errors is a strongly NP-hard problem. So far the problem has not been solved well. RESULTS: In this paper, a new approach is presented to solve the reconstruction problem of DNA sequencing by hybridization, which realizes the computational part of the SBH experiment. The proposed algorithm accepts both the negative and positive errors. The computational experiments show that the algorithm behaves satisfactorily, especially for the case with k-tuple repetitions and positive errors.  相似文献   

2.
Sequencing by hybridization (SBH) is a DNA sequencing technique, in which the sequence is reconstructed using its k-mer content. This content, which is called the spectrum of the sequence, is obtained by hybridization to a universal DNA array. Standard universal arrays contain all k-mers for some fixed k, typically 8 to 10. Currently, in spite of its promise and elegance, SBH is not competitive with standard gel-based sequencing methods. This is due to two main reasons: lack of tools to handle realistic levels of hybridization errors and an inherent limitation on the length of uniquely reconstructible sequence by standard universal arrays. In this paper, we deal with both problems. We introduce a simple polynomial reconstruction algorithm which can be applied to spectra from standard arrays and has provable performance in the presence of both false negative and false positive errors. We also propose a novel design of chips containing universal bases that differs from the one proposed by Preparata et al. (1999). We give a simple algorithm that uses spectra from such chips to reconstruct with high probability random sequences of length lower only by a squared log factor compared to the information theoretic bound. Our algorithm is very robust to errors and has a provable performance even if there are both false negative and false positive errors. Simulations indicate that its sensitivity to errors is also very small in practice.  相似文献   

3.
MOTIVATION: A new heuristic algorithm for solving DNA sequencing by hybridization problem with positive and negative errors. RESULTS: A heuristic algorithm providing better solutions than algorithms known from the literature based on tabu search method.  相似文献   

4.
A new algorithm for the construction of physical maps from hybridization fingerprints of short oligonucleotide probes has been developed. Extensive simulations in high-noise scenarios show that the algorithm produces an essentially completely correct map in over 95% of trials. Tests for the influence of specific experimental parameters demonstrate that the algorithm is robust to both false positive and false negative experimental errors. The algorithm was also tested in simulations using real DNA sequences of C. elegans, E. coli, S. cerevisiae, and H. sapiens. To overcome the non-randomness of probe frequencies in these sequences, probes were preselected based on sequence statistics and a screening process of the hybridization data was developed. With these modifications, the algorithm produced very encouraging results.  相似文献   

5.
MOTIVATION: A realistic approach to sequencing by hybridization must deal with realistic sequencing errors. The results of such a method can surely be applied to similar sequencing tasks. RESULTS: We provide the first algorithms for interactive sequencing by hybridization which are robust in the presence of hybridization errors. Under a strong error model allowing both positive and negative hybridization errors without repeated queries, we demonstrate accurate and efficient reconstruction with error rates up to 7%. Under the weaker traditional error model of Shamir and Tsur (Proceedings of the Fifth International Conference on Computational Molecular Biology (RECOMB-01), pp 269-277, 2000), we obtain accurate reconstructions with up to 20% false negative hybridization errors. Finally, we establish theoretical bounds on the performance of the sequential probing algorithm of Skiena and Sundaram (J. Comput. Biol., 2, 333-353, 1995) under the strong error model. AVAILABILTY: Freely available upon request. CONTACT: skiena@cs.sunysb.edu.  相似文献   

6.
An open problem in the field of computational neuroscience is how to link synaptic plasticity to system-level learning. A promising framework in this context is temporal-difference (TD) learning. Experimental evidence that supports the hypothesis that the mammalian brain performs temporal-difference learning includes the resemblance of the phasic activity of the midbrain dopaminergic neurons to the TD error and the discovery that cortico-striatal synaptic plasticity is modulated by dopamine. However, as the phasic dopaminergic signal does not reproduce all the properties of the theoretical TD error, it is unclear whether it is capable of driving behavior adaptation in complex tasks. Here, we present a spiking temporal-difference learning model based on the actor-critic architecture. The model dynamically generates a dopaminergic signal with realistic firing rates and exploits this signal to modulate the plasticity of synapses as a third factor. The predictions of our proposed plasticity dynamics are in good agreement with experimental results with respect to dopamine, pre- and post-synaptic activity. An analytical mapping from the parameters of our proposed plasticity dynamics to those of the classical discrete-time TD algorithm reveals that the biological constraints of the dopaminergic signal entail a modified TD algorithm with self-adapting learning parameters and an adapting offset. We show that the neuronal network is able to learn a task with sparse positive rewards as fast as the corresponding classical discrete-time TD algorithm. However, the performance of the neuronal network is impaired with respect to the traditional algorithm on a task with both positive and negative rewards and breaks down entirely on a task with purely negative rewards. Our model demonstrates that the asymmetry of a realistic dopaminergic signal enables TD learning when learning is driven by positive rewards but not when driven by negative rewards.  相似文献   

7.
Considering the nanofabrication errors, the real fabricated metallic nanowires may have irregular cross-sectional shapes. In this work, the metallic nanowires array with arbitrary cross-sectional shapes for negative refraction in visible regime was studied theoretically. To fully understand the evolution process of the negative refraction of the metallic wires with irregular cross-sectional shapes, the effective refractive index, effective mass, and effective radius of the wires were put forth and studied. The nanowire array with arbitrary cross-sectional shapes with different geometrical parameters was investigated in detail by means of computational numerical calculation on the basis of finite difference and time?Cdomain algorithm. The influence of geometrical parameters of the nanowires on negative refraction was systematically analyzed. The calculated results indicate that the irregular shape can play a positive role for the negative refraction-based imaging application.  相似文献   

8.
Oligonucleotide fingerprinting is a powerful DNA array-based method to characterize cDNA and ribosomal RNA gene (rDNA) libraries and has many applications including gene expression profiling and DNA clone classification. We are especially interested in the latter application. A key step in the method is the cluster analysis of fingerprint data obtained from DNA array hybridization experiments. Most of the existing approaches to clustering use (normalized) real intensity values and thus do not treat positive and negative hybridization signals equally (positive signals are much more emphasized). In this paper, we consider a discrete approach. Fingerprint data are first normalized and binarized using control DNA clones. Because there may exist unresolved (or missing) values in this binarization process, we formulate the clustering of (binary) oligonucleotide fingerprints as a combinatorial optimization problem that attempts to identify clusters and resolve the missing values in the fingerprints simultaneously. We study the computational complexity of this clustering problem and a natural parameterized version and present an efficient greedy algorithm based on MINIMUM CLIQUE PARTITION on graphs. The algorithm takes advantage of some unique properties of the graphs considered here, which allow us to efficiently find the maximum cliques as well as some special maximal cliques. Our preliminary experimental results on simulated and real data demonstrate that the algorithm runs faster and performs better than some popular hierarchical and graph-based clustering methods. The results on real data from DNA clone classification also suggest that this discrete approach is more accurate than clustering methods based on real intensity values in terms of separating clones that have different characteristics with respect to the given oligonucleotide probes.  相似文献   

9.
Bhandarkar SM  Machaka SA  Shete SS  Kota RN 《Genetics》2001,157(3):1021-1043
Reconstructing a physical map of a chromosome from a genomic library presents a central computational problem in genetics. Physical map reconstruction in the presence of errors is a problem of high computational complexity that provides the motivation for parallel computing. Parallelization strategies for a maximum-likelihood estimation-based approach to physical map reconstruction are presented. The estimation procedure entails a gradient descent search for determining the optimal spacings between probes for a given probe ordering. The optimal probe ordering is determined using a stochastic optimization algorithm such as simulated annealing or microcanonical annealing. A two-level parallelization strategy is proposed wherein the gradient descent search is parallelized at the lower level and the stochastic optimization algorithm is simultaneously parallelized at the higher level. Implementation and experimental results on a distributed-memory multiprocessor cluster running the parallel virtual machine (PVM) environment are presented using simulated and real hybridization data.  相似文献   

10.
Sequencing by hybridization is a method for reconstructing a DNA sequence based on its k-mer content. This content, called the spectrum of the sequence, can be obtained from hybridization with a universal DNA chip. However, even with a sequencing chip containing all 4(9) 9-mers and assuming no hybridization errors, only about 400-bases-long sequences can be reconstructed unambiguously. Drmanac et al. (1989) suggested sequencing long DNA targets by obtaining spectra of many short overlapping fragments of the target, inferring their relative positions along the target, and then computing spectra of subfragments that are short enough to be uniquely recoverable. Drmanac et al. do not treat the realistic case of errors in the hybridization process. In this paper, we study the effect of such errors. We show that the probability of ambiguous reconstruction in the presence of (false negative) errors is close to the probability in the errorless case. More precisely, the ratio between these probabilities is 1 + O(p = (1 - p)(4). 1 = d) where d is the average length of subfragments, and p is the probability of a false negative. We also obtain lower and upper bounds for the probability of unambiguous reconstruction based on an errorless spectrum. For realistic chip sizes, these bounds are tighter than those given by Arratia et al. (1996). Finally, we report results on simulations with real DNA sequences, showing that even in the presence of 50% false negative errors, a target of cosmid length can be recovered with less than 0.1% miscalled bases.  相似文献   

11.
Artificial neural networks (ANNs) are powerful computational tools that are designed to replicate the human brain and adopted to solve a variety of problems in many different fields. Fault tolerance (FT), an important property of ANNs, ensures their reliability when significant portions of a network are lost. In this paper, a fault/noise injection-based (FIB) genetic algorithm (GA) is proposed to construct fault-tolerant ANNs. The FT performance of an FIB-GA was compared with that of a common genetic algorithm, the back-propagation algorithm, and the modification of weights algorithm. The FIB-GA showed a slower fitting speed when solving the exclusive OR (XOR) problem and the overlapping classification problem, but it significantly reduced the errors in cases of single or multiple faults in ANN weights or nodes. Further analysis revealed that the fit weights showed no correlation with the fitting errors in the ANNs constructed with the FIB-GA, suggesting a relatively even distribution of the various fitting parameters. In contrast, the output weights in the training of ANNs implemented with the use the other three algorithms demonstrated a positive correlation with the errors. Our findings therefore indicate that a combination of the fault/noise injection-based method and a GA is capable of introducing FT to ANNs and imply that the distributed ANNs demonstrate superior FT performance.  相似文献   

12.
The concept of the reward prediction error—the difference between reward obtained and reward predicted—continues to be a focal point for much theoretical and experimental work in psychology, cognitive science, and neuroscience. Models that rely on reward prediction errors typically assume a single learning rate for positive and negative prediction errors. However, behavioral data indicate that better-than-expected and worse-than-expected outcomes often do not have symmetric impacts on learning and decision-making. Furthermore, distinct circuits within cortico-striatal loops appear to support learning from positive and negative prediction errors, respectively. Such differential learning rates would be expected to lead to biased reward predictions and therefore suboptimal choice performance. Contrary to this intuition, we show that on static “bandit” choice tasks, differential learning rates can be adaptive. This occurs because asymmetric learning enables a better separation of learned reward probabilities. We show analytically how the optimal learning rate asymmetry depends on the reward distribution and implement a biologically plausible algorithm that adapts the balance of positive and negative learning rates from experience. These results suggest specific adaptive advantages for separate, differential learning rates in simple reinforcement learning settings and provide a novel, normative perspective on the interpretation of associated neural data.  相似文献   

13.
Large variations are generally reported in the locations of centers of rotation (CR) for each of various joints in the human body. Some of these reports present conflicting results. This paper shows that this may be due in part to suboptimal experimental design as well as the phenomenon of error magnification. An algorithm is presented for computing the coordinates of the CR and the angle of rotation from the x, y coordinate measurements of two point markers on a moving body in two different positions. Error analysis is performed using a mathematical model that introduces systematically a positive or a negative error into each of the 8 x, y coordinates in all possible combinations, resulting in 256 CR locations. CR error zones are computed and graphed. Parametric analysis of the experimental set-up leads to optimization of the set-up. A typical case is analyzed and its errors computed. It is shown that small errors present in the measurements of the x, y coordinates of the markers are magnified to relatively large errors in the CR coordinates. In a suboptimal case, this magnification may be 30–50 times or more. The results show that, besides the magnitude of x, y coordinate errors, other factors responsible for determining the magnitude of errors in the location of the CR are: the magnitude of angle of rotation, the orientation of the markers with respect to the CR and their distances from the CR. In conjunction with the CR, the angle of rotation is also analyzed. Guidelines for optimal experimental set-up for minimizing the output errors are presented.  相似文献   

14.
Hall D  Bhandarkar SM  Wang J 《Genetics》2001,157(3):1045-1056
A contig map is a physical map that shows the native order of a library of overlapping genomic clones. One common method for creating such maps involves using hybridization to detect clone overlaps. False- positive and false-negative hybridization errors, the presence of chimeric clones, and gaps in library coverage lead to ambiguity and error in the clone order. Genomes with good genetic maps, such as Neurospora crassa, provide a means for reducing ambiguities and errors when constructing contig maps if clones can be anchored with genetic markers to the genetic map. A software application called ODS2 for creating contig maps based on clone-clone hybridization data is presented. This application is also designed to exploit partial ordering information provided by anchorage of clones to a genetic map. This information, along with clone-clone hybridization data, is used by a clone ordering algorithm and is represented graphically, allowing users to interactively align physical and genetic maps. ODS2 has a graphical user interface and is implemented entirely in Java, so it runs on multiple platforms. Other features include the flexibility of storing data in a local file or relational database and the ability to create full or minimum tiling contig maps.  相似文献   

15.
16.

Background

DNA sequence comparison is a well-studied problem, in which two DNA sequences are compared using a weighted edit distance. Recent DNA sequencing technologies however observe an encoded form of the sequence, rather than each DNA base individually. The encoded DNA sequence may contain technical errors, and therefore encoded sequencing errors must be incorporated when comparing an encoded DNA sequence to a reference DNA sequence.

Results

Although two-base encoding is currently used in practice, many other encoding schemes are possible, whereby two ore more bases are encoded at a time. A generalized k-base encoding scheme is presented, whereby feasible higher order encodings are better able to differentiate errors in the encoded sequence from true DNA sequence variants. A generalized version of the previous two-base encoding DNA sequence comparison algorithm is used to compare a k-base encoded sequence to a DNA reference sequence. Finally, simulations are performed to evaluate the power, the false positive and false negative SNP discovery rates, and the performance time of k-base encoding compared to previous methods as well as to the standard DNA sequence comparison algorithm.

Conclusions

The novel generalized k-base encoding scheme and resulting local alignment algorithm permits the development of higher fidelity ligation-based next generation sequencing technology. This bioinformatic solution affords greater robustness to errors, as well as lower false SNP discovery rates, only at the cost of computational time.  相似文献   

17.
MOTIVATION: Developing a new method of assembling small sequences based on sequencing by hybridization with many positive and negative faults. First, an interpretation of a generic traveling salesman problem is provided (i.e. finding the shortest route for visiting many cities), using genetic algorithms. Second, positive errors are excluded before assembly by a sanitization process. RESULTS: The present method outperforms those described in previous studies, in terms of both time and accuracy. AVAILABILITY: http://kamit.med.u-tokai.ac.jp/~takaho/sbh/index.html  相似文献   

18.
Alignment of protein sequences is a key step in most computational methods for prediction of protein function and homology-based modeling of three-dimensional (3D)-structure. We investigated correspondence between "gold standard" alignments of 3D protein structures and the sequence alignments produced by the Smith-Waterman algorithm, currently the most sensitive method for pair-wise alignment of sequences. The results of this analysis enabled development of a novel method to align a pair of protein sequences. The comparison of the Smith-Waterman and structure alignments focused on their inner structure and especially on the continuous ungapped alignment segments, "islands" between gaps. Approximately one third of the islands in the gold standard alignments have negative or low positive score, and their recognition is below the sensitivity limit of the Smith-Waterman algorithm. From the alignment accuracy perspective, the time spent by the algorithm while working in these unalignable regions is unnecessary. We considered features of the standard similarity scoring function responsible for this phenomenon and suggested an alternative hierarchical algorithm, which explicitly addresses high scoring regions. This algorithm is considerably faster than the Smith-Waterman algorithm, whereas resulting alignments are in average of the same quality with respect to the gold standard. This finding shows that the decrease of alignment accuracy is not necessarily a price for the computational efficiency.  相似文献   

19.
An algorithm is described for generation of the long sequence written in a four letter alphabet from the constituent k-tuple words in the minimal number of separate, randomly defined fragments of the starting sequence. It is primarily intended for use in sequencing by hybridization (SBH) process- a potential method for sequencing human genome DNA (Drmanac et al., Genomics 4, pp. 114-128, 1989). The algorithm is based on the formerly defined rules and informative entities of the linear sequence. The algorithm requires neither knowledge on the number of appearances of a given k-tuple in sequence fragments, nor the information on which k-tuple words are on the ends of a fragment. It operates with the mixed content of k-tuples of the various lengths. The concept of the algorithm enables operations with the k-tuple sets containing false positive and false negative k-tuples. The content of the false k-tuples primarily affects the completeness of the generated sequence, and its correctness in the specific cases only. The algorithm can be used for the optimization of SBH parameters in the simulation experiments, as well as for the sequence generation in the real SBH experiments on the genomic DNA.  相似文献   

20.
BACKGROUND: Multiplex or multicolor fluorescence in situ hybridization (M-FISH) is a recently developed cytogenetic technique for cancer diagnosis and research on genetic disorders. By simultaneously viewing the multiply labeled specimens in different color channels, M-FISH facilitates the detection of subtle chromosomal aberrations. The success of this technique largely depends on the accuracy of pixel classification (color karyotyping). Improvements in classifier performance would allow the elucidation of more complex and more subtle chromosomal rearrangements. Normalization of M-FISH images has a significant effect on the accuracy of classification. In particular, misalignment or misregistration across multiple channels seriously affects classification accuracy. Image normalization, including automated registration, must be done before pixel classification. METHODS AND RESULTS: We studied several image normalization approaches that affect image classification. In particular, we developed an automated registration technique to correct misalignment across the different fluor images (caused by chromatic aberration and other factors). This new registration algorithm is based on wavelets and spline approximations that have computational advantages and improved accuracy. To evaluate the performance improvement brought about by these data normalization approaches, we used the downstream pixel classification accuracy as a measurement. A Bayesian classifier assumed that each of 24 chromosome classes had a normal probability distribution. The effects that this registration and other normalization steps have on subsequent classification accuracy were evaluated on a comprehensive M-FISH database established by Advanced Digital Imaging Research (http://www.adires.com/05/Project/MFISH_DB/MFISH_DB.shtml). CONCLUSIONS: Pixel misclassification errors result from different factors. These include uneven hybridization, spectral overlap among fluors, and image misregistration. Effective preprocessing of M-FISH images can decrease the effects of those factors and thereby increase pixel classification accuracy. The data normalization steps described in this report, such as image registration and background flattening, can significantly improve subsequent classification accuracy. An improved classifier in turn would allow subtle DNA rearrangements to be identified in genetic diagnosis and cancer research.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号