首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
MOTIVATION: Genomic mutations and variations provide insightful information about the functionality of sequence elements and their association with human diseases. Traditionally, variations are identified through analysis of short DNA sequences, usually shorter than 1000 bp per fragment. Optical maps provide both faster and more cost-efficient means for detecting such differences, because a single map can span over 1 million bp. Optical maps are assembled to cover the whole genome, and the accuracy of assembly is critical. RESULTS: We present a computationally efficient model-based method for improving quality of such assemblies. Our method provides very high accuracy even with moderate coverage (<20 x). We utilize a hidden Markov model to represent the consensus map and use the expectation-Maximization algorithm to drive the refinement process. We also provide quality scores to assess the quality of the finished map. AVAILABILITY: Code is available from www.cmb.usc.edu/people/valouev/  相似文献   

2.
M W Enkin 《CMAJ》1996,154(11):1621-1622
  相似文献   

3.
Constans P 《Proteins》2004,55(3):646-655
Electron density protein alignments are analyzed in terms of their underlying similarity measure, the density overlap. These alignments are conceptually unrelated to biochemical structural elements and, therefore, are appropriate in structure-only similarity studies. The analysis is focused on the low sequence similarity subset of protein domains. A remarkable association is found between simple, density overlap measures and the expert designed Structural Classification of Proteins (SCOP) for which functional and evolutive analogies prevail. The association found validates the functional significance of electron density alignments.  相似文献   

4.
MOTIVATION: Pairwise local sequence alignment is commonly used to search data bases for sequences related to some query sequence. Alignments are obtained using a scoring matrix that takes into account the different frequencies of occurrence of the various types of amino acid substitutions. Software like BLAST provides the user with a set of scoring matrices available to choose from, and in the literature it is sometimes recommended to try several scoring matrices on the sequences of interest. The significance of an alignment is usually assessed by looking at E-values and p-values. While sequence lengths and data base sizes enter the standard calculations of significance, it is much less common to take the use of several scoring matrices on the same sequences into account. Altschul proposed corrections of the p-value that account for the simultaneous use of an infinite number of PAM matrices. Here we consider the more realistic situation where the user may choose from a finite set of popular PAM and BLOSUM matrices, in particular the ones available in BLAST. It turns out that the significance of a result can be considerably overestimated, if a set of substitution matrices is used in an alignment problem and the most significant alignment is then quoted. RESULTS: Based on extensive simulations, we study the multiple testing problem that occurs when several scoring matrices for local sequence alignment are used. We consider a simple Bonferroni correction of the p-values and investigate its accuracy. Finally, we propose a more accurate correction based on extreme value distributions fitted to the maximum of the normalized scores obtained from different scoring matrices. For various sets of matrices we provide correction factors which can be easily applied to adjust p- and E-values reported by software packages.  相似文献   

5.
We present a general method for assessing threading score significance. The threading score of a protein sequence, thread onto a given structure, should be compared with the threading score distribution of a random amino-acid sequence, of the same length, thread on the same structure; small p-values point significantly high scores. We claim that, due to general protein contact map properties, this reference distribution is a Weibull extreme value distribution whose parameters depend on the threading method, the structure, the length of the query and the random sequence simulation model used. These parameters can be estimated off-line with simulated sequence samples, for different sequence lengths. They can further be interpolated at the exact length of a query, enabling the quick computation of the p-value.  相似文献   

6.
7.

Background  

PCR has the potential to detect and precisely quantify specific DNA sequences, but it is not yet often used as a fully quantitative method. A number of data collection and processing strategies have been described for the implementation of quantitative PCR. However, they can be experimentally cumbersome, their relative performances have not been evaluated systematically, and they often remain poorly validated statistically and/or experimentally. In this study, we evaluated the performance of known methods, and compared them with newly developed data processing strategies in terms of resolution, precision and robustness.  相似文献   

8.
Given a family of related sequences, one can first determinealignments between various pairs of those sequences, then constructa simultaneous alignment of all the sequences that is determinedin a natural manner by the set of pairwise alignments. Thisapproach is sometimes effective for exposing the existence andlocations of conserved regions, which can then be aligned bymore sensitive multiple-alignment methods. This paper presentsan efficient algorithm for constructing a multiple alignmentfrom a set of pairwise alignments.  相似文献   

9.
10.
SUMMARY: As was shown in Nagarajan et al. (2005), commonly used approximations for assessing the significance of multiple alignments can be be very inaccurate. To address this, we present here the FAST package, an open-source collection of programs and libraries for efficiently and reliably computing the significance of ungapped local alignments. We also describe other potential applications in Bioinformatics where these programs can be adapted for significance testing. AVAILABILITY: The FAST package includes C++ implementations of various algorithms that can be used as stand-alone programs or as a library of subroutines. The package and a web-server for some of the programs are available at www.cs.cornell.edu/~keich/FAST.  相似文献   

11.

Background  

Profile-based analysis of multiple sequence alignments (MSA) allows for accurate comparison of protein families. Here, we address the problems of detecting statistically confident dissimilarities between (1) MSA position and a set of predicted residue frequencies, and (2) between two MSA positions. These problems are important for (i) evaluation and optimization of methods predicting residue occurrence at protein positions; (ii) detection of potentially misaligned regions in automatically produced alignments and their further refinement; and (iii) detection of sites that determine functional or structural specificity in two related families.  相似文献   

12.
Homology-derived secondary structure of proteins (HSSP) is a well-known database of multiple sequence alignments (MSAs) which merges information of protein sequences and their three-dimensional structures. It is available for all proteins whose structure is deposited in the PDB. It is also used by STING and (Java)Protein Dossier to calculate and present relative entropy as a measure of the degree of conservation for each residue of proteins whose structure has been solved and deposited in the PDB. However, if the STING and (Java)Protein Dossier are to provide support for analysis of protein structures modeled in computers or being experimentally solved but not yet deposited in the PDB, then we need a new method for building alignments having a flavor of HSSP alignments (myMSAr). The present study describes a new method and its corresponding databank (SH2QS--database of sequences homologue to the query [structure-having] sequence). Our main interest in making myMSAr was to measure the degree of residue conservation for a given query sequence, regardless of whether it has a corresponding structure deposited in the PDB. In this study, we compare the measurement of residue conservation provided by corresponding alignments produced by HSSP and SH2QS. As a case study, we also present two biologically relevant examples, the first one highlighting the equivalence of analysis of the degree of residue conservation by using HSSP or SH2QS alignments, and the second one presenting the degree of residue conservation for a structure modeled in a computer, which , as a consequence, does not have an alignment reported by HSSP.  相似文献   

13.
Egomotion and relative depth map from optical flow   总被引:2,自引:0,他引:2  
When an observer moves in a 3D world, optical flow fields are generated on his retina. We argue that such an observer can in principle compute the parameters of his egomotion, and following this, the relative depth map of the stationary environment solely from the instantaneous positional velocity fields (IPVF). Moreover, we argue that in the stationary world, this analysis can be done locally, and is not dependent on global properties of the optical flow under the imposed constraints (smoothness of the egomotion path, rigidity of objects, temporal continuity of perception). To investigate the method, and to analyze its performance, a computer model has been constructed which simulates an observer moving through a 3D world of stationary rectangular planes at different depths and orientations. The results suggest that the method offers a reasonable and computationally feasible means of extracting information about egomotion and surface layout from optical flows, under certain circumstances. We discuss some issues related to extending the analysis to the case of a rigid world of moving objects, and some issues related to the status of information extractable from optical flows with respect to other sources of information.  相似文献   

14.
15.
This paper is concerned with the statistical analysis of data from comparative experiments in applied animal behaviour research. It is emphasized that the statistical analysis of experimental results should be in accordance with the design of the experiment. An example is given to illustrate this.  相似文献   

16.
Consider the scenario of common gene clusters of closely related species where the cluster sizes could be as large as 400 from an alphabet of 25,000 genes. This paper addresses the problem of computing the statistical significance of such large clusters, whose individual elements occur with very low frequency (of the order of the number of species in this case) and the alphabet set of the elements is relatively large. We present a model where we study the structure of the clusters in terms of smaller nested (or otherwise) sub-clusters contained within the cluster. We give a probability estimation based on the expected cluster structure for such clusters (rather than some form of the product of individual probabilities of the elements). We also give an exact probability computation based on a dynamic programming algorithm, which runs in polynomial time.  相似文献   

17.
11,581 grape (Vitis L.) EST-SSRs were produced and characterized from a total of 381,609 grape ESTs. Among the EST-SSRs, the tri repeat (5,560, 45.4%) represented the most abundant class of microsatellites in grape EST. Most of grape EST-SSR motifs fall within 18-24 bps in length. The EST-SSRs tri-repeats occurred a higher percentage in 5??-end (59.3%) than in 3??-end (48.3%). And EST-SSR tri-repeats had abundant codon repeats for putative amino acid runs as Proline, Arginine in grape ESTs. To better utilizing these markers, 142 of newly developed and validated EST SSR loci as well as 223 linkage map SSR loci were in silico aligned and mapped in grape genome. The orders of these SSR loci in the chromosomal physical locations and in the linkage groups were compared, and about twenty linkage map loci positions were switched or rearranged in grape genome. The EST-SSR markers extended the linkage map in grape genome. The method of in silico mapping reported in this study provided an initial collection for grape mapping resources. This approach offers great opportunities to understand the genetic variations in nucleotide sequences differences in physical map, and genetic recombination in linkage maps, as well as benefits for markers enrichment in a specific grape genome region for fine mapping or QTL mapping.  相似文献   

18.
Protein structure comparison is a fundamental problem for structural genomics, with applications to drug design, fold prediction, protein clustering, and evolutionary studies. Despite its importance, there are very few rigorous methods and widely accepted similarity measures known for this problem. In this paper we describe the last few years of developments on the study of an emerging measure, the contact map overlap (CMO), for protein structure comparison. A contact map is a list of pairs of residues which lie in three-dimensional proximity in the protein's native fold. Although this measure is in principle computationally hard to optimize, we show how it can in fact be computed with great accuracy for related proteins by integer linear programming techniques. These methods have the advantage of providing certificates of near-optimality by means of upper bounds to the optimal alignment value. We also illustrate effective heuristics, such as local search and genetic algorithms. We were able to obtain for the first time optimal alignments for large similar proteins (about 1,000 residues and 2,000 contacts) and used the CMO measure to cluster proteins in families. The clusters obtained were compared to SCOP classification in order to validate the measure. Extensive computational experiments showed that alignments which are off by at most 10% from the optimal value can be computed in a short time. Further experiments showed how this measure reacts to the choice of the threshold defining a contact and how to choose this threshold in a sensible way.  相似文献   

19.
Statistical analyses of genome sequence‐derived protein sequence data can identify amino acid residues that interact between proteins or between domains of a protein. These statistical methods are based on evolution‐directed amino acid variation responding to structural and functional constraints in proteins. The identified residues form a basis for determining structure and folding of proteins as well as inferring mechanisms of protein function. When applied to two‐component systems, several research groups have shown they can be used to identify the amino acid interactions between response regulators and histidine kinases and the specificity therein. Recently, statistical studies between the HisKA and HATPase‐ATP‐binding domains of histidine kinases identified amino acid interactions for both the inactive and the active catalytic states of such kinases. The identified interactions generated a model structure for the domain conformation of the active state. This conformation requires an unwinding of a portion of the C‐terminal helix of the HisKA domain that destroys the inactive state residue contacts and suggests how signal‐binding determines the equilibrium between the inactive and active states of histidine kinases. The rapidly accumulating protein sequence databases from genome, metagenome and microbiome studies are an important resource for functional and structural understanding of proteins and protein complexes in microbes.  相似文献   

20.
MOTIVATION: Different automatic methods of sequence alignments are routinely used as a starting point for homology searches and function inference. Confidence in an alignment probability is one of the major fundamentals of massive automatic genome-scale pairwise comparisons, for clustering of putative orthologs and paralogs, sequenced genome annotation or multiple-genomic tree constructions. Extreme value distribution based on the Karlin-Altschul model, usually advised for large-scale comparisons are not always valid, particularly in the case of comparisons of non-biased with nucleotide-biased genomes (such that of Plasmodium falciparum). Z-values estimates based on Monte Carlo technics, can be calculated experimentally for any alignment output, whatever the method used. Empirically, a Z-value higher than approximately 8 is supposed reasonable to assess that an alignment score is significant, but this arbitrary figure was never theoretically justified. RESULTS: In this paper, we used the Bienaymé-Chebyshev inequality to demonstrate a theorem of the upper limit of an alignment score probability (or P-value). This theorem implies that a computed Z-value is a statistical test, a single-linkage clustering criterion and that 1/Z-value(2) is an upper limit to the probability of an alignment score whatever the actual probability law is. Therefore, this study provides the missing theoretical link between a Z-value cut-off used for an automatic clustering of putative orthologs and/or paralogs, and the corresponding statistical risk in such genome-scale comparisons (using non-biased or biased genomes).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号