首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A simple statistical method is presented which tests for non-randomness in a cyclic sequence. The distribution of the statistic is tabulated for short sequences, and a large sample approximation is derived. Significant values of the statistic are given for sequences of length up to 52. Easily used computer programs to carry out appropriate calculations are presented.  相似文献   

2.
Summary Closely related proteins show an obvious kinship by having numerous matching amino acids in their aligned sequences. Kinship between anciently separated proteins requires a statistical evaluation to rule out fortuitous similarities. A simple statistic is developed which assumes equal probability for all codon pairs, and a table of critical values for amino acid sequence alignments of length 200 or less is presented. Applying this statistic toV andC regions of immunoglobulin chains, aligned on the basis of shared features of three-dimensional structure, provides evidence that theV andC sequences descended from a common ancestor. Similarly the distant evolutionary relationship of dehydrogenases, flavdoxin, and subtilisin, suggested by structural alignments, is verified. On the other hand, the statistic does not verify a common evolutionary origin for the heme binding pocket in globins and cytochromeb 5. Empirical evidence from the distribution of MMD values of amino acid pairs in comparisons of misaligned polypeptide chains and from Monte Carlo trials of sequences aligned with arbitrary gaps supports the validity of the statistic.  相似文献   

3.
A new statistic for detecting genetic differentiation   总被引:20,自引:0,他引:20  
Hudson RR 《Genetics》2000,155(4):2011-2014
A new statistic for detecting genetic differentiation of subpopulations is described. The statistic can be calculated when genetic data are collected on individuals sampled from two or more localities. It is assumed that haplotypic data are obtained, either in the form of DNA sequences or data on many tightly linked markers. Using a symmetric island model, and assuming an infinite-sites model of mutation, it is found that the new statistic is as powerful or more powerful than previously proposed statistics for a wide range of parameter values.  相似文献   

4.
C A Beam  H S Wieand 《Biometrics》1991,47(3):907-919
In this paper we study a statistic that is suitable for comparing a discrete diagnostic marker to one or more continuous diagnostic markers. Test procedures and confidence intervals are based on asymptotic normality. The statistic is applicable for correlated data in which all the markers are obtained for each subject. The statistic was studied for use in comparing two markers for rectal bleeding. Examples for this application and two more general applications are presented.  相似文献   

5.
The D(2) statistic, defined as the number of matches of words of some pre-specified length k, is a computationally fast alignment-free measure of biological sequence similarity. However there is some debate about its suitability for this purpose as the variability in D(2) may be dominated by the terms that reflect the noise in each of the single sequences only. We examine the extent of the problem and the effectiveness of overcoming it by using two mean-centred variants of this statistic, D(2)* and D(2c). We conclude that all three statistics are potentially useful measures of sequence similarity, for which reasonably accurate p-values can be estimated under a null hypothesis of sequences composed of identically and independently distributed letters. We show that D(2) and D(2)c, and to a somewhat lesser extent D(2)*, perform well in tests to classify moderate length query sequences as putative cis-regulatory modules.  相似文献   

6.
Modifications for improving the prediction quality in a previously described adaptive algorithm of automated annotation (A 4) were considered. First, the direct use of the basis statistic η ensures a higher prediction quality than the use of a previously proposed statistic γ. Second, the quality is improved when only some of the found similar sequences, rather than all of them, are used for prediction, since this reduces the data noise.  相似文献   

7.
Use of runs statistics for pattern recognition in genomic DNA sequences.   总被引:2,自引:0,他引:2  
In this article, the use of the finite Markov chain imbedding (FMCI) technique to study patterns in DNA under a hidden Markov model (HMM) is introduced. With a vision of studying multiple runs-related statistics simultaneously under an HMM through the FMCI technique, this work establishes an investigation of a bivariate runs statistic under a binary HMM for DNA pattern recognition. An FMCI-based recursive algorithm is derived and implemented for the determination of the exact distribution of this bivariate runs statistic under an independent identically distributed (IID) framework, a Markov chain (MC) framework, and a binary HMM framework. With this algorithm, we have studied the distributions of the bivariate runs statistic under different binary HMM parameter sets; probabilistic profiles of runs are created and shown to be useful for trapping HMM maximum likelihood estimates (MLEs). This MLE-trapping scheme offers good initial estimates to jump-start the expectation-maximization (EM) algorithm in HMM parameter estimation and helps prevent the EM estimates from landing on a local maximum or a saddle point. Applications of the bivariate runs statistic and the probabilistic profiles in conjunction with binary HMMs for pattern recognition in genomic DNA sequences are illustrated via case studies on DNA bendability signals using human DNA data.  相似文献   

8.
Some statistical properties of samples of DNA sequences are studied under an infinite-site neutral model with recombination. The two quantities of interest are R, the number of recombination events in the history of a sample of sequences, and RM, the number of recombination events that can be parsimoniously inferred from a sample of sequences. Formulas are derived for the mean and variance of R. In contrast to R, RM can be determined from the sample. Since no formulas are known for the mean and variance of RM, they are estimated with Monte Carlo simulations. It is found that RM is often much less than R, therefore, the number of recombination events may be greatly under-estimated in a parsimonious reconstruction of the history of a sample. The statistic RM can be used to estimate the product of the recombination rate and the population size or, if the recombination rate is known, to estimate the population size. To illustrate this, DNA sequences from the Adh region of Drosophila melanogaster are used to estimate the effective population size of this species.  相似文献   

9.
Strand M 《Biometrics》2000,56(4):1222-1226
Treatment means in factorial experiments are lattice ordered when there is an increase in mean response as the level of any factor is increased while holding the other factors fixed. Such means occur naturally in many experiments. A nonparametric test for lattice-ordered means involving a Kendall-type statistic will be summarized for k-factor factorial experiments. Specifically, the form of the test statistic and variance under the null hypothesis will be presented. In addition, a normalized version of the test statistic will be discussed and applied to relevant data.  相似文献   

10.
With whole-genome sequences being completed at an increasing rate, it is important to develop and assess tools to analyze them. Following annotation of the protein content of a genome, one can compare sequences with previously characterized homologous genes to detect novel functions within specific proteins in the evolution of the newly sequenced genome. One common statistical method to detect such changes is to compare the ratios of nonsynonymous (K(a)) to synonymous (K(s)) nucleotide substitution rates. Here, the effects of several parameters that can influence this calculation (sequence reconstruction method, phylogenetic tree branch length weighting, GC content, and codon bias) are examined. Also, two new alternative measures of adaptive evolution, the point accepted mutations (PAM)/neutral evolutionary distance (NED) ratio and the sequence space assessment (SSA) statistic are presented. All of these methods are compared using two sequence families: the recent divergence of leptin orthologs in primates, and the more ancient divergence of the deoxyribonucleoside kinase family. The examination of these and other measures to detect changes of gene function along branches of a phylogenetic tree will become increasingly important in the postgenomic era.  相似文献   

11.
Microarray technologies allow for simultaneous measurement of DNA copy number at thousands of positions in a genome. Gains and losses of DNA sequences reveal themselves through characteristic patterns of hybridization intensity. To identify change points along the chromosomes, we develop a marker clustering method which consists of 2 parts. First, a "circular clustering tree test statistic" attaches a statistic to each marker that measures the likelihood that it is a change point. Then construction of the marker statistics is followed by outlier detection approaches. The method provides a new way to build up a binary tree that can accurately capture change-point signals and is easy to perform. A simulation study shows good performance in change-point detection, and cancer cell line data are used to illustrate performance when regions of true copy number changes are known.  相似文献   

12.
One of the key problems in the study of ancient DNA is that of authenticating sequences obtained from PCR amplifications of highly degraded samples. Contamination of ancient samples and postmortem damage to endogenous DNA templates are the major obstacles facing researchers in this task. In particular, the authentication of sequences obtained from ancient human remains is thought by many to be rather challenging. We propose a novel approach, based on the c statistic, that can be employed to help identify the sequence motif of an endogenous template, based on a sample of sequences that reflect the nucleotide composition of individual template molecules obtained from ancient tissues (such as cloned products from a PCR amplification). The c statistic exploits as information the most common form of postmortem damage observed among clone sequences in ancient DNA studies, namely, lesion-induced substitutions caused by cytosine deamination events. Analyses of simulated sets of templates with miscoding lesions and real sets of clone sequences from the literature indicate that the c-based approach is highly effective in identifying endogenous sequence motifs, even when they are not present among the sampled clones. The proposed approach is likely to be of general use to researchers working with DNA from ancient tissues, particularly from human remains, where authentication of results has been most challenging. [Reviewing Editor: Dr. Magnus Nordborg]  相似文献   

13.
J J Chen  R L Kodell 《Biometrics》1987,43(3):499-509
This paper proposes a method for analyzing tumor data from chronic studies when the experimental design includes combinations of two factors, for example, sex and dose. Both main effects and combined-effect (interaction) hypotheses are considered. A stratified log-rank statistic is presented for tests of no column or row (main) effects. The paper shows that when the numbers of animals in the cells are unequal and disproportional, the null distribution of the unstratified log-rank statistic does not have a chi-square distribution. Two simple models, additive and multiplicative, for representing the combined effect of row and column are considered under the proportional hazards model. A simple conservative statistic is proposed for testing the additivity of the row and column effects. A simulation experiment to examine the behavior of the null distribution of the combined-effect test statistic under the additive model and the power of the test against the multiplicative model is reported. The procedure is illustrated by analyzing mammary tumors induced by 7,12-dimethylbenz[a]anthracene (DMBA) in yellow and agouti F1 female mice from a laboratory experiment.  相似文献   

14.
A nonparametric test to detect a pulse in monthly data is presented. This test is a maximum rank-sum test. The test statistic can be computed from frequencies or rates. The exact null distribution of the test statistic is tabulated for pulses that last 3, 4, 5, or 6 months. Estimates from a simulation study of the test's type I error rate and power are presented. The statistical modeling of the data is discussed. Several examples are given to illustrate the application of the test and the modeling procedures. Practical matters such as the treatment of tied observations, the effect of unequal lengths in the months, sample-size calculation, and post-test power analysis are discussed and illustrated with examples.  相似文献   

15.
OBJECTIVES: The association of a candidate gene with disease can be evaluated by a case-control study in which the genotype distribution is compared for diseased cases and unaffected controls. Usually, the data are analyzed with Armitage's test using the asymptotic null distribution of the test statistic. Since this test does not generally guarantee a type I error rate less than or equal to the significance level alpha, tests based on exact null distributions have been investigated. METHODS: An algorithm to generate the exact null distribution for both Armitage's test statistic and a recently proposed modification of the Baumgartner-Weiss-Schindler statistic is presented. I have compared the tests in a simulation study. RESULTS: The asymptotic Armitage test is slightly anticonservative whereas the exact tests control the type I error rate. The exact Armitage test is very conservative, but the exact test based on the modification of the Baumgartner-Weiss-Schindler statistic has a type I error rate close to alpha. The exact Armitage test is the least powerful test; the difference in power between the other two tests is often small and the comparison does not show a clear winner. CONCLUSION: Simulation results indicate that an exact test based on the modification of the Baumgartner-Weiss-Schindler statistic is preferable for the analysis of case-control studies of genetic markers.  相似文献   

16.
17.
C T Le 《Biometrics》1988,44(1):299-303
This paper is concerned with the issue of testing for trend with correlated binary data. We consider the problem where one has either one or two ears (or eyes) available for analysis at baseline and one wishes to look at changes over time in a dichotomous outcome taking into account the correlation between responses from two ears. A reparameterization of Rosner's (1982, Biometrics 38, 105-114) correlated binary data model is presented and applied to a test for trend where the stratifying variable is age (or any other subject-specific variable). Observed and expected values are calculated for the trend statistic separately for both unilateral and bilateral cases and are then summed to obtain an overall summary statistic. The proposed method is illustrated by a reanalysis of data presented in a published study of the efficacy of antibiotics for the treatment of otitis media.  相似文献   

18.
A number of experimental methods have been reported for estimating the number of genes in a genome, or the closely related coding density of a genome, defined as the fraction of base pairs in codons. Recently, DNA sequence data representative of the genome as a whole have become available for several organisms, making the problem of estimating coding density amenable to sequence analytic methods. Estimates of coding density for a single genome vary widely, so that methods with characterized error bounds have become increasingly desirable. We present a method to estimate the protein coding density in a corpus of DNA sequence data, in which a 'coding statistic' is calculated for a large number of windows of the sequence under study, and the distribution of the statistic is decomposed into two normal distributions, assumed to be the distributions of the coding statistic in the coding and noncoding fractions of the sequence windows. The accuracy of the method is evaluated using known data and application is made to the yeast chromosome III sequence and to C. elegans cosmid sequences. It can also be applied to fragmentary data, for example a collection of short sequences determined in the course of STS mapping.  相似文献   

19.
SciRoKo is a user-friendly software tool for the identification of microsatellites in genomic sequences. The combination of an extremely fast search algorithm with a built-in summary statistic tool makes SciRoKo an excellent tool for full genome analysis. Compared to other already existing tools, SciRoKo also allows the analysis of compound microsatellites. AVAILABILITY: free for use: www.kofler.or.at/Bioinformatics. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

20.
一种随机化检验算法及其Matlab实现   总被引:1,自引:0,他引:1  
在数据满足特定假设的前提下,可用有关统计技术检验样品间的差异.然而,在自然情况下,这些假设往往不成立或未知成立.本研究建立了一种随机化检验算法,可对实数域上的两个样品进行差异显著性分析,给出了Matlab标准源程序.算法具有广适性.设定随机化模拟次数和差异显著性水平,则可计算出检验值P,及差异显著与否.该算法可使用各种差异函数,如欧氏距离等等.应用本算法,对数种实测样品数据,以不同的差异函数进行了差异性分析.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号