首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Hümbelin M  Thomas A  Lin J  Li J  Jore J  Berry A 《Gene》2002,300(1-2):129-139
Three statistical/mathematical analyses are carried out on isochore sequences: spectral analysis, analysis of variance, and segmentation analysis. Spectral analysis shows that there are GC content fluctuations at different length scales in isochore sequences. The analysis of variance shows that the null hypothesis (the mean value of a group of GC contents remains the same along the sequence) may or may not be rejected for an isochore sequence, depending on the subwindow sizes at which GC contents are sampled, and the window size within which group members are defined. The segmentation analysis shows that there are stronger indications of GC content changes at isochore borders than within an isochore. These analyses support the notion of isochore sequences, but reject the assumption that isochore sequences are homogeneous at the base level. An isochore sequence may pass a homogeneity test when GC content fluctuations at smaller length scales are ignored or averaged out.  相似文献   

2.
Many authors apply statistical tests to sets of relevés obtained using non-random methods to investigate phytosociological and ecological relationships. Frequently applied tests include thet-test, ANOVA, Mann-Whitney test, Kruskal-Wallis test, chi-square test (of independence, goodness-of-fit, and homogeneity), Kolmogorov-Smirnov test, concentration analysis, tests of linear correlation and Spearman rank correlation coefficient, computer intensive methods (such as randomization and re-sampling) and others. I examined the extent of reliability of the results of such tests applied to non-random data by examining the tests requirements according to statistical theory. I conclude that when used for such data, the statistical tests do not provide reliable support for the inferences made because non-randomness of samples violated the demand for observations to be independent, and different parts of the investigated communities did not have equal chance to be represented in the sample. Additional requirements, e.g. of normality and homoscedasticity, were also neglected in several cases. The importance of data satisfying the basic requirements set by statistical tests is stressed.  相似文献   

3.
4.
Widely used in testing statistical hypotheses, the Bonferroni multiple test has a rather low power that entails a high risk to accept falsely the overall null hypothesis and therefore to not detect really existing effects. We suggest that when the partial test statistics are statistically independent, it is possible to reduce this risk by using binomial modifications of the Bonferroni test. Instead of rejecting the null hypothesis when at least one of n partial null hypotheses is rejected at a very high level of significance (say, 0.005 in the case of n = 10), as it is prescribed by the Bonferroni test, the binomial tests recommend to reject the null hypothesis when at least k partial null hypotheses (say, k = [n/2]) are rejected at much lower level (up to 30-50%). We show that the power of such binomial tests is essentially higher as compared with the power of the original Bonferroni and some modified Bonferroni tests. In addition, such an approach allows us to combine tests for which the results are known only for a fixed significance level. The paper contains tables and a computer program which allow to determine (retrieve from a table or to compute) the necessary binomial test parameters, i.e. either the partial significance level (when k is fixed) or the value of k (when the partial significance level is fixed).  相似文献   

5.
Isochore structures in the mouse genome   总被引:2,自引:0,他引:2  
Zhang CT  Zhang R 《Genomics》2004,83(3):384-394
The distribution of the G+C content in the mouse genome has been studied using a windowless technique. We have found that: (i). Abrupt variations of the G+C content from a GC-rich region to a GC-poor region, and vice versa, occur frequently at some sites along the sequence of the mouse genome. (ii). Long domains with relatively homogeneous G+C content (isochores) exist, which usually have sharp boundaries. Consequently, 28 isochores longer than 1 Mb have been identified in the mouse genome. A homogeneity index was used to quantify the variations of the G+C content within isochores. The precise boundaries, sizes, and G+C contents of these isochores have been determined. The windowless technique for the G+C content computation was also used to analyze the DNA sequence containing the mouse MHC region, which has a GC-poor isochore. This isochore is located at the central part of the sequence with boundaries at 468459 and 812716 bp, where the sequence is extended from the centromeric end to the telomeric end. In addition, the analysis of a segment of the rat genome shows that the rat genome also has clear isochore structures.  相似文献   

6.
Haiminen N  Mannila H 《Gene》2007,394(1-2):53-60
The isochore structure of a genome is observable by variation in the G+C (guanine and cytosine) content within and between the chromosomes. Describing the isochore structure of vertebrate genomes is a challenging task, and many computational methods have been developed and applied to it. Here we apply a well-known least-squares optimal segmentation algorithm to isochore discovery. The algorithm finds the best division of the sequence into k pieces, such that the segments are internally as homogeneous as possible. We show how this simple segmentation method can be applied to isochore discovery using as input the G+C content of sliding windows on the sequence. To evaluate the performance of this segmentation technique on isochore detection, we present results from segmenting previously studied isochore regions of the human genome. Detailed results on the MHC locus, on parts of chromosomes 21 and 22, and on a 100 Mb region from chromosome 1 are similar to previously suggested isochore structures. We also give results on segmenting all 22 autosomal human chromosomes. An advantage of this technique is that oversegmentation of G+C rich regions can generally be avoided. This is because the technique concentrates on greater global, instead of smaller local, differences in the sequence composition. The effect is further emphasized by a log-transformation of the data that lowers the high variance that is observed in G+C rich regions. We conclude that the least-squares optimal segmentation method is computationally efficient and yields results close to previous biologically motivated isochore structures.  相似文献   

7.
Hirsch R. P. 1979. Distribution of Polymorphus minutus among its intermediate hosts. International journal for Parasitology10: 243–248. In 1971, Crofton investigated patterns of distribution of Polymorphus minutus in the intermediate host, Gammarus pulex. Among his conclusions were: (1) P. minutus populations occur in patterns similar to negative binomial distributions, and (2) parasite-induced host mortality results in patterns similar to truncated (high end) negative binomial distributions. Those conclusions, however, were not tested by statistical analyses. To test Crofton's observations, Chi-square goodness of fit tests were applied to data used by Crofton and an additional two stations sampled by Hynes & Nicholas in 1963. Analyses were expanded to include five theoretical distributions, four patterns of host mortality and various rates of host mortality. Truncated forms of negative binomial, positive binomial and Poisson distributions were also investigated where nontruncated distributions failed to fit observed distributions. It was found that negative binomial distributions most frequently describe patterns of P. minutus distribution with the exception of one population described by Poisson and another by positive binomial distributions. Crofton's assumption that truncated distributions result from parasite-induced host mortality seems unlikely in light of those analyses.  相似文献   

8.
When the results of biological experiments are tested statistically for a possible difference between a treatment and its control, the statistical inferences are valid only if the statistical procedure is derived from a model that fits the experimental results satisfactorily. In this paper it is shown that a beta-binomial distribution provided a better fit than a binomial distribution when the data used were based on a large number of counts of dead and total implants on dominant-lethal tests on mice. This suggests that the probability P that an implant will die is not constant over the experimental units. Tests derived from the beta-biomial distribution have been used and their results compared with those of the tests based on the binomial or constant P assumption. The tests based on the binomial model are erroneously too severe when P is not constant in a group. The problem of knowing whether the males or the females should be considered as the experimental units is considered. In this paper, calculations are carried out for the two situations. This problem will be further studied by computer simulation and the result will be presented in a next paper.It is also shown that a negative binomial distribution could be fitted to the dead implant counts. No test based on this model was used because it ignores the total implants. No familiar distribution could be fitted to the total implant counts.  相似文献   

9.
Statistical tests for differential expression in cDNA microarray experiments   总被引:13,自引:0,他引:13  
Extracting biological information from microarray data requires appropriate statistical methods. The simplest statistical method for detecting differential expression is the t test, which can be used to compare two conditions when there is replication of samples. With more than two conditions, analysis of variance (ANOVA) can be used, and the mixed ANOVA model is a general and powerful approach for microarray experiments with multiple factors and/or several sources of variation.  相似文献   

10.
The human genome is composed of large sequence segments with fairly homogeneous GC content, namely isochores, which have been linked to many important functions; biological implications of most isochore boundaries, however, remain elusive, partly due to the difficulty in determining these boundaries at high resolution. Using the segmentation algorithm based on the quadratic divergence, we re-determined all 79 boundaries of previously identified human isochores at single-nucleotide resolution, and then compared the boundary coordinates with other genome features. We found that 55.7% of isochore boundaries coincide with termini of repeat elements; 45.6% of isochore boundaries coincide with termini of highly conserved sequences based on alignment of 17 vertebrate genomes, i.e., the highly conserved genome sequence switches to a less or non-conserved one at the isochore boundary; some isochore boundaries coincide with abrupt change of CpG island distribution (note that one boundary can associate with more than one genome feature). In addition, sequences around isochore boundaries are highly conserved. It seems reasonable to deduce that the boundaries of all the isochores studied here would be replication timing sites in the human genome. These results suggest possible key roles of the isochore boundaries and may further our understanding of the human genome organization.  相似文献   

11.
Symbolic logic, as used in the formal theory of scientific explanation proposed by Hempel and Oppenheim, has been suggested as the basis for automated medical diagnosis. In human autopsy pathology the determination of cause-and-effect relationships is a major area subject to logical analysis. We propose a modification of the Hempel-Oppenheim schema in which the logical relationships must only be satisfied “much” of the time, as determined by binomial significance tests. The analysis employs “certainty levels” logic with a more limited consistency requirement than classical logic. The analysis is applied to a series of 181 autopsied patients with leukemia in an attempt to determine a possible role of chemotherapeutic agents in the etiology of pulmonary edema. Among 51 patients who had received cytosine arabinoside (Ara-C) within 30 days of death, there was significantly more unexplained moderate or massive pulmonary edema than among patients with no or remote therapy (p<0.001). The results suggest that a symbolic logical analysis combined with a binomial significance test can elucidate cause-and-effect relationships observed at autopsy, especially when there are multiple possible explanations for the same effect.  相似文献   

12.
Configural Frequency Analysis (CFA) is being increasingly used by psychologists and other researchers to test for the presence of combinations of categorical variables which occur more frequently or less frequently than expected under a particular model of chance. Configurations which occur more frequently than chance are known as “Types”-Configurations which are conspicuous by their absence or rarity are known as “Antitypes”. Most configural frequency test theory consists of binomial tests applied to the cells of a cross-tabulation table. The wide variety of statistical tests described in papers and books on CFA are approximations to the binomial test, due to the computational intensity associated with performing binomial tests directly (VON EYE, 1990b). This paper advocates direct computation of binomial probabilities instead of the usual approximations used in CFA. Mathematical relationships of the binomial distribution with the F and incomplete beta distributions are described which enable the researcher to efficiently compute binomial probabilities using functions available in common statistical software. The classical inference approach adopted by traditional CFA makes it difficult to make conclusions regarding the likely prevalence rates of types or antitypes in the reference population. It is also not possible to exploit additional information about the sample which, while not known precisely, is known with a degree of confidence and can aid in the identification of types and antitypes. A Bayesian conjugate distributions approach based on the incomplete beta distribution is proposed. Bayesian extensions of this model to both classical CFA and a sequential CFA analysis advanced by KIESER and VICTOR (1991) are described.  相似文献   

13.
In this study, we present a new method for evaluating animal evolutionary relationships. We used the GC% levels of genome-wide genes to determine the correlation between the GC% content and evolutionary relationship. The correlation coefficients of the GC% content of the orthologous genes of the paired animal species were calculated for a total of 21 species, and the evolutionary branching dates of these 21 species were derived from fossil records. The correlation coefficient of the GC% content of the orthologous genes of the species pair under study served as an indicator of their evolutionary relationship. Moreover, there was a decreasing linear relationship between the correlation coefficient and evolutionary branching date (R2 = 0.930).  相似文献   

14.
The mammalian genome is not a random sequence but shows a specific, evolutionarily conserved structure that becomes manifest in its isochore pattern. Isochores, i.e. stretches of DNA with a distinct sequence composition and thus a specific GC content, cause the chromosomal banding pattern. This fundamental level of genome organization is related to several functional features like the replication timing of a DNA sequence. GC richness of genomic regions generally corresponds to an early replication time during S phase. Recently, we demonstrated this interdependency on a molecular level for an abrupt transition from a GC-poor isochore to a GC-rich one in the NF1 gene region; this isochore boundary also separates late from early replicating chromatin. Now, we analyzed another genomic region containing four isochores separated by three sharp isochore transitions. Again, the GC-rich isochores were found to be replicating early, the GC-poor isochores late in S phase; one of the replication time zones was discovered to consist of one single replicon. At the boundaries between isochores, that all show no special sequence elements, the replication machinery stopped for several hours. Thus, our results emphasize the importance of isochores as functional genomic units, and of isochore transitions as genomic landmarks with a key function for chromosome organization and basic biological properties.  相似文献   

15.
Analytical DNA ultracentrifugation revealed that eukaryotic genomes are mosaics of isochores: long DNA segments (>300 kb on average) relatively homogeneous in G+C. Important genome features are dependent on this isochore structure, e.g. genes are found predominantly in the GC-richest isochore classes. However, no reliable method is available to rigorously partition the genome sequence into relatively homogeneous regions of different composition, thereby revealing the isochore structure of chromosomes at the sequence level. Homogeneous regions are currently ascertained by plain statistics on moving windows of arbitrary length, or simply by eye on G+C plots. On the contrary, the entropic segmentation method is able to divide a DNA sequence into relatively homogeneous, statistically significant domains. An early version of this algorithm only produced domains having an average length far below the typical isochore size. Here we show that an improved segmentation method, specifically intended to determine the most statistically significant partition of the sequence at each scale, is able to identify the boundaries between long homogeneous genome regions displaying the typical features of isochores. The algorithm precisely locates classes II and III of the human major histocompatibility complex region, two well-characterized isochores at the sequence level, the boundary between them being the first isochore boundary experimentally characterized at the sequence level. The analysis is then extended to a collection of human large contigs. The relatively homogeneous regions we find show many of the features (G+C range, relative proportion of isochore classes, size distribution, and relationship with gene density) of the isochores identified through DNA centrifugation. Isochore chromosome maps, with many potential applications in genomics, are then drawn for all the completely sequenced eukaryotic genomes available.  相似文献   

16.
Evolution of isochores in rodents   总被引:4,自引:1,他引:3  
The most deviant isochore pattern within mammals was found in rat and mouse; most other mammals possess a different kind of isochore organization called the "general pattern." However, isochore patterns remain largely unknown in rodents other than mouse and rat. To investigate the taxonomic distribution of isochore patterns in rodents, we sequenced the nuclear gene LCAT (lecithin:cholesterol acyltransferase) from 17 rodents species (bringing the total of LCAT sequences in rodent to 19) and compared their GC contents at third codon positions and in introns. We also analyzed an extensive sequence database from rodents other than rat and mouse. All murid LCAT sequences are much poorer in GC than all nonrodent LCAT sequences, and the hamster sequence database shows exactly the same isochore pattern as rat and mouse. Thus, all murids share the same special isochore pattern--GC homogenization. LCAT sequences are GC-poor in hystricomorphs too, but the guinea pig sequence database indicates that large changes in GC content occur without an overall modification of the isochore pattern. This novel mode of isochore evolution is called GC reordering. LCAT sequences also show that the evolution of isochores in sciurids and glirids is nonconservative in comparison with that in nonrodents. Thus, at least two novel patterns of isochore evolution were found. No rodent investigated to date shared the general mammalian pattern.   相似文献   

17.
生态学研究中常见的统计学问题分析   总被引:6,自引:0,他引:6       下载免费PDF全文
在当代生态学研究中统计学方法的应用日益广泛,对于生态科学的发展和研究水平的提高起到了积极的作用。但是不容忽视的是在生态学研究应用统计学方法的过程中存在若干问题,主要表现在:1)回归分析方面的问题。直线回归方程用相关指数R2来描述直线回归的显著性;曲线回归方程往往用相关系数r来表示显著性;多元线性回归方程只对方程进行显著性检验,没有对每一个回归系数进行显著性检验。2)方差分析方面的问题。当处理数超过2时,不恰当地使用t_检验比较平均数的差异显著性。该文分析了产生这些问题的原因,提出了改进的对策。  相似文献   

18.
Abstract

The human genome is composed of large sequence segments with fairly homogeneous GC content, namely isochores, which have been linked to many important functions; biological implications of most isochore boundaries, however, remain elusive, partly due to the difficulty in determining these boundaries at high resolution. Using the segmentation algorithm based on the quadratic divergence, we re-determined all 79 boundaries of previously identified human isochores at single-nucleotide resolution, and then compared the boundary coordinates with other genome features. We found that 55.7% of isochore boundaries coincide with termini of repeat elements; 45.6% of isochore boundaries coincide with termini of highly conserved sequences based on alignment of 17 vertebrate genomes, i.e., the highly conserved genome sequence switches to a less or non-conserved one at the isochore boundary; some isochore boundaries coincide with abrupt change of CpG island distribution (note that one boundary can associate with more than one genome feature). In addition, sequences around isochore boundaries are highly conserved. It seems reasonable to deduce that the boundaries of all the isochores studied here would be replication timing sites in the human genome. These results suggest possible key roles of the isochore boundaries and may further our understanding of the human genome organization.  相似文献   

19.
One of the fascinating properties of the DNA sequences of prokaryotic and eukaryotic chromosomes is that they possess long-range order. Computational methods like spectral analysis, mutual information and DNA random walks have been used to probe long-range order via-long range correlations. This work attempts to show the advantage of using the Information Theoretic measure of mutual information for this purpose. A number Mu is found which indicates the existence of long-range order. Mu is the ratio between the value of mutual information function between two nucleotides of a DNA sequence separated by a large distance of 100 kilobases to the value expected from a randomized sequence of the same DNA. It is found that in spite of the constant shuffling of nucleotides due to insertion, deletion, inversion and recombination that occur during evolution, the chromosomal structure of prokaryotes is not always mosaic. While all archaeal chromosomes show mosaic structure and lack long-range order, a sizable fraction of the bacterial chromosomes do possess long-range order. A statistical multivariate analysis has been done to find which of the physical variables like genome size or GC% affects the organization of the chromosome or correlates with the long-range order. The existence of long-range order in bacterial chromosomes could be directly correlated to the degree of gene strand bias shown by it. Firmicutes which have low GC content also have pronounced strand bias and show long-range correlations. It is observed that the occurrence of long-range order in bacteria is independent of genome size, but depends on its GC content and gene strand bias.  相似文献   

20.
An isochore map of the human genome based on the Z curve method   总被引:4,自引:0,他引:4  
Zhang CT  Zhang R 《Gene》2003,317(1-2):127-135
The distribution of the G+C content in the human genome has been studied by using a windowless technique derived from the Z curve method. The most important findings presented in this paper are twofold. First, abrupt variations of the G+C content along human chromosome sequences are the main variation patterns of G+C content. It is found that at some sites, the G+C content undergoes abrupt changes from a G+C-rich region to a G+C-poor region alternatively and vice versa. Second, it is shown that long domains with relatively homogeneous G+C content along each chromosome do exist. These domains are thought to be isochores, which usually have sharp boundaries. Consequently, 56 isochores longer than 3 Mb have been identified in chromosomes 1-22, X and Y. Boundaries, size and G+C content of each isochore identified are listed in detail. As an example to demonstrate the power of the method, the boundary between the Classes III and II isochores of the MHC sequence has been determined and found to be at 2,477,936, which is in good agreement with the experimental evidence. A homogeneity index is introduced to measure the homogeneity of G+C content in isochores. We emphasize that the homogeneity of G+C content is relative. The isochores in which the G+C content keeps absolutely constant do not exist. Isochore structures appear to be a basic organization of the human genome. Due to the relevance to many important biological functions, the clarification of isochore structures will provide much insight into the understanding of the human genome.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号