首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Summary Methods for performing multiple tests of paired proportions are described. A broadly applicable method using McNemar's exact test and the exact distributions of all test statistics is developed; the method controls the familywise error rate in the strong sense under minimal assumptions. A closed form (not simulation‐based) algorithm for carrying out the method is provided. A bootstrap alternative is developed to account for correlation structures. Operating characteristics of these and other methods are evaluated via a simulation study. Applications to multiple comparisons of predictive models for disease classification and to postmarket surveillance of adverse events are given.  相似文献   

2.
The molecular clock theory has greatly enlightened our understanding of macroevolutionary events. Maximum likelihood (ML) estimation of divergence times involves the adoption of fixed calibration points, and the confidence intervals associated with the estimates are generally very narrow. The credibility intervals are inferred assuming that the estimates are normally distributed, which may not be the case. Moreover, calculation of standard errors is usually carried out by the curvature method and is complicated by the difficulty in approximating second derivatives of the likelihood function. In this study, a standard primate phylogeny was used to examine the standard errors of ML estimates via the bootstrap method. Confidence intervals were also assessed from the posterior distribution of divergence times inferred via Bayesian Markov Chain Monte Carlo. For the primate topology under evaluation, no significant differences were found between the bootstrap and the curvature methods. Also, Bayesian confidence intervals were always wider than those obtained by ML.  相似文献   

3.
Assessment of the reliability of a given phylogenetic hypothesis is an important step in phylogenetic analysis. Historically, the nonparametric bootstrap procedure has been the most frequently used method for assessing the support for specific phylogenetic relationships. The recent employment of Bayesian methods for phylogenetic inference problems has resulted in clade support being expressed in terms of posterior probabilities. We used simulated data and the four-taxon case to explore the relationship between nonparametric bootstrap values (as inferred by maximum likelihood) and posterior probabilities (as inferred by Bayesian analysis). The results suggest a complex association between the two measures. Three general regions of tree space can be identified: (1) the neutral zone, where differences between mean bootstrap and mean posterior probability values are not significant, (2) near the two-branch corner, and (3) deep in the two-branch corner. In the last two regions, significant differences occur between mean bootstrap and mean posterior probability values. Whether bootstrap or posterior probability values are higher depends on the data in support of alternative topologies. Examination of star topologies revealed that both bootstrap and posterior probability values differ significantly from theoretical expectations; in particular, there are more posterior probability values in the range 0.85-1 than expected by theory. Therefore, our results corroborate the findings of others that posterior probability values are excessively high. Our results also suggest that extrapolations from single topology branch-length studies are unlikely to provide any general conclusions regarding the relationship between bootstrap and posterior probability values.  相似文献   

4.
To test whether there are differences between living lineages of domestic guinea pigs Cavia porcellus , we studied 118 specimens from six breeds collected along six Andean countries as well as 15 from the wild cavy species ( Cavia tschudii ). The mean weight and body length of 15 adult wild cavies (295±31 g, 242±8.3 mm) were significantly smaller than 25 creole guinea pigs from Bolivia and Chile (639±157 g, 287±23.7 mm, respectively). Eighteen laboratory/pet guinea pigs (including the English Pirbright breed) were also smaller (900±173 g, 308±21 mm) than 25 improved ones from Peru (Tamborada breed, 1241±75.4 g, 317±12 mm) and Ecuador (Auqui breed, 1138±65.5 g, 307±8 mm). Similar size increases appeared in the first axis of a principal component analysis of six skeletal measurements, recovering 84% of total variation. Phylogenetic and haplotype analyses of complete cytochrome b gene sequences consistently joined all 22 domestic individuals (13 shared unambiguous substitutions, 100% bootstrap in 1000 replicates), probably from a single first ancient domestication in the western Andes. Six laboratory/pet sequences were also joined within a common branch (six shared substitutions, 96% bootstrap), probably from a documented European second phase. By contrast, those from improved Auqui joined a northern creole subgroup (one shared substitution, 84% bootstrap), and those from Nativa and improved Tamborada clustered together and with a southern creole subgroup (four shared substitutions, 86% bootstrap); this suggests at least two independent modern events during a more complex third phase, producing two improved guinea pigs selected for size and meat. Cavia tschudii sequences showed some unexpected geographic variation.  相似文献   

5.
Diversity indices might be used to assess the impact of treatments on the relative abundance patterns in species communities. When several treatments are to be compared, simultaneous confidence intervals for the differences of diversity indices between treatments may be used. The simultaneous confidence interval methods described until now are either constructed or validated under the assumption of the multinomial distribution for the abundance counts. Motivated by four example data sets with background in agricultural and marine ecology, we focus on the situation when available replications show that the count data exhibit extra‐multinomial variability. Based on simulated overdispersed count data, we compare previously proposed methods assuming multinomial distribution, a method assuming normal distribution for the replicated observations of the diversity indices and three different bootstrap methods to construct simultaneous confidence intervals for multiple differences of Simpson and Shannon diversity indices. The focus of the simulation study is on comparisons to a control group. The severe failure of asymptotic multinomial methods in overdispersed settings is illustrated. Among the bootstrap methods, the widely known Westfall–Young method performs best for the Simpson index, while for the Shannon index, two methods based on stratified bootstrap and summed count data are preferable. The methods application is illustrated for an example.  相似文献   

6.
A 1230-bp region of the cytochrome c oxidase subunit I (COI) gene of mitochondrial DNA of each of 16 brachiopod species, representing all five living orders, was amplified by polymerase chain reaction and sequenced. Pairwise comparisons of sequence differences plotted against divergence times estimated from the brachiopod fossil record revealed that, although there are considerable variations in the expected substitution rate among different lineages, amino acid substitutions of the COI sequences may largely become saturated in 100 Ma, due mostly to multiple substitutions at the same site. Coinciding with this result, phylogenetic analysis indicated low bootstrap values for nodes corresponding to divergence events that occurred before 100 Ma, suggesting that COI sequences are suitable only for inference of phylogenetic events subsequent to the Mesozoic. Examination of brachiopod codons corresponding to invariant amino acids in the COI of various other animals suggest the nonuniversal codon relationships UGA = Trp, AUA = Met, AAA/G = Lys, and AGA/G = Ser. These are identical to those in mollusks, annelids, and arthropods, consistent with the conclusion that brachiopods are protostomes, as indicated by previous molecular analyses.  相似文献   

7.
年龄-龄期两性生命表(age-stage, two-sex life table)简称两性生命表,是种群生态学研究与害虫治理中常用的重要理论与分析工具。根据两性生命表理论而设计的方便用户的软件TWOSEX-MSChart近年来被越来越多国内外学者用于昆虫种群研究的数据分析。两性生命表软件的分析功能是由许多的统计技术与计算机模拟方法作为数据分析的支撑,其中自我重复取样(bootstrap)是其重要技术之一。本文详述了bootstrap技术的基本原理、方法、优缺点及其在两性生命表分析中的应用,并介绍了其理论基础多项式定理(multinomial theorem)在生命表研究中的应用。与常用统计方法相比,bootstrap不需要数据分布假设就可以对数据总体的分布特性进行统计和推断。在两性生命表分析中,bootstrap不仅可以估算种群参数或一般统计值的方差和标准误,同时利用paired bootstrap test还可以比较不同处理间的差异,准确显示种群的变异性。利用相同的自我重复取样样本(same bootstrap samples)可以正确计算昆虫的孵化率与不同繁殖型对种群参数的贡献,并...  相似文献   

8.
In experiments involving many variables, investigators typically use multiple comparisons procedures to determine differences that are unlikely to be the result of chance. However, investigators rarely consider how the magnitude of the greatest observed effect sizes may have been subject to bias resulting from multiple testing. These questions of bias become important to the extent investigators focus on the magnitude of the observed effects. As an example, such bias can lead to problems in attempting to validate results, if a biased effect size is used to power a follow-up study. An associated important consequence is that confidence intervals constructed using standard distributions may be badly biased. A bootstrap approach is used to estimate and adjust for the bias in the effect sizes of those variables showing strongest differences. This bias is not always present; some principles showing what factors may lead to greater bias are given and a proof of the convergence of the bootstrap distribution is provided.  相似文献   

9.
Investigators have linked rare copy number variation (CNVs) to neuropsychiatric diseases, such as schizophrenia. One hypothesis is that CNV events cause disease by affecting genes with specific brain functions. Under these circumstances, we expect that CNV events in cases should impact brain-function genes more frequently than those events in controls. Previous publications have applied “pathway” analyses to genes within neuropsychiatric case CNVs to show enrichment for brain-functions. While such analyses have been suggestive, they often have not rigorously compared the rates of CNVs impacting genes with brain function in cases to controls, and therefore do not address important confounders such as the large size of brain genes and overall differences in rates and sizes of CNVs. To demonstrate the potential impact of confounders, we genotyped rare CNV events in 2,415 unaffected controls with Affymetrix 6.0; we then applied standard pathway analyses using four sets of brain-function genes and observed an apparently highly significant enrichment for each set. The enrichment is simply driven by the large size of brain-function genes. Instead, we propose a case-control statistical test, cnv-enrichment-test, to compare the rate of CNVs impacting specific gene sets in cases versus controls. With simulations, we demonstrate that cnv-enrichment-test is robust to case-control differences in CNV size, CNV rate, and systematic differences in gene size. Finally, we apply cnv-enrichment-test to rare CNV events published by the International Schizophrenia Consortium (ISC). This approach reveals nominal evidence of case-association in neuronal-activity and the learning gene sets, but not the other two examined gene sets. The neuronal-activity genes have been associated in a separate set of schizophrenia cases and controls; however, testing in independent samples is necessary to definitively confirm this association. Our method is implemented in the PLINK software package.  相似文献   

10.
ABSTRACT: Large-scale sequencing of genomes has enabled the inference of phylogenies based on the evolution of genomic architecture, under such events as rearrangements, duplications, and losses. Many evolutionary models and associated algorithms have been designed over the last few years and have found use in comparative genomics and phylogenetic inference. However, the assessment of phylogenies built from such data has not been properly addressed to date. The standard method used in sequence-based phylogenetic inference is the bootstrap, but it relies on a large number of homologous characters that can be resampled; yet in the case of rearrangements, the entire genome is a single character. Alternatives such as the jackknife suffer from the same problem, while likelihood tests cannot be applied in the absence of well established probabilistic models. We present a new approach to the assessment of distance-based phylogenetic inference from whole-genome data; our approach combines features of the jackknife and the bootstrap and remains nonparametric. For each feature of our method, we give an equivalent feature in the sequence-based framework; we also present the results of extensive experimental testing, in both sequence-based and genome-based frameworks. Through the feature-by-feature comparison and the experimental results, we show that our bootstrapping approach is on par with the classic phylogenetic bootstrap used in sequence-based reconstruction, and we establish the clear superiority of the classic bootstrap for sequence data and of our corresponding new approach for rearrangement data over proposed variants. Finally, we test our approach on a small dataset of mammalian genomes, verifying that the support values match current thinking about the respective branches. Our method is the first to provide a standard of assessment to match that of the classic phylogenetic bootstrap for aligned sequences. Its support values follow a similar scale and its receiver-operating characteristics are nearly identical, indicating that it provides similar levels of sensitivity and specificity. Thus our assessment method makes it possible to conduct phylogenetic analyses on whole genomes with the same degree of confidence as for analyses on aligned sequences. Extensions to search-based inference methods such as maximum parsimony and maximum likelihood are possible, but remain to be thoroughly tested.  相似文献   

11.
A potential limitation of data from microarray experiments exists when improper control samples are used. In cancer research, comparisons of tumour expression profiles to those from normal samples is challenging due to tissue heterogeneity (mixed cell populations). A specific example exists in a published colon cancer dataset, in which tissue heterogeneity was reported among the normal samples. In this paper, we show how to overcome or avoid the problem of using normal samples that do not derive from the same tissue of origin as the tumour. We advocate an exploratory unsupervised bootstrap analysis that can reveal unexpected and undesired, but strongly supported, clusters of samples that reflect tissue differences instead of tumour versus normal differences. All of the algorithms used in the analysis, including the maximum difference subset algorithm, unsupervised bootstrap analysis, pooled variance t-test for finding differentially expressed genes and the jackknife to reduce false positives, are incorporated into our online Gene Expression Data Analyzer ( http:// bioinformatics.upmc.edu/GE2/GEDA.html ).  相似文献   

12.
Randomized clinical trials with time-to-event endpoints are frequently stopped after a prespecified number of events has been observed. This practice leads to dependent data and nonrandom censoring, which can in general not be solved by conditioning on the underlying baseline information. In case of staggered study entry, matters are complicated substantially. The present paper demonstrates that the study design at hand entails general independent censoring in the counting process sense, provided that the analysis is based on study time information only. To illustrate that the filtrations must not use abundant information, we simulated data of event-driven trials and evaluated them by means of Cox regression models with covariates for the calendar times. The Breslow curves of the cumulative baseline hazard showed considerable deviations, which implies that the analysis is disturbed by conditioning on the calendar time variables. A second simulation study further revealed that Efron's classical bootstrap, unlike the (martingale-based) wild bootstrap, may lead to biased results in the given setting, as the assumption of random censoring is violated. This is exemplified by an analysis of data on immunotherapy in patients with advanced, previously treated nonsmall cell lung cancer.  相似文献   

13.
Heterochrony, differences in the timing of developmental events between descendent species and their ancestors, is a pervasive evolutionary pattern. However, the origins of such timing changes are still not resolved. Here we show, using sequence analysis, that exposure to predator cues altered the timing of onset of several developmental events in embryos of two closely related gastropod species: Radix balthica and Radix auricularia. These timing alterations were limited to certain events and were species-specific. Compared with controls, over half (62%) of exposed R. auricularia embryos had a later onset of body flexing and an earlier occurrence of the eyes and the heart; in R. balthica, 67 per cent of exposed embryos showed a later occurrence of mantle muscle flexing and an earlier attachment to, and crawling on, the egg capsule wall. The resultant developmental sequences in treated embryos converged, and were more similar to one another than were the sequences of the controls for both species. We conclude that biotic agents can elicit altered event timing in developing gastropod embryos. These changes were species-specific, but did not occur in all individuals. Such developmental plasticity in the timing of developmental events could be an important step in generating interspecific heterochrony.  相似文献   

14.
A method is presented for removing recent homoplastic events from a phylogenetic tree. This “topiary pruning” method produces a series of progressively modified duplicates of the original set of data, from which more and more of the most recent substitutions have been removed. The edited sets of data have increased amounts of information per remaining taxon, while similar but randomized data sets subjected to topiary pruning do not. The ability of topiary pruning to “unscramble” artificial data sets that have high levels of homoplasy is demonstrated, and is shown to be similar in its effects to the weighting method of Kluge and Farris (1969), although with the additional advantage of reducing the number of taxa to the point where bootstrapping is feasible. Pruning and weighting used together produce closer approximations to the “true” tree than either method used separately. It is further shown that in these artificial data sets midpoint rooting is more likely to be accurate than outgroup rooting. When pruning and weighting are applied to the extensive sets of mitochondrial DNA data of Cann et al. (1987) and Vigilant et al. (1991), trees result that have deep branch points, some of which lead to entirely African branches. In the case of the Vigilant et al. data, the three African branches have bootstrap values between 0.94 and 1.0, and the consensus and bootstrap midpoint roots also have high bootstrap values and occur on these African branches near their junction. An African origin of the human mitochondrial tree is not proved by this approach, particularly since sequences from non-African groups are underrepresented in current data sets, but it is rendered more likely.  相似文献   

15.
We have analyzed what phylogenetic signal can be derived by small subunit rRNA comparison for bacteria of different but closely related genera (enterobacteria) and for different species or strains within a single genus (Escherichia or Salmonella), and finally how similar are the ribosomal operons within a single organism (Escherichia coli). These sequences have been analyzed by neighbor-joining, maximum likelihood, and parsimony. The robustness of each topology was assessed by bootstrap. Sequences were obtained for the seven rrn operons of E. coli strain PK3. These data demonstrated differences located in three highly variable domains. Their nature and localization suggest that since the divergence of E. coli and Salmonella typhimurium, most point mutations that occurred within each gene have been propagated among the gene family by conversions involving short domains, and that homogenization by conversions may not have affected the entire sequence of each gene. We show that the differences that exist between the different operons are ignored when sequences are obtained either after cloning of a single operon or directly from polymerase chain reaction (PCR) products. Direct sequencing of PCR products produces a mean sequence in which mutations present in the most variable domains become hidden. Cloning a single operon results in a sequence that differs from that of the other operons and of the mean sequence by several point mutations. For identification of unknown bacteria at the species level or below, a mean sequence or the sequence of a single nonidentified operon should therefore be avoided. Taking into account the seven operons and therefore mutations that accumulate in the most variable domains would perhaps increase tree resolution. However, if gene conversions that homogenize the rRNA multigene family are rare events, some nodes in phylogenetic trees will reflect these recombination events and these trees may therefore be gene trees rather than organismal trees.   相似文献   

16.
Two mitochondrial genes, Cytochrome b (Cytb) and Cytochrome c oxidase subunit I (COI), have been used as phylogenetic markers in Chironomids. The nucleotide sequences of 685 bp from Cytb and 596 bp from COI have been determined for 36 Chironomus species from the Palearctic, or Holarctic, and Australasia. The concatenated sequence of 1281 bp from both genes was used to investigate the phylogenetic relationships among these species. The nucleotide sequence alignments were used for construction of phylogenetic trees based on maximum-parsimony and neighbor-joining methods. Both techniques produced similar phylogenies. Monophyly of the genus Chironomus is supported by a bootstrap value of 100% at the basal branch. Six clusters of species have been revealed with high bootstrap values supporting both monophyly of each cluster and the validity of the branching order within each cluster. Four species, C. circumdatus, C. nepeanensis, C. dorsalis, and C. crassiforceps, cannot be placed into any cluster. Cytological phylogenies were constructed using the same set of species, except for C. biwaprimus. These trees showed many similarities to that obtained from the mitochondrial (mt) sequence analysis, but also a number of significant differences. When compared with the tree constructed from the sequence of 23 species available for one of the globin genes, globin 2b (gb2b), there was better support for the mt tree than for the cytological trees. An intron, which varies in its occurrence and position in gb2b, was also investigated and the distribution of the introns supports the phylogenetic history of the genus Chironomus obtained with mt data. The differences observed in the cytological trees seem to be attributable more to the retention of the same chromosome banding sequence across several species, rather than convergent evolutionary events. An important question is the determination of the position of the subgenus Camptochironomus in relation to the representatives of the nominal subgenus Chironomus, since it has been suggested that this is a separate genus. The Camptochironomus species are internal to the trees and have arisen more recently than some of the species of the subgenus Chironomus, indicating that they are not sufficiently differentiated to be considered more than a subgenus.  相似文献   

17.
Ongoing hybridization and retained ancestral polymorphism in rapidly radiating lineages could mask recent cladogenetic events. This presents a challenge for the application of molecular phylogenetic methods to resolve differences between closely related taxa. We reanalyzed published genotyping‐by‐sequencing (GBS) data to infer the phylogeny of four species within the Ophrys sphegodes complex, a recently radiated clade of orchids. We used different data filtering approaches to detect different signals contained in the dataset generated by GBS and estimated their effects on maximum likelihood trees, global FST and bootstrap support values. We obtained a maximum likelihood tree with high bootstrap support, separating the species by using a large dataset based on loci shared by at least 30% of accessions. Bootstrap and FST values progressively decreased when filtering for loci shared by a higher number of accessions. However, when filtering more stringently to retain homozygous and organellar loci, we identified two main clades. These clades group individuals independently from their a priori species assignment, but were associated with two organellar haplotype clusters. We infer that a less stringent filtering preferentially selects for rapidly evolving lineage‐specific loci, which might better delimit lineages. In contrast, when using homozygous/organellar DNA loci the signature of a putative hybridization event in the lineage prevails over the most recent phylogenetic signal. These results show that using differing filtering strategies on GBS data could dissect the organellar and nuclear DNA phylogenetic signal and yield novel insights into relationships between closely related species.  相似文献   

18.
In microarray studies it is common that the number of replications (i.e. the sample size) is small and that the distribution of expression values differs from normality. In this situation, permutation and bootstrap tests may be appropriate for the identification of differentially expressed genes. However, unlike bootstrap tests, permutation tests are not suitable for very small sample sizes, such as three per group. A variety of different bootstrap tests exists. For example, it is possible to adjust the data to have a common mean before the bootstrap samples are drawn. For small significance levels, which can occur when a large number of genes is investigated, the original bootstrap test, as well as a bootstrap test suggested for the Behrens-Fisher problem, have no power in cases of very small sample sizes. In contrast, the modified test based on adjusted data is powerful. Using a Monte Carlo simulation study, we demonstrate that the difference in power can be huge. In addition, the different tests are illustrated using microarray data.  相似文献   

19.
The bootstrap is an important tool for estimating the confidence interval of monophyletic groups within phylogenies. Although bootstrap analyses are used in most evolutionary studies, there is no clear consensus as how best to interpret bootstrap probability values. To study further the bootstrap method, nine small subunit ribosomal DNA (SSU rDNA) data sets were submitted to bootstrapped maximum parsimony (MP) analyses using unweighted and weighted sequence positions. Analyses of the lengths (i.e., parsimony steps) of the bootstrap trees show that the shape and mean of the bootstrap tree distribution may provide important insights into the evolutionary signal within the sequence data. With complex phylogenies containing nodes defined by short internal branches (multifurcations), the mean of the bootstrap tree distribution may differ by 2 standard deviations from the length of the best tree found from the original data set. Weighting sequence positions significantly increases the bootstrap values at internal nodes. There may, however, be strong bootstrap support for conflicting species groupings among different data sets. This phenomenon appears to result from a correlation between the topology of the tree used to create the weights and the topology of the bootstrap consensus tree inferred from the MP analysis of these weighted data. The analyses also show that characteristics of the bootstrap tree distribution (e.g., skewness) may be used to choose between alternative weighting schemes for phylogenetic analyses.  相似文献   

20.
Introduction: Self-reported household pesticide use has been associated with higher risk of childhood leukemia in a number of case–control studies. The aim of this study is to assess the reliability of self-reported household use of pesticides and potential differences in reliability by case–control status, and by socio-demographic characteristics. Methods: Analyses are based on a subset of the Northern California Childhood Leukemia Study population. Eligible households included those with children less than 8 years old who lived in the same residence since diagnosis (reference date for controls). The reliability was based on two repeated in-person interviews. Kappa, percent positive and negative agreements were used to assess reliability of responses to ever/never use of six pesticides categories. Results: Kappa statistics ranged from 0.31 to 0.61 (fair to substantial agreement), with 9 out of the 12 tests indicating moderate agreement. The percent positive agreement ranged from 46 to 80% and the percent negative agreement from 54 to 95%. Reliability for all pesticide types as assessed by the three reliability measures did not differ significantly for cases and controls as confirmed by bootstrap analysis. For most pesticide types, Kappa and percent positive agreement were higher for non-Hispanics than Hispanics and for households with higher income vs. lower income. Conclusions: Reproducibility of maternal-reported pesticide use was moderate to high and was similar among cases and controls suggesting that differential recall is not likely to be a major source of bias.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号