首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Large sets of bioinformatical data provide a challenge in time consumption while solving the cluster identification problem, and that is why a parallel algorithm is so needed for identifying dense clusters in a noisy background. Our algorithm works on a graph representation of the data set to be analyzed. It identifies clusters through the identification of densely intraconnected subgraphs. We have employed a minimum spanning tree (MST) representation of the graph and solve the cluster identification problem using this representation. The computational bottleneck of our algorithm is the construction of an MST of a graph, for which a parallel algorithm is employed. Our high-level strategy for the parallel MST construction algorithm is to first partition the graph, then construct MSTs for the partitioned subgraphs and auxiliary bipartite graphs based on the subgraphs, and finally merge these MSTs to derive an MST of the original graph. The computational results indicate that when running on 150 CPUs, our algorithm can solve a cluster identification problem on a data set with 1,000,000 data points almost 100 times faster than on single CPU, indicating that this program is capable of handling very large data clustering problems in an efficient manner. We have implemented the clustering algorithm as the software CLUMP.  相似文献   

2.
The paper proposes a general model for analysizing two-period change-over designs with binary data. The model includes tests for carry-over effects, treatment and period effects in analogy to the well-known ANOVA-model for continuous data. Minimum modified χ2-statistics are derived and formulas for desk calculators are given.  相似文献   

3.
In the context of experiments involving visual inspection of random dot patterns the problem of testing the null hypothesis of independence of binary responses is considered. A flexible model for dependence between binary responses is proposed. Two tests, optimal under different versions of the model, are derived. These two tests turn out to involve the same computations as the Wilcoxon two sample test and the runs test respectively.  相似文献   

4.
Multiclass classification is one of the fundamental tasks in bioinformatics and typically arises in cancer diagnosis studies by gene expression profiling. There have been many studies of aggregating binary classifiers to construct a multiclass classifier based on one-versus-the-rest (1R), one-versus-one (11), or other coding strategies, as well as some comparison studies between them. However, the studies found that the best coding depends on each situation. Therefore, a new problem, which we call the ldquooptimal coding problem,rdquo has arisen: how can we determine which coding is the optimal one in each situation? To approach this optimal coding problem, we propose a novel framework for constructing a multiclass classifier, in which each binary classifier to be aggregated has a weight value to be optimally tuned based on the observed data. Although there is no a priori answer to the optimal coding problem, our weight tuning method can be a consistent answer to the problem. We apply this method to various classification problems including a synthesized data set and some cancer diagnosis data sets from gene expression profiling. The results demonstrate that, in most situations, our method can improve classification accuracy over simple voting heuristics and is better than or comparable to state-of-the-art multiclass predictors.  相似文献   

5.
6.
7.
In this paper, a statistical model for clinical trials is presented for the special situation that a varying and unstructered number of binary responses is obtained from each subject. The assumptions of the model are the following: 1.) For each subject there is a (constant) individual Bernoulli parameter determining the distribution of the binary responses of this subject. 2.) The Bernoulli parameters associated with the subjects are realizations of independent random variables with distributions Pg in treatment group g(g = 1, 2, …, G). 3.) Given the value of the Bernoulli parameter, the observations are stochastically independent within each subject. Under these assumptions, a test statistic is derived to test the hypothesis H0:E(P1) = E(P2) = … = E(PG). It is proven and demonstrated by simulations, that the test statistic asymptotically (i.e. for a large number of subjects) follows the X2-distribution.  相似文献   

8.
《Science activities》2013,50(3):113-114
This article presents a learning cycle with the aim of helping students understand the evidentiary basis of scientific claims. Students consider data and interpretations as used to support contradictory views in the debate surrounding the causal relationship between human immunodeficiency virus (HIV) and acquired immunodeficiency syndrome (AIDS). Students apply their understandings of the role of data to explorations of other scientific controversies.  相似文献   

9.
10.
Monozygotic and dizygotic twin studies investigating the relative roles of host genetics and environmental factors in shaping gut microbiota composition have produced conflicting results. In this study, we investigated the gut microbiota composition of a healthy dichorionic triplet set. The dichorionic triplet set contained a pair of monozygotic twins and a fraternal sibling, with similar pre- and post-natal environmental conditions including feeding regime. V4 16S rRNA and rpoB amplicon pyrosequencing was employed to investigate microbiota composition, and the species and strain diversity of the culturable bifidobacterial population was also examined. At month 1, the monozygotic pair shared a similar microbiota distinct to the fraternal sibling. By month 12 however, the profile was more uniform between the three infants. Principal coordinate analysis (PCoA) of the microbiota composition revealed strong clustering of the monozygotic pair at month 1 and a separation of the fraternal infant. At months 2 and 3 the phylogenetic distance between the monozygotic pair and the fraternal sibling has greatly reduced and by month 12 the monozygotic pair no longer clustered separately from the fraternal infant. Pulse field gel electrophoresis (PFGE) analysis of the bifidobacterial population revealed a lack of strain diversity, with identical strains identified in all three infants at month 1 and 12. The microbiota of two antibiotic-treated dichorionic triplet sets was also investigated. Not surprisingly, in both triplet sets early life antibiotic administration appeared to be a major determinant of microbiota composition at month 1, irrespective of zygosity. By month 12, early antibiotic administration appeared to no longer exert such a strong influence on gut microbiota composition. We hypothesize that initially host genetics play a significant role in the composition of an individual’s gut microbiota, unless an antibiotic intervention is given, but by month 12 environmental factors are the major determinant.  相似文献   

11.
12.
目的:多个指标评价观察虚拟支气管镜模拟技术在临床医师技能培训中发挥的作用.对象:初次接触支气管镜的呼吸科医生及纤维支气管镜相关科室医生(硕士以上学历,均已获取医师资格证).方法:随机选择36名临床医生,在培训前、培训2小时和培训5小时后对其进行虚拟支气管镜的理论与操作考核.采用训练前、训练后自身对照研究.结果:36名医师均完成了虚拟支气管镜的操作培训.培训结束后,受训者在操作用时(秒)、碰壁次数(次)、吸引器使用时间(秒)、出现红视时间(秒)、观察到的亚段支气管树比例等多个评价指标考核结果显示中与训练前考核结果显示比较,差异性显著.结论:初次接触支气管镜的呼吸科医生及纤维支气管镜相关科室医生经过培训后,无论在操作的准确性、灵活性,还是速度方面都比训练前有显著提高,并能独立完成临床检查操作.  相似文献   

13.
Clustering is a popular technique for explorative analysis of data, as it can reveal subgroupings and similarities between data in an unsupervised manner. While clustering is routinely applied to gene expression data, there is a lack of appropriate general methodology for clustering of sequence-level genomic and epigenomic data, e.g. ChIP-based data. We here introduce a general methodology for clustering data sets of coordinates relative to a genome assembly, i.e. genomic tracks. By defining appropriate feature extraction approaches and similarity measures, we allow biologically meaningful clustering to be performed for genomic tracks using standard clustering algorithms. An implementation of the methodology is provided through a tool, ClusTrack, which allows fine-tuned clustering analyses to be specified through a web-based interface. We apply our methods to the clustering of occupancy of the H3K4me1 histone modification in samples from a range of different cell types. The majority of samples form meaningful subclusters, confirming that the definitions of features and similarity capture biological, rather than technical, variation between the genomic tracks. Input data and results are available, and can be reproduced, through a Galaxy Pages document at http://hyperbrowser.uio.no/hb/u/hb-superuser/p/clustrack. The clustering functionality is available as a Galaxy tool, under the menu option "Specialized analyzis of tracks", and the submenu option "Cluster tracks based on genome level similarity", at the Genomic HyperBrowser server: http://hyperbrowser.uio.no/hb/.  相似文献   

14.
EDWARDS'S test for seasonality is extended to multiple peaks and troughs. It is shown how the extended statistic may be adjusted for the population at risk and for unequal lengths of time intervals in the cycle of seasons. A simulation study shows that the extended test statistic is, for sample sizes N ≧ 100, very specific in detecting the number of peaks and troughs for which it is intended. The associated method of parameter estimation is also assessed; for N ≦ 100, the amplitude of a possibly adequate simple harmonic model is estimated well, but the initial value of the phase angle is not. Estimated percentage points for the extended test statistic are tabulated, and some recommendations are offered regarding usage of this method.  相似文献   

15.
16.
Marginal regression via generalized estimating equations is widely used in biostatistics to model longitudinal data from subjects whose outcomes and covariates are observed at several time points. In this paper we consider two issues that have been raised in the literature concerning the marginal regression approach. The first is that even though the past history may be predictive of outcome, the marginal approach does not use this history. Although marginal regression has the flexibility of allowing between-subject variations in the observation times, it may lose substantial prediction power in comparison with the transitional modeling approach that relates the responses to the covariate and outcome histories. We address this issue by using the concept of “information sets” for prediction to generalize the “partly conditional mean” approach of Pepe and Couper (J. Am. Stat. Assoc. 92:991–998, 1997). This modeling approach strikes a balance between the flexibility of the marginal approach and the predictive power of transitional modeling. Another issue is the problem of excess zeros in the outcomes over what the underlying model for marginal regression implies. We show how our predictive modeling approach based on information sets can be readily modified to handle the excess zeros in the longitudinal time series. By synthesizing the marginal, transitional, and mixed effects modeling approaches in a predictive framework, we also discuss how their respective advantages can be retained while their limitations can be circumvented for modeling longitudinal data.  相似文献   

17.
We determined the complete nucleotide sequences (16403 and 16572 base pairs, respectively) of the mitochondrial genomes of the South American lungfish, Lepidosiren paradoxa, and the Australian lungfish, Neoceratodus forsteri (Sarcopterygii, Dipnoi). The mitochondrial DNA sequences were established in an effort to resolve the debated evolutionary positions of the lungfish and the coelacanth relative to land vertebrates. Previous molecular phylogenetic studies based on complete mtDNA sequences, including only the African lungfish, Protopterus dolloi, sequence were able to strongly reject the traditional textbook hypothesis that coelacanths are the closest relatives of land vertebrates. However, these studies were unable to statistically significantly distinguish between the two remaining scenarios: lungfish as the closest relatives to land vertebrates and lungfish and coelacanths jointly as their sister group (Cao et al. 1998; Zardoya et al. 1998; Zardoya and Meyer 1997a). Lungfish, coelacanths, and the fish ancestors of the tetrapod lineage all originated within a short time window of about 20 million years, back in the early Devonian (about 380 to 400 million years ago). This short divergence time makes the determination of the phylogenetic relationships among these three lineages difficult. In this study, we attempted to break the long evolutionary branch of lungfish, in an effort to better resolve the phylogenetic relationships among the three extant sarcopterygian lineages. The gene order of the mitochondrial genomes of the South American and Australian lungfish conforms to the consensus gene order among gnathostome vertebrates. The phylogenetic analyses of the complete set of mitochondrial proteins (without ND6) suggest that the lungfish are the closest relatives of the tetrapods, although the support in favor of this scenario is not statistically significant. The two other smaller data sets (tRNA and rRNA genes) give inconsistent results depending on the different reconstruction methods applied and cannot significantly rule out any of the three alternative hypotheses. Nuclear protein-coding genes, which might be better phylogenetic markers for this question, support the lungfish–tetrapod sister-group relationship (Brinkmann et al. 2004).This article contains online supplementary material.Reviewing Editor: Dr. Rafael Zardoya  相似文献   

18.
19.
In otological and ophthalmological functional diagnostics it is typical to record for a certain individual several similar biopotentials under different conditions of stimulation and/or to collect various potentials for the same individual. Since the combination of these different data collected from a patient is of interest, coupling of the classification results obtained for each single data set seems desirable. Based on sample classifiers, obtained by linear discriminance analysis, of acoustically or visually evoked potentials from selected patients an algorithm is proposed for their coupling, which logically links the various rules of allocation and is based on the principle of assignment by highest likelihood. Furthermore, the procedure may be modified by inclusion of corresponding confidence intervals for the determined allocation frequencies. The results obtained for the groups of patients investigated show a considerable improvement of classification as judged from the error rates being distinctly reduced after the coupling.  相似文献   

20.
This paper presents new biostatistical methods for the analysis of microbiome data based on a fully parametric approach using all the data. The Dirichlet-multinomial distribution allows the analyst to calculate power and sample sizes for experimental design, perform tests of hypotheses (e.g., compare microbiomes across groups), and to estimate parameters describing microbiome properties. The use of a fully parametric model for these data has the benefit over alternative non-parametric approaches such as bootstrapping and permutation testing, in that this model is able to retain more information contained in the data. This paper details the statistical approaches for several tests of hypothesis and power/sample size calculations, and applies them for illustration to taxonomic abundance distribution and rank abundance distribution data using HMP Jumpstart data on 24 subjects for saliva, subgingival, and supragingival samples. Software for running these analyses is available.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号