首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

In the clinical context, samples assayed by microarray are often classified by cell line or tumour type and it is of interest to discover a set of genes that can be used as class predictors. The leukemia dataset of Golubet al.[1] and the NCI60 dataset of Rosset al.[2] present multiclass classification problems where three tumour types and nine cell lines respectively must be identified. We apply an evolutionary algorithm to identify the near-optimal set of predictive genes that classify the data. We also examine the initial gene selection step whereby the most informative genes are selected from the genes assayed.  相似文献   

2.
Numerical classification of species of Vibrio and related genera   总被引:10,自引:0,他引:10  
Data from 1091 strains of the family Vibrionaceae collected in five different studies have been merged into a single data matrix and analysed in a taxonomic study. A set of 142 characters was selected to compare these data. Seventy-nine characters were common to all studies, but data for the other 63 characters were incomplete. Cultures of 90 strains, examined in more than one of the original studies, were used to estimate test error and inter-study variability. The data from these replicate strains also allowed the problem of merging data from different studies to be assessed. Taxonomic resemblance was estimated on the basis of 111 characters using the SSM coefficient and UPGMA clustering. A taxonomic analysis based on 999 strains, which included most of the major species of the family Vibrionaceae, gave 59 clusters and 44 unclustered strains. A table of properties of these phenons was produced. The results showed that data obtained from studies carried out at different times and in different locations, but using standard techniques, could be combined and used to provide useful taxonomic information.  相似文献   

3.
微生物生态研究中基于BIOLOG方法的数据分析   总被引:21,自引:0,他引:21  
BIOLOG微平板法作为一种方便快速的微生物检验技术,已广泛应用于环境微生物检测,微生物生态研究等方面,发挥着越来越重要的作用。该方法可以获得关于微生物群落碳源利用能力的大量数据,反映出关于微生物活性的丰富信息。然而大量的数据也对解释和分析提出了挑战,分析了应用于BIOLOG产生数据的统计分析方法,对常用的AWCD值计算,多样性指数计算,主成分分析(PCA),聚类分析,相关、回归等方法深入探讨,阐述各自的功能、不足以及在应用中容易出现的问题。另外也对一些不常见的方法,如非参数多元分析(Non-Parametric version of MANOVA/Permutation version of MANOVA)、动力学参数分析、多元回归树、典范对应分析等也进行了讨论。通过对不同方法应用目标和原理的分析论述了各自优缺点,对微生物研究中基于BIOLOG方法数据分析的选择应用提供参考。  相似文献   

4.
5.
In this paper, we compare the performance of two iterative clustering methods when applied to an extensive data set describing strains of the bacterial family Enterobacteriaceae. In both methods, the classification (i.e. the number of classes and the partitioning) is determined by minimizing stochastic complexity. The first method performs the minimization by repeated application of the generalized Lloyd algorithm (GLA). The second method uses an optimization technique known as local search (LS). The method modifies the current solution by making global changes to the class structure and it, then, performs local fine-tuning to find a local optimum. It is observed that if we fix the number of classes, the LS finds a classification with a lower stochastic complexity value than GLA. In addition, the variance of the solutions is much smaller for the LS due to its more systematic method of searching. Overall, the two algorithms produce similar classifications but they merge certain natural classes with microbiological relevance in different ways.  相似文献   

6.
Probability plotting methods for the analysis for the analysis of data   总被引:1,自引:0,他引:1  
  相似文献   

7.
1. Britain is unusual in the quantity and quality of species and habitat data available, at both national and regional scales. This paper reviews the sources, coverage and quality of these data. 2. Habitat and species data are used by conservation agencies in England, Scotland and Wales for site selection and for monitoring habitat quality. The paper argues, however, that neither habitat data nor species distribution data on their own are sufficient to locate and monitor habitats for nature conservation purposes effectively. 3. Differences in sampling methodologies between habitat and species surveys present methodological difficulties for the development of an integrated monitoring system that uses both types of data. These problems need to be overcome if habitat and species data are to be used more effectively for nature conservation in the wider countryside. 4. A more integrated system based on the concept of biotope occupancy is proposed and discussed. The implementation of the system would assist with understanding those factors that explain observed patterns in species distribution and diversity, thereby helping to improve the effectiveness of policies for nature conservation.  相似文献   

8.
Ghosh D 《Biometrics》2003,59(4):992-1000
Due to the advent of high-throughput microarray technology, it has become possible to develop molecular classification systems for various types of cancer. In this article, we propose a methodology using regularized regression models for the classification of tumors in microarray experiments. The performances of principal components, partial least squares, and ridge regression models are studied; these regression procedures are adapted to the classification setting using the optimal scoring algorithm. We also develop a procedure for ranking genes based on the fitted regression models. The proposed methodologies are applied to two microarray studies in cancer.  相似文献   

9.
MOTIVATION: A serious limitation in microarray analysis is the unreliability of the data generated from low signal intensities. Such data may produce erroneous gene expression ratios and cause unnecessary validation or post-analysis follow-up tasks. Therefore, the elimination of unreliable signal intensities will enhance reproducibility and reliability of gene expression ratios produced from microarray data. In this study, we applied fuzzy c-means (FCM) and normal mixture modeling (NMM) based classification methods to separate microarray data into reliable and unreliable signal intensity populations. RESULTS: We compared the results of FCM classification with those of classification based on NMM. Both approaches were validated against reference sets of biological data consisting of only true positives and true negatives. We observed that both methods performed equally well in terms of sensitivity and specificity. Although a comparison of the computation times indicated that the fuzzy approach is computationally more efficient, other considerations support the use of NMM for the reliability analysis of microarray data. AVAILABILITY: The classification approaches described in this paper and sample microarray data are available as Matlab( TM ) (The MathWorks Inc., Natick, MA) programs (mfiles) and text files, respectively, at http://rc.kfshrc.edu.sa/bssc/staff/MusaAsyali/Downloads.asp. The programs can be run/tested on many different computer platforms where Matlab is available. CONTACT: asyali@kfshrc.edu.sa.  相似文献   

10.
Westfall  R. H.  Theron  G. K.  Rooyen  N. 《Plant Ecology》1997,132(2):137-154
A program package is described in which vegetation data can be objectively classified and analysed. Classification is based on minimum entropy. Results show that in a comparison with TWINSPAN, improvements to the relevé sequence, in terms of community variation, can be obtained. Furthermore, TWINSPAN classifications are shown to be dependent on a particular relevé input sequence.  相似文献   

11.
Coleman  Annette W. 《Hydrobiologia》1996,321(1):29-34
Analysis of DNA can help to distinguish those morphological characters indicative of species difference from those representing retained traits or parallel evolution. This can be of great value in detecting recent invaders. The choice of which DNA characters to examine not only dictates the methodology to be used but must also be appropriate for the detection level sought. Restriction endonuclease fragment comparisons of plastid DNA have been used to assess Codium species; the results show C. fragile subsp. tomentosoides from east and west coast North America to be identical while sympatric endemic Codium species each display their own unique set of fragments. For species of other algae, plastid DNA fragment patterns are not necessarily identical across a morphological species, e.g. Pandorina morum. Such repetitive element probes as M13 and the use of RAPDs are more appropriate for analysis of populations within species. DNA base sequence comparisons of nuclear rDNA genes often yield too few variant bases between closely related species for reliable identifications. Analysis of the more variable Internal Transcribed Spacer (ITS) region, lying between the small and large ribosomal subunit genes in nuclear DNA, yields more extensive base pair variation between species and relatively little within species; it may be an alternative choice for endonuclease restriction fragment analysis or for sequencing.  相似文献   

12.
13.

Background

The identification of new diagnostic or prognostic biomarkers is one of the main aims of clinical cancer research. Technologies like mass spectrometry are commonly being used in proteomic research. Mass spectrometry signals show the proteomic profiles of the individuals under study at a given time. These profiles correspond to the recording of a large number of proteins, much larger than the number of individuals. These variables come in addition to or to complete classical clinical variables. The objective of this study is to evaluate and compare the predictive ability of new and existing models combining mass spectrometry data and classical clinical variables. This study was conducted in the context of binary prediction.

Results

To achieve this goal, simulated data as well as a real dataset dedicated to the selection of proteomic markers of steatosis were used to evaluate the methods. The proposed methods meet the challenge of high-dimensional data and the selection of predictive markers by using penalization methods (Ridge, Lasso) and dimension reduction techniques (PLS), as well as a combination of both strategies through sparse PLS in the context of a binary class prediction. The methods were compared in terms of mean classification rate and their ability to select the true predictive values. These comparisons were done on clinical-only models, mass-spectrometry-only models and combined models.

Conclusions

It was shown that models which combine both types of data can be more efficient than models that use only clinical or mass spectrometry data when the sample size of the dataset is large enough.  相似文献   

14.
15.
Species occurrences gathered from the literature, from atlases or from field surveys are currently used to analyze multispecific patterns, such as species richness or species geographic ranges. Such occurrences result from the independent recognitions of specimens by several botanists in particular places and at particular occasions. Thereby, the analysis of the resulting occasional relevés involves the assignment of the species occurrences to spatial units such as a grid of quadrats. As a result, the distribution of occurrences among quadrats is controlled while their distribution among species is observed. In this paper we show how non-symmetric correspondence analysis (NSCA) enables the investigation of data structure by taking into account this fundamental asymmetry. We apply this new ordination technique to a list of endemic tree species occurrences in the Western Ghats (South India). We explore the interesting properties of NSCA as an ordination technique and demonstrate the usefulness of the method as a tool in biogeography. Regarding the Western Ghats, NSCA brings out the preponderance of deforestation over biogeographic history in explaining the observed multispecific patterns.  相似文献   

16.
Expression arrays facilitate the monitoring of changes in the expression patterns of large collections of genes. The analysis of expression array data has become a computationally-intensive task that requires the development of bioinformatics technology for a number of key stages in the process, such as image analysis, database storage, gene clustering and information extraction. Here, we review the current trends in each of these areas, with particular emphasis on the development of the related technology being carried out within our groups.  相似文献   

17.
García-Dorado A  Gallego A 《Genetics》2003,164(2):807-819
We simulated single-generation data for a fitness trait in mutation-accumulation (MA) experiments, and we compared three methods of analysis. Bateman-Mukai (BM) and maximum likelihood (ML) need information on both the MA lines and control lines, while minimum distance (MD) can be applied with or without the control. Both MD and ML assume gamma-distributed mutational effects. ML estimates of the rate of deleterious mutation had larger mean square error (MSE) than MD or BM had due to large outliers. MD estimates obtained by ignoring the mean decline observed from comparison to a control are often better than those obtained using that information. When effects are simulated using the gamma distribution, reducing the precision with which the trait is assayed increases the probability of obtaining no ML or MD estimates but causes no appreciable increase of the MSE. When the residual errors for the means of the simulated lines are sampled from the empirical distribution in a MA experiment, instead of from a normal one, the MSEs of BM, ML, and MD are practically unaffected. When the simulated gamma distribution accounts for a high rate of mild deleterious mutation, BM detects only approximately 30% of the true deleterious mutation rate, while MD or ML detects substantially larger fractions. To test the robustness of the methods, we also added a high rate of common contaminant mutations with constant mild deleterious effect to a low rate of mutations with gamma-distributed deleterious effects and moderate average. In that case, BM detects roughly the same fraction as before, regardless of the precision of the assay, while ML fails to provide estimates. However, MD estimates are obtained by ignoring the control information, detecting approximately 70% of the total mutation rate when the mean of the lines is assayed with good precision, but only 15% for low-precision assays. Contaminant mutations with only tiny deleterious effects could not be detected with acceptable accuracy by any of the above methods.  相似文献   

18.
物种濒危等级划分与物种保护   总被引:7,自引:0,他引:7  
介绍了国际与国内濒危物种等级最新标准。探讨了濒危物处等级的划分标准存在的问题和物种的保护优先序。介绍了确定物种保护优先序时的两种不同观点。  相似文献   

19.

Background  

The rapid identification of Bacillus spores and bacterial identification are paramount because of their implications in food poisoning, pathogenesis and their use as potential biowarfare agents. Many automated analytical techniques such as Curie-point pyrolysis mass spectrometry (Py-MS) have been used to identify bacterial spores giving use to large amounts of analytical data. This high number of features makes interpretation of the data extremely difficult We analysed Py-MS data from 36 different strains of aerobic endospore-forming bacteria encompassing seven different species. These bacteria were grown axenically on nutrient agar and vegetative biomass and spores were analyzed by Curie-point Py-MS.  相似文献   

20.
Wheeler (2012) stated that minimization of ad hoc hypotheses as emphasized by Farris (1983) always leads to a preference for trivial optimizations when analysing unaligned sequence data, leaving no basis for tree choice. That is not correct. Farris's framework can be expressed as maximization of homology, a formulation that has been used to overcome the problems with inapplicables (it leads to the notion of subcharacters as a quantity to be co‐minimized in parsimony analysis) and that is known not to lead to a preference for trivial optimizations when analysing unaligned sequence data. Maximization of homology, in turn, can be formulated as a minimization of ad hoc hypotheses of homoplasy in the sense of Farris, as shown here. These issues are not just theoretical but have empirical relevance. It is therefore also discussed how maximization of homology can be approximated under various weighting schemes in heuristic tree alignment programs, such as POY, that do not take into account subcharacters. Empirical analyses that use the so‐called 3221 cost set (gap opening cost three, transversion and transition costs two, and gap extension cost one), the cost set that is known to be an optimal approximation under equally weighted homology in POY, are briefly reviewed. From a theoretical point of view, maximization of homology provides the general framework to understand such cost sets in terms that are biologically relevant and meaningful. Whether or not embedded in a sensitivity analysis, this is not the case for minimization of a cost that is defined in operational terms only. Neither is it the case for minimization of equally weighted transformations, a known problem that is not addressed by Kluge and Grant's (2006) proposal to invoke the anti‐superfluity principle as a rationale for this minimization.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号