期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Using several pair-wise informant sequences for <Emphasis Type="Italic">de novo</Emphasis> prediction of alternatively spliced transcripts

Paul Flicek Michael R Brent 《Genome biology》2006,7(Z1):S8

相似文献

2.

A Mixed Methods and Triangulation Model for Increasing the Accuracy of Adherence and Sexual Behaviour Data: The Microbicides Development Programme

Robert Pool Catherine M. Montgomery Neetha S. Morar Oliver Mweemba Agnes Ssali Mitzy Gafos Shelley Lees Jonathan Stadler Angela Crook Andrew Nunn Richard Hayes Sheena McCormack 《PloS one》2010,5(7)

Background

The collection of accurate data on adherence and sexual behaviour is crucial in microbicide (and other HIV-related) research. In the absence of a “gold standard” the collection of such data relies largely on participant self-reporting. After reviewing available methods, this paper describes a mixed method/triangulation model for generating more accurate data on adherence and sexual behaviour in a multi-centre vaginal microbicide clinical trial. In a companion paper some of the results from this model are presented .

Methodology/Principal Findings

Data were collected from a random subsample of 725 women (7.7% of the trial population) using structured interviews, coital diaries, in-depth interviews, counting returned gel applicators, focus group discussions, and ethnography. The core of the model was a customised, semi-structured in-depth interview. There were two levels of triangulation: first, discrepancies between data from the questionnaires, diaries, in-depth interviews and applicator returns were identified, discussed with participants and, to a large extent, resolved; second, results from individual participants were related to more general data emerging from the focus group discussions and ethnography. A democratic and equitable collaboration between clinical trialists and qualitative social scientists facilitated the success of the model, as did the preparatory studies preceding the trial. The process revealed some of the underlying assumptions and routinised practices in “clinical trial culture” that are potentially detrimental to the collection of accurate data, as well as some of the shortcomings of large qualitative studies, and pointed to some potential solutions.

Conclusions/Significance

The integration of qualitative social science and the use of mixed methods and triangulation in clinical trials are feasible, and can reveal (and resolve) inaccuracies in data on adherence and sensitive behaviours, as well as illuminating aspects of “trial culture” that may also affect data accuracy. 相似文献

3.

Novel Statistical Approaches for Non-Normal Censored Immunological Data: Analysis of Cytokine and Gene Expression Data

Nikolaus Ballenberger Anna Lluis Erika von Mutius Sabina Illi Bianca Schaub 《PloS one》2012,7(10)

Background

For several immune-mediated diseases, immunological analysis will become more complex in the future with datasets in which cytokine and gene expression data play a major role. These data have certain characteristics that require sophisticated statistical analysis such as strategies for non-normal distribution and censoring. Additionally, complex and multiple immunological relationships need to be adjusted for potential confounding and interaction effects.

Objective

We aimed to introduce and apply different methods for statistical analysis of non-normal censored cytokine and gene expression data. Furthermore, we assessed the performance and accuracy of a novel regression approach in order to allow adjusting for covariates and potential confounding.

Methods

For non-normally distributed censored data traditional means such as the Kaplan-Meier method or the generalized Wilcoxon test are described. In order to adjust for covariates the novel approach named Tobit regression on ranks was introduced. Its performance and accuracy for analysis of non-normal censored cytokine/gene expression data was evaluated by a simulation study and a statistical experiment applying permutation and bootstrapping.

Results

If adjustment for covariates is not necessary traditional statistical methods are adequate for non-normal censored data. Comparable with these and appropriate if additional adjustment is required, Tobit regression on ranks is a valid method. Its power, type-I error rate and accuracy were comparable to the classical Tobit regression.

Conclusion

Non-normally distributed censored immunological data require appropriate statistical methods. Tobit regression on ranks meets these requirements and can be used for adjustment for covariates and potential confounding in large and complex immunological datasets. 相似文献

4.

Cognitive Structure and Informant Accuracy 总被引：5，自引：0，他引：5

Linton C. Freeman A. Kimball Romney Sue C. Freeman 《American anthropologist》1987,89(2):310-325

The problem of informant accuracy is examined in light of principles of memory organization from cognitive psychology. These principles turn out to be powerful, not only in explaining overall patterns of informant error, but in predicting details about the types of errors made. Predictions are made in terms both of different kinds of informants and different kinds of objects. All the predictions are strongly supported by the data. Finally, in the light of these results, two strategies are developed. The "best" informants, it seems, can be used to reveal long-range stable patterns of events, and the "worst" informants can be used to reveal the details of a particular event of special interest. 相似文献

5.

等位基因多态性群体遗传结构的多元非线性分析方法 总被引：4，自引：0，他引：4

薛付忠王洁贞郭亦寿胡平吴学森《遗传学报》2004,31(2):202-211

长期以来,对于多维基因多态性数据的多元统计分析,如计算遗传距离时昕用的聚类分析、分析群体遗传结构时所用的主成分分析、因子分析和典型相关分析等,一直应用为无约束条件数据而设计的经典多元线性分析方法,并没有注意基因多态性数据的“闭合效应”所带来的问题。从分析基因多态性数据的分布和结构特征入手,文中指出了基因多态性分布具有“闭合数据”的特点,分析了由于“闭合效应”的影响,经典多元线性方法用于群体遗传结构分析昕面临的困难。根据成分数据统计分析的理论和方法,提出了基因多态性群体遗传结构的多元非线性分析基本方法。并以主成分分析为例,通过实例比较和分析了经典线性主成分分析和“对数比”非线性主成分分析的结果,证明“对数比”非线性主成分分析方法是研究基因多态性群体遗传结构的良好方法,具有特异、灵敏等优点,其结果符合群体遗传学规律。相似文献

6.

Qualitative Analysis of the Interdisciplinary Interaction between Data Analysis Specialists and Novice Clinical Researchers

Guilherme Roberto Zammar Jatin Shah Ana Paula Bonilauri Ferreira Luciana Cofiel Kenneth W. Lyles Ricardo Pietrobon 《PloS one》2010,5(2)

Background

The inherent complexity of statistical methods and clinical phenomena compel researchers with diverse domains of expertise to work in interdisciplinary teams, where none of them have a complete knowledge in their counterpart''s field. As a result, knowledge exchange may often be characterized by miscommunication leading to misinterpretation, ultimately resulting in errors in research and even clinical practice. Though communication has a central role in interdisciplinary collaboration and since miscommunication can have a negative impact on research processes, to the best of our knowledge, no study has yet explored how data analysis specialists and clinical researchers communicate over time.

Methods/Principal Findings

We conducted qualitative analysis of encounters between clinical researchers and data analysis specialists (epidemiologist, clinical epidemiologist, and data mining specialist). These encounters were recorded and systematically analyzed using a grounded theory methodology for extraction of emerging themes, followed by data triangulation and analysis of negative cases for validation. A policy analysis was then performed using a system dynamics methodology looking for potential interventions to improve this process. Four major emerging themes were found. Definitions using lay language were frequently employed as a way to bridge the language gap between the specialties. Thought experiments presented a series of “what if” situations that helped clarify how the method or information from the other field would behave, if exposed to alternative situations, ultimately aiding in explaining their main objective. Metaphors and analogies were used to translate concepts across fields, from the unfamiliar to the familiar. Prolepsis was used to anticipate study outcomes, thus helping specialists understand the current context based on an understanding of their final goal.

Conclusion/Significance

The communication between clinical researchers and data analysis specialists presents multiple challenges that can lead to errors. 相似文献

7.

Molecular Data and the Dynamic Nature of Polyploidy

D. E. Soltis P. S. Soltis Dr. Loren H. Rieseberg 《植物科学评论》1993,12(3):243-273

During the past decade, molecular techniques have provided a wealth of data that have facilitated the resolution of several controversial questions in polyploid evolution. Herein we have focused on several of these issues: (1) the frequency of recurrent formation of polyploid species; (2) the genetic consequences of multiple polyploidizations within a species; (3) the prevalence and genetic attributes of autopolyploids; and (4) the genetic changes that occur in polyploid genomes following their formation.

Molecular data provide a more dynamic picture of polyploid evolution than has been traditionally espoused. Numerous studies have demonstrated multiple origins of both allopolyploids and autopolyploids. In several polyploid species studied in detail, multiple origins were found to be frequent on a local geographic scale, as well as during a short span of time. Molecular data strongly suggest that recurrent formation of polyploid species is the rule, rather than the exception. In addition, molecular data indicate that recurrent formation of polyploids has important genetic consequences, introducing considerable genetic variation from diploid progenitors into polyploid derivatives.

Molecular data also suggest a much more important role for natural autopolyploids than has been historically envisioned. In contrast to the longstanding view of autopolyploidy as being rare, molecular data continue to reveal steadily increasing numbers of well-documented autoploids having tetrasomic or higher-level polysomic inheritance. Although autopolyploidy undoubtedly occurs much less frequently than allopolyploidy in natural populations, it nonetheless has been a significant evolutionary mechanism. Molecular data also provide compelling genetic evidence that contradicts the traditional view of autopolyploidy as being maladaptive. Electrophoretic studies have revealed three important attributes of autopolyploids compared to their diploid progenitors: (1) enzyme multiplicity, (2) increased heterozygosity, and (3) increased allelic diversity. Genetic variability is, in fact, typically substantially higher in autopoloids than in their diploid progenitors. These genetic attributes of autopolyploids are due to polysomic inheritance and provide strong genetic arguments for the potential success of autopolyploids in nature.

In addition to providing numerous important insights into the formation of polyploids and the immediate genetic consequences of polyploidy, molecular data also have been used to study the subsequent evolution of polyploid genomes. Common hypotheses on the subsequent evolution of polyploid genomes include (1) gene silencing, eventually leading to extensively diploidized polyploid genomes; (2) gene diversification, resulting in regulatory or functional divergence of duplicate genes; and (3) genome diversification, resulting in chromosomal repatterning. Compelling, but limited, genetic evidence for all of these factors has been obtained in molecular analyses of polyploid species. The occurrence of these processes in polyploid genomes indicates that polyploid genomes are plastic and susceptible to evolutionary change.

In summary, molecular data continue to demonstrate that polyploidization and the subsequent evolution of polyploid genomes are very dynamic processes. 相似文献

8.

A Mechanistic Model of PCR for Accurate Quantification of Quantitative PCR Data

Gregory J. Boggy Peter J. Woolf 《PloS one》2010,5(8)

Background

Quantitative PCR (qPCR) is a workhorse laboratory technique for measuring the concentration of a target DNA sequence with high accuracy over a wide dynamic range. The gold standard method for estimating DNA concentrations via qPCR is quantification cycle () standard curve quantification, which requires the time- and labor-intensive construction of a standard curve. In theory, the shape of a qPCR data curve can be used to directly quantify DNA concentration by fitting a model to data; however, current empirical model-based quantification methods are not as reliable as standard curve quantification.

Principal Findings

We have developed a two-parameter mass action kinetic model of PCR (MAK2) that can be fitted to qPCR data in order to quantify target concentration from a single qPCR assay. To compare the accuracy of MAK2-fitting to other qPCR quantification methods, we have applied quantification methods to qPCR dilution series data generated in three independent laboratories using different target sequences. Quantification accuracy was assessed by analyzing the reliability of concentration predictions for targets at known concentrations. Our results indicate that quantification by MAK2-fitting is as reliable as standard curve quantification for a variety of DNA targets and a wide range of concentrations.

Significance

We anticipate that MAK2 quantification will have a profound effect on the way qPCR experiments are designed and analyzed. In particular, MAK2 enables accurate quantification of portable qPCR assays with limited sample throughput, where construction of a standard curve is impractical. 相似文献

9.

Cognitive and Geographic Maps: Study of Individual Variation Among Tojolabal Mayans

Louanna Furbee and Robert A. Benfer Associate Professor 《American anthropologist》1983,85(2):305-334

Disease and geography are related domains for Tojolabal-Maya. Using multidimensional methods, we compare two domains: (1) individual cognitive "maps" from disease terms and (2) hand-drawn maps, both with one another and with an official topographic map. Multivariate study of individual informant data demonstrates correspondence of the axes of maps. Least squares fitting of dimensional representations using a method specifically modified for ethnosemantic data allows meaningful comparisons both among and within informants, and with an aggregate from a related survey of 33 informants as well. These multivariate operations help integrate individual data, sampled simultaneously for several domains, tasks, and occasions, with aggregate data. For semantic domains, we achieved rapprochement between psychological and anthropological approaches. [disease, folk theories, ethnosemantics, cognition, multivariate, Tojolabal-Maya] 相似文献

10.

Land Transitions in Northwest Vietnam: An Integrated Analysis of Biophysical and Socio-Cultural Factors

Vu Kim Chi Anton Van Rompaey Gerard Govers Veerle Vanacker Birgit Schmook Nguyen Hieu 《Human ecology: an interdisciplinary journal》2013,41(1):37-50

This paper discusses transitions in land use evidenced by the case of the Suoi Muoi catchment area in NW mountain of Vietnam. Land use transitions were detected from LANDSAT and SPOT satellite images taken over the last 40 years. The maps showing changes in land use were linked with biophysical properties of the land such as slope gradient, elevation and soil type, and cultural characteristic of various ethnic groups by means of logistic regression model. The combination of research methods and instruments from several disciplines, including statistical spatial analysis such as the multiple logistic regression (MLR) models and the multiple correspondence analysis (MCA) on household interview data, and key informant interviews allowed us to identify and validate a number of factors that drive land cover and land use changes in Northwest Vietnam. 相似文献

11.

Predicting the Phenotypic Values of Physiological Traits Using SNP Genotype and Gene Expression Data in Mice

Yu Takagi Hirokazu Matsuda Yukio Taniguchi Hiroaki Iwaisaki 《PloS one》2014,9(12)

Predicting phenotypes using genome-wide genetic variation and gene expression data is useful in several fields, such as human biology and medicine, as well as in crop and livestock breeding. However, for phenotype prediction using gene expression data for mammals, studies remain scarce, as the available data on gene expression profiling are currently limited. By integrating a few sources of relevant data that are available in mice, this study investigated the accuracy of phenotype prediction for several physiological traits. Gene expression data from two tissues as well as single nucleotide polymorphisms (SNPs) were used. For the studied traits, the variance of the effects of the expression levels was more likely to differ among the genes than were the effects of SNPs. For the glucose concentration, the total cholesterol amount, and the total tidal volume, the accuracy by cross validation tended to be higher when the gene expression data rather than the SNP genotype data were used, and a statistically significant increase in the accuracy was obtained when the gene expression data from the liver were used alone or jointly with the SNP genotype data. For these traits, there were no additional gains in accuracy from using the gene expression data of both the liver and lung compared to that of individual use. The accuracy of prediction using genes that were selected differently was examined; the use of genes with a higher tissue specificity tended to result in an accuracy that was similar to or greater than that associated with the use of all of the available genes for traits such as the glucose concentration and total cholesterol amount. Although relatively few animals were evaluated, the current results suggest that gene expression levels could be used as explanatory variables. However, further studies are essential to confirm our findings using additional animal samples. 相似文献

12.

Fishery-Independent Data Reveal Negative Effect of Human Population Density on Caribbean Predatory Fish Communities

Christopher D. Stallings 《PloS one》2009,4(5)

Background

Understanding the current status of predatory fish communities, and the effects fishing has on them, is vitally important information for management. However, data are often insufficient at region-wide scales to assess the effects of extraction in coral reef ecosystems of developing nations.

Methodology/Principal Findings

Here, I overcome this difficulty by using a publicly accessible, fisheries-independent database to provide a broad scale, comprehensive analysis of human impacts on predatory reef fish communities across the greater Caribbean region. Specifically, this study analyzed presence and diversity of predatory reef fishes over a gradient of human population density. Across the region, as human population density increases, presence of large-bodied fishes declines, and fish communities become dominated by a few smaller-bodied species.

Conclusions/Significance

Complete disappearance of several large-bodied fishes indicates ecological and local extinctions have occurred in some densely populated areas. These findings fill a fundamentally important gap in our knowledge of the ecosystem effects of artisanal fisheries in developing nations, and provide support for multiple approaches to data collection where they are commonly unavailable. 相似文献

13.

Economic feasibility of a new method to estimate mortality in crisis-affected and resource-poor settings

Roberts B Morgan OW Sultani MG Nyasulu P Rwebangila S Sondorp E Chandramohan D Checchi F 《PloS one》2011,6(9):e25175

Introduction

Mortality data provide essential evidence on the health status of populations in crisis-affected and resource-poor settings and to guide and assess relief operations. Retrospective surveys are commonly used to collect mortality data in such populations, but require substantial resources and have important methodological limitations. We evaluated the feasibility of an alternative method for rapidly quantifying mortality (the informant method). The study objective was to assess the economic feasibility of the informant method.

Methods

The informant method captures deaths through an exhaustive search for all deaths occurring in a population over a defined and recent recall period, using key community informants and next-of-kin of decedents. Between July and October 2008, we implemented and evaluated the informant method in: Kabul, Afghanistan; Mae La camp for Karen refugees, Thai-Burma border; Chiradzulu District, Malawi; and Lugufu and Mtabila refugee camps, Tanzania. We documented the time and cost inputs for the informant method in each site, and compared these with projections for hypothetical retrospective mortality surveys implemented in the same site with a 6 month recall period and with a 30 day recall period.

Findings

The informant method was estimated to require an average of 29% less time inputs and 33% less monetary inputs across all four study sites when compared with retrospective surveys with a 6 month recall period, and 88% less time inputs and 86% less monetary inputs when compared with retrospective surveys with a 1 month recall period. Verbal autopsy questionnaires were feasible and efficient, constituting only 4% of total person-time for the informant method''s implementation in Chiradzulu District.

Conclusions

The informant method requires fewer resources and incurs less respondent burden. The method''s generally impressive feasibility and the near real-time mortality data it provides warrant further work to develop the method given the importance of mortality measurement in such settings. 相似文献

14.

Utility of Health Facility-based Malaria Data for Malaria Surveillance

Yaw A. Afrane Guofa Zhou Andrew K. Githeko Guiyun Yan 《PloS one》2013,8(2)

相似文献

15.

ClusPro FMFT-SAXS: Ultra-fast Filtering Using Small-Angle X-ray Scattering Data in Protein Docking

Mikhail Ignatov Andrey Kazennov Dima Kozakov 《Journal of molecular biology》2018,430(15):2249-2255

We have recently demonstrated that incorporation of small-angle X-ray scattering (SAXS)-based filtering in our heavily used docking server ClusPro improves docking results. However, the filtering step is time consuming, since ≈ 10⁵ conformations have to be sequentially processed. At the same time, we have demonstrated the possibility of ultra-fast systematic energy evaluation for all rigid body orientations of two proteins, by sampling using Fast Manifold Fourier Transform (FMFT), if energies are represented as a combination of convolution-like expressions. Here we present a novel FMFT-based algorithm FMFT-SAXS for massive SAXS computation on multiple conformations of a protein complex. This algorithm exploits the convolutional form of SAXS calculation function. FMFT-SAXS allows computation of SAXS profiles for millions of conformations in a matter of minutes, providing an opportunity to explore the whole conformational space of two interacting proteins. We demonstrate the application of the new FMFT-SAXS approach to significantly speed up SAXS filtering step in our current docking protocol (1 to 2 orders of magnitude faster, running in several minutes on a modern 16-core CPU) without loss of accuracy. This is demonstrated on the benchmark set as well as on the experimental data. The new approach is available as a part of ClusPro server (https://beta.cluspro.org) and as an open source C library (https://bitbucket.org/abc-group/libfmftsaxs). 相似文献

16.

The useful plants of Tambopata,Peru: II. Additional hypothesis testing in quantitative ethnobotany

Oliver Phillips Alwyn H. Gentry 《Economic botany》1993,47(1):33-43

We present results of applying a simple technique to statistically test several hypotheses in ethnobotany, using plant use data from non-indigenous people in southeast Peru. Hypotheses tested concern: (1) the power of eight different variables as predictors of a plant’s use value; (2) comparisons of ethnobotanical knowledge among informants; and (3) the relationship between informant age and knowledge of plant uses. Each class of hypothesis is evaluated with respect to all uses, and classes (1) and (3) are evaluated for each of the following subsidiary use categories: construction, edible, commerce, medicine, and technology. We found that the family to which a plant belongs explains a large part of the variance in species’ use values. Each of the other factors analyzed (growth-form, density, frequency, mean and maximum diameter, mean and maximum growth rate) is also significantly predictive of use values. Age significantly predicts informant knowledge of(l) all uses, and (2) of medicinal uses. Plant medicinal lore is particularly vulnerable to acculturation. 相似文献

17.

Mutationmapper: A Tool to Aid the Mapping of Protein Mutation Data

Shabana Vohra Philip C. Biggin 《PloS one》2013,8(8)

There has been a rapid increase in the amount of mutational data due to, amongst other things, an increase in single nucleotide polymorphism (SNP) data and the use of site-directed mutagenesis as a tool to help dissect out functional properties of proteins. Many manually curated databases have been developed to index point mutations but they are not sustainable with the ever-increasing volume of scientific literature. There have been considerable efforts in the automatic extraction of mutation specific information from raw text involving use of various text-mining approaches. However, one of the key problems is to link these mutations with its associated protein and to present this data in such a way that researchers can immediately contextualize it within a structurally related family of proteins. To aid this process, we have developed an application called MutationMapper. Point mutations are extracted from abstracts and are validated against protein sequences in Uniprot as far as possible. Our methodology differs in a fundamental way from the usual text-mining approach. Rather than start with abstracts, we start with protein sequences, which facilitates greatly the process of validating a potential point mutation identified in an abstract. The results are displayed as mutations mapped on to the protein sequence or a multiple sequence alignment. The latter enables one to readily pick up mutations performed at equivalent positions in related proteins. We demonstrate the use of MutationMapper against several examples including a single sequence and multiple sequence alignments. The application is available as a web-service at http://mutationmapper.bioch.ox.ac.uk. 相似文献

18.

A "holistic" kinesin phylogeny reveals new kinesin families and predicts protein functions

下载免费PDF全文

Wickstead B Gull K 《Molecular biology of the cell》2006,17(4):1734-1743

Kinesin superfamily proteins are ubiquitous to all eukaryotes and essential for several key cellular processes. With the establishment of genome sequence data for a substantial number of eukaryotes, it is now possible for the first time to analyze the complete kinesin repertoires of a diversity of organisms from most eukaryotic kingdoms. Such a "holistic" approach using 486 kinesin-like sequences from 19 eukaryotes and analyzed by Bayesian techniques, identifies three new kinesin families, two new phylum-specific groups, and unites two previously identified families. The paralogue distribution suggests that the eukaryotic cenancestor possessed nearly all kinesin families. However, multiple losses in individual lineages mean that no family is ubiquitous to all organisms and that the present day distribution reflects common biology more than it does common ancestry. In particular, the distribution of four families--Kinesin-2, -9, and the proposed new families Kinesin-16 and -17--correlates with the possession of cilia/flagella, and this can be used to predict a flagellar function for two new kinesin families. Finally, we present a set of hidden Markov models that can reliably place most new kinesin sequences into families, even when from an organism at a great evolutionary distance from those in the analysis. 相似文献

19.

A multigene family encoding several "finger" structures is present and differentially active in mammalian genomes 总被引：51，自引：0，他引：51

K Chowdhury U Deutsch P Gruss 《Cell》1987,48(5):771-778

Mouse genomic DNA contains multiple copies of sequences homologous to the Drosophila "Krüppel," a member of the "gap" class of developmental control genes of the fruit fly. The most interesting aspect of the homologous region is that, like Xenopus TFIIIA, it contains multiple finger-like folded domains capable of binding to nucleic acids. We have isolated six individual phages from a mouse genomic library on the basis of their DNA homology to Krüppel finger-coding probes, and describe here the DNA sequence and expression of two such clones containing finger-like structures. Upon differentiation of mouse teratocarcinoma cell line F9 with retinoic acid and cAMP, the expression of both genes was drastically reduced, and in one instance was undetectable. Each of the several other eukaryotic DNAs analyzed contained multiple copies of homologous genes with putative finger structures, indicating the presence of a finger-containing multigene family in higher organisms. 相似文献

20.

Feature Selection and Classification of MAQC-II Breast Cancer and Multiple Myeloma Microarray Gene Expression Data

Qingzhong Liu Andrew H. Sung Zhongxue Chen Jianzhong Liu Xudong Huang Youping Deng 《PloS one》2009,4(12)

Microarray data has a high dimension of variables but available datasets usually have only a small number of samples, thereby making the study of such datasets interesting and challenging. In the task of analyzing microarray data for the purpose of, e.g., predicting gene-disease association, feature selection is very important because it provides a way to handle the high dimensionality by exploiting information redundancy induced by associations among genetic markers. Judicious feature selection in microarray data analysis can result in significant reduction of cost while maintaining or improving the classification or prediction accuracy of learning machines that are employed to sort out the datasets. In this paper, we propose a gene selection method called Recursive Feature Addition (RFA), which combines supervised learning and statistical similarity measures. We compare our method with the following gene selection methods:

Support Vector Machine Recursive Feature Elimination (SVMRFE)
Leave-One-Out Calculation Sequential Forward Selection (LOOCSFS)
Gradient based Leave-one-out Gene Selection (GLGS)

To evaluate the performance of these gene selection methods, we employ several popular learning classifiers on the MicroArray Quality Control phase II on predictive modeling (MAQC-II) breast cancer dataset and the MAQC-II multiple myeloma dataset. Experimental results show that gene selection is strictly paired with learning classifier. Overall, our approach outperforms other compared methods. The biological functional analysis based on the MAQC-II breast cancer dataset convinced us to apply our method for phenotype prediction. Additionally, learning classifiers also play important roles in the classification of microarray data and our experimental results indicate that the Nearest Mean Scale Classifier (NMSC) is a good choice due to its prediction reliability and its stability across the three performance measurements: Testing accuracy, MCC values, and AUC errors. 相似文献