首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Random forest is an ensemble classification algorithm. It performs well when most predictive variables are noisy and can be used when the number of variables is much larger than the number of observations. The use of bootstrap samples and restricted subsets of attributes makes it more powerful than simple ensembles of trees. The main advantage of a random forest classifier is its explanatory power: it measures variable importance or impact of each factor on a predicted class label. These characteristics make the algorithm ideal for microarray data. It was shown to build models with high accuracy when tested on high-dimensional microarray datasets. Current implementations of random forest in the machine learning and statistics community, however, limit its usability for mining over large datasets, as they require that the entire dataset remains permanently in memory. We propose a new framework, an optimized implementation of a random forest classifier, which addresses specific properties of microarray data, takes computational complexity of a decision tree algorithm into consideration, and shows excellent computing performance while preserving predictive accuracy. The implementation is based on reducing overlapping computations and eliminating dependency on the size of main memory. The implementation's excellent computational performance makes the algorithm useful for interactive data analyses and data mining.  相似文献   

2.
Kabir MR  Kumar A 《Bioresource technology》2011,102(19):8972-8985
This study investigates the energy and environmental aspects of producing biohydrogen for bitumen upgrading from a life cycle perspective. Three technologies are studied for biohydrogen production; these include the Battelle Columbus Laboratory (BCL) gasifier, the Gas Technology Institute (GTI) gasifier, and fast pyrolysis. Three different biomass feedstocks are considered including forest residue (FR), whole forest (WF), and agricultural residue (AR). The fast pyrolysis pathway includes two cases: truck transport of bio-oil and pipeline transport of bio-oil. The net energy ratios (NERs) for nine biohydrogen pathways lie in the range of 1.3-9.3. The maximum NER (9.3) is for the FR-based pathway using GTI technology. The GHG emissions lie in the range of 1.20-8.1 kg CO? eq/kg H?. The lowest limit corresponds to the FR-based biohydrogen production pathway using GTI technology. This study also analyzes the intensities for acid rain precursor and ground level ozone precursor.  相似文献   

3.
RNA interference (RNAi) screening is extensively used in the field of reverse genetics. RNAi libraries constructed using random oligonucleotides have made this technology affordable. However, the new methodology requires exploration of the RNAi target gene information after screening because the RNAi library includes non-natural sequences that are not found in genes. Here, we developed a web-based tool to support RNAi screening. The system performs short hairpin RNA (shRNA) target prediction that is informed by comprehensive enquiry (SPICE). SPICE automates several tasks that are laborious but indispensable to evaluate the shRNAs obtained by RNAi screening. SPICE has four main functions: (i) sequence identification of shRNA in the input sequence (the sequence might be obtained by sequencing clones in the RNAi library), (ii) searching the target genes in the database, (iii) demonstrating biological information obtained from the database, and (iv) preparation of search result files that can be utilized in a local personal computer (PC). Using this system, we demonstrated that genes targeted by random oligonucleotide-derived shRNAs were not different from those targeted by organism-specific shRNA. The system facilitates RNAi screening, which requires sequence analysis after screening. The SPICE web application is available at http://www.spice.sugysun.org/.  相似文献   

4.
In contracting muscle, individual myosin molecules function as part of a large ensemble, hydrolyzing ATP to power the relative sliding of actin filaments. The technological advances that have enabled direct observation and manipulation of single molecules, including recent experiments that have explored myosin's force-dependent properties, provide detailed insight into the kinetics of myosin's mechanochemical interaction with actin. However, it has been difficult to reconcile these single-molecule observations with the behavior of myosin in an ensemble. Here, using a combination of simulations and theory, we show that the kinetic mechanism derived from single-molecule experiments describes ensemble behavior; but the connection between single molecule and ensemble is complex. In particular, even in the absence of external force, internal forces generated between myosin molecules in a large ensemble accelerate ADP release and increase how far actin moves during a single myosin attachment. These myosin-induced changes in strong binding lifetime and attachment distance cause measurable properties, such as actin speed in the motility assay, to vary depending on the number of myosin molecules interacting with an actin filament. This ensemble-size effect challenges the simple detachment limited model of motility, because even when motility speed is limited by ADP release, increasing attachment rate can increase motility speed.  相似文献   

5.
We develop a new technique to analyse microarray data which uses a combination of principal components analysis and consensus ensemble k-clustering to find robust clusters and gene markers in the data. We apply our method to a public microarray breast cancer dataset which has expression levels of genes in normal samples as well as in three pathological stages of disease; namely, atypical ductal hyperplasia or ADH, ductal carcinoma in situ or DCIS and invasive ductal carcinoma or IDC. Our method averages over clustering techniques and data perturbation to find stable, robust clusters and gene markers. We identify the clusters and their pathways with distinct subtypes of breast cancer (Luminal,Basal and Her2+). We confirm that the cancer phenotype develops early (in early hyperplasia or ADH stage) and find from our analysis that each subtype progresses from ADH to DCIS to IDC along its own specific pathway, as if each was a distinct disease.  相似文献   

6.
MOTIVATION: Microarray experiments are expected to contribute significantly to the progress in cancer treatment by enabling a precise and early diagnosis. They create a need for class prediction tools, which can deal with a large number of highly correlated input variables, perform feature selection and provide class probability estimates that serve as a quantification of the predictive uncertainty. A very promising solution is to combine the two ensemble schemes bagging and boosting to a novel algorithm called BagBoosting. RESULTS: When bagging is used as a module in boosting, the resulting classifier consistently improves the predictive performance and the probability estimates of both bagging and boosting on real and simulated gene expression data. This quasi-guaranteed improvement can be obtained by simply making a bigger computing effort. The advantageous predictive potential is also confirmed by comparing BagBoosting to several established class prediction tools for microarray data. AVAILABILITY: Software for the modified boosting algorithms, for benchmark studies and for the simulation of microarray data are available as an R package under GNU public license at http://stat.ethz.ch/~dettling/bagboost.html.  相似文献   

7.
Gene selection and classification of microarray data using random forest   总被引:9,自引:0,他引:9  

Background  

Selection of relevant genes for sample classification is a common task in most gene expression studies, where researchers try to identify the smallest possible set of genes that can still achieve good predictive performance (for instance, for future use with diagnostic purposes in clinical practice). Many gene selection approaches use univariate (gene-by-gene) rankings of gene relevance and arbitrary thresholds to select the number of genes, can only be applied to two-class problems, and use gene selection ranking criteria unrelated to the classification algorithm. In contrast, random forest is a classification algorithm well suited for microarray data: it shows excellent performance even when most predictive variables are noise, can be used when the number of variables is much larger than the number of observations and in problems involving more than two classes, and returns measures of variable importance. Thus, it is important to understand the performance of random forest with microarray data and its possible use for gene selection.  相似文献   

8.
The flagellum of Trypanosoma brucei is an essential and multifunctional organelle that is receiving increasing attention as a potential drug target and as a system for studying flagellum biology. RNA interference (RNAi) knockdown is widely used to test the requirement for a protein in flagellar motility and has suggested that normal flagellar motility is essential for viability in bloodstream-form trypanosomes. However, RNAi knockdown alone provides limited functional information because the consequence is often loss of a multiprotein complex. We therefore developed an inducible system that allows functional analysis of point mutations in flagellar proteins in T. brucei. Using this system, we identified point mutations in the outer dynein light chain 1 (LC1) that allow stable assembly of outer dynein motors but do not support propulsive motility. In procyclic-form trypanosomes, the phenotype of LC1 mutants with point mutations differs from the motility and structural defects of LC1 knockdowns, which lack the outer-arm dynein motor. Thus, our results distinguish LC1-specific functions from broader functions of outer-arm dynein. In bloodstream-form trypanosomes, LC1 knockdown blocks cell division and is lethal. In contrast, LC1 point mutations cause severe motility defects without affecting viability, indicating that the lethal phenotype of LC1 RNAi knockdown is not due to defective motility. Our results demonstrate for the first time that normal motility is not essential in bloodstream-form T. brucei and that the presumed connection between motility and viability is more complex than might be interpreted from knockdown studies alone. These findings open new avenues for dissecting mechanisms of flagellar protein function and provide an important step in efforts to exploit the potential of the flagellum as a therapeutic target in African sleeping sickness.  相似文献   

9.
An ensemble classifier approach for microRNA precursor (pre-miRNA) classification was proposed based upon combining a set of heterogeneous algorithms including support vector machine (SVM), k-nearest neighbors (kNN) and random forest (RF), then aggregating their prediction through a voting system. Additionally, the proposed algorithm, the classification performance was also improved using discriminative features, self-containment and its derivatives, which have shown unique structural robustness characteristics of pre-miRNAs. These are applicable across different species. By applying preprocessing methods—both a correlation-based feature selection (CFS) with genetic algorithm (GA) search method and a modified-Synthetic Minority Oversampling Technique (SMOTE) bagging rebalancing method—improvement in the performance of this ensemble was observed. The overall prediction accuracies obtained via 10 runs of 5-fold cross validation (CV) was 96.54%, with sensitivity of 94.8% and specificity of 98.3%—this is better in trade-off sensitivity and specificity values than those of other state-of-the-art methods. The ensemble model was applied to animal, plant and virus pre-miRNA and achieved high accuracy, >93%. Exploiting the discriminative set of selected features also suggests that pre-miRNAs possess high intrinsic structural robustness as compared with other stem loops. Our heterogeneous ensemble method gave a relatively more reliable prediction than those using single classifiers. Our program is available at http://ncrna-pred.com/premiRNA.html.  相似文献   

10.
MOTIVATION: Microarrays have been widely used to discover novel disease related genes. Some types of microarray, such as cDNA arrays, usually contain a considerable portion of missing values. When missing value imputation and gene prioritization are sequentially conducted, it is necessary to consider the distribution space of prioritization scores due to the existence of missing values. We propose an ensemble approach to address this issue. A bootstrap procedure enables us to generate a resample multivariate distribution of the prioritization scores and then to obtain the expected prioritization scores. RESULTS: We used a published microarray two-sample data set to illustrate our approach. We focused on the following issues after missing value imputation: (i) concordance of gene prioritization and (ii) control of true and false positives. We compared our approach with the traditional non-ensemble approach to missing value imputation. We also evaluated the performance of non-imputation approach when the theoretical test distribution was available. The results showed that the ensemble imputation approach provided clearly improved performances in the concordance of gene prioritization and the control of true/false positives, especially when sample sizes were about 5-10 per group and missing rates were about 10-20%, which was a common situation for cDNA microarray studies. AVAILABILITY: The Matlab codes are freely available at http://home.gwu.edu/~ylai/research/Missing.  相似文献   

11.
HMGN1 is a nuclear protein that binds to nucleosomes and alters the accessibility of regulatory factors to their chromatin targets. To elucidate its biological function and identify specific HMGN1 target genes, we generated Hmgn1-/- mice. DNA microarray analysis of Hmgn1+/+ and Hmgn1-/- embryonic fibroblasts identified N-cadherin as a potential HMGN1 gene target. RT-PCR and western blot analysis confirmed a linkage between HMGN1 expression and N-cadherin levels. In both transformed and primary mouse embryonic fibroblasts (MEFs), HMGN1 acted as negative regulator of N-cadherin expression. Likewise, the N-cadherin levels in early embryos of Hmgn1-/- mice were higher than those of their Hmgn1+/+ littermates. Loss of HMGN1 increased the adhesiveness, motility and aggregation potential of Hmgn1-/- MEFs, a phenotype consistent with increased levels of N-cadherin protein. Re-expression of wild-type HMGN1, but not of the mutant HMGN1 protein that does not bind to chromatin, in Hmgn1-/- MEFs, decreased the levels of N-cadherin and restored the Hmgn1+/+ phenotype. These studies demonstrate a role for HMGN1 in the regulation of specific gene expression. We suggest that in MEFs, and during early mouse development, the interaction of HMGN1 with chromatin down-regulates the expression of N-cadherin.  相似文献   

12.
Tran LM  Rizk ML  Liao JC 《Biophysical journal》2008,95(12):5606-5617
Complete modeling of metabolic networks is desirable, but it is difficult to accomplish because of the lack of kinetics. As a step toward this goal, we have developed an approach to build an ensemble of dynamic models that reach the same steady state. The models in the ensemble are based on the same mechanistic framework at the elementary reaction level, including known regulations, and span the space of all kinetics allowable by thermodynamics. This ensemble allows for the examination of possible phenotypes of the network upon perturbations, such as changes in enzyme expression levels. The size of the ensemble is reduced by acquiring data for such perturbation phenotypes. If the mechanistic framework is approximately accurate, the ensemble converges to a smaller set of models and becomes more predictive. This approach bypasses the need for detailed characterization of kinetic parameters and arrives at a set of models that describes relevant phenotypes upon enzyme perturbations.  相似文献   

13.
The human pathogen Mycoplasma genitalium is known to mediate cell adhesion to target cells by the attachment organelle, a complex structure also implicated in gliding motility. The gliding mechanism of M. genitalium cells is completely unknown, but recent studies have begun to elucidate the components of the gliding machinery. We report the study of MG312, a cytadherence-related protein containing in the N terminus a box enriched in aromatic and glycine residues (EAGR), which is also exclusively found in MG200 and MG386 gliding motility proteins. Characterization of an MG_312 deletion mutant obtained by homologous recombination has revealed that the MG312 protein is required for the assembly of the M. genitalium terminal organelle. This finding is consistent with the intermediate-cytadherence phenotype and the complete absence of gliding motility exhibited by this mutant. Reintroduction of several MG_312 deletion derivatives into the MG_312 null mutant allowed us to identify two separate functional domains: an N-terminal domain implicated in gliding motility and a C-terminal domain involved in cytadherence and terminal organelle assembly functions. In addition, our results also provide evidence that the EAGR box has a specific contribution to mycoplasma cell motion. Finally, the presence of a conserved ATP binding site known as a Walker A box in the MG312 N-terminal region suggests that this structural protein could also play an active function in the gliding mechanism.  相似文献   

14.
Myxococcus xanthus moves on solid surfaces by using two gliding motility systems, A motility for individual-cell movement and S motility for coordinated group movements. The frz genes encode chemotaxis homologues that control the cellular reversal frequency of both motility systems. One of the components of the core Frz signal transduction pathway, FrzE, is homologous to both CheA and CheY from the enteric bacteria and is therefore a novel CheA-CheY fusion protein. In this study, we investigated the role of this fusion protein, in particular, the CheY domain (FrzECheY). FrzECheY retains all of the highly conserved residues of the CheY superfamily of response regulators, including Asp709, analogous to phosphoaccepting Asp57 of Escherichia coli CheY. While in-frame deletion of the entire frzE gene caused both motility systems to show a hyporeversal phenotype, in-frame deletion of the FrzECheY domain resulted in divergent phenotypes for the two motility systems: hyperreversals of the A-motility system and hyporeversals of the S-motility system. To further investigate the role of FrzECheY in A and S motility, point mutations were constructed such that the putative phosphoaccepting residue, Asp709, was changed from D to A (and was therefore never subject to phosphorylation) or E (possibly mimicking constitutive phosphorylation). The D709A mutant showed hyperreversals for both motilities, while the D709E mutant showed hyperreversals for A motility and hyporeversal for S motility. These results show that the FrzECheY domain plays a critical signaling role in coordinating A and S motility. On the basis of the phenotypic analyses of the frzE mutants generated in this study, a model is proposed for the divergent signal transduction through FrzE in controlling and coordinating A and S motility in M. xanthus.  相似文献   

15.
Previous studies of the effects of fur trapping on marten populations have not considered habitat variation and how trappers use available habitat. We investigated the behavior of fur trappers with respect to roads, waterways, and the forest habitats on trap lines, using registered trap lines in northern Ontario as a study system. The objectives of this study were to 1) develop models for predicting trap location based on access and habitat features, 2) determine whether trappers target the same habitat preferred by American marten, and 3) investigate effects of spatial resolution on predictive models, using a geographic information system (GIS) for coarse resolution variables and direct forest mensuration for fine resolution variables. Distance to roads and water were by far the most influential factors in logistic models for predicting trap presence, accounting for 51.2–61.7% of the observed deviance. At a coarse spatial resolution, trappers selected sites that were close to vehicular access, and in older mixed wood forest stands. Similarly, at a coarse resolution, marten selected old stands, but dominated by coniferous trees. At a finer spatial resolution, trappers selected sites with high basal area of trees, pronounced proportion of black spruce, high canopy cover, and high density of coarse woody debris, consistent with previous studies on marten habitat selection at a fine resolution. Although coarse resolution models are easily applicable because of the wide availability of GIS land cover data, fine resolution models had greater predictive power when considering habitat variables. By quantifying trapper behaviors, these results suggest that the effectiveness of marten sanctuaries used in forest management depend not only on the age and species composition of forest stands left unlogged, but also on the degree to which they are accessible to trappers. © 2012 The Wildlife Society.  相似文献   

16.
This paper describes the formation of single polar bundles of pili on Azospirillum brasilense cells, the twitching motility of cell aggregates, and a new type of social behavior--the dispersal of bacterial cells in semiliquid agar associated with the formation of granular inclusions (the so-called Gri+ phenotype)--which is an alternative to swarming (the Swa+ phenotype). The wild-type A. brasilense cells occurring in a semiliquid agar may show either the Swa+Gri-, or Swa-Gri-, or Swa-Gri+ phenotype. The formation of single polar flagella (Fla) or polar bundles of pili may reflect two alternative states of A. brasilense cells. The components of the Fla system may be involved in the regulation of the phenotypic variation of azospirilla.  相似文献   

17.
HGF/SF-Met signaling in tumor progression   总被引:11,自引:0,他引:11  
Tumor progression is a multi-step process that requires a sequential selection of specific malignant phenotypes. Met activation may induce different phenotypes depending on tumor stage: inducing proliferation and angiogenesis in primary tumors, stimulating motility to form micrometastases, and regaining the proliferation phenotype to form overt metastases. To study how HGF/SF-induced proliferative phenotypes switch to the invasive phenotype is important for understanding the mechanism of tumor progression and will provide an attractive target for cancer intervention and therapy.  相似文献   

18.
The DNA-binding protein (DBP) has a wide range of roles such as those in DNA repair, recombination, and gene expression. Recently, a microarray-based method has been developed for the high-throughput analysis of DNA-protein interactions. However, to maximize the advantages of this method, the detection process should be improved so that the method can be applied to many proteins without the use of antibody or sample labeling. Previously, we presented a primary report on the detection of DBP, which is applicable to the microarray format. The system consists of three steps: first, the target DBP in the sample solution is incubated with a probe DNA; second, the probe is digested with Exo (Exonuclease) III; finally, the probe is extended withTaq DNA polymerase using fluorescent dye-labeled dUTP as a substrate. The binding DBP protects the probe from digestion by Exo III. Therefore, only the DBP-bound probe allows the following extension. In this study, the simultaneous detection of multiple DBPs was examined, and then the DBPs were analyzed using a crude extract of the cultured cells to demonstrate the general applicability of the method. Our method can be applied to many DBPs using the same procedure and components, whereas in the antibody-based method, the same number of antibodies as DBPs is needed to detect target DBPs in ELISA (enzyme-linked immunosorbent assay). These results suggest that our method is useful for the high-throughput detection of DBPs in the microarray format.  相似文献   

19.
In breast cancer, inactivation of the RB tumor suppressor gene is believed to occur via multiple mechanisms to facilitate tumorigenesis. However, the prognostic and predictive value of RB status in disease-specific clinical outcomes has remained uncertain. We investigated RB pathway deregulation in the context of both ER-positive and ER-negative disease using combined microarray datasets encompassing over 900 breast cancer patient samples. Disease-specific characteristics of RB pathway deregulation were investigated in this dataset by evaluating correlation among pathway genes as well as differential expression across patient tumor populations defined by ER status. Survival analysis among these breast cancer samples demonstrates that the RB-loss signature is associated with poor disease outcome within several independent cohorts. Within the ER-negative subpopulation, the RB-loss signature is associated with improved response to chemotherapy and longer relapse-free survival. Additionally, while individual genes in the RB target signature closely reproduce its prognostic value, they also serve to predict and monitor response to therapeutic compounds, such as the cytostatic agent PD-0332991. These results indicate that the RB-loss signature expression is associated with poor outcome in breast cancer, but predicts improved response to chemotherapy based on data in ER-negative populations. While the RB-loss signature, as a whole, demonstrates prognostic and predictive utility, a small subset of markers could be sufficient to stratify patients based on RB function and inform the selection of appropriate therapeutic regimens.Key words: RB, breast cancer, microarray, proliferation, cytostatics  相似文献   

20.
Escherichia coli K-12 has the ability to migrate on semisolid media by means of swarming motility. A systematic and comprehensive collection of gene-disrupted E. coli K-12 mutants (the Keio collection) was used to identify the genes involved in the swarming motility of this bacterium. Of the 3,985 nonessential gene mutants, 294 were found to exhibit a strongly repressed-swarming phenotype. Further, 216 of the 294 mutants displayed no significant defects in swimming motility; therefore, the 216 genes were considered to be specifically associated with the swarming phenotype. The swarming-associated genes were classified into various functional categories, indicating that swarming is a specialized form of motility that requires a wide variety of cellular activities. These genes include genes for tricarboxylic acid cycle and glucose metabolism, iron acquisition, chaperones and protein-folding catalysts, signal transduction, and biosynthesis of cell surface components, such as lipopolysaccharide, the enterobacterial common antigen, and type 1 fimbriae. Lipopolysaccharide and the enterobacterial common antigen may be important surface-acting components that contribute to the reduction of surface tension, thereby facilitating the swarm migration in the E. coli K-12 strain.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号