共查询到20条相似文献,搜索用时 15 毫秒
1.
Gene selection: a Bayesian variable selection approach 总被引:13,自引:0,他引:13
Selection of significant genes via expression patterns is an important problem in microarray experiments. Owing to small sample size and the large number of variables (genes), the selection process can be unstable. This paper proposes a hierarchical Bayesian model for gene (variable) selection. We employ latent variables to specialize the model to a regression setting and uses a Bayesian mixture prior to perform the variable selection. We control the size of the model by assigning a prior distribution over the dimension (number of significant genes) of the model. The posterior distributions of the parameters are not in explicit form and we need to use a combination of truncated sampling and Markov Chain Monte Carlo (MCMC) based computation techniques to simulate the parameters from the posteriors. The Bayesian model is flexible enough to identify significant genes as well as to perform future predictions. The method is applied to cancer classification via cDNA microarrays where the genes BRCA1 and BRCA2 are associated with a hereditary disposition to breast cancer, and the method is used to identify a set of significant genes. The method is also applied successfully to the leukemia data. SUPPLEMENTARY INFORMATION: http://stat.tamu.edu/people/faculty/bmallick.html. 相似文献
2.
Based on nearly complete genome sequences from a variety of organisms data on naturally occurring genetic variation on the scale of hundreds of loci to entire genomes have been collected in recent years. In parallel, new statistical tests have been developed to infer evidence of recent positive selection from these data and to localize the target regions of selection in the genome. These methods have now been successfully applied to Drosophila melanogaster , humans, mice and a few plant species. In genomic regions of normal recombination rates, the targets of positive selection have been mapped down to the level of individual genes. 相似文献
3.
Summary . Variable selection can be challenging, particularly in situations with a large number of predictors with possibly high correlations, such as gene expression data. In this article, a new method called the OSCAR (octagonal shrinkage and clustering algorithm for regression) is proposed to simultaneously select variables while grouping them into predictive clusters. In addition to improving prediction accuracy and interpretation, these resulting groups can then be investigated further to discover what contributes to the group having a similar behavior. The technique is based on penalized least squares with a geometrically intuitive penalty function that shrinks some coefficients to exactly zero. Additionally, this penalty yields exact equality of some coefficients, encouraging correlated predictors that have a similar effect on the response to form predictive clusters represented by a single coefficient. The proposed procedure is shown to compare favorably to the existing shrinkage and variable selection techniques in terms of both prediction error and model complexity, while yielding the additional grouping information. 相似文献
4.
MOTIVATION: Anchoring of proteins to the extracytosolic leaflet of membranes via C-terminal attachment of glycosylphosphatidylinositol (GPI) is ubiquitous and essential in eukaryotes. The signal for GPI-anchoring is confined to the C-terminus of the target protein. In order to identify anchoring signals in silico, we have trained neural networks on known GPI-anchored proteins, systematically optimizing input parameters. RESULTS: A Kohonen self-organizing map, GPI-SOM, was developed that predicts GPI-anchored proteins with high accuracy. In combination with SignalP, GPI-SOM was used in genome-wide surveys for GPI-anchored proteins in diverse eukaryotes. Apart from specialized parasites, a general trend towards higher percentages of GPI-anchored proteins in larger proteomes was observed. AVAILABILITY: GPI-SOM is accessible on-line at http://gpi.unibe.ch. The source code (written in C) is available on the same website. SUPPLEMENTARY INFORMATION: Positive training set, performance test sets and lists of predicted GPI-anchored proteins from different eukaryotes in fasta format. 相似文献
5.
Giuliano F.; Arrigo P.; Scalia F.; Cardo P. P.; Damiani G. 《Bioinformatics (Oxford, England)》1993,9(6):687-693
Computer recognition of short frnctional sites on DNA, suchas promoter regions or intronexon boundaries, has recentlyattracted much interest. In this paper we have focused our attentionon the automatic recognition of relevant features of human nucleicacid sequences by means of an unsupervised artificial neuralnetwork model. Sixty messenger RNA and 31 genomic DNA sequenceswere analysed. The results showed that in mRNA, the minimalsimilarity 60 base pattern was guanine-and cytosine-rich andlocated in most sequences in a range of 250 bases from eitherthe middle point of the signal peptide coding region or fromthe start of the coding region. On DNA sequences a region definedby a cluster of minimal similarity patterns was present in manyof the analysed genes. This zone may be related to alternativesplicing and DNA methylation. 相似文献
6.
Teuvo Kohonen 《Biological cybernetics》1996,75(4):281-291
A new self-organizing map (SOM) architecture called the ASSOM (adaptive-subspace SOM) is shown to create sets of translation-invariant
filters when randomly displaced or moving input patterns are used as training data. No analytical functional forms for these
filters are thereby postulated. Different kinds of filters are formed by the ASSOM when pictures are rotated during learning,
or when they are zoomed. The ASSOM can thus act as a learning feature-extraction stage for pattern recognizers, being able
to adapt to many sensory environments and to many different transformation groups of patterns.
Received: 14 September 1995 / Accepted in revised form: 8 May 1996 相似文献
7.
Wetlands are nutrient-rich and biodiverse ecosystems that provide habitats for various animals and plants and protect against flooding. Classification of wetlands provides information to conservation planners and resource managers for ecosystem service determination. Many ecological case studies illuminate the self-organizing map (SOM) as a robust and powerful data classification and visualization tool. In this study, we use the SOM to analyze the habitat characteristics of inland wetlands in South Korea. We surveyed the plants, benthic macroinvertebrates, and bird species inhabiting 530 nationwide wetlands for four years from 2016 to 2019. Nine environmental features, including the proportion of urban area, farmland, grassland, a forest within a 1 km buffer zone, distance from the river and nearest wetland, area, perimeter, and average slope of wetland polygons, were used to train the SOM and examine the habitat characteristics of the surveyed living components. A map size of 10 × 11 pixels was considered for SOM training, and the output data were classified into eight clusters. Based on the occurrence frequency of the surveyed species group, most species were distributed in all clusters, whereas some dominated in specific clusters. We believe that our study contributes significantly to the literature because it highlights the significance of the SOM approach to cluster wetlands with dependent habitats and provides ecological information to build sustainable wetland conservation policies. 相似文献
8.
SUMMARY: INteractive Codon usage Analysis (INCA) provides an array of features useful in analysis of synonymous codon usage in whole genomes. In addition to computing codon frequencies and several usage indices, such as 'codon bias', effective Nc and CAI, the primary strength of INCA has numerous options for the interactive graphical display of calculated values, thus allowing visual detection of various trends in codon usage. Finally, INCA includes a specific unsupervised neural network algorithm, the self-organizing map, used for gene clustering according to the preferred utilization of codons. AVAILABILITY: INCA is available for the Win32 platform and is free of charge for academic use. For details, visit the web page http://www.bioinfo-hr.org/inca or contact the author directly. SUPPLEMENTARY INFORMATION: Software is accompanied with a user manual and a short tutorial. 相似文献
9.
Mahony S Hendrix D Golden A Smith TJ Rokhsar DS 《Bioinformatics (Oxford, England)》2005,21(9):1807-1814
MOTIVATION: The automatic identification of over-represented motifs present in a collection of sequences continues to be a challenging problem in computational biology. In this paper, we propose a self-organizing map of position weight matrices as an alternative method for motif discovery. The advantage of this approach is that it can be used to simultaneously characterize every feature present in the dataset, thus lessening the chance that weaker signals will be missed. Features identified are ranked in terms of over-representation relative to a background model. RESULTS: We present an implementation of this approach, named SOMBRERO (self-organizing map for biological regulatory element recognition and ordering), which is capable of discovering multiple distinct motifs present in a single dataset. Demonstrated here are the advantages of our approach on various datasets and SOMBRERO's improved performance over two popular motif-finding programs, MEME and AlignACE. AVAILABILITY: SOMBRERO is available free of charge from http://bioinf.nuigalway.ie/sombrero SUPPLEMENTARY INFORMATION: http://bioinf.nuigalway.ie/sombrero/additional. 相似文献
10.
Monte Carlo feature selection for supervised classification 总被引:4,自引:0,他引:4
Draminski M Rada-Iglesias A Enroth S Wadelius C Koronacki J Komorowski J 《Bioinformatics (Oxford, England)》2008,24(1):110-117
MOTIVATION: Pre-selection of informative features for supervised classification is a crucial, albeit delicate, task. It is desirable that feature selection provides the features that contribute most to the classification task per se and which should therefore be used by any classifier later used to produce classification rules. In this article, a conceptually simple but computer-intensive approach to this task is proposed. The reliability of the approach rests on multiple construction of a tree classifier for many training sets randomly chosen from the original sample set, where samples in each training set consist of only a fraction of all of the observed features. RESULTS: The resulting ranking of features may then be used to advantage for classification via a classifier of any type. The approach was validated using Golub et al. leukemia data and the Alizadeh et al. lymphoma data. Not surprisingly, we obtained a significantly different list of genes. Biological interpretation of the genes selected by our method showed that several of them are involved in precursors to different types of leukemia and lymphoma rather than being genes that are common to several forms of cancers, which is the case for the other methods. AVAILABILITY: Prototype available upon request. 相似文献
11.
12.
Wheat gliadin and other cereal prolamins have been said to be involved in the pathogenic damage of the small intestine in celiac disease via the apoptosis of epithelial cells. In the present work we investigated the mechanisms underlying the pro-apoptotic activity exerted by gliadin-derived peptides in Caco-2 intestinal cells, a cell line which retains many morphological and enzymatic features typical of normal human enterocytes. We found that digested peptides from wheat gliadins (i) induce apoptosis by the CD95/Fas apoptotic pathway, (ii) induce increased Fas and FasL mRNA levels, (iii) determine increased FasL release in the medium, and (iv) that gliadin digest-induced apoptosis can be blocked by Fas cascade blocking agents, i.e. targeted neutralizing antibodies. This favors the hypothesis that gliadin could activate an autocrine/paracrine Fas-mediated cell death pathway. Finally, we found that (v) a small peptide (1157 Da) from durum wheat, previously proposed for clinical practice, exerted a powerful protective activity against gliadin digest cytotoxicity. 相似文献
13.
Kyung-Hee Choi Jung-Su Kim Young Shin Kim Mi-Ae Yoo Tae-Soo Chon 《Ecological Informatics》2006,1(3):219
The two dimensional movement tracks of STAT92E06346 mutant and two control strains (Oregon red (OR) and TM3) of Drosophila melonogaster were continuously observed with image processors. Subsequently Self-Organizing Map (SOM) was implemented to patterning of responding behaviors of the tested specimens. Movement behaviors were accordingly revealed in different strains and sex. SOM showed difference in degree of grouping in behaviors in different genotypes. Visualization through SOM further characterized the clusters of specimens with the variables regarding activities and spatial information. The study demonstrated that techniques in data mining in artificial neural networks could be a useful tool for analyzing complex behaviors induced by changes in genetic information. 相似文献
14.
This paper presents an approach to the well-known Travelling Salesman Problem (TSP) using Self-Organizing Maps (SOM). The SOM algorithm has interesting topological information about its neurons configuration on cartesian space, which can be used to solve optimization problems. Aspects of initialization, parameters adaptation, and complexity analysis of the proposed SOM based algorithm are discussed. The results show an average deviation of 3.7% from the optimal tour length for a set of 12 TSP instances. 相似文献
15.
N. V. Swindale H.-U. Bauer 《Proceedings. Biological sciences / The Royal Society》1998,265(1398):827-838
Cortical maps of orientation preference in cats, ferrets and monkeys contain numerous half-rotation point singularities. Experimental data have shown that direction preference also has a smooth representation in these maps, with preferences being for the most part orthogonal to the axis of preferred orientation. As a result, the orientation singularities induce an extensive set of linear fractures in the direction map. These fractures run between and connect nearby point orientation singularities. Their existence appears to pose a puzzle for theories that postulate that cortical maps maximize continuity of representation, because the fractures could be avoided if the orientation map contained full-rotation singularities. Here we show that a dimension-reduction model of cortical map formation, which implements principles of continuity and completeness, produces an arrangement of linear direction fractures connecting point orientation singularities which is similar to that observed experimentally. We analyse the behaviour of this model and suggest reasons why the model maps contain half-rotation rather than full-rotation orientation singularities. 相似文献
16.
17.
Jolanta J. Adamczyk 《Ecological Research》2011,26(3):547-554
Macrofungal communities were investigated in four associations of xerothermic swards: Festucetum pallentis, Origano-Brachypodietum, Adonido-Brachypodietum pinnati and Diantho-Armerietum elongatae in a Jurassic area of the Częstochowa Upland (southern Poland). A total of 47 species were recorded. The self-organising map (SOM)—an unsupervised algorithm for artificial neural networks—was used to recognise patterns in the macrofungal communities of diverse xerothermic swards. Only two associations were mycologically similar: Origano-Brachypodietum and Adonido-Brachypodietum pinnati. Species with high and significant IndVal (the species indicator value) for each investigated phytocoenoses are presented. The presence of macrofungal species and the participation of indicator species were connected with habitat factors of plant associations, as documented by the IndVal application. In the least fertile phytocoenoses, macrofungal communities were poor with few indicator species. The more fertile phytocoenoses had richer and more varied communities of macrofungi with higher numbers of indicator species. The ordering methods applied in this study were very effective for analyzing the macrofungal communities existing in plant associations. 相似文献
18.
Rowland JJ 《Bio Systems》2003,72(1-2):187-196
The expressive power, powerful search capability, and the explicit nature of the resulting models make evolutionary methods very attractive for supervised learning applications in bioinformatics. However, their characteristics also make them highly susceptible to overtraining or to discovering chance relationships in the data. Identification of appropriate criteria for terminating evolution and for selecting an appropriately validated model is vital. Some approaches that are commonly applied to other modelling methods are not necessarily applicable in a straightforward manner to evolutionary methods. An approach to model selection is presented that is not unduly computationally intensive. To illustrate the issues and the technique two bioinformatic datasets are used, one relating to metabolite determination and the other to disease prediction from gene expression data. 相似文献
19.
20.
Understanding the geographical patterns and divisions of communities is a fundamental step in achieving the sustainable management of ecosystems, especially in deteriorating global and local environments. The idea of geographical division has been applied on all continents but Antarctica, but it has never been rigorously tested for stream ecosystems in China, leaving a gap in knowledge for many basic and applied research questions regarding, for example, diversity patterns, conservation issues or climate change effects. To fill this gap, we aimed to (1) evaluate the geographical divisions of the macroinvertebrate communities in Chinese streams using the self-organizing map (SOM) method and (2) to characterize the distribution patterns in relation to different environmental variables. Macroinvertebrates were collected from 57 relatively clean stream sites covering a south-north gradient along the boundary of the geographic ladder (or altitudinal divide) in China. SOM was used to analyze large-scale biogeographical divisions of the macroinvertebrate communities. The sampling sites were divided into six clusters, distinguishing the samples from northern, central, and southern China. This pattern was also reflected by biotic metrics (abundance, biomass, taxa and sum of Ephemeroptera, Plecoptera, and Trichoptera richness, and diversity). The gradient of environmental variables, particularly water quality variables, was similar between the clusters, with the exceptions of two clusters from southwestern China when considering altitude and one cluster from northern China when considering conductivity and TN. The different clusters from the SOM were associated with indicator species, with clean-water adapted species dominating in southwestern China and pollution tolerant species in northern China. However, there were no significant correlations between environmental variables and biotic metrics. The overall combination of environmental variables and organism data suggests that spatial variation was the main predictor determining the composition of the macroinvertebrate communities on a large-scale, and the trained SOM appeared to be efficient at classifying streams on a broad geographic scale. 相似文献