首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 265 毫秒
1.
MOTIVATION: High-dimensional data such as microarrays have created new challenges to traditional statistical methods. One such example is on class prediction with high-dimension, low-sample size data. Due to the small sample size, the sample mean estimates are usually unreliable. As a consequence, the performance of the class prediction methods using the sample mean may also be unsatisfactory. To obtain more accurate estimation of parameters some statistical methods, such as regularizations through shrinkage, are often desired. RESULTS: In this article, we investigate the family of shrinkage estimators for the mean value under the quadratic loss function. The optimal shrinkage parameter is proposed under the scenario when the sample size is fixed and the dimension is large. We then construct a shrinkage-based diagonal discriminant rule by replacing the sample mean by the proposed shrinkage mean. Finally, we demonstrate via simulation studies and real data analysis that the proposed shrinkage-based rule outperforms its original competitor in a wide range of settings.  相似文献   

2.

Background  

DNA microarrays open up a new horizon for studying the genetic determinants of disease. The high throughput nature of these arrays creates an enormous wealth of information, but also poses a challenge to data analysis. Inferential problems become even more pronounced as experimental designs used to collect data become more complex. An important example is multigroup data collected over different experimental groups, such as data collected from distinct stages of a disease process. We have developed a method specifically addressing these issues termed Bayesian ANOVA for microarrays (BAM). The BAM approach uses a special inferential regularization known as spike-and-slab shrinkage that provides an optimal balance between total false detections and total false non-detections. This translates into more reproducible differential calls. Spike and slab shrinkage is a form of regularization achieved by using information across all genes and groups simultaneously.  相似文献   

3.
MOTIVATION: Discriminant analysis for high-dimensional and low-sample-sized data has become a hot research topic in bioinformatics, mainly motivated by its importance and challenge in applications to tumor classifications for high-dimensional microarray data. Two of the popular methods are the nearest shrunken centroids, also called predictive analysis of microarray (PAM), and shrunken centroids regularized discriminant analysis (SCRDA). Both methods are modifications to the classic linear discriminant analysis (LDA) in two aspects tailored to high-dimensional and low-sample-sized data: one is the regularization of the covariance matrix, and the other is variable selection through shrinkage. In spite of their usefulness, there are potential limitations with each method. The main concern is that both PAM and SCRDA are possibly too extreme: the covariance matrix in the former is restricted to be diagonal while in the latter there is barely any restriction. Based on the biology of gene functions and given the feature of the data, it may be beneficial to estimate the covariance matrix as an intermediate between the two; furthermore, more effective shrinkage schemes may be possible. RESULTS: We propose modified LDA methods to integrate biological knowledge of gene functions (or variable groups) into classification of microarray data. Instead of simply treating all the genes independently or imposing no restriction on the correlations among the genes, we group the genes according to their biological functions extracted from existing biological knowledge or data, and propose regularized covariance estimators that encourages between-group gene independence and within-group gene correlations while maintaining the flexibility of any general covariance structure. Furthermore, we propose a shrinkage scheme on groups of genes that tends to retain or remove a whole group of the genes altogether, in contrast to the standard shrinkage on individual genes. We show that one of the proposed methods performed better than PAM and SCRDA in a simulation study and several real data examples.  相似文献   

4.
The landscape of human phosphorylation networks has not been systematically explored, representing vast, unchartered territories within cellular signaling networks. Although a large number of in vivo phosphorylated residues have been identified by mass spectrometry (MS)‐based approaches, assigning the upstream kinases to these residues requires biochemical analysis of kinase‐substrate relationships (KSRs). Here, we developed a new strategy, called CEASAR, based on functional protein microarrays and bioinformatics to experimentally identify substrates for 289 unique kinases, resulting in 3656 high‐quality KSRs. We then generated consensus phosphorylation motifs for each of the kinases and integrated this information, along with information about in vivo phosphorylation sites determined by MS, to construct a high‐resolution map of phosphorylation networks that connects 230 kinases to 2591 in vivo phosphorylation sites in 652 substrates. The value of this data set is demonstrated through the discovery of a new role for PKA downstream of Btk (Bruton's tyrosine kinase) during B‐cell receptor signaling. Overall, these studies provide global insights into kinase‐mediated signaling pathways and promise to advance our understanding of cellular signaling processes in humans.  相似文献   

5.
Summary Diagonal discriminant rules have been successfully used for high‐dimensional classification problems, but suffer from the serious drawback of biased discriminant scores. In this article, we propose improved diagonal discriminant rules with bias‐corrected discriminant scores for high‐dimensional classification. We show that the proposed discriminant scores dominate the standard ones under the quadratic loss function. Analytical results on why the bias‐corrected rules can potentially improve the predication accuracy are also provided. Finally, we demonstrate the improvement of the proposed rules over the original ones through extensive simulation studies and real case studies.  相似文献   

6.
7.
L. Finos  A. Farcomeni 《Biometrics》2011,67(1):174-181
Summary We show a novel approach for k‐FWER control which does not involve any correction, but only testing the hypotheses along a (possibly data‐driven) order until a suitable number of p‐values are found above the uncorrected α level. p‐values can arise from any linear model in a parametric or nonparametric setting. The approach is not only very simple and computationally undemanding, but also the data‐driven order enhances power when the sample size is small (and also when k and/or the number of tests is large). We illustrate the method on an original study about gene discovery in multiple sclerosis, in which were involved a small number of couples of twins, discordant by disease. The methods are implemented in an R package (someKfwer ), freely available on CRAN.  相似文献   

8.
The relative influence of Neogene geomorphological events and Quaternary climatic changes as causal mechanisms on Neotropical diversification remains largely speculative, as most divergence timing inferences are based on a single locus and have limited taxonomic or geographic sampling. To investigate these influences, we use a multilocus (two mitochondrial and 11 nuclear genes) range‐wide sampling of Phyllopezus pollicaris, a gecko complex widely distributed across the poorly studied South American ‘dry diagonal’ biomes. Our approach couples traditional and model‐based phylogeography with geospatial methods, and demonstrates Miocene diversification and limited influence of Pleistocene climatic fluctuations on P. pollicaris. Phylogeographic structure and distribution models highlight that persistence across multiple isolated regions shaped the diversification of this species complex. Approximate Bayesian computation supports hypotheses of allopatric and ecological/sympatric speciation between lineages that largely coincide with genetic clusters associated with Chaco, Cerrado, and Caatinga, standing for complex diversification between the ‘dry diagonal’ biomes. We recover extremely high genetic diversity and suggest that eight well‐supported clades may be valid species, with direct implications for taxonomy and conservation assessments. These patterns exemplify how low‐vagility species complexes, characterized by strong genetic structure and pre‐Pleistocene divergence histories, represent ideal radiations to investigate broad biogeographic histories of associated biomes.  相似文献   

9.
Sheets, H.D., Mitchell, C.E., Izard, Z.T., Willis, J.M., Melchin, M.J. & Holmden, C. 2012: Horizon annealing: a collection‐based approach to automated sequencing of the fossil record. Lethaia, Vol. 45, pp. 532–547. A number of different approaches to quantitative biochronology have been proposed and used to construct high‐resolution time‐scales for a range of uses. We present a new approach, horizon annealing, which uses simulated annealing to optimize the sequencing of collection horizons. Temporal sequences of events produced by this method are compared with those produced by graphic correlation, CONOP and RASC for a series of previously studied exemplar data sets. Horizon annealing produces results similar to other methods, but it does have properties (the ordination of collections and the avoidance of some local minima) that make it useful for high‐resolution studies, particularly those based on capture‐mark‐recapture methods requiring detailed presence–absence data for individual collections and taxa. □ Chronostratigraphy, graphic correlation, graptolite, rate of evolution, CONOP9.  相似文献   

10.
The classification of cancer subtypes, which is critical for successful treatment, has been studied extensively with the use of gene expression profiles from oligonucleotide chips or cDNA microarrays. Various pattern recognition methods have been successfully applied to gene expression data. However, these methods are not optimal, rather they are high-performance classifiers that emphasize only classification accuracy. In this paper, we propose an approach for the construction of the optimal linear classifier using gene expression data. Two linear classification methods, linear discriminant analysis (LDA) and discriminant partial least-squares (DPLS), are applied to distinguish acute leukemia subtypes. These methods are shown to give satisfactory accuracy. Moreover, we determined optimally the number of genes participating in the classification (a remarkably small number compared to previous results) on the basis of the statistical significance test. Thus, the proposed method constructs the optimal classifier that is composed of a small size predictor and provides high accuracy.  相似文献   

11.
Pectic homogalacturonan (HG) is one of the main constituents of plant cell walls. When processed to low degrees of esterification, HG can form complexes with divalent calcium ions. These macromolecular structures (also called egg boxes) play an important role in determining the biomechanics of cell walls and in mediating cell‐to‐cell adhesion. Current immunological methods enable only steady‐state detection of egg box formation in situ. Here we present a tool for efficient real‐time visualisation of available sites for HG crosslinking within cell wall microdomains. Our approach is based on calcium‐mediated binding of fluorescently tagged long oligogalacturonides (OGs) with endogenous de‐esterified HG. We established that more than seven galacturonic acid residues in the HG chain are required to form a stable complex with endogenous HG through calcium complexation in situ, confirming a recently suggested thermodynamic model. Using defined carbohydrate microarrays, we show that the long OG probe binds exclusively to HG that has a very low degree of esterification and in the presence of divalent ions. We used this probe to study real‐time dynamics of HG during elongation of Arabidopsis pollen tubes and root hairs. Our results suggest a different spatial organisation of incorporation and processing of HG in the cell walls of these two tip‐growing structures.  相似文献   

12.
13.
Dabney AR  Storey JD 《PloS one》2007,2(10):e1002
Nearest-centroid classifiers have recently been successfully employed in high-dimensional applications, such as in genomics. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is frequently carried out by computing univariate scores for each feature individually, without consideration for how a subset of features performs as a whole. We introduce a new feature selection approach for high-dimensional nearest centroid classifiers that instead is based on the theoretically optimal choice of a given number of features, which we determine directly here. This allows us to develop a new greedy algorithm to estimate this optimal nearest-centroid classifier with a given number of features. In addition, whereas the centroids are usually formed from maximum likelihood estimates, we investigate the applicability of high-dimensional shrinkage estimates of centroids. We apply the proposed method to clinical classification based on gene-expression microarrays, demonstrating that the proposed method can outperform existing nearest centroid classifiers.  相似文献   

14.
In the period from January 1981 to December 2010, 1068 small‐molecule new chemical entities (NCEs) were introduced, of which ca. 34% are either a natural product or a close analogue. While this metric reflects the impact natural products have played in delivering new chemical starting points (leads) for the pharmaceutical industry, it does not capture the decline this approach has suffered over the last 20 years as the high‐throughput screening (HTS) of pure compound libraries has become more popular. An impediment to natural‐product drug discovery in the HTS paradigm is the lack of a clear strategy that enables front‐loading of an extract or fraction's chemical constituents so that they are compliant with lead‐ and drug‐like chemical space. To address this imbalance, an approach based on lipophilicity, as measured by clog P has been developed that, together with advances being made in isolation and structural elucidation, can afford natural product leads in timelines compatible with pure compound screening.  相似文献   

15.
Organopromoter, 2‐aminoethanesulfonic acid was used to catalyze the synthesis of a series of structurally intriguing new hybrids thiazolyl acridine‐1,8(2H,5H)‐diones and dihydropyrido[2,3‐d : 6,5‐d′]dipyrimidine‐2,4,6,8(1H,3H,5H,7H)‐tetraones for the first time. 2‐Aminoethanesulfonic acid is a biobased organopromoter, used to generate four new bonds for the synthesis of new coupled thiazole‐based decahydroacridine‐1,8‐diones. Superior green credentials, operational simplicity, easy work‐up and recyclability of the catalyst are the key strengths of this method. The broad substrate scope, mild reaction conditions, short reaction time, cost effectiveness, high atom economy and good to excellent yields make the present method a distinct improvement over existing methods. Spectral (IR, 1H‐NMR,13C‐NMR, Mass) data and elemental analyses confirmed the structures of the titled products. A series of thiazolyl acridine‐1,8(2H,5H)‐diones and dihydropyrido[2,3‐d : 6,5‐d′]dipyrimidine‐2,4,6,8(1H,3H,5H,7H)‐tetraones were screened for their antimicrobial activity against four bacterial and three fungal strains.  相似文献   

16.
17.
Maria Masotti  Bin Guo  Baolin Wu 《Biometrics》2019,75(4):1076-1085
Genetic variants associated with disease outcomes can be used to develop personalized treatment. To reach this precision medicine goal, hundreds of large‐scale genome‐wide association studies (GWAS) have been conducted in the past decade to search for promising genetic variants associated with various traits. They have successfully identified tens of thousands of disease‐related variants. However, in total these identified variants explain only part of the variation for most complex traits. There remain many genetic variants with small effect sizes to be discovered, which calls for the development of (a) GWAS with more samples and more comprehensively genotyped variants, for example, the NHLBI Trans‐Omics for Precision Medicine (TOPMed) Program is planning to conduct whole genome sequencing on over 100 000 individuals; and (b) novel and more powerful statistical analysis methods. The current dominating GWAS analysis approach is the “single trait” association test, despite the fact that many GWAS are conducted in deeply phenotyped cohorts including many correlated and well‐characterized outcomes, which can help improve the power to detect novel variants if properly analyzed, as suggested by increasing evidence that pleiotropy, where a genetic variant affects multiple traits, is the norm in genome‐phenome associations. We aim to develop pleiotropy informed powerful association test methods across multiple traits for GWAS. Since it is generally very hard to access individual‐level GWAS phenotype and genotype data for those existing GWAS, due to privacy concerns and various logistical considerations, we develop rigorous statistical methods for pleiotropy informed adaptive multitrait association test methods that need only summary association statistics publicly available from most GWAS. We first develop a pleiotropy test, which has powerful performance for truly pleiotropic variants but is sensitive to the pleiotropy assumption. We then develop a pleiotropy informed adaptive test that has robust and powerful performance under various genetic models. We develop accurate and efficient numerical algorithms to compute the analytical P‐value for the proposed adaptive test without the need of resampling or permutation. We illustrate the performance of proposed methods through application to joint association test of GWAS meta‐analysis summary data for several glycemic traits. Our proposed adaptive test identified several novel loci missed by individual trait based GWAS meta‐analysis. All the proposed methods are implemented in a publicly available R package.  相似文献   

18.
Recent advances in high‐throughput sequencing technologies provide opportunities to gain novel insights into the genetic basis of phenotypic trait variation. Yet to date, progress in our understanding of genotype–phenotype associations in nonmodel organisms in general and natural vertebrate populations in particular has been hampered by small sample sizes typically available for wildlife populations and a resulting lack of statistical power, as well as a limited ability to control for false‐positive signals. Here we propose to combine a genome‐wide association study (GWAS) and FST‐based approach with population‐level replication to partly overcome these limitations. We present a case study in which we used this approach in combination with genotyping‐by‐sequencing (GBS) single nucleotide polymorphism (SNP) data to identify genomic regions associated with Borrelia afzelii resistance or susceptibility in the natural rodent host of this Lyme disease‐causing spirochete, the bank vole (Myodes glareolus). Using this combined approach we identified four consensus SNPs located in exonic regions of the genes Slc26a4, Tns3, Wscd1 and Espnl, which were significantly associated with the voles’ Borrelia infectious status within and across populations. Functional links between host responses to bacterial infections and most of these genes have previously been demonstrated in other rodent systems, making them promising new candidates for the study of evolutionary host responses to Borrelia emergence. Our approach is applicable to other systems and may facilitate the identification of genetic variants underlying disease resistance or susceptibility, as well as other ecologically relevant traits, in wildlife populations.  相似文献   

19.
Summary Ye, Lin, and Taylor (2008, Biometrics 64 , 1238–1246) proposed a joint model for longitudinal measurements and time‐to‐event data in which the longitudinal measurements are modeled with a semiparametric mixed model to allow for the complex patterns in longitudinal biomarker data. They proposed a two‐stage regression calibration approach that is simpler to implement than a joint modeling approach. In the first stage of their approach, the mixed model is fit without regard to the time‐to‐event data. In the second stage, the posterior expectation of an individual's random effects from the mixed‐model are included as covariates in a Cox model. Although Ye et al. (2008) acknowledged that their regression calibration approach may cause a bias due to the problem of informative dropout and measurement error, they argued that the bias is small relative to alternative methods. In this article, we show that this bias may be substantial. We show how to alleviate much of this bias with an alternative regression calibration approach that can be applied for both discrete and continuous time‐to‐event data. Through simulations, the proposed approach is shown to have substantially less bias than the regression calibration approach proposed by Ye et al. (2008) . In agreement with the methodology proposed by Ye et al. (2008) , an advantage of our proposed approach over joint modeling is that it can be implemented with standard statistical software and does not require complex estimation techniques.  相似文献   

20.
DNA microarrays have been acknowledged to represent a promising approach for the detection of viral pathogens. However, the probes designed for current arrays could cover only part of the given viral variants, that could result in false-negative or ambiguous data. If all the variants are to be covered, the requirement for more probes would render much higher spot density and thus higher cost of the arrays. Here we have developed a new strategy for oligonucleotide probe design. Using type I human immunodeficiency virus (HIV-1) tat gene as an example, we designed the array probes and validated the optimized parameters in silico. Results show that the oligo number is significantly reduced comparing with the existing methods, while specificity and hybridization efficiency remain intact. The adoption of this method in reducing the oligo numbers could increase the detection capacity for DNA microarrays, and would significantly lower the manufacturing cost for making array chips. These authors contribute equally to the work.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号