首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Guan Y  Yan J  Sinha R 《Biometrics》2011,67(3):711-718
This article is concerned with variance estimation for statistics that are computed from single recurrent event processes. Such statistics are important in diagnosis for each individual recurrent event process. The proposed method only assumes a semiparametric form for the first-order structure of the processes but not for the second-order (i.e., dependence) structure. The new variance estimator is shown to be consistent for the target parameter under very mild conditions. The estimator can be used in many applications in semiparametric rate regression analysis of recurrent event data such as outlier detection, residual diagnosis, as well as robust regression. A simulation study and application to two real data examples are used to demonstrate the use of the proposed method.  相似文献   

2.
The spatial signature of microevolutionary processes structuring genetic variation may play an important role in the detection of loci under selection. However, the spatial location of samples has not yet been used to quantify this. Here, we present a new two‐step method of spatial outlier detection at the individual and deme levels using the power spectrum of Moran eigenvector maps (MEM). The MEM power spectrum quantifies how the variation in a variable, such as the frequency of an allele at a SNP locus, is distributed across a range of spatial scales defined by MEM spatial eigenvectors. The first step (Moran spectral outlier detection: MSOD) uses genetic and spatial information to identify outlier loci by their unusual power spectrum. The second step uses Moran spectral randomization (MSR) to test the association between outlier loci and environmental predictors, accounting for spatial autocorrelation. Using simulated data from two published papers, we tested this two‐step method in different scenarios of landscape configuration, selection strength, dispersal capacity and sampling design. Under scenarios that included spatial structure, MSOD alone was sufficient to detect outlier loci at the individual and deme levels without the need for incorporating environmental predictors. Follow‐up with MSR generally reduced (already low) false‐positive rates, though in some cases led to a reduction in power. The results were surprisingly robust to differences in sample size and sampling design. Our method represents a new tool for detecting potential loci under selection with individual‐based and population‐based sampling by leveraging spatial information that has hitherto been neglected.  相似文献   

3.
We study statistical methods to detect cancer genes that are over- or down-expressed in some but not all samples in a disease group. This has proven useful in cancer studies where oncogenes are activated only in a small subset of samples. We propose the outlier robust t-statistic (ORT), which is intuitively motivated from the t-statistic, the most commonly used differential gene expression detection method. Using real and simulation studies, we compare the ORT to the recently proposed cancer outlier profile analysis (Tomlins and others, 2005) and the outlier sum statistic of Tibshirani and Hastie (2006). The proposed method often has more detection power and smaller false discovery rates. Supplementary information can be found at http://www.biostat.umn.edu/~baolin/research/ort.html.  相似文献   

4.
We propose a new statistics for the detection of differentially expressed genes when the genes are activated only in a subset of the samples. Statistics designed for this unconventional circumstance has proved to be valuable for most cancer studies, where oncogenes are activated for a small number of disease samples. Previous efforts made in this direction include cancer outlier profile analysis (Tomlins and others, 2005), outlier sum (Tibshirani and Hastie, 2007), and outlier robust t-statistics (Wu, 2007). We propose a new statistics called maximum ordered subset t-statistics (MOST) which seems to be natural when the number of activated samples is unknown. We compare MOST to other statistics and find that the proposed method often has more power then its competitors.  相似文献   

5.
The discrete data structure and large sequencing depth of RNA sequencing (RNA-seq) experiments can often generate outlier read counts in one or more RNA samples within a homogeneous group. Thus, how to identify and manage outlier observations in RNA-seq data is an emerging topic of interest. One of the main objectives in these research efforts is to develop statistical methodology that effectively balances the impact of outlier observations and achieves maximal power for statistical testing. To reach that goal, strengthening the accuracy of outlier detection is an important precursor. Current outlier detection algorithms for RNA-seq data are executed within a testing framework and may be sensitive to sparse data and heavy-tailed distributions. Therefore, we propose a univariate algorithm that utilizes a probabilistic approach to measure the deviation between an observation and the distribution generating the remaining data and implement it within in an iterative leave-one-out design strategy. Analyses of real and simulated RNA-seq data show that the proposed methodology has higher outlier detection rates for both non-normalized and normalized negative binomial distributed data.  相似文献   

6.
Outlier detection and environmental association analysis are common methods to search for loci or genomic regions exhibiting signals of adaptation to environmental factors. However, a validation of outlier loci and corresponding allele distribution models through functional molecular biology or transplant/common garden experiments is rarely carried out. Here, we employ another method for validation, namely testing outlier loci in specifically designed, independent data sets. Previously, an outlier locus associated with three different habitat types had been detected in Arabis alpina. For the independent validation data set, we sampled 30 populations occurring in these three habitat types across five biogeographic regions of the Swiss Alps. The allele distribution model found in the original study could not be validated in the independent test data set: The outlier locus was no longer indicative of habitat‐mediated selection. We propose several potential causes of this failure of validation, of which unaccounted genetic structure and technical issues in the original data set used to detect the outlier locus were most probable. Thus, our study shows that validating outlier loci and allele distribution models in independent data sets is a helpful tool in ecological genomics which, in the case of positive validation, adds confidence to outlier loci and their association with environmental factors or, in the case of failure of validation, helps to explain inconsistencies.  相似文献   

7.

Background

Identifying genes that are differentially expressed (DE) between two or more conditions with multiple patterns of expression is one of the primary objectives of gene expression data analysis. Several statistical approaches, including one-way analysis of variance (ANOVA), are used to identify DE genes. However, most of these methods provide misleading results for two or more conditions with multiple patterns of expression in the presence of outlying genes. In this paper, an attempt is made to develop a hybrid one-way ANOVA approach that unifies the robustness and efficiency of estimation using the minimum β-divergence method to overcome some problems that arise in the existing robust methods for both small- and large-sample cases with multiple patterns of expression.

Results

The proposed method relies on a β-weight function, which produces values between 0 and 1. The β-weight function with β = 0.2 is used as a measure of outlier detection. It assigns smaller weights (≥ 0) to outlying expressions and larger weights (≤ 1) to typical expressions. The distribution of the β-weights is used to calculate the cut-off point, which is compared to the observed β-weight of an expression to determine whether that gene expression is an outlier. This weight function plays a key role in unifying the robustness and efficiency of estimation in one-way ANOVA.

Conclusion

Analyses of simulated gene expression profiles revealed that all eight methods (ANOVA, SAM, LIMMA, EBarrays, eLNN, KW, robust BetaEB and proposed) perform almost identically for m = 2 conditions in the absence of outliers. However, the robust BetaEB method and the proposed method exhibited considerably better performance than the other six methods in the presence of outliers. In this case, the BetaEB method exhibited slightly better performance than the proposed method for the small-sample cases, but the the proposed method exhibited much better performance than the BetaEB method for both the small- and large-sample cases in the presence of more than 50% outlying genes. The proposed method also exhibited better performance than the other methods for m > 2 conditions with multiple patterns of expression, where the BetaEB was not extended for this condition. Therefore, the proposed approach would be more suitable and reliable on average for the identification of DE genes between two or more conditions with multiple patterns of expression.  相似文献   

8.
High voltage electron microscopy of intact cells prepared by the critical point drying (CPD) procedure has become an important tool in the study of three-dimensional relationships between cytoplasmic organelles. It has been claimed that critical point-dried specimens reveal a structure that is not visible in sections of plastic-embedded material; it has also been claimed that this structure, in association with known cytoplasmic filaments, forms a meshwork of tapering threads ("microtrabecular lattice"). Alternatively, this structure might be a surface tension artifact produced during CPD. To test possible sources of artifacts during CPD, model fiber systems of known structure were used. It was found that traces of water or ethanol in the CO2 caused distortions and fusion of fibers in pure muscle actin, fibrin, collagen, chromatin, and microtubules that produce a structure very similar to the proposed "microtrabecular lattice." These structures were, however, well preserved if water and ethanol were totally excluded from the CO2. The same results were obtained with whole mounts of cultured cells. A "microtrabecular lattice" was obtained if some water or ethanol was present in the pressure chamber. On the other hand, when water or ethanol were totally excluded from the CO2 during CPD, cytoplasmic filaments were uniform in thickness similar to their appearance in sections of plastic-embedded cells. It is concluded that the "microtrabecular lattice" is a distorted image of the cytoplasmic filament network produced during CPD by traces of water or ethanol in the CO2.  相似文献   

9.
Two important and not yet solved problems in bacterial genome research are the identification of horizontally transferred genes and the prediction of gene expression levels. Both problems can be addressed by multivariate analysis of codon usage data. In particular dimensionality reduction methods for visualization of multivariate data have shown to be effective tools for codon usage analysis. We here propose a multidimensional scaling approach using a novel similarity measure for codon usage tables. Our probabilistic similarity measure is based on P-values derived from the well-known chi-square test for comparison of two distributions. Experimental results on four microbial genomes indicate that the new method is well-suited for the analysis of horizontal gene transfer and translational selection. As compared with the widely-used correspondence analysis, our method did not suffer from outlier sensitivity and showed a better clustering of putative alien genes in most cases.  相似文献   

10.
Due to the high sensitivity of diffusion tensor imaging (DTI) to physiological motion, clinical DTI scans often suffer a significant amount of artifacts. Tensor-fitting-based, post-processing outlier rejection is often used to reduce the influence of motion artifacts. Although it is an effective approach, when there are multiple corrupted data, this method may no longer correctly identify and reject the corrupted data. In this paper, we introduce a new criterion called “corrected Inter-Slice Intensity Discontinuity” (cISID) to detect motion-induced artifacts. We compared the performance of algorithms using cISID and other existing methods with regard to artifact detection. The experimental results show that the integration of cISID into fitting-based methods significantly improves the retrospective detection performance at post-processing analysis. The performance of the cISID criterion, if used alone, was inferior to the fitting-based methods, but cISID could effectively identify severely corrupted images with a rapid calculation time. In the second part of this paper, an outlier rejection scheme was implemented on a scanner for real-time monitoring of image quality and reacquisition of the corrupted data. The real-time monitoring, based on cISID and followed by post-processing, fitting-based outlier rejection, could provide a robust environment for routine DTI studies.  相似文献   

11.
In this work we are studying whether calcium phosphate deposition (CPD) during vascular calcification is a passive or a cell-mediated mechanism. Passive CPD was studied in fixed vascular smooth muscle cells (VSMC), which calcify faster than live cells in the presence of 1.8 mM Ca2(+) and 2 mM P(i). CPD seems to be a cell-independent process that depends on the concentration of calcium, phosphate, and hydroxyl ions, but not on Ca × P(i) concentration products, given that deposition is obtained with 2 × 2 and 4 × 1 Ca × P(i) mM2 but not with 2 × 1 or 1 × 4 Ca × P(i) mM2. Incubation with 4 mM P(i) without CPD (i.e., plus 1 mM Ca) does not induce osteogene expression. Increased expression of bone markers such as Bmp2 and Cbfa1 is only observed concomitantly with CPD. Hydroxyapatite is the only crystalline phase in both lysed and live cells. Lysed cell deposits are highly crystalline, whereas live cell deposits still contain large amounts of amorphous calcium. High-resolution transmission electron microscopy revealed a nanostructure of rounded crystallites of 5-10 nm oriented at random in lysed cells, which is compatible with spontaneous precipitation. The nanostructure in live cells consisted of long fiber crystals, 10-nm thick, embedded in an amorphous matrix. This structure indicates an active role of cells in the process of hydroxyapatite crystallization. In conclusion, our data suggest that CPD is a passive phenomenon, which triggers the osteogenic changes that are involved in the formation of a well organized, calcified crystalline structure.  相似文献   

12.
Vibrio cholerae secretes a large virulence-associated multifunctional autoprocessing RTX toxin (MARTX(Vc)). Autoprocessing of this toxin by an embedded cysteine protease domain (CPD) is essential for this toxin to induce actin depolymerization in a broad range of cell types. A homologous CPD is also present in the large clostridial toxin TcdB and recent studies showed that inositol hexakisphosphate (Ins(1,2,3,4,5,6)P(6) or InsP(6)) stimulated the autoprocessing of TcdB dependent upon the CPD (Egerer, M., Giesemann, T., Jank, T., Satchell, K. J., and Aktories, K. (2007) J. Biol. Chem. 282, 25314-25321). In this work, the autoprocessing activity of the CPD within MARTX(Vc) is similarly found to be inducible by InsP(6). The CPD is shown to bind InsP(6) (K(d), 0.6 microm), and InsP(6) is shown to stimulate intramolecular autoprocessing at both physiological concentrations and as low as 0.01 microm. Processed CPD did not bind InsP(6) indicating that, subsequent to cleavage, the activated CPD may shift to an inactive conformation. To further pursue the mechanism of autoprocessing, conserved residues among 24 identified CPDs were mutagenized. In addition to cysteine and histidine residues that form the catalytic site, 2 lysine residues essential for InsP(6) binding and 5 lysine and arginine residues resulting in loss of activity at low InsP(6) concentrations were identified. Overall, our data support a model in which basic residues located across the CPD structure form an InsP(6) binding pocket and that the binding of InsP(6) stimulates processing by altering the CPD to an activated conformation. After processing, InsP(6) is shown to be recycled, while the cleaved CPD becomes incapable of further binding of InsP(6).  相似文献   

13.
Clegg LX  Cai J  Sen PK 《Biometrics》1999,55(3):805-812
In multivariate failure time data analysis, a marginal regression modeling approach is often preferred to avoid assumptions on the dependence structure among correlated failure times. In this paper, a marginal mixed baseline hazards model is introduced. Estimating equations are proposed for the estimation of the marginal hazard ratio parameters. The proposed estimators are shown to be consistent and asymptotically Gaussian with a robust covariance matrix that can be consistently estimated. Simulation studies indicate the adequacy of the proposed methodology for practical sample sizes. The methodology is illustrated with a data set from the Framingham Heart Study.  相似文献   

14.
We present a probabilistic registration algorithm that robustly solves the problem of rigid-body alignment between two shapes with high accuracy, by aptly modeling measurement noise in each shape, whether isotropic or anisotropic. For point-cloud shapes, the probabilistic framework additionally enables modeling locally-linear surface regions in the vicinity of each point to further improve registration accuracy. The proposed Iterative Most-Likely Point (IMLP) algorithm is formed as a variant of the popular Iterative Closest Point (ICP) algorithm, which iterates between point-correspondence and point-registration steps. IMLP’s probabilistic framework is used to incorporate a generalized noise model into both the correspondence and the registration phases of the algorithm, hence its name as a most-likely point method rather than a closest-point method. To efficiently compute the most-likely correspondences, we devise a novel search strategy based on a principal direction (PD)-tree search. We also propose a new approach to solve the generalized total-least-squares (GTLS) sub-problem of the registration phase, wherein the point correspondences are registered under a generalized noise model. Our GTLS approach has improved accuracy, efficiency, and stability compared to prior methods presented for this problem and offers a straightforward implementation using standard least squares. We evaluate the performance of IMLP relative to a large number of prior algorithms including ICP, a robust variant on ICP, Generalized ICP (GICP), and Coherent Point Drift (CPD), as well as drawing close comparison with the prior anisotropic registration methods of GTLS-ICP and A-ICP. The performance of IMLP is shown to be superior with respect to these algorithms over a wide range of noise conditions, outliers, and misalignments using both mesh and point-cloud representations of various shapes.  相似文献   

15.
The surface water temperature is a vital ecological and climate variable, and its monitoring is critical. An extensive sensor network measures the ocean, but outliers pervade the monitoring data due to the sudden change in the water surface level. No single algorithm can identify the outlier efficiently. Hence, this work aims to propose and evaluate the performance of three statistical-based outlier detection algorithms for the water surface temperature: 1) the Standard Z-Score method, 2) the Modified Z-Score coupled with decomposition, and 3) the Exponential Moving Average with the Coupled Modified Z-Score and decomposition. A threshold was set to flag the outlier values. The models' performance was evaluated using the F-score method. Results showed that an increase in outlier detection might reduce the precision of identifying the actual outlier. Based on the results, the Exponential Moving Average with the Modified Z-Score gave the highest F-score value (= 0.83) compared to the other two individual methods. Therefore, this proposed algorithm is recommended to detect outliers efficiently in large surface water temperature datasets.  相似文献   

16.
Small area estimation with M‐quantile models was proposed by Chambers and Tzavidis ( 2006 ). The key target of this approach to small area estimation is to obtain reliable and outlier robust estimates avoiding at the same time the need for strong parametric assumptions. This approach, however, does not allow for the use of unit level survey weights, making questionable the design consistency of the estimators unless the sampling design is self‐weighting within small areas. In this paper, we adopt a model‐assisted approach and construct design consistent small area estimators that are based on the M‐quantile small area model. Analytic and bootstrap estimators of the design‐based variance are discussed. The proposed estimators are empirically evaluated in the presence of complex sampling designs.  相似文献   

17.
Phenotypic divergences between modern human populations have developed as a result of genetic adaptation to local environments over the past 100,000 years. To identify genes involved in population-specific phenotypes, it is necessary to detect signatures of recent positive selection in the human genome. Although detection of elongated linkage disequilibrium (LD) has been a powerful tool in the field of evolutionary genetics, current LD-based approaches are not applicable to already fixed loci. Here, we report a method of scanning for population-specific strong selective sweeps that have reached fixation. In this method, genome-wide SNP data is used to analyze differences in the haplotype frequency, nucleotide diversity, and LD between populations, using the ratio of haplotype homozygosity between populations. To estimate the detection power of the statistics used in this study, we performed computer simulations and found that these tests are relatively robust against the density of typed SNPs and demographic parameters if the advantageous allele has reached fixation. Therefore, we could determine the threshold for maintaining high detection power, regardless of SNP density and demographic history. When this method was applied to the HapMap data, it was able to identify the candidates of population-specific strong selective sweeps more efficiently than the outlier approach that depends on the empirical distribution. This study, confirming strong positive selection on genes previously reported to be associated with specific phenotypes, also identifies other candidates that are likely to contribute to phenotypic differences between human populations.  相似文献   

18.
The cyclobutane pyrimidine dimer (CPD) is a major type of DNA damage induced by ultraviolet B (UVB) radiation. CPD photolyase, which absorbs blue/UVA light as an energy source to monomerize dimers, is a crucial factor for determining the sensitivity of rice (Oryza sativa) to UVB radiation. Here, we purified native class II CPD photolyase from rice leaves. As the final purification step, CPD photolyase was bound to CPD-containing DNA conjugated to magnetic beads and then released by blue-light irradiation. The final purified fraction contained 54- and 56-kD proteins, whereas rice CPD photolyase expressed from Escherichia coli was a single 55-kD protein. Western-blot analysis using anti-rice CPD photolyase antiserum suggested that both the 54- and 56-kD proteins were the CPD photolyase. Treatment with protein phosphatase revealed that the 56-kD native rice CPD photolyase was phosphorylated, whereas the E. coli-expressed rice CPD photolyase was not. The purified native rice CPD photolyase also had significantly higher CPD photorepair activity than the E. coli-expressed CPD photolyase. According to the absorption, emission, and excitation spectra, the purified native rice CPD photolyase possesses both a pterin-like chromophore and an FAD chromophore. The binding activity of the native rice CPD photolyase to thymine dimers was higher than that of the E. coli-expressed CPD photolyase. These results suggest that the structure of the native rice CPD photolyase differs significantly from that of the E. coli-expressed rice CPD photolyase, and the structural modification of the native CPD photolyase leads to higher activity in rice.  相似文献   

19.
UV exposure of DNA molecules induces serious DNA lesions. The cyclobutane pyrimidine dimer (CPD) photolyase repairs CPD-type - lesions by using the energy of visible light. Two chromophores for different roles have been found in this enzyme family; one catalyzes the CPD repair reaction and the other works as an antenna pigment that harvests photon energy. The catalytic cofactor of all known photolyases is FAD, whereas several light-harvesting cofactors are found. Currently, 5,10-methenyltetrahydrofolate (MTHF), 8-hydroxy-5-deaza-riboflavin (8-HDF) and FMN are the known light-harvesting cofactors, and some photolyases lack the chromophore. Three crystal structures of photolyases from Escherichia coli (Ec-photolyase), Anacystis nidulans (An-photolyase), and Thermus thermophilus (Tt-photolyase) have been determined; however, no archaeal photolyase structure is available. A similarity search of archaeal genomic data indicated the presence of a homologous gene, ST0889, on Sulfolobus tokodaii strain7. An enzymatic assay reveals that ST0889 encodes photolyase from S. tokodaii (St-photolyase). We have determined the crystal structure of the St-photolyase protein to confirm its structural features and to investigate the mechanism of the archaeal DNA repair system with light energy. The crystal structure of the St-photolyase is superimposed very well on the three known photolyases including the catalytic cofactor FAD. Surprisingly, another FAD molecule is found at the position of the light-harvesting cofactor. This second FAD molecule is well accommodated in the crystal structure, suggesting that FAD works as a novel light-harvesting cofactor of photolyase. In addition, two of the four CPD recognition residues in the crystal structure of An-photolyase are not found in St-photolyase, which might utilize a different mechanism to recognize the CPD from that of An-photolyase.  相似文献   

20.
Ordered categorical data can be analysed using correspondence analysis with the ordered categories taken into consideration. Such an analysis was proposed by Beh (1997) and uses orthogonal polynomials which require the input of a scoring scheme to reflect the ordered structure of the categories. This method of correspondence analysis visualises the relationship between the categories, in terms of the location, dispersion and higher order components. The impact of the scoring method on the orthogonal polynomials, and hence upon the correspondence plot and other output of the analysis should therefore be considered. This paper aims at identifying this impact by considering four scoring schemes: integer valued (natural) scores, midrank scores, Nishisato scores and singular vectors from the classical correspondence analysis of the data. It is shown that while the latter two maximise the location component, generally there is little difference when comparing them with the output of the former two scoring schemes. A simple comparative study of profile co-ordinates using different scoring schemes is also discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号