期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Investigations into resting-state connectivity using independent component analysis

Beckmann CF DeLuca M Devlin JT Smith SM 《Philosophical transactions of the Royal Society of London. Series B, Biological sciences》2005,360(1457):1001-1013

Inferring resting-state connectivity patterns from functional magnetic resonance imaging (fMRI) data is a challenging task for any analytical technique. In this paper, we review a probabilistic independent component analysis (PICA) approach, optimized for the analysis of fMRI data, and discuss the role which this exploratory technique can take in scientific investigations into the structure of these effects. We apply PICA to fMRI data acquired at rest, in order to characterize the spatio-temporal structure of such data, and demonstrate that this is an effective and robust tool for the identification of low-frequency resting-state patterns from data acquired at various different spatial and temporal resolutions. We show that these networks exhibit high spatial consistency across subjects and closely resemble discrete cortical functional networks such as visual cortical areas or sensory-motor cortex. 相似文献

2.

An integrated-likelihood method for estimating genetic differentiation between populations

Kitakado T Kitada S Kishino H Skaug HJ 《Genetics》2006,173(4):2073-2082

The aim of this article is to develop an integrated-likelihood (IL) approach to estimate the genetic differentiation between populations. The conventional maximum-likelihood (ML) and pseudolikelihood (PL) methods that use sample counts of alleles may cause severe underestimations of FST, which means overestimations of theta=4Nm, when the number of sampling localities is small. To reduce such bias in the estimation of genetic differentiation, we propose an IL method in which the mean allele frequencies over populations are regarded as nuisance parameters and are eliminated by integration. To maximize the IL function, we have developed two algorithms, a Monte Carlo EM algorithm and a Laplace approximation. Our simulation studies show that the method proposed here outperforms the conventional ML and PL methods in terms of unbiasedness and precision. The IL method was applied to real data for Pacific herring and African elephants. 相似文献

3.

Efficient approximations for learning phylogenetic HMM models from data

Jojic V Jojic N Meek C Geiger D Siepel A Haussler D Heckerman D 《Bioinformatics (Oxford, England)》2004,20(Z1):i161-i168

MOTIVATION: We consider models useful for learning an evolutionary or phylogenetic tree from data consisting of DNA sequences corresponding to the leaves of the tree. In particular, we consider a general probabilistic model described in Siepel and Haussler that we call the phylogenetic-HMM model which generalizes the classical probabilistic models of Neyman and Felsenstein. Unfortunately, computing the likelihood of phylogenetic-HMM models is intractable. We consider several approximations for computing the likelihood of such models including an approximation introduced in Siepel and Haussler, loopy belief propagation and several variational methods. RESULTS: We demonstrate that, unlike the other approximations, variational methods are accurate and are guaranteed to lower bound the likelihood. In addition, we identify a particular variational approximation to be best-one in which the posterior distribution is variationally approximated using the classic Neyman-Felsenstein model. The application of our best approximation to data from the cystic fibrosis transmembrane conductance regulator gene region across nine eutherian mammals reveals a CpG effect. 相似文献

4.

Robust Data Driven Model Order Estimation for Independent Component Analysis of fMRI Data with Low Contrast to Noise

Waqas Majeed Malcolm J. Avison 《PloS one》2014,9(4)

Independent component analysis (ICA) has been successfully utilized for analysis of functional MRI (fMRI) data for task related as well as resting state studies. Although it holds the promise of becoming an unbiased data-driven analysis technique, a few choices have to be made prior to performing ICA, selection of a method for determining the number of independent components (nIC) being one of them. Choice of nIC has been shown to influence the ICA maps, and various approaches (mostly relying on information theoretic criteria) have been proposed and implemented in commonly used ICA analysis packages, such as MELODIC and GIFT. However, there has been no consensus on the optimal method for nIC selection, and many studies utilize arbitrarily chosen values for nIC. Accurate and reliable determination of true nIC is especially important in the setting where the signals of interest contribute only a small fraction of the total variance, i.e. very low contrast-to-noise ratio (CNR), and/or very focal response. In this study, we evaluate the performance of different model order selection criteria and demonstrate that the model order selected based upon bootstrap stability of principal components yields more reliable and accurate estimates of model order. We then demonstrate the utility of this fully data-driven approach to detect weak and focal stimulus-driven responses in real data. Finally, we compare the performance of different multi-run ICA approaches using pseudo-real data. 相似文献

5.

Evolutionary Triplet Models of Structured RNA

下载免费PDF全文

Robert K. Bradley Ian Holmes 《PLoS computational biology》2009,5(8)

The reconstruction and synthesis of ancestral RNAs is a feasible goal for paleogenetics. This will require new bioinformatics methods, including a robust statistical framework for reconstructing histories of substitutions, indels and structural changes. We describe a “transducer composition” algorithm for extending pairwise probabilistic models of RNA structural evolution to models of multiple sequences related by a phylogenetic tree. This algorithm draws on formal models of computational linguistics as well as the 1985 protosequence algorithm of David Sankoff. The output of the composition algorithm is a multiple-sequence stochastic context-free grammar. We describe dynamic programming algorithms, which are robust to null cycles and empty bifurcations, for parsing this grammar. Example applications include structural alignment of non-coding RNAs, propagation of structural information from an experimentally-characterized sequence to its homologs, and inference of the ancestral structure of a set of diverged RNAs. We implemented the above algorithms for a simple model of pairwise RNA structural evolution; in particular, the algorithms for maximum likelihood (ML) alignment of three known RNA structures and a known phylogeny and inference of the common ancestral structure. We compared this ML algorithm to a variety of related, but simpler, techniques, including ML alignment algorithms for simpler models that omitted various aspects of the full model and also a posterior-decoding alignment algorithm for one of the simpler models. In our tests, incorporation of basepair structure was the most important factor for accurate alignment inference; appropriate use of posterior-decoding was next; and fine details of the model were least important. Posterior-decoding heuristics can be substantially faster than exact phylogenetic inference, so this motivates the use of sum-over-pairs heuristics where possible (and approximate sum-over-pairs). For more exact probabilistic inference, we discuss the use of transducer composition for ML (or MCMC) inference on phylogenies, including possible ways to make the core operations tractable. 相似文献

6.

Reproducibility assessment of independent component analysis of expression ratios from DNA microarrays

Kreil DP MacKay DJ 《Comparative and Functional Genomics》2003,4(3):300-317

相似文献

7.

Local linear independent component analysis based on clustering

Karhunen J Mălăroiu S Ilmoniemi M 《International journal of neural systems》2000,10(6):439-451

相似文献

8.

Bivariate Mixed Effects Analysis of Clustered Data with Large Cluster Sizes

Daowen Zhang Jie Lena Sun Karen Pieper 《Statistics in biosciences》2016,8(2):220-233

Linear mixed effects models are widely used to analyze a clustered response variable. Motivated by a recent study to examine and compare the hospital length of stay (LOS) between patients undertaking percutaneous coronary intervention (PCI) and coronary artery bypass graft (CABG) from several international clinical trials, we proposed a bivariate linear mixed effects model for the joint modeling of clustered PCI and CABG LOSs where each clinical trial is considered a cluster. Due to the large number of patients in some trials, commonly used commercial statistical software for fitting (bivariate) linear mixed models failed to run since it could not allocate enough memory to invert large dimensional matrices during the optimization process. We consider ways to circumvent the computational problem in the maximum likelihood (ML) inference and restricted maximum likelihood (REML) inference. Particularly, we developed an expected and maximization (EM) algorithm for the REML inference and presented an ML implementation using existing software. The new REML EM algorithm is easy to implement and computationally stable and efficient. With this REML EM algorithm, we could analyze the LOS data and obtained meaningful results. 相似文献

9.

An approximate maximum likelihood approach, applied to phylogenetic trees.

Henrik J?nsson Bo S?derberg 《Journal of computational biology》2003,10(5):737-749

A novel type of approximation scheme to the maximum likelihood (ML) approach is presented and discussed in the context of phylogenetic tree reconstruction from aligned DNA sequences. It is based on a parameterized approximation to the conditional distribution of hidden variables (related, e.g., to the sequences of unobserved branch point ancestors) given the observed data. A modified likelihood, based on the extended data, is then maximized with respect to the parameters of the model as well as to those involved in the approximation. With a suitable form of the approximation, the proposed method allows for simpler updating of the parameters, at the cost of an increased parameter count and a slight decrease in performance. The method is tested on phylogenetic tree reconstruction from artificially generated sequences, and its performance is compared to that of ML, showing that the approach is competitive for reasonably similar sequences. The method is also applied to real DNA sequences from primates, yielding a result consistent with those obtained by other standard algorithms. 相似文献

10.

A Hybrid EM and Monte Carlo EM Algorithm and Its Application to Analysis of Transmission of Infectious Diseases

Yang Yang Ira M. Longini Jr. M. Elizabeth Halloran Valerie Obenchain 《Biometrics》2012,68(4):1238-1249

Summary In epidemics of infectious diseases such as influenza, an individual may have one of four possible final states: prior immune, escaped from infection, infected with symptoms, and infected asymptomatically. The exact state is often not observed. In addition, the unobserved transmission times of asymptomatic infections further complicate analysis. Under the assumption of missing at random, data‐augmentation techniques can be used to integrate out such uncertainties. We adapt an importance‐sampling‐based Monte Carlo Expectation‐Maximization (MCEM) algorithm to the setting of an infectious disease transmitted in close contact groups. Assuming the independence between close contact groups, we propose a hybrid EM‐MCEM algorithm that applies the MCEM or the traditional EM algorithms to each close contact group depending on the dimension of missing data in that group, and discuss the variance estimation for this practice. In addition, we propose a bootstrap approach to assess the total Monte Carlo error and factor that error into the variance estimation. The proposed methods are evaluated using simulation studies. We use the hybrid EM‐MCEM algorithm to analyze two influenza epidemics in the late 1970s to assess the effects of age and preseason antibody levels on the transmissibility and pathogenicity of the viruses. 相似文献

11.

Learning Rates and States from Biophysical Time Series: A Bayesian Approach to Model Selection and Single-Molecule FRET Data

Jonathan E. Bronson Jake M. Hofman Chris H. Wiggins 《Biophysical journal》2009,97(12):3196-3205

Time series data provided by single-molecule Förster resonance energy transfer (smFRET) experiments offer the opportunity to infer not only model parameters describing molecular complexes, e.g., rate constants, but also information about the model itself, e.g., the number of conformational states. Resolving whether such states exist or how many of them exist requires a careful approach to the problem of model selection, here meaning discrimination among models with differing numbers of states. The most straightforward approach to model selection generalizes the common idea of maximum likelihood—selecting the most likely parameter values—to maximum evidence: selecting the most likely model. In either case, such an inference presents a tremendous computational challenge, which we here address by exploiting an approximation technique termed variational Bayesian expectation maximization. We demonstrate how this technique can be applied to temporal data such as smFRET time series; show superior statistical consistency relative to the maximum likelihood approach; compare its performance on smFRET data generated from experiments on the ribosome; and illustrate how model selection in such probabilistic or generative modeling can facilitate analysis of closely related temporal data currently prevalent in biophysics. Source code used in this analysis, including a graphical user interface, is available open source via http://vbFRET.sourceforge.net. 相似文献

12.

Use of runs statistics for pattern recognition in genomic DNA sequences. 总被引：2，自引：0，他引：2

Leo Wang-Kit Cheung 《Journal of computational biology》2004,11(1):107-124

In this article, the use of the finite Markov chain imbedding (FMCI) technique to study patterns in DNA under a hidden Markov model (HMM) is introduced. With a vision of studying multiple runs-related statistics simultaneously under an HMM through the FMCI technique, this work establishes an investigation of a bivariate runs statistic under a binary HMM for DNA pattern recognition. An FMCI-based recursive algorithm is derived and implemented for the determination of the exact distribution of this bivariate runs statistic under an independent identically distributed (IID) framework, a Markov chain (MC) framework, and a binary HMM framework. With this algorithm, we have studied the distributions of the bivariate runs statistic under different binary HMM parameter sets; probabilistic profiles of runs are created and shown to be useful for trapping HMM maximum likelihood estimates (MLEs). This MLE-trapping scheme offers good initial estimates to jump-start the expectation-maximization (EM) algorithm in HMM parameter estimation and helps prevent the EM estimates from landing on a local maximum or a saddle point. Applications of the bivariate runs statistic and the probabilistic profiles in conjunction with binary HMMs for pattern recognition in genomic DNA sequences are illustrated via case studies on DNA bendability signals using human DNA data. 相似文献

13.

Temporally and Spatially Constrained ICA of fMRI Data Analysis

Zhi Wang Maogeng Xia Zhen Jin Li Yao Zhiying Long 《PloS one》2014,9(4)

Constrained independent component analysis (CICA) is capable of eliminating the order ambiguity that is found in the standard ICA and extracting the desired independent components by incorporating prior information into the ICA contrast function. However, the current CICA method produces constraints that are based on only one type of prior information (temporal/spatial), which may increase the dependency of CICA on the accuracy of the prior information. To improve the robustness of CICA and to reduce the impact of the accuracy of prior information on CICA, we proposed a temporally and spatially constrained ICA (TSCICA) method that incorporated two types of prior information, both temporal and spatial, as constraints in the ICA. The proposed approach was tested using simulated fMRI data and was applied to a real fMRI experiment using 13 subjects who performed a movement task. Additionally, the performance of TSCICA was compared with the ICA method, the temporally CICA (TCICA) method and the spatially CICA (SCICA) method. The results from the simulation and from the real fMRI data demonstrated that TSCICA outperformed TCICA, SCICA and ICA in terms of robustness to noise. Moreover, the TSCICA method displayed better robustness to prior temporal/spatial information than the TCICA/SCICA method. 相似文献

14.

Independent Component Analysis for Brain fMRI Does Indeed Select for Maximal Independence

Vince D. Calhoun Vamsi K. Potluru Ronald Phlypo Rogers F. Silva Barak A. Pearlmutter Arvind Caprihan Sergey M. Plis Tülay Adal? 《PloS one》2013,8(8)

A recent paper by Daubechies et al. claims that two independent component analysis (ICA) algorithms, Infomax and FastICA, which are widely used for functional magnetic resonance imaging (fMRI) analysis, select for sparsity rather than independence. The argument was supported by a series of experiments on synthetic data. We show that these experiments fall short of proving this claim and that the ICA algorithms are indeed doing what they are designed to do: identify maximally independent sources. 相似文献

15.

Maximum likelihood of evolutionary trees: hardness and approximation

Chor B Tuller T 《Bioinformatics (Oxford, England)》2005,21(Z1):i97-106

MOTIVATION: Maximum likelihood (ML) is an increasingly popular optimality criterion for selecting evolutionary trees. Yet the computational complexity of ML was open for over 20 years, and only recently resolved by the authors for the Jukes-Cantor model of substitution and its generalizations. It was proved that reconstructing the ML tree is computationally intractable (NP-hard). In this work we explore three directions, which extend that result. RESULTS: (1) We show that ML under the assumption of molecular clock is still computationally intractable (NP-hard). (2) We show that not only is it computationally intractable to find the exact ML tree, even approximating the logarithm of the ML for any multiplicative factor smaller than 1.00175 is computationally intractable. (3) We develop an algorithm for approximating log-likelihood under the condition that the input sequences are sparse. It employs any approximation algorithm for parsimony, and asymptotically achieves the same approximation ratio. We note that ML reconstruction for sparse inputs is still hard under this condition, and furthermore many real datasets satisfy it. 相似文献

16.

REML estimation of variance parameters in nonlinear mixed effects models using the SAEM algorithm

Meza C Jaffrézic F Foulley JL 《Biometrical journal. Biometrische Zeitschrift》2007,49(6):876-888

Nonlinear mixed effects models are now widely used in biometrical studies, especially in pharmacokinetic research or for the analysis of growth traits for agricultural and laboratory species. Most of these studies, however, are often based on ML estimation procedures, which are known to be biased downwards. A few REML extensions have been proposed, but only for approximated methods. The aim of this paper is to present a REML implementation for nonlinear mixed effects models within an exact estimation scheme, based on an integration of the fixed effects and a stochastic estimation procedure. This method was implemented via a stochastic EM, namely the SAEM algorithm. The simulation study showed that the proposed REML estimation procedure considerably reduced the bias observed with the ML estimation, as well as the residual mean squared error of the variance parameter estimations, especially in the unbalanced cases. ML and REML based estimators of fixed effects were also compared via simulation. Although the two kinds of estimates were very close in terms of bias and mean square error, predictions of individual profiles were clearly improved when using REML vs. ML. An application of this estimation procedure is presented for the modelling of growth in lines of chicken. 相似文献

17.

An empirical comparison of information-theoretic criteria in estimating the number of independent components of fMRI data

Hui M Li J Wen X Yao L Long Z 《PloS one》2011,6(12):e29274

Background

Independent Component Analysis (ICA) has been widely applied to the analysis of fMRI data. Accurate estimation of the number of independent components of fMRI data is critical to reduce over/under fitting. Although various methods based on Information Theoretic Criteria (ITC) have been used to estimate the intrinsic dimension of fMRI data, the relative performance of different ITC in the context of the ICA model hasn''t been fully investigated, especially considering the properties of fMRI data. The present study explores and evaluates the performance of various ITC for the fMRI data with varied white noise levels, colored noise levels, temporal data sizes and spatial smoothness degrees.

Methodology

Both simulated data and real fMRI data with varied Gaussian white noise levels, first-order auto-regressive (AR(1)) noise levels, temporal data sizes and spatial smoothness degrees were carried out to deeply explore and evaluate the performance of different traditional ITC.

Principal Findings

Results indicate that the performance of ITCs depends on the noise level, temporal data size and spatial smoothness of fMRI data. 1) High white noise levels may lead to underestimation of all criteria and MDL/BIC has the severest underestimation at the higher Gaussian white noise level. 2) Colored noise may result in overestimation that can be intensified by the increase of AR(1) coefficient rather than the SD of AR(1) noise and MDL/BIC shows the least overestimation. 3) Larger temporal data size will be better for estimation for the model of white noise but tends to cause severer overestimation for the model of AR(1) noise. 4) Spatial smoothing will result in overestimation in both noise models.

Conclusions

1) None of ITC is perfect for all fMRI data due to its complicated noise structure. 2) If there is only white noise in data, AIC is preferred when the noise level is high and otherwise, Laplace approximation is a better choice. 3) When colored noise exists in data, MDL/BIC outperforms the other criteria. 相似文献

18.

Independent Component Analysis of the Effect of L-dopa on fMRI of Language Processing

Namhee Kim Prem K. Goel Madalina E. Tivarus Ashleigh Hillier David Q. Beversdorf 《PloS one》2010,5(8)

L-dopa, which is a precursor for dopamine, acts to amplify strong signals, and dampen weak signals as suggested by previous studies. The effect of L-dopa has been demonstrated in language studies, suggesting restriction of the semantic network. In this study, we aimed to examine the effect of L-dopa on language processing with fMRI using Independent Component Analysis (ICA). Two types of language tasks (phonological and semantic categorization tasks) were tested under two drug conditions (placebo and L-dopa) in 16 healthy subjects. Probabilistic ICA (PICA), part of FSL, was implemented to generate Independent Components (IC) for each subject for the four conditions and the ICs were classified into task-relevant source groups by a correlation threshold criterion. Our key findings include: (i) The highly task-relevant brain regions including the Left Inferior Frontal Gyrus (LIFG), Left Fusiform Gyrus (LFUS), Left Parietal lobe (LPAR) and Superior Temporal Gyrus (STG) were activated with both L-dopa and placebo for both tasks, and (ii) as compared to placebo, L-dopa was associated with increased activity in posterior regions, including the superior temporal area (BA 22), and decreased activity in the thalamus (pulvinar) and inferior frontal gyrus (BA 11/47) for both tasks. These results raise the possibility that L-dopa may exert an indirect effect on posterior regions mediated by the thalamus (pulvinar). 相似文献

19.

A supervised visual model for finding regions of interest in basal cell carcinoma images

Ricardo Gutiérrez Francisco Gómez Lucía Roa-Peña Eduardo Romero 《Diagnostic pathology》2011,6(1):1-14

Background

Expectation maximizing (EM) is one of the common approaches for image segmentation.

Methods

an improvement of the EM algorithm is proposed and its effectiveness for MRI brain image segmentation is investigated. In order to improve EM performance, the proposed algorithms incorporates neighbourhood information into the clustering process. At first, average image is obtained as neighbourhood information and then it is incorporated in clustering process. Also, as an option, user-interaction is used to improve segmentation results. Simulated and real MR volumes are used to compare the efficiency of the proposed improvement with the existing neighbourhood based extension for EM and FCM.

Results

the findings show that the proposed algorithm produces higher similarity index.

Conclusions

experiments demonstrate the effectiveness of the proposed algorithm in compare to other existing algorithms on various noise levels. 相似文献

20.

STEME: efficient EM to find motifs in large data sets

Reid JE Wernisch L 《Nucleic acids research》2011,39(18):e126

MEME and many other popular motif finders use the expectation-maximization (EM) algorithm to optimize their parameters. Unfortunately, the running time of EM is linear in the length of the input sequences. This can prohibit its application to data sets of the size commonly generated by high-throughput biological techniques. A suffix tree is a data structure that can efficiently index a set of sequences. We describe an algorithm, Suffix Tree EM for Motif Elicitation (STEME), that approximates EM using suffix trees. To the best of our knowledge, this is the first application of suffix trees to EM. We provide an analysis of the expected running time of the algorithm and demonstrate that STEME runs an order of magnitude more quickly than the implementation of EM used by MEME. We give theoretical bounds for the quality of the approximation and show that, in practice, the approximation has a negligible effect on the outcome. We provide an open source implementation of the algorithm that we hope will be used to speed up existing and future motif search algorithms. 相似文献