首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We introduce a model-based analysis technique for extracting and characterizing rhythmic expression profiles from genome-wide DNA microarray hybridization data. These patterns are clues to discovering rhythmic genes implicated in cell-cycle, circadian, or other biological processes. The algorithm, implemented in a program called RAGE (Rhythmic Analysis of Gene Expression), decouples the problems of estimating a pattern's wavelength and phase. Our algorithm is linear-time in frequency and phase resolution, an improvement over previous quadratic-time approaches. Unlike previous approaches, RAGE uses a true distance metric for measuring expression profile similarity, based on the Hausdorff distance. This results in better clustering of expression profiles for rhythmic analysis. The confidence of each frequency estimate is computed using Z-scores. We demonstrate that RAGE is superior to other techniques on synthetic and actual DNA microarray hybridization data. We also show how to replace the discretized phase search in our method with an exact (combinatorially precise) phase search, resulting in a faster algorithm with no complexity dependence on phase resolution.  相似文献   

2.
The problem of recovering DNA distribution from cytofluorometric experimental data was investigated. Theoretical analysis led to a convenient formulation of the problem and to uniqueness results for its solution. A minimization algorithm has been implemented to get the optimal estimate of G1, S, G2, and M phase percentages. This algorithm was tested in some experimental cases.  相似文献   

3.

Background  

Increasingly researchers are turning to the use of haplotype analysis as a tool in population studies, the investigation of linkage disequilibrium, and candidate gene analysis. When the phase of the data is unknown, computational methods, in particular those employing the Expectation-Maximisation (EM) algorithm, are frequently used for estimating the phase and frequency of the underlying haplotypes. These methods have proved very successful, predicting the phase-known frequencies from data for which the phase is unknown with a high degree of accuracy. Recently there has been much speculation as to the effect of unknown, or missing allelic data – a common phenomenon even with modern automated DNA analysis techniques – on the performance of EM-based methods. To this end an EM-based program, modified to accommodate missing data, has been developed, incorporating non-parametric bootstrapping for the calculation of accurate confidence intervals.  相似文献   

4.
The problem of recovering DNA distribution from cytofluorometric experimental data was investigated. Theoretical analysis led to a convenient formulation of the problem and to uniqueness results for its solution. A minimization algorithm has been implemented to get the optimal estimate of G1, S, G2, and M phase percentages. This algorithm was tested in some experimental cases.  相似文献   

5.
6.
This paper presents a phase detection algorithm for four-dimensional (4D) cardiac computed tomography (CT) analysis. The algorithm detects a phase, i.e. a specific three-dimensional (3D) image out of several time-distributed 3D images, with high contrast in the left ventricle and low contrast in the right ventricle. The purpose is to use the automatically detected phase in an existing algorithm that automatically aligns the images along the heart axis. Decision making is based on the contrast agent distribution over time. It was implemented in KardioPerfusion – a software framework currently being developed for 4D CT myocardial perfusion analysis. Agreement of the phase detection algorithm with two reference readers was 97% (95% CI: 82–100%). Mean duration for detection was 0.020 s (95% CI: 0.018–0.022 s), which was times less than the readers needed (s, ). Thus, this algorithm is an accurate and fast tool that can improve work flow of clinical examinations.  相似文献   

7.
A genetic algorithm-based computational method for the ab initio phasing of diffraction data from crystals of symmetric macromolecular structures, such as icosahedral viruses, has been implemented and applied to authentic data from the P1/Mahoney strain of poliovirus. Using only single-wavelength native diffraction data, the method is shown to be able to generate correct phases, and thus electron density, to 3.0 A resolution. Beginning with no advance knowledge of the shape of the virus and only approximate knowledge of its size, the method uses a genetic algorithm to determine coarse, low-resolution (here, 20.5 A) models of the virus that obey the known non-crystallographic symmetry (NCS) constraints. The best scoring of these models are subjected to refinement and NCS-averaging, with subsequent phase extension to high resolution (3.0 A). Initial difficulties in phase extension were overcome by measuring and including all low-resolution terms in the transform. With the low-resolution data included, the method was successful in generating essentially correct phases and electron density to 6.0 A in every one of ten trials from different models identified by the genetic algorithm. Retrospective analysis revealed that these correct high-resolution solutions converged from a range of significantly different low-resolution phase sets (average differences of 59.7 degrees below 24 A). This method represents an efficient way to determine phases for icosahedral viruses, and has the advantage of producing phases free from model bias. It is expected that the method can be extended to other protein systems with high NCS.  相似文献   

8.
Time-resolved admittance measurements provide the basis for studies showing that membrane fusion occurs through the formation and widening of an initially small pore, linking two previously separated aqueous compartments. Here we introduce modifications to this method that correct the cell-pipette (source) admittance for attenuation and phase shifts produced by electrophysiological equipment. Two new approaches for setting the right phase angle are discussed. The first uses the displacement of a patch-clamp amplifier C-slow potentiometer for the calculation of phase. This calculation is based on amplitudes of observed and expected (theoretical) changes in the source admittance. The second approach automates the original phase adjustment, the validity of which we prove analytically for certain conditions. The multiple sine wave approach is modified to allow the calculation of target cell membrane parameters and the conductance of the fusion pore. We also show how this technique can be extended for measurements of the resting potential of the first (voltage-clamped) membrane. We introduce an algorithm for calculation of fusion pore conductance despite a concurrent change in the resistance of the clamped membrane. The sensitivity of the capacitance restoration algorithm to phase shift errors is analyzed, and experimental data are used to demonstrate the results of this analysis. Finally, we show how the phase offset can be corrected "off-line" by restoring the shape of the capacitance increment.  相似文献   

9.
10.
Electron density profiles of disk membranes isolated from bovine retinal rod outer segments have been determined to 12 A resolution by analysis of the X-ray diffraction from oriented multilayers, in the absence of lipid phase separation. Data were collected on both film and a two-dimensional TV-detector; both detectors yielded identical patterns consisting of relatively sharp lamellar reflections of small mosaic spread. The unit cell repeat was reversibly varied over the range of 143 to 183 A. The diffraction patterns changed dramatically at 150 A; consequently, the low (less than 150 A) and high (greater than 150 A) periodicity data were independently analyzed via a swelling algorithm. The high periodicity data yielded two statistically equivalent phase choices corresponding to two symmetric, but different membrane profiles. The low periodicity data yielded essentially one, characteristically asymmetric profile. These profiles have been modeled with regard to the separate profiles of rhodopsin, lipid and water, subject to the known composition of the isolated disk membranes.  相似文献   

11.

Background

Knowing the phase of marker genotype data can be useful in genome-wide association studies, because it makes it possible to use analysis frameworks that account for identity by descent or parent of origin of alleles and it can lead to a large increase in data quantities via genotype or sequence imputation. Long-range phasing and haplotype library imputation constitute a fast and accurate method to impute phase for SNP data.

Methods

A long-range phasing and haplotype library imputation algorithm was developed. It combines information from surrogate parents and long haplotypes to resolve phase in a manner that is not dependent on the family structure of a dataset or on the presence of pedigree information.

Results

The algorithm performed well in both simulated and real livestock and human datasets in terms of both phasing accuracy and computation efficiency. The percentage of alleles that could be phased in both simulated and real datasets of varying size generally exceeded 98% while the percentage of alleles incorrectly phased in simulated data was generally less than 0.5%. The accuracy of phasing was affected by dataset size, with lower accuracy for dataset sizes less than 1000, but was not affected by effective population size, family data structure, presence or absence of pedigree information, and SNP density. The method was computationally fast. In comparison to a commonly used statistical method (fastPHASE), the current method made about 8% less phasing mistakes and ran about 26 times faster for a small dataset. For larger datasets, the differences in computational time are expected to be even greater. A computer program implementing these methods has been made available.

Conclusions

The algorithm and software developed in this study make feasible the routine phasing of high-density SNP chips in large datasets.  相似文献   

12.
A multi-clustering fusion method is presented based on combining several runs of a clustering algorithm resulting in a common partition. More specifically, the results of several independent runs of the same clustering algorithm are appropriately combined to obtain a distinct partition of the data which is not affected by initialization and overcomes the instabilities of clustering methods. Subsequently, a fusion procedure is applied to the clusters generated during the previous phase to determine the optimal number of clusters in the data set according to some predefined criteria.  相似文献   

13.
This article presents methodology for the construction of a linkage map in an autotetraploid species, using either codominant or dominant molecular markers scored on two parents and their full-sib progeny. The steps of the analysis are as follows: identification of parental genotypes from the parental and offspring phenotypes; testing for independent segregation of markers; partition of markers into linkage groups using cluster analysis; maximum-likelihood estimation of the phase, recombination frequency, and LOD score for all pairs of markers in the same linkage group using the EM algorithm; ordering the markers and estimating distances between them; and reconstructing their linkage phases. The information from different marker configurations about the recombination frequency is examined and found to vary considerably, depending on the number of different alleles, the number of alleles shared by the parents, and the phase of the markers. The methods are applied to a simulated data set and to a small set of SSR and AFLP markers scored in a full-sib population of tetraploid potato.  相似文献   

14.
Electron density profiles of disk membranes isolated from bovine retinal rod outer segments have been determined to 12 Å resolution by analysis of the X-ray diffraction from oriented multilayers, in the absence of lipid phase separation. Data were collected on both film and a two-dimensional TV-detector; both detectors yielded identical patterns consisting of relatively sharp lamellar reflections of small mosaic spread. The unit cell repeat was reversibly varied over the range of 143 to 183 Å. The diffraction patterns changed dramatically at 150 Å; consequently, the low (less than 150 Å) and high (greater than 150 Å) periodicity data were independently analyzed via a swelling algorithm. The high periodicity data yielded two statistically equivalent phase choices corresponding to two symmetric, but different membrane profiles. The low periodicity data yielded essentially one, characteristically asymmetric profile. These profiles have been modeled with regard to the separate profiles of rhodopsin, lipid and water, subject to the known composition of the isolated disk membranes.  相似文献   

15.
We recently described a method for linkage disequilibrium (LD) mapping, using cladistic analysis of phased single-nucleotide polymorphism (SNP) haplotypes in a logistic regression framework. However, haplotypes are often not available and cannot be deduced with certainty from the unphased genotypes. One possible two-stage approach is to infer the phase of multilocus genotype data and analyze the resulting haplotypes as if known. Here, haplotypes are inferred using the expectation-maximization (EM) algorithm and the best-guess phase assignment for each individual analyzed. However, inferring haplotypes from phase-unknown data is prone to error and this should be taken into account in the subsequent analysis. An alternative approach is to analyze the phase-unknown multilocus genotypes themselves. Here we present a generalization of the method for phase-known haplotype data to the case of unphased SNP genotypes. Our approach is designed for high-density SNP data, so we opted to analyze the simulated dataset. The marker spacing in the initial screen was too large for our method to be effective, so we used the answers provided to request further data in regions around the disease loci and in null regions. Power to detect the disease loci, accuracy in localizing the true site of the locus, and false-positive error rates are reported for the inferred-haplotype and unphased genotype methods. For this data, analyzing inferred haplotypes outperforms analysis of genotypes. As expected, our results suggest that when there is little or no LD between a disease locus and the flanking region, there will be no chance of detecting it unless the disease variant itself is genotyped.  相似文献   

16.
In a social network, users hold and express positive and negative attitudes (e.g. support/opposition) towards other users. Those attitudes exhibit some kind of binary relationships among the users, which play an important role in social network analysis. However, some of those binary relationships are likely to be latent as the scale of social network increases. The essence of predicting latent binary relationships have recently began to draw researchers'' attention. In this paper, we propose a machine learning algorithm for predicting positive and negative relationships in social networks inspired by structural balance theory and social status theory. More specifically, we show that when two users in the network have fewer common neighbors, the prediction accuracy of the relationship between them deteriorates. Accordingly, in the training phase, we propose a segment-based training framework to divide the training data into two subsets according to the number of common neighbors between users, and build a prediction model for each subset based on support vector machine (SVM). Moreover, to deal with large-scale social network data, we employ a sampling strategy that selects small amount of training data while maintaining high accuracy of prediction. We compare our algorithm with traditional algorithms and adaptive boosting of them. Experimental results of typical data sets show that our algorithm can deal with large social networks and consistently outperforms other methods.  相似文献   

17.
Bayesian adaptive Markov chain Monte Carlo estimation of genetic parameters   总被引:2,自引:0,他引:2  
Accurate and fast estimation of genetic parameters that underlie quantitative traits using mixed linear models with additive and dominance effects is of great importance in both natural and breeding populations. Here, we propose a new fast adaptive Markov chain Monte Carlo (MCMC) sampling algorithm for the estimation of genetic parameters in the linear mixed model with several random effects. In the learning phase of our algorithm, we use the hybrid Gibbs sampler to learn the covariance structure of the variance components. In the second phase of the algorithm, we use this covariance structure to formulate an effective proposal distribution for a Metropolis-Hastings algorithm, which uses a likelihood function in which the random effects have been integrated out. Compared with the hybrid Gibbs sampler, the new algorithm had better mixing properties and was approximately twice as fast to run. Our new algorithm was able to detect different modes in the posterior distribution. In addition, the posterior mode estimates from the adaptive MCMC method were close to the REML (residual maximum likelihood) estimates. Moreover, our exponential prior for inverse variance components was vague and enabled the estimated mode of the posterior variance to be practically zero, which was in agreement with the support from the likelihood (in the case of no dominance). The method performance is illustrated using simulated data sets with replicates and field data in barley.  相似文献   

18.
High-throughput molecular-profiling technologies provide rapid, efficient and systematic approaches to search for biomarkers. Supervised learning algorithms are naturally suited to analyse a large amount of data generated using these technologies in biomarker discovery efforts. The study demonstrates with two examples a data-driven analysis approach to analysis of large complicated datasets collected in high-throughput technologies in the context of biomarker discovery. The approach consists of two analytic steps: an initial unsupervised analysis to obtain accurate knowledge about sample clustering, followed by a second supervised analysis to identify a small set of putative biomarkers for further experimental characterization. By comparing the most widely applied clustering algorithms using a leukaemia DNA microarray dataset, it was established that principal component analysis-assisted projections of samples from a high-dimensional molecular feature space into a few low dimensional subspaces provides a more effective and accurate way to explore visually and identify data structures that confirm intended experimental effects based on expected group membership. A supervised analysis method, shrunken centroid algorithm, was chosen to take knowledge of sample clustering gained or confirmed by the first step of the analysis to identify a small set of molecules as candidate biomarkers for further experimentation. The approach was applied to two molecular-profiling studies. In the first study, PCA-assisted analysis of DNA microarray data revealed that discrete data structures exist in rat liver gene expression and correlated with blood clinical chemistry and liver pathological damage in response to a chemical toxicant diethylhexylphthalate, a peroxisome-proliferator-activator receptor agonist. Sixteen genes were then identified by shrunken centroid algorithm as the best candidate biomarkers for liver damage. Functional annotations of these genes revealed roles in acute phase response, lipid and fatty acid metabolism and they are functionally relevant to the observed toxicities. In the second study, 26 urine ions identified from a GC/MS spectrum, two of which were glucose fragment ions included as positive controls, showed robust changes with the development of diabetes in Zucker diabetic fatty rats. Further experiments are needed to define their chemical identities and establish functional relevancy to disease development.  相似文献   

19.
This paper proposes solutions to monitor the load and to balance the load of cloud data center. The proposed solutions work in two phases and graph theoretical concepts are applied in both phases. In the first phase, cloud data center is modeled as a network graph. This network graph is augmented with minimum dominating set concept of graph theory for monitoring its load. For constructing minimum dominating set, this paper proposes a new variant of minimum dominating set (V-MDS) algorithm and is compared with existing construction algorithms proposed by Rooji and Fomin. The V-MDS approach of querying cloud data center load information is compared with Central monitor approach. The second phase focuses on system and network-aware live virtual machine migration for load balancing cloud data center. For this, a new system and traffic-aware live VM migration for load balancing (ST-LVM-LB) algorithm is proposed and is compared with existing benchmarked algorithms dynamic management algorithm (DMA) and Sandpiper. To study the performance of the proposed algorithms, CloudSim3.0.3 simulator is used. The experimental results show that, V-MDS algorithm takes quadratic time complexity, whereas Rooji and Fomin algorithms take exponential time complexity. Then the V-MDS approach for querying Cloud Data Center load information is compared with the Central monitor approach and the experimental result shows that the proposed approach reduces the number of message updates by half than the Central monitor approach. The experimental results show on load balancing that the developed ST-LVM-LB algorithm triggers lesser Virtual Machine migrations, takes lesser time and migration cost to migrate with minimum network overhead. Thus the proposed algorithms improve the service delivery performance of cloud data center by incorporating graph theoretical solutions in monitoring and balancing the load.  相似文献   

20.
Molecular techniques allow the survey of a large number of linked polymorphic loci in random samples from diploid populations. However, the gametic phase of haplotypes is usually unknown when diploid individuals are heterozygous at more than one locus. To overcome this difficulty, we implement an expectation-maximization (EM) algorithm leading to maximum-likelihood estimates of molecular haplotype frequencies under the assumption of Hardy-Weinberg proportions. The performance of the algorithm is evaluated for simulated data representing both DNA sequences and highly polymorphic loci with different levels of recombination. As expected, the EM algorithm is found to perform best for large samples, regardless of recombination rates among loci. To ensure finding the global maximum likelihood estimate, the EM algorithm should be started from several initial conditions. The present approach appears to be useful for the analysis of nuclear DNA sequences or highly variable loci. Although the algorithm, in principle, can accommodate an arbitrary number of loci, there are practical limitations because the computing time grows exponentially with the number of polymorphic loci. Although the algorithm, in principle, can accommodate an arbitrary number of loci, there are practical limitations because the computing time grows exponentially with the number of polymorphic loci.   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号