首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background  

The Golden Spike data set has been used to validate a number of methods for summarizing Affymetrix data sets, sometimes with seemingly contradictory results. Much less use has been made of this data set to evaluate differential expression methods. It has been suggested that this data set should not be used for method comparison due to a number of inherent flaws.  相似文献   

2.
A comparison has been made between two methods of measuring body sway during quiet standing. In the first method a Wright ataxiameter was used to measure the trunk movement in the anteroposterior direction; whilst in the second method a Kistler force platform was used to monitor the locus of the resultant ground reaction force. The good correlation between the two sets of data has resulted in a regression equation to convert one set into the equivalent other set. This equation should be useful while comparing the sway data from various research centres.  相似文献   

3.
The recently described FINGAR genetic algorithm method for NMR refinement [D.A. Pearlman (1996) J. Biomol. NMR, 8, 67–76] has been extended so that it can be used to detect problem restraints in an NMR-derived set of data. A problem restraint is defined as a restraint in a generally well-behaved set where the associated target value is in error, due to inaccuracies in the data, misassignment, etc. The method described here, FINGAR.RWF, locates problem restraints by finding those restraints that, if removed from the data set, result in a disproportionate improvement in the scoring function. The method is applied to several test cases of simulated data, as well as to real data for the FK506 macrocycle, with excellent results.  相似文献   

4.
We have developed a new method for the analysis of voids in proteins (defined as empty cavities not accessible to solvent). This method combines analysis of individual discrete voids with analysis of packing quality. While these are different aspects of the same effect, they have traditionally been analysed using different approaches. The method has been applied to the calculation of total void volume and maximum void size in a non-redundant set of protein domains and has been used to examine correlations between thermal stability and void size. The tumour-suppressor protein p53 has then been compared with the non-redundant data set to determine whether its low thermal stability results from poor packing. We found that p53 has average packing, but the detrimental effects of some previously unexplained mutations to p53 observed in cancer can be explained by the creation of unusually large voids.  相似文献   

5.
6.

Background

With the rapid accumulation of genomic data, it has become a challenge issue to annotate and interpret these data. As a representative, Gene set enrichment analysis has been widely used to interpret large molecular datasets generated by biological experiments. The result of gene set enrichment analysis heavily relies on the quality and integrity of gene set annotations. Although several methods were developed to annotate gene sets, there is still a lack of high quality annotation methods. Here, we propose a novel method to improve the annotation accuracy through combining the GO structure and gene expression data.

Results

We propose a novel approach for optimizing gene set annotations to get more accurate annotation results. The proposed method filters the inconsistent annotations using GO structure information and probabilistic gene set clusters calculated by a range of cluster sizes over multiple bootstrap resampled datasets. The proposed method is employed to analyze p53 cell lines, colon cancer and breast cancer gene expression data. The experimental results show that the proposed method can filter a number of annotations unrelated to experimental data and increase gene set enrichment power and decrease the inconsistent of annotations.

Conclusions

A novel gene set annotation optimization approach is proposed to improve the quality of gene annotations. Experimental results indicate that the proposed method effectively improves gene set annotation quality based on the GO structure and gene expression data.
  相似文献   

7.
Interactive semisupervised learning for microarray analysis   总被引:3,自引:0,他引:3  
Microarray technology has generated vast amounts of gene expression data with distinct patterns. Based on the premise that genes of correlated functions tend to exhibit similar expression patterns, various machine learning methods have been applied to capture these specific patterns in microarray data. However, the discrepancy between the rich expression profiles and the limited knowledge of gene functions has been a major hurdle to the understanding of cellular networks. To bridge this gap so as to properly comprehend and interpret expression data, we introduce relevance feedback to microarray analysis and propose an interactive learning framework to incorporate the expert knowledge into the decision module. In order to find a good learning method and solve two intrinsic problems in microarray data, high dimensionality and small sample size, we also propose a semisupervised learning algorithm: kernel discriminant-EM (KDEM). This algorithm efficiently utilizes a large set of unlabeled data to compensate for the insufficiency of a small set of labeled data and it extends the linear algorithm in discriminant-EM (DEM) to a kernel algorithm to handle nonlinearly separable data in a lower dimensional space. The relevance feedback technique and KDEM together construct an efficient and effective interactive semisupervised learning framework for microarray analysis. Extensive experiments on the yeast cell cycle regulation data set and Plasmodium falciparum red blood cell cycle data set show the promise of this approach  相似文献   

8.
A new method for the analysis of NMR data in terms of the solution structure of proteins has been developed. The method consists of two steps: first a systematic search of the conformational space to define the region allowed by the initial set of experimental constraints, and second, the narrowing of this region by the introduction of additional constraints and optional refinement procedures. The search of the conformational space is guided by heuristics to make it computationally feasible. The method is therefore called the heuristic refinement method and is coded in an expert system called PROTEAN. The paper describes the validation of the first step of the method using an artificial NMR data set generated from the known crystal structure of sperm whale carbon monoxymyoglobin. It is shown that the initial search procedure yields a low-resolution structure of the myoglobin molecule, accurately reproducing its main topological features, and that the precision of the structure depends on the quality of the initial data set.  相似文献   

9.
In this work, the application of a multivariate curve resolution procedure based on alternating least squares optimization (MCR-ALS) for the analysis of data from DNA microarrays is proposed. For this purpose, simulated and publicly available experimental data sets have been analyzed. Application of MCR-ALS, a method that operates without the use of any training set, has enabled the resolution of the relevant information about different cancer lines classification using a set of few components; each of these defined by a sample and a pure gene expression profile. From resolved sample profiles, a classification of samples according to their origin is proposed. From the resolved pure gene expression profiles, a set of over- or underexpressed genes that could be related to the development of cancer diseases has been selected. Advantages of the MCR-ALS procedure in relation to other previously proposed procedures such as principal component analysis are discussed.  相似文献   

10.
The inference of population divergence times and branching patterns is of fundamental importance in many population genetic analyses. Many methods have been developed for estimating population divergence times, and recently, there has been particular attention towards genome-wide single-nucleotide polymorphisms (SNP) data. However, most SNP data have been affected by an ascertainment bias caused by the SNP selection and discovery protocols. Here, we present a modification of an existing maximum likelihood method that will allow approximately unbiased inferences when ascertainment is based on a set of outgroup populations. We also present a method for estimating trees from the asymmetric dissimilarity measures arising from pairwise divergence time estimation in population genetics. We evaluate the methods by simulations and by applying them to a large SNP data set of seven East Asian populations.  相似文献   

11.
12.
13.
The female gametophyte is an absolutely essential structure for angiosperm reproduction, and female sterility has been reported in a number of crops. In this paper, a maximum-likelihood method is presented for estimating the position and effect of a female partial-sterile locus in a backcross population using the observed data of dominant or codominant markers. The ML solutions are obtained via Bailey’s method. The process for the estimating of the recombination fractions and the viabilities of female gametes are described, and the variances of the estimates of the parameters are also presented. Application of the method is demonstrated using a set of simulated data. This method circumvents the problems of the traditional mapping methods for female sterile genes which were based on data from seed set or embryo-sac morphology and anatomy.  相似文献   

14.
We present a computer-aided approach for identifying and aligning consensus secondary structure within a set of functionally related oligonucleotide sequences aligned by sequence. The method relies on visualization of secondary structure using a generalization of the dot matrix representation appropriate for consensus sequence data sets. An interactive computer program implementing such a visualization of consensus structure has been developed. The program allows for alignment editing, data and display filtering and various modes of base pair representation, including co-variation. The utility of this approach is demonstrated with four sample data sets derived from in vitro selection experiments and one data set comprising tRNA sequences.  相似文献   

15.
Reliable assignment of an unknown query sequence to its correct species remains a methodological problem for the growing field of DNA barcoding. While great advances have been achieved recently, species identification from barcodes can still be unreliable if the relevant biodiversity has been insufficiently sampled. We here propose a new notion of species membership for DNA barcoding-fuzzy membership, based on fuzzy set theory-and illustrate its successful application to four real data sets (bats, fishes, butterflies and flies) with more than 5000 random simulations. Two of the data sets comprise especially dense species/population-level samples. In comparison with current DNA barcoding methods, the newly proposed minimum distance (MD) plus fuzzy set approach, and another computationally simple method, 'best close match', outperform two computationally sophisticated Bayesian and BootstrapNJ methods. The new method proposed here has great power in reducing false-positive species identification compared with other methods when conspecifics of the query are absent from the reference database.  相似文献   

16.
A method has been developed to determine the cell cycle kinetics for a quiescent population of cells which are stimulated to undergo a single transit of the division cycle. The method, known as the cohort of fraction labelled mitoses (COFLM), requires no knowledge of the proliferative fraction. The probability statements of the model were formulated and then compared by an iterative fitting procedure to experimental data to obtain estimates of the model parameters. Best fit model responses show good agreement with a set of experimental data.  相似文献   

17.
Secondary structures of proteins have been predicted using neural networks from their Fourier transform infrared spectra. To improve the generalization ability of the neural networks, the training data set has been artificially increased by linear interpolation. The leave-one-out approach has been used to demonstrate the applicability of the method. Bayesian regularization has been used to train the neural networks and the predictions have been further improved by the maximum-likelihood estimation method. The networks have been tested and standard error of prediction (SEP) of 4.19% for alpha helix, 3.49% for beta sheet, and 3.15% for turns have been achieved. The results indicate that there is a significant decrease in the SEP for each type of structure parameter compared to previous works.  相似文献   

18.
Foll M  Gaggiotti O 《Genetics》2008,180(2):977-993
Identifying loci under natural selection from genomic surveys is of great interest in different research areas. Commonly used methods to separate neutral effects from adaptive effects are based on locus-specific population differentiation coefficients to identify outliers. Here we extend such an approach to estimate directly the probability that each locus is subject to selection using a Bayesian method. We also extend it to allow the use of dominant markers like AFLPs. It has been shown that this model is robust to complex demographic scenarios for neutral genetic differentiation. Here we show that the inclusion of isolated populations that underwent a strong bottleneck can lead to a high rate of false positives. Nevertheless, we demonstrate that it is possible to avoid them by carefully choosing the populations that should be included in the analysis. We analyze two previously published data sets: a human data set of codominant markers and a Littorina saxatilis data set of dominant markers. We also perform a detailed sensitivity study to compare the power of the method using amplified fragment length polymorphism (AFLP), SNP, and microsatellite markers. The method has been implemented in a new software available at our website (http://www-leca.ujf-grenoble.fr/logiciels.htm).  相似文献   

19.
Multilayer feedforward neural networks with backpropagation algorithm have been used successfully in many applications. However, the level of generalization is heavily dependent on the quality of the training data. That is, some of the training patterns can be redundant or irrelevant. It has been shown that with careful dynamic selection of training patterns, better generalization performance may be obtained. Nevertheless, generalization is carried out independently of the novel patterns to be approximated. In this paper, we present a learning method that automatically selects the training patterns more appropriate to the new sample to be predicted. This training method follows a lazy learning strategy, in the sense that it builds approximations centered around the novel sample. The proposed method has been applied to three different domains: two artificial approximation problems and a real time series prediction problem. Results have been compared to standard backpropagation using the complete training data set and the new method shows better generalization abilities.  相似文献   

20.
Cluster analysis has proven to be a useful tool for investigating the association structure among genes in a microarray data set. There is a rich literature on cluster analysis and various techniques have been developed. Such analyses heavily depend on an appropriate (dis)similarity measure. In this paper, we introduce a general clustering approach based on the confidence interval inferential methodology, which is applied to gene expression data of microarray experiments. Emphasis is placed on data with low replication (three or five replicates). The proposed method makes more efficient use of the measured data and avoids the subjective choice of a dissimilarity measure. This new methodology, when applied to real data, provides an easy-to-use bioinformatics solution for the cluster analysis of microarray experiments with replicates (see the Appendix). Even though the method is presented under the framework of microarray experiments, it is a general algorithm that can be used to identify clusters in any situation. The method's performance is evaluated using simulated and publicly available data set. Our results also clearly show that our method is not an extension of the conventional clustering method based on correlation or euclidean distance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号