首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Terminal restriction fragment length polymorphism (T-RFLP) is increasingly being used to examine microbial community structure and accordingly, a range of approaches have been used to analyze data sets. A number of published reports have included data and results that were statistically flawed or lacked rigorous statistical testing. A range of simple, yet powerful techniques are available to examine community data, however their use is seldom, if ever, discussed in microbial literature. We describe an approach that overcomes some of the problems associated with analyzing community datasets and offer an approach that makes data interpretation simple and effective. The Bray-Curtis coefficient is suggested as an ideal coefficient to be used for the construction of similarity matrices. Its strengths include its ability to deal with data sets containing multiple blocks of zeros in a meaningful manner. Non-metric multi-dimensional scaling is described as a powerful, yet easily interpreted method to examine community patterns based on T-RFLP data. Importantly, we describe the use of significance testing of data sets to allow quantitative assessment of similarity, removing subjectivity in comparing complex data sets. Finally, we introduce a quantitative measure of sample dispersion and suggest its usefulness in describing site heterogeneity. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

2.
When a ground and vegetation cover factor related to soil erosion is mapped with the aid of remotely sensed data, a cost-efficient sample design to collect ground data and to obtain an accurate map is required. However, the supports used to collect ground data are often smaller than the desirable pixels used for mapping, which leads to complexity in developing procedures for sample design and mapping. For these purposes, a sampling and mapping method was developed by integrating stratification and an up-scaling method in geostatistics — block cokriging with Landsat Thematic Mapper imagery. This method is based on spatial correlation and stratified sampling. It scales up not only the ground sample data but also the uncertainties associated with the data aggregation from smaller supports to larger pixels or blocks. This method uses the advantages of both stratification and block cokriging variance-based sample design, which leads to sample designs with variable grid spacing, and thus significantly increases the unit cost-efficiency of sample data in sampling and mapping. This outcome was verified by the results of this study.  相似文献   

3.
In scientific research, integration and synthesis require a common understanding of where data come from, how much they can be trusted, and what they may be used for. To make such an understanding computer-accessible requires standards for exchanging richly annotated data. The challenges of conveying reusable data are particularly acute in regard to evolutionary comparative analysis, which comprises an ever-expanding list of data types, methods, research aims, and subdisciplines. To facilitate interoperability in evolutionary comparative analysis, we present NeXML, an XML standard (inspired by the current standard, NEXUS) that supports exchange of richly annotated comparative data. NeXML defines syntax for operational taxonomic units, character-state matrices, and phylogenetic trees and networks. Documents can be validated unambiguously. Importantly, any data element can be annotated, to an arbitrary degree of richness, using a system that is both flexible and rigorous. We describe how the use of NeXML by the TreeBASE and Phenoscape projects satisfies user needs that cannot be satisfied with other available file formats. By relying on XML Schema Definition, the design of NeXML facilitates the development and deployment of software for processing, transforming, and querying documents. The adoption of NeXML for practical use is facilitated by the availability of (1) an online manual with code samples and a reference to all defined elements and attributes, (2) programming toolkits in most of the languages used commonly in evolutionary informatics, and (3) input-output support in several widely used software applications. An active, open, community-based development process enables future revision and expansion of NeXML.  相似文献   

4.
Ewing G  Nicholls G  Rodrigo A 《Genetics》2004,168(4):2407-2420
We present a Bayesian statistical inference approach for simultaneously estimating mutation rate, population sizes, and migration rates in an island-structured population, using temporal and spatial sequence data. Markov chain Monte Carlo is used to collect samples from the posterior probability distribution. We demonstrate that this chain implementation successfully reaches equilibrium and recovers truth for simulated data. A real HIV DNA sequence data set with two demes, semen and blood, is used as an example to demonstrate the method by fitting asymmetric migration rates and different population sizes. This data set exhibits a bimodal joint posterior distribution, with modes favoring different preferred migration directions. This full data set was subsequently split temporally for further analysis. Qualitative behavior of one subset was similar to the bimodal distribution observed with the full data set. The temporally split data showed significant differences in the posterior distributions and estimates of parameter values over time.  相似文献   

5.
6.
Modeling and real-time prediction of classical swine fever epidemics   总被引:3,自引:0,他引:3  
We propose a new method to analyze outbreak data of an infectious disease such as classical swine fever. The underlying model is a two-type branching process. It is used to deduce information concerning the epidemic from detected cases. In particular, the method leads to prediction of the future course of the epidemic and hence can be used as a basis for control policy decisions. We test the model with data from the large 1997-1998 classical swine fever epidemic in The Netherlands. It turns out that our results are in good agreement with the data.  相似文献   

7.
DNA microarrays have been used in applications ranging from the assignment of gene function to analytical uses in prognostics. However, the detection sensitivity, cross hybridization, and reproducibility of these arrays can affect experimental design and data interpretation. Moreover, several technologies are available for fabrication of oligonucleotide microarrays. We review these technologies and performance attributes and, with data sets generated from human brain RNA, present statistical tools and methods to analyze data quality and to mine and visualize the data. Our data show high reproducibility and should allow an investigator to discern biological and regional variability from differential expression. Although we have used brain RNA as a model system to illustrate some of these points, the oligonucleotide arrays and methods employed in this study can be used with cell lines, tissue sections, blood, and other fluids. To further demonstrate this point, we provide data generated from total RNA sample sizes of 200 ng.  相似文献   

8.
Ho SY  Hsieh CH  Chen HM  Huang HL 《Bio Systems》2006,85(3):165-176
An accurate classifier with linguistic interpretability using a small number of relevant genes is beneficial to microarray data analysis and development of inexpensive diagnostic tests. Several frequently used techniques for designing classifiers of microarray data, such as support vector machine, neural networks, k-nearest neighbor, and logistic regression model, suffer from low interpretabilities. This paper proposes an interpretable gene expression classifier (named iGEC) with an accurate and compact fuzzy rule base for microarray data analysis. The design of iGEC has three objectives to be simultaneously optimized: maximal classification accuracy, minimal number of rules, and minimal number of used genes. An "intelligent" genetic algorithm IGA is used to efficiently solve the design problem with a large number of tuning parameters. The performance of iGEC is evaluated using eight commonly-used data sets. It is shown that iGEC has an accurate, concise, and interpretable rule base (1.1 rules per class) on average in terms of test classification accuracy (87.9%), rule number (3.9), and used gene number (5.0). Moreover, iGEC not only has better performance than the existing fuzzy rule-based classifier in terms of the above-mentioned objectives, but also is more accurate than some existing non-rule-based classifiers.  相似文献   

9.
Mass-spectrometry based bottom-up proteomics is the main method to analyze proteomes comprehensively and the rapid evolution of instrumentation and data analysis has made the technology widely available. Data visualization is an integral part of the analysis process and it is crucial for the communication of results. This is a major challenge due to the immense complexity of MS data. In this review, we provide an overview of commonly used visualizations, starting with raw data of traditional and novel MS technologies, then basic peptide and protein level analyses, and finally visualization of highly complex datasets and networks. We specifically provide guidance on how to critically interpret and discuss the multitude of different proteomics data visualizations. Furthermore, we highlight Python-based libraries and other open science tools that can be applied for independent and transparent generation of customized visualizations. To further encourage programmatic data visualization, we provide the Python code used to generate all data figures in this review on GitHub ( https://github.com/MannLabs/ProteomicsVisualization ).  相似文献   

10.
The scattering cross-section of atoms in biological macromolecules for both elastically and inelastically scattered electrons is approximately 100,000 times larger than that for x-ray. Therefore, much smaller (<1 microm) and thinner (<0.01 microm) protein crystals than those used for x-ray crystallography can be used to analyze the molecular structures by electron crystallography. But, inelastic scattering is a serious problem. We examined electron diffraction data from thin three-dimensional (3-D) crystals (600-750 A thick) and two-dimensional (2-D) crystals (approximately 60 A thick), both at 93 K, with an energy filtering electron microscope operated at an accelerating voltage of 200 kV. Removal of inelastically scattered electrons significantly improved intensity data statistics and R(Friedel) factor in every resolution range up to 3-A resolution. The effect of energy filtering was more prominent for thicker crystals but was significant even for thin crystals. These filtered data sets showed better intensity statistics even in comparison with data sets collected at 4 K and an accelerating voltage of 300 kV without energy filtering. Thus, the energy filter will be an effective and important tool in the structure analysis of thin 3-D and 2-D crystals, particularly when data are collected at high tilt angle.  相似文献   

11.
We propose an integrative approach that combines structural magnetic resonance imaging data (MRI), diffusion tensor imaging data (DTI), neuropsychological data, and genetic data to predict early-onset obsessive compulsive disorder (OCD) severity. From a cohort of 87 patients, 56 with complete information were used in the present analysis. First, we performed a multivariate genetic association analysis of OCD severity with 266 genetic polymorphisms. This association analysis was used to select and prioritize the SNPs that would be included in the model. Second, we split the sample into a training set (N = 38) and a validation set (N = 18). Third, entropy-based measures of information gain were used for feature selection with the training subset. Fourth, the selected features were fed into two supervised methods of class prediction based on machine learning, using the leave-one-out procedure with the training set. Finally, the resulting model was validated with the validation set. Nine variables were used for the creation of the OCD severity predictor, including six genetic polymorphisms and three variables from the neuropsychological data. The developed model classified child and adolescent patients with OCD by disease severity with an accuracy of 0.90 in the testing set and 0.70 in the validation sample. Above its clinical applicability, the combination of particular neuropsychological, neuroimaging, and genetic characteristics could enhance our understanding of the neurobiological basis of the disorder.  相似文献   

12.
Life cycle inventory (LCI) is becoming an established environmental management tool that quantifies all resource usage and waste generation associated with providing specific goods or services to society. LCIs are increasingly used by industry as well as policy makers to provide a holistic ‘macro’ overview of the environmental profile of a good or service. This information, effectively combined with relevant information obtained from other environmental management tools, is very useful in guiding strategic environmental decision making. LCIs are very data intensive. There is a risk that they imply a level of accuracy that does not exist. This is especially true today, because the availability of accurate LCI data is limited. Also, it is not easy for LCI users, decision-makers and other interested parties to differentiate between ‘good quality’ and ‘poor quality’ LCI data. Several data quality requirements for ‘good’ LCI data can be defined only in relation to the specific study in which they are used. In this paper we show how and why the use of a common LCI database for some of the more commonly used LCI data, together with increased documentation and harmonisation of the data quality features of all LCI data, is key to the further development of LCI as a useful and pragmatic environmental management tool. Initiatives already underway to make this happen are also described.  相似文献   

13.
Tree ring width (TRW), maximum (MXD), mean (MED) and minimum (MID) wood density were investigated in samples from the vicinity of the Tuchola Forest Biosphere Reserve (Northern Poland) in an attempt to distinguish the relative importance of climate and insect attack on the growth of Norway spruce. Selected climate parameters were used for a multiple regression to predict tree-ring width during insect outbreaks. This also used AICc for model selection. Additionally, k-means clustering was then used to group the yearly data of TRW, MXD, MID and the data of insect outbreaks. The respective climate data and data on insect outbreaks during the years 1962–1996 revealed a strong influence of May precipitation on TRW and insect outbreaks on MID. Missing tree rings or narrow rings and lower MXD together with higher MID might indicate increased insect activity.  相似文献   

14.
A variety of methods have been used to make evolutionary inferences based on the spatial distribution of biological data, including reconstructing population history and detection of the geographic pattern of natural selection. This article provides an examination of geostatistical analysis, a method used widely in geology but which has not often been applied in biological anthropology. Geostatistical analysis begins with the examination of a variogram, a plot showing the relationship between a biological distance measure and the geographic distance between data points and which provides information on the extent and pattern of spatial correlation. The results of variogram analysis are used for interpolating values of unknown data points in order to construct a contour map, a process known as kriging. The methods of geostatistical analysis and discussion of potential problems are applied to a large data set of anthropometric measures for 197 populations in Ireland. The geostatistical analysis reveals two major sources of spatial variation. One pattern, seen for overall body and craniofacial size, shows an east-west cline most likely reflecting the combined effects of past population dispersal and settlement. The second pattern is seen for craniofacial height and shows an isolation by distance pattern reflecting rapid spatial changes in the midlands region of Ireland, perhaps attributable to the genetic impact of the Vikings. The correspondence of these results with other analyses of these data and the additional insights generated from variogram analysis and kriging illustrate the potential utility of geostatistical analysis in biological anthropology.  相似文献   

15.
Optimization of fermentation processes is a difficult task that relies on an understanding of the complex effects of processing inputs on productivity and quality outputs. Because of the complexity of these biological systems, traditional optimization methods utilizing mathematical models and statistically designed experiments are less effective, especially on a production scale. At the same time, information is being collected on a regular basis during the course of normal manufacturing and process development that is rarely fully utilized. We are developing an optimization method in which historical process data is used to train an artificial neural network for correlation of processing inputs and outputs. Subsequently, an optimization routine is used in conjunction with the trained neural network to find optimal processing conditions given the desired product characteristics and any constraints on inputs. Wine processing is being used as a case study for this work. Using data from wine produced in our pilot winery over the past 3 years, we have demonstrated that trained neural networks can be used successfully to predict the yeast-fermentation kinetics, as well as chemical and sensory properties of the finished wine, based solely on the properties of the grapes and the intended processing. To accomplish this, a hybrid neural network training method, Stop Training with Validation (STV), has been developed to find the most desirable neural network architecture and training level. As industrial historical data will not be evenly spaced over the entire possible search space, we have also investigated the ability of the trained neural networks to interpolate and extrapolate with data not used during training. Because a company will utilize its own existing process data for this method, the result of this work will be a general fermentation optimization method that can be applied to fermentation processes to improve quality and productivity.  相似文献   

16.
We discuss the problem of modelling survival/mortality and growth data that are skewed with excess zeros. This type of data is a common occurrence in biological and environmental studies. The method presented here allows us to utilize both the survival/mortality and growth data when both data sets contain a large proportion of zeros. The method consists of four stages. Firstly the original data is divided into two sets; one contains all the surviving organisms and the other all of the mortalities. Secondly we calculate the actual growth of the surviving organisms and of the mortalities. Thirdly we count the number of surviving organisms for which growth has occurred and the number where no growth occurred, and the same count procedure is carried out on the mortalities. Next we model the survival/mortality data and growth/no growth data using logistic regression, and separately model the growth data using an ordinary regression. Finally we combine the three models to estimate the expected growth for a specific set of values of the explanatory variables. If we used another statistical method that did not involve the dead mussels or the ones with no growth, some of the information provided by these mussels would be lost. However, using the method we propose, all of the data collected are used to achieve an optimal estimation of the mussel growth. A case study of survival and growth of blue mussels (Mytilus galloprovincialis) and ribbed mussels (Aulacomya atra maoriana) trans-located from their natural distribution to different depths and sites along the axis of Doubtful Sound, New Zealand, is used for illustration.  相似文献   

17.
Restriction endonuclease analysis of mtDNA was used to examine the genetic relatedness of several geographically separated isolines of the Drosophila mercatorum subgroup. In addition, we examined the temporal and spatial distribution of two mtDNA restriction site polymorphisms produced by the enzymes BstEII and BstNI at a single locality--Kamuela, Hawaii. Due to small sample sizes of some collections and the undesirable dependance of the estimation of polymorphism frequency on its variance, an arcsin square root transformation of the frequency data was used. We also use an Fst estimator of our transformed frequencies to demonstrate considerable spatial and temporal differentiation within the Kamuela population. In contrast, isozyme data from the same population reveals no pattern of differentiation. The temporal and geographic heterogeneity and population subdivision detected with mtDNA analysis also is consistent with the known dispersal behavior and ecological constraints of this species. The mtDNA data in conjunction with the isozyme data show that the population structure of the Kamuela D. mercatorum is close to the boundary line separating panmixia from subdivision, a conclusion that could not be made from isozyme data alone.  相似文献   

18.
J Wang  HC Fan  B Behr  SR Quake 《Cell》2012,150(2):402-412
Meiotic recombination and de novo mutation are the two main contributions toward gamete genome diversity, and many questions remain about how an individual human's genome is edited by these two processes. Here, we describe a high-throughput method for single-cell whole-genome analysis that was used to measure the genomic diversity in one individual's gamete genomes. A microfluidic system was used for highly parallel sample processing and to minimize nonspecific amplification. High-density genotyping results from 91 single cells were used to create a personal recombination map, which was consistent with population-wide data at low resolution but revealed significant differences from pedigree data at higher resolution. We used the data to test for meiotic drive and found evidence for gene conversion. High-throughput sequencing on 31 single cells was used to measure the frequency of large-scale genome instability, and deeper sequencing of eight single cells revealed de novo mutation rates with distinct characteristics.  相似文献   

19.
Cover-abundance estimates are commonly employed in phytosociological investigations to record the performance of species. Because the coded values are on an ordinal scale of measure, various authors have suggested that some transformation is necessary before such values can be used for classification and ordination. However, it is not clear that transformation is a sufficient treatment, and it would seem preferable to use ordinal data directly. In this paper we examine such direct use of partial rankings and show that several dissimilarity measures can be defined for this case without invoking any transformations. They include dissimilarity measures associated with various rank correlation measures and with distances between strings; all the measure are variant forms of Hausdorf's interset distance. Certain other kinds of data, such as those employing dominant and subdominant species and the dry-weight-rank estimation of biomass, are also on an ordinal scale and could be analysed using similar techniques.To illustrate the approach, a string dissimilarity measure is used to analyse a set of data from Slovakian grasslands which appear to reflect a simple gradient. The original data were recorded with 10 classes of performance and are analysed using hierarchical and nondeterministic, overlapping, classifications.  相似文献   

20.
Palm-Pitviper (Bothriechis) Phylogeny, mtDNA, and Consilience   总被引:1,自引:0,他引:1  
The phylogeny of the neotropical palm-pitviper genus Bothriechis has been previously inferred from morphology and allozymes. These nuclear-based data sets were found to be congruent and also consilient with the geologic history of the region. We present mtDNA sequence data as an additional data set in the inference of Bothriechis phylogeny and analyze it separately and combined with previous data. The mtDNA phylogeny is incongruent with the nuclear data sets. Based on a number of factors, we hypothesize that the incongruence is due to both mtDNA introgression and lineage sorting. We argue that mtDNA represents extrinsic data and as such should be used as a consilient data set.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号