首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
ABSTRACT Most ecologists use statistical methods as their main analytical tools when analyzing data to identify relationships between a response and a set of predictors; thus, they treat all analyses as hypothesis tests or exercises in parameter estimation. However, little or no prior knowledge about a system can lead to creation of a statistical model or models that do not accurately describe major sources of variation in the response variable. We suggest that under such circumstances data mining is more appropriate for analysis. In this paper we 1) present the distinctions between data-mining (usually exploratory) analyses and parametric statistical (confirmatory) analyses, 2) illustrate 3 strengths of data-mining tools for generating hypotheses from data, and 3) suggest useful ways in which data mining and statistical analyses can be integrated into a thorough analysis of data to facilitate rapid creation of accurate models and to guide further research.  相似文献   

3.
DNA microarray data are affected by variations from a number of sources. Before these data can be used to infer biological information, the extent of these variations must be assessed. Here we describe an open source software package, lcDNA, that provides tools for filtering, normalizing, and assessing the statistical significance of cDNA microarray data. The program employs a hierarchical Bayesian model and Markov Chain Monte Carlo simulation to estimate gene-specific confidence intervals for each gene in a cDNA microarray data set. This program is designed to perform these primary analytical operations on data from two-channel spotted, or in situ synthesized, DNA microarrays.  相似文献   

4.
Many sequenced genes are mainly annotated through automatic transfer of annotation from similar sequences. Manual comparison of results or intermediate results from different tools can help avoid wrong annotations and give hints to the function of a gene even if none of the automated tools could return any result. AFAWE simplifies the task of manual functional annotation by running different tools and workflows for automatic function prediction and displaying the results in a way that facilitates comparison. Because all programs are executed as web services, AFAWE is easily extensible and can directly query primary databases, thereby always using the most up-to-date data sources. Visual filters help to distinguish trustworthy results from non-significant results. Furthermore, an interface to add detailed manual annotation to each gene is provided, which can be displayed to other users.  相似文献   

5.
6.
&#  &#  &#  &#  &#  &#  &#  &#  &#  &#  &#  &#  &#  &#  &#  &#  &#  &#  &#  &#  &#  &# 《水生生物学报》2015,39(6):1076-1084
为探究传统中药玉屏风散在水产养殖上的生物学功效,在基础饲料中分别添加2.5%、5%和10%的玉屏风散和1%的芪参散配成实验饲料,饲喂实验鱼2周,然后分别观察两种中药对罗非鱼(体重约220 g)的生长、非特异性免疫及免疫相关基因表达的影响以及对草鱼(体重约20 g)免疫保护率的影响。结果显示:各中药组均提高了罗非鱼增重率, 10%玉屏风散组效果最显著;各实验组中罗非鱼肝、肾、脾体指数和各血液学指标与对照组无显著差异;各中药组罗非鱼谷草转氨酶(AST)和谷丙转氨酶(ALT)含量均不高于空白对照组; 2.5%、10%玉屏风散组和1%芪参散组碱性磷酸酶(AKP)含量均高于对照组(P0.05);各药物组罗非鱼溶菌酶活性均高于对照组,且2.5%玉屏风散组与对照组差异显著(P0.05);玉屏风散各剂量组罗非鱼呼吸暴发活性均高于对照组(P0.05);罗非鱼肝脏中热休克蛋白70(HSP70)在各实验组中表达量均上调,肾脏中HSP70在2.5%和5%玉屏风散组表达量上调,脾脏中HSP70在各实验组中表达量均下调;肝脏中转化生长因子受体TGFRⅢ在2.5%玉屏风散组表达量上调,肾脏中TGFRⅢ在2.5%玉屏风散组和1%芪参散组表达量上调,脾脏中TGFRⅢ在各实验组中表达量均上调;玉屏风散各剂量组能够提高草鱼对嗜水气单胞菌的抵抗能力,表现为死亡率降低和进入血液的细菌减少,其中5%玉屏风散组效果最佳。实验表明,玉屏风散能够有效促进罗非鱼生长,对其非特异性免疫指标以及免疫相关基因表达均有不同程度的促进作用,并能够有效地提高草鱼的免疫保护率。    相似文献   

7.
8.
This review focuses on using microarray data on a clonal osteoblast cell model to demonstrate how various current and future bioinformatic tools can be used to understand, at a more global or comprehensible level, how cells grow and differentiate. In this example, BMP2 was used to stimulate growth and differentiation of osteoblast to a mineralized matrix. A discussion is included on various methods for clustering gene expression data, statistical evaluation of data, and various new tools that can be used to derive deeper insight into a particular biological problem. How these tools can be obtained is also discussed. New tools for the biologists to compare their datasets with others, as well as examples of future bioinformatic tools that can be used for developing gene networks and pathways for a given set of data are included and discussed.  相似文献   

9.
The Bioinformatics Resource Manager (BRM) is a software environment that provides the user with data management, retrieval and integration capabilities. Designed in collaboration with biologists, BRM simplifies mundane analysis tasks of merging microarray and proteomic data across platforms, facilitates integration of users' data with functional annotation and interaction data from public sources and provides connectivity to visual analytic tools through reformatting of the data for easy import or dynamic launching capability. BRM is developed using Java and other open-source technologies for free distribution. AVAILABILITY: BRM, sample data sets and a user manual can be downloaded from http://www.sysbio.org/dataresources/brm.stm.  相似文献   

10.
Expanding digital data sources, including social media, online news articles and blogs, provide an opportunity to understand better the context and intensity of human-nature interactions, such as wildlife exploitation. However, online searches encompassing large taxonomic groups can generate vast datasets, which can be overwhelming to filter for relevant content without the use of automated tools. The variety of machine learning models available to researchers, and the need for manually labelled training data with an even balance of labels, can make applying these tools challenging. Here, we implement and evaluate a hierarchical text classification pipeline which brings together three binary classification tasks with increasingly specific relevancy criteria. Crucially, the hierarchical approach facilitates the filtering and structuring of a large dataset, of which relevant sources make up a small proportion. Using this pipeline, we also investigate how the accuracy with which text classifiers identify relevant and irrelevant texts is influenced by the use of different models, training datasets, and the classification task. To evaluate our methods, we collected data from Facebook, Twitter, Google and Bing search engines, with the aim of identifying sources documenting the hunting and persecution of bats (Chiroptera). Overall, the ‘state-of-the-art’ transformer-based models were able to identify relevant texts with an average accuracy of 90%, with some classifiers achieving accuracy of >95%. Whilst this demonstrates that application of more advanced models can lead to improved accuracy, comparable performance was achieved by simpler models when applied to longer documents and less ambiguous classification tasks. Hence, the benefits from using more computationally expensive models are dependent on the classification context. We also found that stratification of training data, according to the presence of key search terms, improved classification accuracy for less frequent topics within datasets, and therefore improves the applicability of classifiers to future data collection. Overall, whilst our findings reinforce the usefulness of automated tools for facilitating online analyses in conservation and ecology, they also highlight that the effectiveness and appropriateness of such tools is determined by the nature and volume of data collected, the complexity of the classification task, and the computational resources available to researchers.  相似文献   

11.
Species distribution models are popular and widely applied ecological tools. Recent increases in data availability have led to opportunities and challenges for species distribution modelling. Each data source has different qualities, determined by how it was collected. As several data sources can inform on a single species, ecologists have often analysed just one of the data sources, but this loses information, as some data sources are discarded. Integrated distribution models (IDMs) were developed to enable inclusion of multiple datasets in a single model, whilst accounting for different data collection protocols. This is advantageous because it allows efficient use of all data available, can improve estimation and account for biases in data collection. What is not yet known is when integrating different data sources does not bring advantages. Here, for the first time, we explore the potential limits of IDMs using a simulation study integrating a spatially biased, opportunistic, presence-only dataset with a structured, presence–absence dataset. We explore four scenarios based on real ecological problems; small sample sizes, low levels of detection probability, correlations between covariates and a lack of knowledge of the drivers of bias in data collection. For each scenario we ask; do we see improvements in parameter estimation or the accuracy of spatial pattern prediction in the IDM versus modelling either data source alone? We found integration alone was unable to correct for spatial bias in presence-only data. Including a covariate to explain bias or adding a flexible spatial term improved IDM performance beyond single dataset models, with the models including a flexible spatial term producing the most accurate and robust estimates. Increasing the sample size of presence–absence data and having no correlated covariates also improved estimation. These results demonstrate under which conditions integrated models provide benefits over modelling single data sources.  相似文献   

12.
13.
Despite substantial research activity on bioreactor design and experiments, there are very few reports of modelling tools that can be used to generate predictive models describing how bioreactor parameters affect performance. New developments in mathematics, such as sparse Bayesian feature selection methods and nonlinear model-free modelling regression methods, offer considerable promise for modelling diverse types of data. The utility of these mathematical tools in stem cell biology are demonstrated by analysis of a large set of bioreactor data derived from the literature. In spite of the diversity of the data sources, and the inherent difficulty in representing bioreactor variables, these modelling methods were able to develop robust, quantitative, predictive models. These models relate bioreactor operational parameters to the degree of expansion of haematopoietic stem cells or their progenitors, and also identify the bioreactor variables that are most likely to affect performance across many experiments. These methods show substantial promise in assisting the design and optimisation of stem cell bioreactors.  相似文献   

14.

Background

A large number of gene prediction programs for the human genome exist. These annotation tools use a variety of methods and data sources. In the recent ENCODE genome annotation assessment project (EGASP), some of the most commonly used and recently developed gene-prediction programs were systematically evaluated and compared on test data from the human genome. AUGUSTUS was among the tools that were tested in this project.

Results

AUGUSTUS can be used as an ab initio program, that is, as a program that uses only one single genomic sequence as input information. In addition, it is able to combine information from the genomic sequence under study with external hints from various sources of information. For EGASP, we used genomic sequence alignments as well as alignments to expressed sequence tags (ESTs) and protein sequences as additional sources of information. Within the category of ab initio programs AUGUSTUS predicted significantly more genes correctly than any other ab initio program. At the same time it predicted the smallest number of false positive genes and the smallest number of false positive exons among all ab initio programs. The accuracy of AUGUSTUS could be further improved when additional extrinsic data, such as alignments to EST, protein and/or genomic sequences, was taken into account.

Conclusion

AUGUSTUS turned out to be the most accurate ab initio gene finder among the tested tools. Moreover it is very flexible because it can take information from several sources simultaneously into consideration.
  相似文献   

15.
16.
Plant genome databases play an important role in the archiving and dissemination of data arising from the international genome projects. Recent developments in bioinformatics, such as new software tools, programming languages and standards, have produced better access across the Internet to the data held within them.An increasing emphasis is placed on data analysis and indeed many resources now provide tools allied to the databases, to aid in the analysis and interpretation of the data. However, a considerable wealth of information lies untapped by considering the databases as single entities and will only be exploited by linking them with a wide range of data sources. Data from research programs such as comparative mapping and germplasm studies may be used as tools, to gain additional knowledge but without additional experimentation. To date, the current plant genome databases are not yet linked comprehensively with each other or with these additional resources, although they are clearly moving toward this. Here, the current wealth of public plant genome databases is reviewed, together with an overview of initiatives underway to bind them to form a single plant genome infrastructure.  相似文献   

17.
High-throughput technologies produce massive amounts of data. However, individual methods yield data specific to the technique used and biological setup. The integration of such diverse data is necessary for the qualitative analysis of information relevant to hypotheses or discoveries. It is often useful to integrate these datasets using pathways and protein interaction networks to get a broader view of the experiment. The resulting network needs to be able to focus on either the large-scale picture or on the more detailed small-scale subsets, depending on the research question and goals. In this tutorial, we illustrate a workflow useful to integrate, analyze, and visualize data from different sources, and highlight important features of tools to support such analyses.  相似文献   

18.
We have created a new Java-based integrated computational environment for the exploration of genomic data, called Bluejay. The system is capable of using almost any XML file related to genomic data. Non-XML data sources can be accessed via a proxy server. Bluejay has several features, which are new to Bioinformatics, including an unlimited semantic zoom capability, coupled with Scalable Vector Graphics (SVG) outputs; an implementation of the XLink standard, which features access to MAGPIE Genecards as well as any BioMOBY service accessible over the Internet; and the integration of gene chip analysis tools with the functional assignments. The system can be used as a signed web applet, Web Start, and a local stand-alone application, with or without connection to the Internet. It is available free of charge and as open source via http://bluejay.ucalgary.ca.  相似文献   

19.
Production of official statistics frequently requires expert judgement to evaluate and reconcile data of unknown and varying quality from multiple and potentially conflicting sources. Moreover, exceptional events may be difficult to incorporate in modelled estimates. Computational logic provides a methodology and tools for incorporating analyst''s judgement, integrating multiple data sources and modelling methods, ensuring transparency and replicability, and making documentation computationally accessible. Representations using computational logic can be implemented in a variety of computer-based languages for automated production. Computational logic complements standard mathematical and statistical techniques and extends the flexibility of mathematical and statistical modelling. A basic overview of computational logic is presented and its application to official statistics is illustrated with the WHO & UNICEF estimates of national immunization coverage.  相似文献   

20.
Modeling and simulation: tools for metabolic engineering.   总被引:7,自引:0,他引:7  
Mathematical modeling is one of the key methodologies of metabolic engineering. Based on a given metabolic model different computational tools for the simulation, data evaluation, systems analysis, prediction, design and optimization of metabolic systems have been developed. The currently used metabolic modeling approaches can be subdivided into structural models, stoichiometric models, carbon flux models, stationary and nonstationary mechanistic models and models with gene regulation. However, the power of a model strongly depends on its basic modeling assumptions, the simplifications made and the data sources used. Model validation turns out to be particularly difficult for metabolic systems. The different modeling approaches are critically reviewed with respect to their potential and benefits for the metabolic engineering cycle. Several tools that have emerged from the different modeling approaches including structural pathway synthesis, stoichiometric pathway analysis, metabolic flux analysis, metabolic control analysis, optimization of regulatory architectures and the evaluation of rapid sampling experiments are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号