首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Parallel file systems have been developed in recent years to ease the I/O bottleneck of high-end computing system. These advanced file systems offer several data layout strategies in order to meet the performance goals of specific I/O workloads. However, while a layout policy may perform well on some I/O workload, it may not perform as well for another. Peak I/O performance is rarely achieved due to the complex data access patterns. Data access is application dependent. In this study, a cost-intelligent data access strategy based on the application-specific optimization principle is proposed. This strategy improves the I/O performance of parallel file systems. We first present examples to illustrate the difference of performance under different data layouts. By developing a cost model which estimates the completion time of data accesses in various data layouts, the layout can better match the application. Static layout optimization can be used for applications with dominant data access patterns, and dynamic layout selection with hybrid replications can be used for applications with complex I/O patterns. Theoretical analysis and experimental testing have been conducted to verify the proposed cost-intelligent layout approach. Analytical and experimental results show that the proposed cost model is effective and the application-specific data layout approach can provide up to a 74% performance improvement for data-intensive applications.  相似文献   

2.
Biclustering is an important tool in microarray analysis when only a subset of genes co-regulates in a subset of conditions. Different from standard clustering analyses, biclustering performs simultaneous classification in both gene and condition directions in a microarray data matrix. However, the biclustering problem is inherently intractable and computationally complex. In this paper, we present a new biclustering algorithm based on the geometrical viewpoint of coherent gene expression profiles. In this method, we perform pattern identification based on the Hough transform in a column-pair space. The algorithm is especially suitable for the biclustering analysis of large-scale microarray data. Our studies show that the approach can discover significant biclusters with respect to the increased noise level and regulatory complexity. Furthermore, we also test the ability of our method to locate biologically verifiable biclusters within an annotated set of genes.  相似文献   

3.
4.

Background

The tools and techniques used in morphometrics have always aimed to transform the physical shape of an object into a concise set of numerical data for mathematical analysis. The advent of landmark-based morphometrics opened new avenues of research, but these methods are not without drawbacks. The time investment required of trained individuals to accurately landmark a data set is significant, and the reliance on readily-identifiable physical features can hamper research efforts. This is especially true of those investigating smooth or featureless surfaces.

Methods

In this paper, we present a new method to perform this transformation for data obtained from high-resolution scanning technology. This method uses surface scans, instead of landmarks, to calculate a shape difference metric analogous to Procrustes distance and perform superimposition. This is accomplished by building upon and extending the Iterative Closest Point algorithm. We also explore some new ways this data can be used; for example, we can calculate an averaged surface directly and visualize point-wise shape information over this surface. Finally, we briefly demonstrate this method on a set of primate skulls and compare the results of the new methodology with traditional geometric morphometric analysis.  相似文献   

5.
Combined analysis of multiple, large datasets is a common objective in the health- and biosciences. Existing methods tend to require researchers to physically bring data together in one place or follow an analysis plan and share results. Developed over the last 10 years, the DataSHIELD platform is a collection of R packages that reduce the challenges of these methods. These include ethico-legal constraints which limit researchers’ ability to physically bring data together and the analytical inflexibility associated with conventional approaches to sharing results. The key feature of DataSHIELD is that data from research studies stay on a server at each of the institutions that are responsible for the data. Each institution has control over who can access their data. The platform allows an analyst to pass commands to each server and the analyst receives results that do not disclose the individual-level data of any study participants. DataSHIELD uses Opal which is a data integration system used by epidemiological studies and developed by the OBiBa open source project in the domain of bioinformatics. However, until now the analysis of big data with DataSHIELD has been limited by the storage formats available in Opal and the analysis capabilities available in the DataSHIELD R packages. We present a new architecture (“resources”) for DataSHIELD and Opal to allow large, complex datasets to be used at their original location, in their original format and with external computing facilities. We provide some real big data analysis examples in genomics and geospatial projects. For genomic data analyses, we also illustrate how to extend the resources concept to address specific big data infrastructures such as GA4GH or EGA, and make use of shell commands. Our new infrastructure will help researchers to perform data analyses in a privacy-protected way from existing data sharing initiatives or projects. To help researchers use this framework, we describe selected packages and present an online book (https://isglobal-brge.github.io/resource_bookdown).  相似文献   

6.
The hierarchical metaregression (HMR) approach is a multiparameter Bayesian approach for meta‐analysis, which generalizes the standard mixed effects models by explicitly modeling the data collection process in the meta‐analysis. The HMR allows to investigate the potential external validity of experimental results as well as to assess the internal validity of the studies included in a systematic review. The HMR automatically identifies studies presenting conflicting evidence and it downweights their influence in the meta‐analysis. In addition, the HMR allows to perform cross‐evidence synthesis, which combines aggregated results from randomized controlled trials to predict effectiveness in a single‐arm observational study with individual participant data (IPD). In this paper, we evaluate the HMR approach using simulated data examples. We present a new real case study in diabetes research, along with a new R package called jarbes (just a rather Bayesian evidence synthesis), which automatizes the complex computations involved in the HMR.  相似文献   

7.
Many characteristics of sensorimotor control can be explained by models based on optimization and optimal control theories. However, most of the previous models assume that the central nervous system has access to the precise knowledge of the sensorimotor system and its interacting environment. This viewpoint is difficult to be justified theoretically and has not been convincingly validated by experiments. To address this problem, this paper presents a new computational mechanism for sensorimotor control from a perspective of adaptive dynamic programming (ADP), which shares some features of reinforcement learning. The ADP-based model for sensorimotor control suggests that a command signal for the human movement is derived directly from the real-time sensory data, without the need to identify the system dynamics. An iterative learning scheme based on the proposed ADP theory is developed, along with rigorous convergence analysis. Interestingly, the computational model as advocated here is able to reproduce the motor learning behavior observed in experiments where a divergent force field or velocity-dependent force field was present. In addition, this modeling strategy provides a clear way to perform stability analysis of the overall system. Hence, we conjecture that human sensorimotor systems use an ADP-type mechanism to control movements and to achieve successful adaptation to uncertainties present in the environment.  相似文献   

8.
Ye Y  Zhong X  Zhang H 《BMC genetics》2005,6(Z1):S135
Genetic mechanisms underlying alcoholism are complex. Understanding the etiology of alcohol dependence and its comorbid conditions such as smoking is important because of the significant health concerns. In this report, we describe a method based on classification trees and deterministic forests for association studies to perform a genome-wide joint association analysis of alcoholism and smoking. This approach is used to analyze the single-nucleotide polymorphism data from the Collaborative Study on the Genetics of Alcoholism in the Genetic Analysis Workshop 14. Our analysis reaffirmed the importance of sex difference in alcoholism. Our analysis also identified genes that were reported in other studies of alcoholism and identified new genes or single-nucleotide polymorphisms that can be useful candidates for future studies.  相似文献   

9.
This paper presents a method of performing model-free LOD-score based linkage analysis on quantitative traits. It is implemented in the QMFLINK program. The method is used to perform a genome screen on the Framingham Heart Study data. A number of markers that show some support for linkage in our study coincide substantially with those implicated in other linkage studies of hypertension. Although the new method needs further testing on additional real and simulated data sets we can already say that it is straightforward to apply and may offer a useful complementary approach to previously available methods for the linkage analysis of quantitative traits.  相似文献   

10.

Background  

Genomewide association studies have resulted in a great many genomic regions that are likely to harbor disease genes. Thorough interrogation of these specific regions is the logical next step, including regional haplotype studies to identify risk haplotypes upon which the underlying critical variants lie. Pedigrees ascertained for disease can be powerful for genetic analysis due to the cases being enriched for genetic disease. Here we present a Monte Carlo based method to perform haplotype association analysis. Our method, hapMC, allows for the analysis of full-length and sub-haplotypes, including imputation of missing data, in resources of nuclear families, general pedigrees, case-control data or mixtures thereof. Both traditional association statistics and transmission/disequilibrium statistics can be performed. The method includes a phasing algorithm that can be used in large pedigrees and optional use of pseudocontrols.  相似文献   

11.
Rapid advances in molecular genetics push the need for efficient data analysis. Advanced algorithms are necessary for extracting all possible information from large experimental data sets. We present a general linear algebra framework for quantitative trait loci (QTL) mapping, using both linear regression and maximum likelihood estimation. The formulation simplifies future comparisons between and theoretical analyses of the methods. We show how the common structure of QTL analysis models can be used to improve the kernel algorithms, drastically reducing the computational effort while retaining the original analysis results. We have evaluated our new algorithms on data sets originating from two large F(2) populations of domestic animals. Using an updating approach, we show that 1-3 orders of magnitude reduction in computational demand can be achieved for matrix factorizations. For interval-mapping/composite-interval-mapping settings using a maximum likelihood model, we also show how to use the original EM algorithm instead of the ECM approximation, significantly improving the convergence and further reducing the computational time. The algorithmic improvements makes it feasible to perform analyses which have previously been deemed impractical or even impossible. For example, using the new algorithms, it is reasonable to perform permutation testing using exhaustive search on populations of 200 individuals using an epistatic two-QTL model.  相似文献   

12.
Gliomas are the most common and malignant intracranial tumors in adults. Recent studies have revealed the significance of functional genomics for glioma pathophysiological studies and treatments. However, access to comprehensive genomic data and analytical platforms is often limited. Here, we developed the Chinese Glioma Genome Atlas(CGGA), a user-friendly data portal for the storage and interactive exploration of cross-omics data, including nearly 2000 primary and recurrent glioma samples from Chinese cohort. Currently, open access is provided to whole-exome sequencing data(286 samples), mRNA sequencing(1018 samples) and microarray data(301 samples), DNA methylation microarray data(159 samples), and microRNA microarray data(198 samples), and to detailed clinical information(age, gender, chemoradiotherapy status,WHO grade, histological type, critical molecular pathological information, and survival data). In addition, we have developed several tools for users to analyze the mutation profiles,mRNA/microRNA expression, and DNA methylation profiles, and to perform survival and gene correlation analyses of specific glioma subtypes. This database removes the barriers for researchers,providing rapid and convenient access to high-quality functional genomic data resources for biological studies and clinical applications. CGGA is available at http://www.cgga.org.cn.  相似文献   

13.
Single-nucleotide polymorphisms (SNPs) are increasingly used as genetic markers. Although a high number of SNP-genotyping techniques have been described, most techniques still have low throughput or require major investments. For laboratories that have access to an automated sequencer, a single-base extension (SBE) assay can be implemented using the ABI SNaPshot™ kit. Here we present a modified protocol comprising multiplex template generation, multiplex SBE reaction, and multiplex sample analysis on a gel-based sequencer such as the ABI 377. These sequencers run on a Macintosh platform, but on this platform the software available for analysis of data from the ABI 377 has limitations. First, analysis of the size standard included with the kit is not facilitated. Therefore a new size standard was designed. Second, using Genotype (ABI), the analysis of the data is very tedious and time consuming. To enable automated batch analysis of 96 samples, with 10 SNPs each, we developed SNPtyper. This is a spreadsheet-based tool that uses the data from Genotyper and offers the user a convenient interface to set parameters required for correct allele calling. In conclusion, the method described will enable any lab having access to an ABI sequencer to genotype up to 1000 SNPs per day for a single experimenter, without investing in new equipment.  相似文献   

14.
The impact factor of scientific reviews, calculated by the Institute for Scientific Information (ISI), is increasingly used to evaluate the performance of scientists and programmes. Bibliometric indicators, originally designed for other purposes than individual evaluation, are very useful tools provided their interpretation is not extrapolated beyond their limits of validity. Here we present a critical analysis of appropriate uses and misuses of bibliometric data based on case studies. We also outline anticipated consequences of new information technologies, such as electronic journals or open access schemes, on the mode of science production, evaluation and dissemination in biomedical sciences.  相似文献   

15.
affy--analysis of Affymetrix GeneChip data at the probe level   总被引:32,自引:0,他引:32  
MOTIVATION: The processing of the Affymetrix GeneChip data has been a recent focus for data analysts. Alternatives to the original procedure have been proposed and some of these new methods are widely used. RESULTS: The affy package is an R package of functions and classes for the analysis of oligonucleotide arrays manufactured by Affymetrix. The package is currently in its second release, affy provides the user with extreme flexibility when carrying out an analysis and make it possible to access and manipulate probe intensity data. In this paper, we present the main classes and functions in the package and demonstrate how they can be used to process probe-level data. We also demonstrate the importance of probe-level analysis when using the Affymetrix GeneChip platform.  相似文献   

16.
Single-nucleotide polymorphisms (SNPs) are increasingly used as genetic markers. Although a high number of SNP-genotyping techniques have been described, most techniques still have low throughput or require major investments. For laboratories that have access to an automated sequencer, a single-base extension (SBE) assay can be implemented using the ABI SNaPshot trade mark kit. Here we present a modified protocol comprising multiplex template generation, multiplex SBE reaction, and multiplex sample analysis on a gel-based sequencer such as the ABI 377. These sequencers run on a Macintosh platform, but on this platform the software available for analysis of data from the ABI 377 has limitations. First, analysis of the size standard included with the kit is not facilitated. Therefore a new size standard was designed. Second, using Genotyper (ABI), the analysis of the data is very tedious and time consuming. To enable automated batch analysis of 96 samples, with 10 SNPs each, we developed SNPtyper. This is a spreadsheet-based tool that uses the data from Genotyper and offers the user a convenient interface to set parameters required for correct allele calling. In conclusion, the method described will enable any lab having access to an ABI sequencer to genotype up to 1000 SNPs per day for a single experimenter, without investing in new equipment.  相似文献   

17.
The SAS system provides biologists with a flexible, easy touse software package for data analysis. Through a combinationof data management tools, a wide variety of pre-programmed proceduresfor sorting, graphing, and statistical analysis and a sophisticatedprogramming language, SAS software can perform all analyticalneeds for most problems. The recent availability of SAS softwareon mainframes other than IBM, and more recently on the microcomputer,means that most scientists can have access to the software.In this review we discuss the structure of the SAS languageand demonstrate its power in the analysis of biological problems.Although to a lesser extent now than originally, the SAS systemis statistically oriented and a working knowledge of statisticsis recommended before using its statistical capabilities. However,all biologists will find its data management and summarizationcapabilities very useful. Received on September 9, 1987; accepted on September 17, 1987  相似文献   

18.
19.
The key benefits of Lab-on-a-Chip technology are substantial time savings via an automation of lab processes, and a reduction in sample and reagent volumes required to perform analysis. In this article we present a new implementation of cell assays on disposable microfluidic chips. The applications are based on the controlled movement of cells by pressure-driven flow in microfluidic channels and two-color fluorescence detection of single cells. This new technology allows for simple flow cytometric studies of cells in a microfluidic chip-based system. In addition, we developed staining procedures that work “on-chip,” thus eliminating time-consuming washing steps. Cells and staining-reagents are loaded directly onto the microfluidic chip and analysis can start after a short incubation time. These procedures require only a fraction of the staining reagents generally needed for flow cytometry and only 30,000 cells per sample, demonstrating the advantages of microfluidic technology. The specific advantage of an on-chip staining reaction is the amount of time, cells, and reagents saved, which is of great importance when working with limited numbers of cells, e.g., primary cells or when needing to perform routine tests of cell cultures as a quality control step. Applications of this technology are antibody staining of proteins and determination of cell transfection efficiency by GFP expression. Results obtained with microfluidic chips, using standard cell lines and primary cells, show good correlation with data obtained using a conventional flow cytometer.  相似文献   

20.
The incursion of High-Throughput Sequencing (HTS) in environmental microbiology brings unique opportunities and challenges. HTS now allows a high-resolution exploration of the vast taxonomic and metabolic diversity present in the microbial world, which can provide an exceptional insight on global ecosystem functioning, ecological processes and evolution. This exploration has also economic potential, as we will have access to the evolutionary innovation present in microbial metabolisms, which could be used for biotechnological development. HTS is also challenging the research community, and the current bottleneck is present in the data analysis side. At the moment, researchers are in a sequence data deluge, with sequencing throughput advancing faster than the computer power needed for data analysis. However, new tools and approaches are being developed constantly and the whole process could be depicted as a fast co-evolution between sequencing technology, informatics and microbiologists. In this work, we examine the most popular and recently commercialized HTS platforms as well as bioinformatics methods for data handling and analysis used in microbial metagenomics. This non-exhaustive review is intended to serve as a broad state-of-the-art guide to researchers expanding into this rapidly evolving field.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号