共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Principal component analysis (PCA) is routinely used to analyze genome-wide single-nucleotide polymorphism (SNP) data, for detecting population structure and potential outliers. However, the size of SNP datasets has increased immensely in recent years and PCA of large datasets has become a time consuming task. We have developed flashpca, a highly efficient PCA implementation based on randomized algorithms, which delivers identical accuracy in extracting the top principal components compared with existing tools, in substantially less time. We demonstrate the utility of flashpca on both HapMap3 and on a large Immunochip dataset. For the latter, flashpca performed PCA of 15,000 individuals up to 125 times faster than existing tools, with identical results, and PCA of 150,000 individuals using flashpca completed in 4 hours. The increasing size of SNP datasets will make tools such as flashpca essential as traditional approaches will not adequately scale. This approach will also help to scale other applications that leverage PCA or eigen-decomposition to substantially larger datasets. 相似文献
3.
We propose that a comparative approach to well-being could be the key to understanding ‘the good life.’ Inspired by current theories of human well-being and animal welfare, we designed a novel test of exploration behavior. Environmentally and socially enriched Long-Evans female rats (N = 60) were trained in four simultaneously presented arms of an eight-arm radial-maze. They learned to expect successes in two arms and failures in the other two. After training, 20 animals remained in enriched housing (enrichment-maintenance) while 40 animals were re-housed in standard, isolated conditions (enrichment-removal). Two weeks later, all animals were re-tested in the maze, initially with access to the four familiar arms only. In the final minute, they also had access to the unfamiliar ambiguous-arms. Though both groups showed significant interest in the ambiguous-arms (P<.0001), the enrichment-maintenance group showed a significantly greater exploratory tendency (P<.01) despite having equivalent levels of activity (P>.3). Thus, we show not only that rats will abandon known rewards and incur risk in order to explore, indicating that exploration is valuable in its own right, but also that individuals with (vs. without) enriched housing conditions are more likely to engage in such exploratory behavior. This novel test contributes to the body of knowledge examining the importance of exploration in humans and other animals; implications for animal welfare and human well-being are discussed. 相似文献
4.
5.
Currently there is great interest in detecting associations between complex traits and rare variants. In this report, we describe Variant Association Tools (VAT) and the VAT pipeline, which implements best practices for rare-variant association studies. Highlights of VAT include variant-site and call-level quality control (QC), summary statistics, phenotype- and genotype-based sample selection, variant annotation, selection of variants for association analysis, and a collection of rare-variant association methods for analyzing qualitative and quantitative traits. The association testing framework for VAT is regression based, which readily allows for flexible construction of association models with multiple covariates and weighting themes based on allele frequencies or predicted functionality. Additionally, pathway analyses, conditional analyses, and analyses of gene-gene and gene-environment interactions can be performed. VAT is capable of rapidly scanning through data by using multi-process computation, adaptive permutation, and simultaneously conducting association analysis via multiple methods. Results are available in text or graphic file formats and additionally can be output to relational databases for further annotation and filtering. An interface to R language also facilitates user implementation of novel association methods. The VAT''s data QC and association-analysis pipeline can be applied to sequence, imputed, and genotyping array, e.g., “exome chip,” data, providing a reliable and reproducible computational environment in which to analyze small- to large-scale studies with data from the latest genotyping and sequencing technologies. Application of the VAT pipeline is demonstrated through analysis of data from the 1000 Genomes project. 相似文献
6.
7.
Minchao Wang Wu Zhang Wang Ding Dongbo Dai Huiran Zhang Hao Xie Luonan Chen Yike Guo Jiang Xie 《PloS one》2014,9(4)
Backgrounds
Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs.Methods
Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes.Result
A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies. 相似文献8.
Daniel H. Huson Sina Beier Isabell Flade Anna Górska Mohamed El-Hadidi Suparna Mitra Hans-Joachim Ruscheweyh Rewati Tappu 《PLoS computational biology》2016,12(6)
There is increasing interest in employing shotgun sequencing, rather than amplicon sequencing, to analyze microbiome samples. Typical projects may involve hundreds of samples and billions of sequencing reads. The comparison of such samples against a protein reference database generates billions of alignments and the analysis of such data is computationally challenging. To address this, we have substantially rewritten and extended our widely-used microbiome analysis tool MEGAN so as to facilitate the interactive analysis of the taxonomic and functional content of very large microbiome datasets. Other new features include a functional classifier called InterPro2GO, gene-centric read assembly, principal coordinate analysis of taxonomy and function, and support for metadata. The new program is called MEGAN Community Edition (CE) and is open source. By integrating MEGAN CE with our high-throughput DNA-to-protein alignment tool DIAMOND and by providing a new program MeganServer that allows access to metagenome analysis files hosted on a server, we provide a straightforward, yet powerful and complete pipeline for the analysis of metagenome shotgun sequences. We illustrate how to perform a full-scale computational analysis of a metagenomic sequencing project, involving 12 samples and 800 million reads, in less than three days on a single server. All source code is available here: https://github.com/danielhuson/megan-ce 相似文献
9.
Biomedical research becomes increasingly interdisciplinary and collaborative in nature. Researchers need to efficiently and effectively collaborate and make decisions by meaningfully assembling, mining and analyzing available large-scale volumes of complex multi-faceted data residing in different sources. In line with related research directives revealing that, in spite of the recent advances in data mining and computational analysis, humans can easily detect patterns which computer algorithms may have difficulty in finding, this paper reports on the practical use of an innovative web-based collaboration support platform in a biomedical research context. Arguing that dealing with data-intensive and cognitively complex settings is not a technical problem alone, the proposed platform adopts a hybrid approach that builds on the synergy between machine and human intelligence to facilitate the underlying sense-making and decision making processes. User experience shows that the platform enables more informed and quicker decisions, by displaying the aggregated information according to their needs, while also exploiting the associated human intelligence. 相似文献
10.
Discriminant procedures are often used to classify data based on observed characteristics of the response variables. This paper discusses the validation techniques in the use of discriminant function approach. Numerical example is used to illustrate its application. 相似文献
11.
Perchlorate Chemistry: Implications for Analysis and Remediation 总被引:16,自引:0,他引:16
Edward T. Urbansky 《Bioremediation Journal》1998,2(2):81-95
Since the discovery of perchlorate in the ground and surface waters of several western states, there has been increasing interest in the health effects resulting from chronic exposure to low (parts per billion [ppb]) levels. With this concern has come a need to investigate technologies that might be used to remediate contaminated sites or to treat contaminated water to make it safe for drinking. Possible technologies include physical separation (precipitation, anion exchange, reverse osmosis, and electrodialysis), chemical and electrochemical reduction, and biological or biochemical reduction. A fairly unique combination of chemical and physical properties of perchlorate poses challenges to its analysis and reduction in the environment or in drinking water. The implications of these properties are discussed in terms of remediative or treatment strategies. Recent developments are also covered. 相似文献
12.
Antoni Margalida Martina Carrete Daniel Hegglin David Serrano Rafael Arenas José A. Donázar 《PloS one》2013,8(6)
After the quasi-extinction of much of the European vertebrate megafauna during the last few centuries, many reintroduction projects seek to restore decimated populations. However, the future of numerous species depends on the management scenarios of metapopulations where the flow of individuals can be critical to ensure their viability. This is the case of the bearded vulture Gypaetus barbatus, an Old World, large body-sized and long-lived scavenger living in mountain ranges. Although persecution in Western Europe restrained it to the Pyrenees, the species is nowadays present in other mountains thanks to reintroduction projects. We examined the movement patterns of pre-adult non-breeding individuals born in the wild population of the Pyrenees (n = 9) and in the reintroduced populations of the Alps (n = 24) and Andalusia (n = 13). Most birds were equipped with GPS-GSM radio transmitters, which allowed accurate determination of individual dispersal patterns. Two estimators were considered: i) step length (i.e., the distance travelled per day by each individual, calculated considering only successive days); and ii) total dispersal distance (i.e., the distance travelled between each mean daily location and the point of release). Both dispersal estimators showed a positive relationship with age but were also highly dependent on the source population, birds in Andalusia and Alps moving farther than in Pyrenees. Future research should confirm if differences in dispersal distances are the rule, in which case the dynamics of future populations would be strongly influenced. In summary, our findings highlight that inter-population differences can affect the flow of individuals among patches (a key aspect to ensure the viability of the European metapopulation of the endangered bearded vulture), and thus should be taken into account when planning reintroduction programs. This result also raises questions about whether similar scenarios may occur in other restoration projects of European megafauna. 相似文献
13.
14.
15.
Genetic variation is usually estimated empirically from statistics based on population gene frequencies, but alternative statistics based on allelic diversity (number of allelic types) can provide complementary information. There is a lack of knowledge, however, on the evolutionary implications attached to allelic-diversity measures, particularly in structured populations. In this article we simulated multiple scenarios of single and structured populations in which a quantitative trait subject to stabilizing selection is adapted to different fitness optima. By forcing a global change in the optima we evaluated which diversity variables are more strongly correlated with both short- and long-term adaptation to the new optima. We found that quantitative genetic variance components for the trait and gene-frequency-diversity measures are generally more strongly correlated with short-term response to selection, whereas allelic-diversity measures are more correlated with long-term and total response to selection. Thus, allelic-diversity variables are better predictors of long-term adaptation than gene-frequency variables. This observation is also extended to unlinked neutral markers as a result of the information they convey on the demographic population history. Diffusion approximations for the allelic-diversity measures in a finite island model under the infinite-allele neutral mutation model are also provided. 相似文献
16.
Mei Guo Mary A. Rupe Jo Ann Dieter Jijun Zou Daniel Spielbauer Keith E. Duncan Richard J. Howard Zhenglin Hou Carl R. Simmons 《The Plant cell》2010,22(4):1057-1073
Genes involved in cell number regulation may affect plant growth and organ size and, ultimately, crop yield. The tomato (genus Solanum) fruit weight gene fw2.2, for instance, governs a quantitative trait locus that accounts for 30% of fruit size variation, with increased fruit size chiefly due to increased carpel ovary cell number. To expand investigation of how related genes may impact other crop plant or organ sizes, we identified the maize (Zea mays) gene family of putative fw2.2 orthologs, naming them Cell Number Regulator (CNR) genes. This family represents an ancient eukaryotic family of Cys-rich proteins containing the PLAC8 or DUF614 conserved motif. We focused on native expression and transgene analysis of the two maize members closest to Le-fw2.2, namely, CNR1 and CNR2. We show that CNR1 reduced overall plant size when ectopically overexpressed and that plant and organ size increased when its expression was cosuppressed or silenced. Leaf epidermal cell counts showed that the increased or decreased transgenic plant and organ size was due to changes in cell number, not cell size. CNR2 expression was found to be negatively correlated with tissue growth activity and hybrid seedling vigor. The effects of CNR1 on plant size and cell number are reminiscent of heterosis, which also increases plant size primarily through increased cell number. Regardless of whether CNRs and other cell number–influencing genes directly contribute to, or merely mimic, heterosis, they may aid generation of more vigorous and productive crop plants. 相似文献
17.
18.
In many bacterial genomes, the leading and lagging strands have different skews in base composition; for example, an excess of guanosine compared to cytosine on the leading strand. We find that Chlamydia genes that have switched their orientation relative to the direction of replication, for example by inversion, acquire the skew of their new ``host' strand. In contrast to most evolutionary processes, which have unpredictable effects on the sequence of a gene, replication-related skews reflect a directional evolutionary force that causes predictable changes in the base composition of switched genes, resulting in increased DNA and amino acid sequence divergence. Received: 27 April 2000 / Accepted: 1 August 2000 相似文献
19.
The goal of the current study was to examine the pattern of anatomical connectivity of the human frontal pole so as to inform theories of function of the frontal pole, perhaps one of the least understood region of the human brain. Rather than simply parcellating the frontal pole into subregions, we focused on examining the brain regions to which the frontal pole is anatomically and functionally connected. While the current findings provided support for previous work suggesting the frontal pole is connected to higher-order sensory association cortex, we found novel evidence suggesting that the frontal pole in humans is connected to posterior visual cortex. Furthermore, we propose a functional framework that incorporates these anatomical connections with existing cognitive theories of the functional organization of the frontal pole. In addition to a previously discussed medial-lateral distinction, we propose a dorsal-ventral gradient based on the information the frontal pole uses to guide behavior. We propose that dorsal regions are connected to other prefrontal regions that process goals and action plans, medial regions are connected to other brain regions that monitor action outcomes and motivate behaviors, and ventral regions connect to regions that process information about stimuli, values, and emotion. By incorporating information across these different levels of information, the frontal pole can effectively guide goal-directed behavior. 相似文献
20.
In biomedical studies the patients are often evaluated numerous times and a large number of variables are recorded at each time-point. Data entry and manipulation of longitudinal data can be performed using spreadsheet programs, which usually include some data plotting and analysis capabilities and are straightforward to use, but are not designed for the analyses of complex longitudinal data. Specialized statistical software offers more flexibility and capabilities, but first time users with biomedical background often find its use difficult. We developed medplot, an interactive web application that simplifies the exploration and analysis of longitudinal data. The application can be used to summarize, visualize and analyze data by researchers that are not familiar with statistical programs and whose knowledge of statistics is limited. The summary tools produce publication-ready tables and graphs. The analysis tools include features that are seldom available in spreadsheet software, such as correction for multiple testing, repeated measurement analyses and flexible non-linear modeling of the association of the numerical variables with the outcome. medplot is freely available and open source, it has an intuitive graphical user interface (GUI), it is accessible via the Internet and can be used within a web browser, without the need for installing and maintaining programs locally on the user’s computer. This paper describes the application and gives detailed examples describing how to use the application on real data from a clinical study including patients with early Lyme borreliosis. 相似文献