首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Principal component analysis (PCA) is routinely used to analyze genome-wide single-nucleotide polymorphism (SNP) data, for detecting population structure and potential outliers. However, the size of SNP datasets has increased immensely in recent years and PCA of large datasets has become a time consuming task. We have developed flashpca, a highly efficient PCA implementation based on randomized algorithms, which delivers identical accuracy in extracting the top principal components compared with existing tools, in substantially less time. We demonstrate the utility of flashpca on both HapMap3 and on a large Immunochip dataset. For the latter, flashpca performed PCA of 15,000 individuals up to 125 times faster than existing tools, with identical results, and PCA of 150,000 individuals using flashpca completed in 4 hours. The increasing size of SNP datasets will make tools such as flashpca essential as traditional approaches will not adequately scale. This approach will also help to scale other applications that leverage PCA or eigen-decomposition to substantially larger datasets.  相似文献   

3.
4.
We propose that a comparative approach to well-being could be the key to understanding ‘the good life.’ Inspired by current theories of human well-being and animal welfare, we designed a novel test of exploration behavior. Environmentally and socially enriched Long-Evans female rats (N = 60) were trained in four simultaneously presented arms of an eight-arm radial-maze. They learned to expect successes in two arms and failures in the other two. After training, 20 animals remained in enriched housing (enrichment-maintenance) while 40 animals were re-housed in standard, isolated conditions (enrichment-removal). Two weeks later, all animals were re-tested in the maze, initially with access to the four familiar arms only. In the final minute, they also had access to the unfamiliar ambiguous-arms. Though both groups showed significant interest in the ambiguous-arms (P<.0001), the enrichment-maintenance group showed a significantly greater exploratory tendency (P<.01) despite having equivalent levels of activity (P>.3). Thus, we show not only that rats will abandon known rewards and incur risk in order to explore, indicating that exploration is valuable in its own right, but also that individuals with (vs. without) enriched housing conditions are more likely to engage in such exploratory behavior. This novel test contributes to the body of knowledge examining the importance of exploration in humans and other animals; implications for animal welfare and human well-being are discussed.  相似文献   

5.
6.
7.
Currently there is great interest in detecting associations between complex traits and rare variants. In this report, we describe Variant Association Tools (VAT) and the VAT pipeline, which implements best practices for rare-variant association studies. Highlights of VAT include variant-site and call-level quality control (QC), summary statistics, phenotype- and genotype-based sample selection, variant annotation, selection of variants for association analysis, and a collection of rare-variant association methods for analyzing qualitative and quantitative traits. The association testing framework for VAT is regression based, which readily allows for flexible construction of association models with multiple covariates and weighting themes based on allele frequencies or predicted functionality. Additionally, pathway analyses, conditional analyses, and analyses of gene-gene and gene-environment interactions can be performed. VAT is capable of rapidly scanning through data by using multi-process computation, adaptive permutation, and simultaneously conducting association analysis via multiple methods. Results are available in text or graphic file formats and additionally can be output to relational databases for further annotation and filtering. An interface to R language also facilitates user implementation of novel association methods. The VAT''s data QC and association-analysis pipeline can be applied to sequence, imputed, and genotyping array, e.g., “exome chip,” data, providing a reliable and reproducible computational environment in which to analyze small- to large-scale studies with data from the latest genotyping and sequencing technologies. Application of the VAT pipeline is demonstrated through analysis of data from the 1000 Genomes project.  相似文献   

8.
9.
Recent climate reconstructions are analyzed specifically for insights into those patterns of climate variability in past centuries with greatest impact on the North American region. Regional variability, largely associated with the El Nino/Southern Oscillation (ENSO) phenomenon, the North Atlantic Oscillation (NAO), and multidecadal patterns of natural variability, are found to mask the emergence of an anthropogenic temperature signal in North America. Substantial recent temperature anomalies may however indicate a possible recent emergence of this signal in the region. Multidecadal North Atlantic variability is likely to positively reinforce any anthropogenic warming over substantial parts of North America in coming decades. The recent magnitudes of El Nino events appear to be unprecedented over the past several centuries. These recent changes, if anthropogenic in nature, may outweigh the projection of larger-scale climate change patterns onto the region in a climate change scenario. The implications of such changes for North America, however, are not yet clear. These observations suggest caution in assessing regional climate change scenarios in North America without a detailed consideration of possible anthropogenic changes in climate patterns influencing the region.  相似文献   

10.
11.

Backgrounds

Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs.

Methods

Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes.

Result

A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies.  相似文献   

12.
There is increasing interest in employing shotgun sequencing, rather than amplicon sequencing, to analyze microbiome samples. Typical projects may involve hundreds of samples and billions of sequencing reads. The comparison of such samples against a protein reference database generates billions of alignments and the analysis of such data is computationally challenging. To address this, we have substantially rewritten and extended our widely-used microbiome analysis tool MEGAN so as to facilitate the interactive analysis of the taxonomic and functional content of very large microbiome datasets. Other new features include a functional classifier called InterPro2GO, gene-centric read assembly, principal coordinate analysis of taxonomy and function, and support for metadata. The new program is called MEGAN Community Edition (CE) and is open source. By integrating MEGAN CE with our high-throughput DNA-to-protein alignment tool DIAMOND and by providing a new program MeganServer that allows access to metagenome analysis files hosted on a server, we provide a straightforward, yet powerful and complete pipeline for the analysis of metagenome shotgun sequences. We illustrate how to perform a full-scale computational analysis of a metagenomic sequencing project, involving 12 samples and 800 million reads, in less than three days on a single server. All source code is available here: https://github.com/danielhuson/megan-ce  相似文献   

13.
Ancient retroposon insertions can be used as virtually homoplasy-free markers to reconstruct the phylogenetic history of species. Inherited, orthologous insertions in related species offer reliable signals of a common origin of the given species. One prerequisite for such a phylogenetically informative insertion is that the inserted element was fixed in the ancestral population before speciation; if not, polymorphically inserted elements may lead to random distributions of presence/absence states during speciation and possibly to apparently conflicting reconstructions of their ancestry. Fortunately, such misleading fixed cases are relatively rare but nevertheless, need to be considered. Here, we present novel, comprehensive statistical models applicable for (1) analyzing any pattern of rare genomic changes, (2) testing and differentiating conflicting phylogenetic reconstructions based on rare genomic changes caused by incomplete lineage sorting or/and ancestral hybridization, and (3) differentiating between search strategies involving genome information from one or several lineages. When the new statistics are applied, in non-conflicting cases a minimum of three elements present in both of two species and absent in a third group are considered significant support (p<0.05) for the branching of the third from the other two, if all three of the given species are screened equally for genome or experimental data. Five elements are necessary for significant support (p<0.05) if a diagnostic locus derived from only one of three species is screened, and no conflicting markers are detected. Most potentially conflicting patterns can be evaluated for their significance and ancestral hybridization can be distinguished from incomplete lineage sorting by considering symmetric or asymmetric distribution of rare genomic changes among possible tree configurations. Additionally, we provide an R-application to make the new KKSC insertion significance test available for the scientific community at http://retrogenomics.uni-muenster.de:3838/KKSC_significance_test/.  相似文献   

14.
Biomedical research becomes increasingly interdisciplinary and collaborative in nature. Researchers need to efficiently and effectively collaborate and make decisions by meaningfully assembling, mining and analyzing available large-scale volumes of complex multi-faceted data residing in different sources. In line with related research directives revealing that, in spite of the recent advances in data mining and computational analysis, humans can easily detect patterns which computer algorithms may have difficulty in finding, this paper reports on the practical use of an innovative web-based collaboration support platform in a biomedical research context. Arguing that dealing with data-intensive and cognitively complex settings is not a technical problem alone, the proposed platform adopts a hybrid approach that builds on the synergy between machine and human intelligence to facilitate the underlying sense-making and decision making processes. User experience shows that the platform enables more informed and quicker decisions, by displaying the aggregated information according to their needs, while also exploiting the associated human intelligence.  相似文献   

15.
Discriminant procedures are often used to classify data based on observed characteristics of the response variables. This paper discusses the validation techniques in the use of discriminant function approach. Numerical example is used to illustrate its application.  相似文献   

16.
17.
18.
After the quasi-extinction of much of the European vertebrate megafauna during the last few centuries, many reintroduction projects seek to restore decimated populations. However, the future of numerous species depends on the management scenarios of metapopulations where the flow of individuals can be critical to ensure their viability. This is the case of the bearded vulture Gypaetus barbatus, an Old World, large body-sized and long-lived scavenger living in mountain ranges. Although persecution in Western Europe restrained it to the Pyrenees, the species is nowadays present in other mountains thanks to reintroduction projects. We examined the movement patterns of pre-adult non-breeding individuals born in the wild population of the Pyrenees (n = 9) and in the reintroduced populations of the Alps (n = 24) and Andalusia (n = 13). Most birds were equipped with GPS-GSM radio transmitters, which allowed accurate determination of individual dispersal patterns. Two estimators were considered: i) step length (i.e., the distance travelled per day by each individual, calculated considering only successive days); and ii) total dispersal distance (i.e., the distance travelled between each mean daily location and the point of release). Both dispersal estimators showed a positive relationship with age but were also highly dependent on the source population, birds in Andalusia and Alps moving farther than in Pyrenees. Future research should confirm if differences in dispersal distances are the rule, in which case the dynamics of future populations would be strongly influenced. In summary, our findings highlight that inter-population differences can affect the flow of individuals among patches (a key aspect to ensure the viability of the European metapopulation of the endangered bearded vulture), and thus should be taken into account when planning reintroduction programs. This result also raises questions about whether similar scenarios may occur in other restoration projects of European megafauna.  相似文献   

19.
Perchlorate Chemistry: Implications for Analysis and Remediation   总被引:16,自引:0,他引:16  
Since the discovery of perchlorate in the ground and surface waters of several western states, there has been increasing interest in the health effects resulting from chronic exposure to low (parts per billion [ppb]) levels. With this concern has come a need to investigate technologies that might be used to remediate contaminated sites or to treat contaminated water to make it safe for drinking. Possible technologies include physical separation (precipitation, anion exchange, reverse osmosis, and electrodialysis), chemical and electrochemical reduction, and biological or biochemical reduction. A fairly unique combination of chemical and physical properties of perchlorate poses challenges to its analysis and reduction in the environment or in drinking water. The implications of these properties are discussed in terms of remediative or treatment strategies. Recent developments are also covered.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号