首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Statistical models are helping palaeontologists to elucidate the history of biodiversity. Sampling standardization has been extensively applied to remedy the effects of uneven sampling in large datasets of fossil invertebrates. However, many vertebrate datasets are smaller, and the issue of uneven sampling has commonly been ignored, or approached using pairwise comparisons with a numerical proxy for sampling effort. Although most authors find a strong correlation between palaeodiversity and sampling proxies, weak correlation is recorded in some datasets. This has led several authors to conclude that uneven sampling does not influence our view of vertebrate macroevolution. We demonstrate that multi-variate regression models incorporating a model of underlying biological diversification, as well as a sampling proxy, fit observed sauropodomorph dinosaur palaeodiversity best. This bivariate model is a better fit than separate univariate models, and illustrates that observed palaeodiversity is a composite pattern, representing a biological signal overprinted by variation in sampling effort. Multi-variate models and other approaches that consider sampling as an essential component of palaeodiversity are central to gaining a more complete understanding of deep time vertebrate diversification.  相似文献   

2.
High-resolution Ca2+ imaging to study cellular Ca2+ behaviors has led to the creation of large datasets with a profound need for standardized and accurate analysis. To analyze these datasets, spatio-temporal maps (STMaps) that allow for 2D visualization of Ca2+ signals as a function of time and space are often used. Methods of STMap analysis rely on a highly arduous process of user defined segmentation and event-based data retrieval. These methods are often time consuming, lack accuracy, and are extremely variable between users. We designed a novel automated machine-learning based plugin for the analysis of Ca2+ STMaps (STMapAuto). The plugin includes optimized tools for Ca2+ signal preprocessing, automated segmentation, and automated extraction of key Ca2+ event information such as duration, spatial spread, frequency, propagation angle, and intensity in a variety of cell types including the Interstitial cells of Cajal (ICC). The plugin is fully implemented in Fiji and able to accurately detect and expeditiously quantify Ca2+ transient parameters from ICC. The plugin’s speed of analysis of large-datasets was 197-fold faster than the commonly used single pixel-line method of analysis. The automated machine-learning based plugin described dramatically reduces opportunities for user error and provides a consistent method to allow high-throughput analysis of STMap datasets.  相似文献   

3.
A number of studies have tried to exploit subtle phase differences in BOLD time series to resolve the order of sequential activation of brain regions, or more generally the ability of signal in one region to predict subsequent signal in another region. More recently, such lag-based measures have been applied to investigate directed functional connectivity, although this application has been controversial. We attempted to use large publicly available datasets (FCON 1000, ADHD 200, Human Connectome Project) to determine whether consistent spatial patterns of Granger Causality are observed in typical fMRI data. For BOLD datasets from 1,240 typically developing subjects ages 7–40, we measured Granger causality between time series for every pair of 7,266 spherical ROIs covering the gray matter and 264 seed ROIs at hubs of the brain’s functional network architecture. Granger causality estimates were strongly reproducible for connections in a test and replication sample (n=620 subjects for each group), as well as in data from a single subject scanned repeatedly, both during resting and passive video viewing. The same effect was even stronger in high temporal resolution fMRI data from the Human Connectome Project, and was observed independently in data collected during performance of 7 task paradigms. The spatial distribution of Granger causality reflected vascular anatomy with a progression from Granger causality sources, in Circle of Willis arterial inflow distributions, to sinks, near large venous vascular structures such as dural venous sinuses and at the periphery of the brain. Attempts to resolve BOLD phase differences with Granger causality should consider the possibility of reproducible vascular confounds, a problem that is independent of the known regional variability of the hemodynamic response.  相似文献   

4.
The analysis of electrophysiological recordings often involves visual inspection of time series data to locate specific experiment epochs, mask artifacts, and verify the results of signal processing steps, such as filtering or spike detection. Long-term experiments with continuous data acquisition generate large amounts of data. Rapid browsing through these massive datasets poses a challenge to conventional data plotting software because the plotting time increases proportionately to the increase in the volume of data. This paper presents FTSPlot, which is a visualization concept for large-scale time series datasets using techniques from the field of high performance computer graphics, such as hierarchic level of detail and out-of-core data handling. In a preprocessing step, time series data, event, and interval annotations are converted into an optimized data format, which then permits fast, interactive visualization. The preprocessing step has a computational complexity of ; the visualization itself can be done with a complexity of and is therefore independent of the amount of data. A demonstration prototype has been implemented and benchmarks show that the technology is capable of displaying large amounts of time series data, event, and interval annotations lag-free with ms. The current 64-bit implementation theoretically supports datasets with up to bytes, on the x86_64 architecture currently up to bytes are supported, and benchmarks have been conducted with bytes/1 TiB or double precision samples. The presented software is freely available and can be included as a Qt GUI component in future software projects, providing a standard visualization method for long-term electrophysiological experiments.  相似文献   

5.
Principal component analysis (PCA) is routinely used to analyze genome-wide single-nucleotide polymorphism (SNP) data, for detecting population structure and potential outliers. However, the size of SNP datasets has increased immensely in recent years and PCA of large datasets has become a time consuming task. We have developed flashpca, a highly efficient PCA implementation based on randomized algorithms, which delivers identical accuracy in extracting the top principal components compared with existing tools, in substantially less time. We demonstrate the utility of flashpca on both HapMap3 and on a large Immunochip dataset. For the latter, flashpca performed PCA of 15,000 individuals up to 125 times faster than existing tools, with identical results, and PCA of 150,000 individuals using flashpca completed in 4 hours. The increasing size of SNP datasets will make tools such as flashpca essential as traditional approaches will not adequately scale. This approach will also help to scale other applications that leverage PCA or eigen-decomposition to substantially larger datasets.  相似文献   

6.
Harmonic analysis on manifolds and graphs has recently led to mathematical developments in the field of data analysis. The resulting new tools can be used to compress and analyze large and complex data sets, such as those derived from sensor networks or neuronal activity datasets, obtained in the laboratory or through computer modeling. The nature of the algorithms (based on diffusion maps and connectivity strengths on graphs) possesses a certain analogy with neural information processing, and has the potential to provide inspiration for modeling and understanding biological organization in perception and memory formation.  相似文献   

7.
Gene expression analysis is generally performed on heterogeneous tissue samples consisting of multiple cell types. Current methods developed to separate heterogeneous gene expression rely on prior knowledge of the cell-type composition and/or signatures - these are not available in most public datasets. We present a novel method to identify the cell-type composition, signatures and proportions per sample without need for a-priori information. The method was successfully tested on controlled and semi-controlled datasets and performed as accurately as current methods that do require additional information. As such, this method enables the analysis of cell-type specific gene expression using existing large pools of publically available microarray datasets.  相似文献   

8.
Artifacts arising when differential phase images are integrated is a common problem to several X-ray phase-based experimental techniques. The combination of noise and insufficient sampling of the high-frequency differential phase signal leads to the formation of streak artifacts in the projections, translating into poor image quality in the tomography slices. In this work, we apply a non-iterative integration algorithm proven to reduce streak artifacts in planar (2D) images to a differential phase tomography scan. We report on how the reduction of streak artifacts in the projections improves the quality of the tomography slices, especially in the directions different from the reconstruction plane. Importantly, the method is compatible with large tomography datasets in terms of computation time.  相似文献   

9.
Machine learning algorithms, including recent advances in deep learning, are promising for tools for detection and classification of broadband high frequency signals in passive acoustic recordings. However, these methods are generally data-hungry and progress has been limited by challenges related to the lack of labeled datasets adequate for training and testing. Large quantities of known and as yet unidentified broadband signal types mingle in marine recordings, with variability introduced by acoustic propagation, source depths and orientations, and interacting signals. Manual classification of these datasets is unmanageable without an in-depth knowledge of the acoustic context of each recording location. A signal classification pipeline is presented which combines unsupervised and supervised learning phases with opportunities for expert oversight to label signals of interest. The method is illustrated with a case study using unsupervised clustering to identify five toothed whale echolocation click types and two anthropogenic signal categories. These categories are used to train a deep network to classify detected signals in either averaged time bins or as individual detections, in two independent datasets. Bin-level classification achieved higher overall precision (>99%) than click-level classification. However, click-level classification had the advantage of providing a label for every signal, and achieved higher overall recall, with overall precision from 92 to 94%. The results suggest that unsupervised learning is a viable solution for efficiently generating the large, representative training sets needed for applications of deep learning in passive acoustics.  相似文献   

10.
The field of social network analysis has received increasing attention during the past decades and has been used to tackle a variety of research questions, from prevention of sexually transmitted diseases to humanitarian relief operations. In particular, social network analyses are becoming an important component in studies of criminal networks and in criminal intelligence analysis. At the same time, intelligence analyses and assessments have become a vital component of modern approaches in policing, with policy implications for crime prevention, especially in the fight against organized crime. In this study, we have a unique opportunity to examine one specific Swedish street gang with three different datasets. These datasets are the most common information sources in studies of criminal networks: intelligence, surveillance and co-offending data. We use the data sources to build networks, and compare them by computing distance, centrality, and clustering measures. This study shows the complexity factor by which different data sources about the same object of study have a fundamental impact on the results. The same individuals have different importance ranking depending on the dataset and measure. Consequently, the data source plays a vital role in grasping the complexity of the phenomenon under study. Researchers, policy makers, and practitioners should therefore pay greater attention to the biases affecting the sources of the analysis, and be cautious when drawing conclusions based on intelligence assessments and limited network data. This study contributes to strengthening social network analysis as a reliable tool for understanding and analyzing criminality and criminal networks.  相似文献   

11.
Freshwater ecosystems are some of the most endangered environments in the world, being affected at multiple scales by the surrounding landscape and human activities therein. Effective research, conservation and management of these ecosystems requires integrating environmental and landscape data with hierarchic river networks by means of summarisation and synthesis of information for large and comprehensive areas at different scales (e.g. basin, sub‐basin, upstream drainage area). The dendritic nature of river networks, the need to tackle multiple scales and the ever‐growing sources of digital information (e.g. temperature or land use data grids) have increasingly led to hardly manageable processing time and stringent hardware requirements when integrating and working with this information. Here we present the River Network Toolkit (RivTool), a software that uses only tabular data to derive and calculate new information at multiple scales for riverine landscapes. It uses data from linear hierarchical river networks and the environmental/landscape data from their respective drainage areas. The software allows the acquisition of: 1) information that characterises river networks based on its topographic nature; 2) data obtained via mathematical calculations that account for the hierarchical and network nature of these systems; and 3) output information using different spatial data sources (e.g. climatic, land use, topologic) that result from up and downstream summarisations. This user‐friendly software considers two units of analysis (segment and sub‐basin) and is time effective even with large datasets. RivTool facilitates and reduces the time required for extracting information for freshwater ecosystems, and may thus contribute to increase scientific productivity, efficiency and accurateness when generating new or improving existing knowledge on large‐scale patterns and processes in river networks.  相似文献   

12.
Recent advances in sensor and recording technology have allowed scientists to acquire very large time-series datasets. Researchers often analyze these datasets in the context of events, which are intervals of time where the properties of the signal change relative to a baseline signal. We have developed DETECT, a MATLAB toolbox for detecting event time intervals in long, multi-channel time series. Our primary goal is to produce a toolbox that is simple for researchers to use, allowing them to quickly train a model on multiple classes of events, assess the accuracy of the model, and determine how closely the results agree with their own manual identification of events without requiring extensive programming knowledge or machine learning experience. As an illustration, we discuss application of the DETECT toolbox for detecting signal artifacts found in continuous multi-channel EEG recordings and show the functionality of the tools found in the toolbox. We also discuss the application of DETECT for identifying irregular heartbeat waveforms found in electrocardiogram (ECG) data as an additional illustration.  相似文献   

13.
The ability to generate large molecular datasets for phylogenetic studies benefits biologists, but such data expansion introduces numerous analytical problems. A typical molecular phylogenetic study implicitly assumes that sequences evolve under stationary, reversible and homogeneous conditions, but this assumption is often violated in real datasets. When an analysis of large molecular datasets results in unexpected relationships, it often reflects violation of phylogenetic assumptions, rather than a correct phylogeny. Molecular evolutionary phenomena such as base compositional heterogeneity and among‐site rate variation are known to affect phylogenetic inference, resulting in incorrect phylogenetic relationships. The ability of methods to overcome such bias has not been measured on real and complex datasets. We investigated how base compositional heterogeneity and among‐site rate variation affect phylogenetic inference in the context of a mitochondrial genome phylogeny of the insect order Coleoptera. We show statistically that our dataset is affected by base compositional heterogeneity regardless of how the data are partitioned or recoded. Among‐site rate variation is shown by comparing topologies generated using models of evolution with and without a rate variation parameter in a Bayesian framework. When compared for their effectiveness in dealing with systematic bias, standard phylogenetic methods tend to perform poorly, and parsimony without any data transformation performs worst. Two methods designed specifically to overcome systematic bias, LogDet and a Bayesian method implementing variable composition vectors, can overcome some level of base compositional heterogeneity, but are still affected by among‐site rate variation. A large degree of variation in both noise and phylogenetic signal among all three codon positions is observed. We caution and argue that more data exploration is imperative, especially when many genes are included in an analysis.  相似文献   

14.
15.
The empirical mode decomposition (EMD) method can adaptively decompose a non-stationary time series into a number of amplitude or frequency modulated functions known as intrinsic mode functions. This paper combines the EMD method with information analysis and presents a framework of information-preserving EMD. The enhanced EMD method has been exploited in the analysis of neural recordings. It decomposes a signal and extracts only the most informative oscillations contained in the non-stationary signal. Information analysis has shown that the extracted components retain the information content of the signal. More importantly, a limited number of components reveal the main oscillations presented in the signal and their instantaneous frequencies, which are not often obvious from the original signal. This information-coupled EMD method has been tested on several field potential datasets for the analysis of stimulus coding in visual cortex, from single and multiple channels, and for finding information connectivity among channels. The results demonstrate the usefulness of the method in extracting relevant responses from the recorded signals. An investigation is also conducted on utilizing the Hilbert phase for cases where phase information can further improve information analysis and stimulus discrimination. The components of the proposed method have been integrated into a toolbox and the initial implementation is also described.  相似文献   

16.
Surface myoelectric signals often appear to carry more information than what is resolved in root mean square analysis of the progress curves or in its power spectrum. Time-frequency analysis of myoelectric signals has not yet led to satisfactory results in respect of separating simultaneous events in time and frequency. In this study a time-frequency analysis of the intensities in time series was developed. This intensity analysis uses a filter bank of non-linearly scaled wavelets with specified time-resolution to extract time-frequency aspects of the signal. Special procedures were developed to calculate intensity in such a way as to approximate the power of the signal in time. Applied to an EMG signal the intensity analysis was called a functional EMG analysis. The method resolves events within the EMG signal. The time when the events occur and their intensity and frequency distribution are well resolved in the intensity patterns extracted from the EMG signal. Averaging intensity patterns from multiple experiments resolve repeatable functional aspects of muscle activation. Various properties of the functional EMG analysis were shown and discussed using model EMG data and real EMG data.  相似文献   

17.
We develop an approach for the exploratory analysis of gene expression data, based upon blind source separation techniques. This approach exploits higher-order statistics to identify a linear model for (logarithms of) expression profiles, described as linear combinations of "independent sources." As a result, it yields "elementary expression patterns" (the "sources"), which may be interpreted as potential regulation pathways. Further analysis of the so-obtained sources show that they are generally characterized by a small number of specific coexpressed or antiexpressed genes. In addition, the projections of the expression profiles onto the estimated sources often provides significant clustering of conditions. The algorithm relies on a large number of runs of "independent component analysis" with random initializations, followed by a search of "consensus sources." It then provides estimates for independent sources, together with an assessment of their robustness. The results obtained on two datasets (namely, breast cancer data and Bacillus subtilis sulfur metabolism data) show that some of the obtained gene families correspond to well known families of coregulated genes, which validates the proposed approach.  相似文献   

18.
Trichoptera are holometabolous insects with aquatic larvae that, together with the Lepidoptera, make up the Amphiesmenoptera. Despite extensive previous morphological work, little phylogenetic agreement has been reached about the relationship among the three suborders--Annulipalpia, Spicipalpia, and Integripalpia--or about the monophyly of Spicipalpia. In an effort to resolve this conflict, we sequenced fragments of the large and small subunit nuclear ribosomal RNAs (1078 nt; D1, D3, V4-5), the nuclear elongation factor 1 alpha gene (EF-1 alpha; 1098 nt), and a fragment of mitochondrial cytochrome oxidase I (COI; 411 nt). Seventy adult and larval morphological characters were reanalyzed and added to molecular data in a combined analysis. We evaluated signal and homoplasy in each of the molecular datasets and attempted to rank the particular datasets according to how appropriate they were for inferring relationships among suborders. This evaluation included testing for conflict among datasets, comparing tree lengths among alternative hypotheses, measuring the left-skew of tree-length distributions from maximally divergent sets of taxa, evaluating the recovery of expected clades, visualizing whether or not substitutions were accumulating with time, and estimating nucleotide compositional bias. Although all these measures cast doubt on the reliability of the deep-level signal coming from the nucleotides of the COI and EF-1 alpha genes, these data could still be included in combined analyses without overturning the results from the most conservative marker, the rRNA. The different datasets were found to be evolving under extremely different rates. A site-specific likelihood method for dealing with combined data with nonoverlapping parameters was proposed, and a similar weighting scheme under parsimony was evaluated. Among our phylogenetic conclusions, we found Annulipalpia to be the most basal of the three suborders, with Spicipalpia and Integripalpia forming a clade. Monophyly of Annulipalpia and Integripalpia was confirmed, but the relationships among spicipalpians remain equivocal.  相似文献   

19.
Phylogenetic comparative methods have become a standard statistical approach for analysing interspecific data, under the assumption that traits of species are more similar than expected by chance (i.e. phylogenetic signal is present). Here I test for phylogenetic signal in intraspecific body size datasets to evaluate whether intraspecific datasets may require phylogenetic analysis. I also compare amounts of phylogenetic signal in intraspecific and interspecific body size datasets. Some intraspecific body size datasets contain significant phylogenetic signal. Detection of significant phylogenetic signal was dependant upon the number of populations (n) and the amount of phylogenetic signal (K) for a given dataset. Amounts of phylogenetic signal do not differ between intraspecific and interspecific datasets. Further, relationships between significance of phylogenetic signal and sample size and amount of phylogenetic signal are similar for intraspecific and interspecific datasets. Thus, intraspecific body size datasets are similar to interspecific body size datasets with respect to phylogenetic signal. Whether these results are general for all characters requires further study.  相似文献   

20.
Regulatory DNA elements, short genomic segments that regulate gene expression, have been implicated in developmental disorders and human disease. Despite this clinical urgency, only a small fraction of the regulatory DNA repertoire has been confirmed through reporter gene assays. The overall success rate of functional validation of candidate regulatory elements is low. Moreover, the number and diversity of datasets from which putative regulatory elements can be identified is large and rapidly increasing. We generated a flexible and user-friendly tool to integrate the information from different types of genomic datasets, e.g. ATAC-seq, ChIP-seq, conservation, aiming to increase the ease and success rate of functional prediction. To this end, we developed the EMERGE program that merges all datasets that the user considers informative and uses a logistic regression framework, based on validated functional elements, to set optimal weights to these datasets. ROC curve analysis shows that a combination of datasets leads to improved prediction of tissue-specific enhancers in human, mouse and Drosophila genomes. Functional assays based on this prediction can be expected to have substantially higher success rates. The resulting integrated signal for prediction of functional elements can be plotted in a build-in genome browser or exported for further analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号