首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Ryu  Minho  Lee  Geonseok  Lee  Kichun 《Cluster computing》2021,24(3):1975-1987

In the new era of big data, numerous information and technology systems can store huge amounts of streaming data in real time, for example, in server-access logs on web application servers. The importance of anomaly detection in voluminous quantities of streaming data from such systems is rapidly increasing. One of the biggest challenges in the detection task is to carry out real-time contextual anomaly detection in streaming data with varying patterns that are visually detectable but unsuitable for a parametric model. Most anomaly detection algorithms have weaknesses in dealing with streaming time-series data containing such patterns. In this paper, we propose a novel method for online contextual anomaly detection in streaming time-series data using generalized extreme studentized deviates (GESD) tests. The GESD test is relatively accurate and efficient because it performs statistical hypothesis testing but it is unable to handle streaming time-series data. Thus, focusing on streaming time-series data, we propose an online version of the test capable of detecting outliers under varying patterns. We perform extensive experiments with simulated data, syntactic data, and real online traffic data from Yahoo Webscope, showing a clear advantage of the proposed method, particularly for analyzing streaming data with varying patterns.

  相似文献   

2.
3.
4.
Babnigg G  Giometti CS 《Proteomics》2003,3(5):584-600
The analysis of proteomes, i.e., the proteins expressed by biological organisms under a given set of conditions at a given time, requires separating complex protein mixtures into discrete protein components, measuring their relative abundances, and identifying the individual protein components. Many types of data are generated during the course of proteome analysis, including graphic images of the protein profiles, flat files containing numeric data, spreadsheets for assimilating numeric data, and relational database tables for integrating data from multiple experiments. As part of a project to describe the proteomes of microbes of interest to the U.S. Department of Energy, a World-Wide Web-based interface has been developed for the display of protein profiles generated by two-dimensional gel electrophoresis. The web interface is capable of obtaining protein identifications on the fly, interrogating the quantitative data in the context of available genome sequence information, and relating the proteome data to existing metabolic pathway databases. Analysis of protein expression profiles is expedited, providing the capability to efficiently determine the gene locations for proteins modulated in abundance in response to different growth conditions and to locate the positions of the proteins within specific metabolic pathways. The proteome of the archaeon Methanococcus jannaschii, a microbe for which the complete genome sequence is available, is used to demonstrate the capabilities of this evolving web interface (http://proteomeweb.anl.gov).  相似文献   

5.

Background

Searching the orthologs of a given protein or DNA sequence is one of the most important and most commonly used Bioinformatics methods in Biology. Programs like BLAST or the orthology search engine Inparanoid can be used to find orthologs when the similarity between two sequences is sufficiently high. They however fail when the level of conservation is low. The detection of remotely conserved proteins oftentimes involves sophisticated manual intervention that is difficult to automate.

Results

Here, we introduce morFeus, a search program to find remotely conserved orthologs. Based on relaxed sequence similarity searches, morFeus selects sequences based on the similarity of their alignments to the query, tests for orthology by iterative reciprocal BLAST searches and calculates a network score for the resulting network of orthologs that is a measure of orthology independent of the E-value. Detecting remotely conserved orthologs of a protein using morFeus thus requires no manual intervention. We demonstrate the performance of morFeus by comparing it to state-of-the-art orthology resources and methods. We provide an example of remotely conserved orthologs, which were experimentally shown to be functionally equivalent in the respective organisms and therefore meet the criteria of the orthology-function conjecture.

Conclusions

Based on our results, we conclude that morFeus is a powerful and specific search method for detecting remotely conserved orthologs. morFeus is freely available at http://bio.biochem.mpg.de/morfeus/. Its source code is available from Sourceforge.net (https://sourceforge.net/p/morfeus/).

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-263) contains supplementary material, which is available to authorized users.  相似文献   

6.

Background  

Bioinformatics often leverages on recent advancements in computer science to support biologists in their scientific discovery process. Such efforts include the development of easy-to-use web interfaces to biomedical databases. Recent advancements in interactive web technologies require us to rethink the standard submit-and-wait paradigm, and craft bioinformatics web applications that share analytical and interactive power with their desktop relatives, while retaining simplicity and availability.  相似文献   

7.
8.
The power of several neutrality tests to reject a simple bottleneck model is examined in a coalescent framework. Several tests are considered including some relying on the frequency spectrum of mutations and some reflecting the linkage disequilibrium structure of the data. We evaluate the effect of the age and of the strength of the bottleneck, and their interaction. We contrast two qualitatively different bottleneck effects depending on their strength. In genealogical terms, during severe bottlenecks, all lineages coalesce leading to a star-like gene genealogy of the sample. Some time after the bottleneck, once new mutations have arisen, they tend to show an excess of rare variants and a slight excess of haplotypes. On the contrary, more moderate bottlenecks allow several lineages to survive the demographic crash, leading to a balanced genealogy with long internal branches. Soon after the event, data tend to show an excess of intermediate frequency variants and a deficit of haplotypes. We show that for moderate sequencing efforts, severe bottlenecks can be detected only after an intermediate time period has allowed for mutations to occur, preferably by frequency spectrum statistics. Moderate bottlenecks can be more easily detected for more recent events, especially using haplotype statistics. Finally, for a single locus, the bottleneck results closely approximate those of a simple hitchhiking model. The main difference concerns the frequency distribution of mutations and haplotypes after moderate perturbations. Hitchhiking increases the number of rare ancestral mutations and leads to a more predominant major haplotype class. Thus, despite a number of common features between the two processes, hitchhiking cannot be strictly modeled by bottlenecks.  相似文献   

9.
SUMMARY: With the availability of whole genome sequence in many species, linkage analysis, positional cloning and microarray are gradually becoming powerful tools for investigating the links between phenotype and genotype or genes. However, in these methods, causative genes underlying a quantitative trait locus, or a disease, are usually located within a large genomic region or a large set of genes. Examining the function of every gene is very time consuming and needs to retrieve and integrate the information from multiple databases or genome resources. PGMapper is a software tool for automatically matching phenotype to genes from a defined genome region or a group of given genes by combining the mapping information from the Ensembl database and gene function information from the OMIM and PubMed databases. PGMapper is currently available for candidate gene search of human, mouse, rat, zebrafish and 12 other species. AVAILABILITY: Available online at http://www.genediscovery.org/pgmapper/index.jsp.  相似文献   

10.
11.
SUMMARY: Each organism has traits that are shared with some, but not all, organisms. Identification of genes needed for a particular trait can be accomplished by a comparative genomics approach using three or more organisms. Genes that occur in organisms without the trait are removed from the set of genes in common among organisms with the trait. To facilitate these comparisons, a web-based server, Procom, was developed to identify the subset of genes that may be needed for a trait. AVAILABILITY: The Procom program is freely available with documentation and examples at http://ural.wustl.edu/~billy/Procom/ CONTACT: billy@ural.wustl.edu.  相似文献   

12.
13.
14.
15.
Mediante is a MIAME-compliant microarray data manager that links together annotations and experimental data. Developed as a J2EE three-tier application, Mediante integrates a management system for production of long oligonucleotide microarrays, an experimental data repository suitable for home made or commercial microarrays, and a user interface dedicated to the management of microarrays projects. Several tools allow quality control of hybridizations and submission of validated data to public repositories. AVAILABILITY: http://www.microarray.fr. SUPPLEMENTARY INFORMATION: http://www.microarray.fr/SP/lebrigand2007/  相似文献   

16.
A web interface to PHYLIP (version 3.57 C) is implemented using CGI/Perl programming. It enables users to do phylogenetic analysis through the Internet.  相似文献   

17.
18.
19.
20.
Age-related hearing impairment (ARHI) affects 25-40% of individuals over the age of 65. Despite the high prevalence of this complex trait, ARHI is still poorly understood. We hypothesized that variance in hearing ability with age is largely determined by genetic factors. We collected audiologic data on females of Northern European ancestry and compared different audiogram representations. A web-based speech-to-noise ratio (SNR) hearing test was compared with pure-tone thresholds to see if we could determine accurately hearing ability on people at home and the genetic contribution to each trait compared. Volunteers were recruited from the TwinsUK cohort. Hearing ability was determined using pure-tone audiometry and a web-based hearing test. Different audiogram presentations were compared for age-correlation and reflection of audiogram shape. Using structural equation modelling based on the classical twin model the heritability of ARHI, as measured by the different phenotypes, was estimated and shared variance between the web-based SNR test and pure-tone audiometry determined using bivariate modelling. Pure-tone audiometric data was collected on 1033 older females (age: 41-86). 1970 volunteers (males and females, age: 18-85) participated in the SNR. In the comparison between different ARHI phenotypes the difference between the first two principle components (PC1-PC2) best represented ARHI. The SNR test showed a sensitivity and specificity of 89% and 80%, respectively, in comparison with pure-tone audiogram data. Univariate heritability estimates ranged from 0.70 (95% CI: 0.63-0.76) for (PC1-PC2) to 0.56 (95% CI: 0.48-0.63) for PC2. The genetic correlation of PC1-PC2 and SNR was -0.67 showing that the 2 traits share variances attributed to additive genetic factors. Hearing ability showed considerable heritability in our sample. We have shown that the SNR test provides a useful surrogate marker of hearing. This will enable a much larger sample to be collected at a fraction of the cost, facilitating future genetic association studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号