首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The objective of this research is to examine the efficiency of EUR/USD market through the application of a trading system. The system uses a genetic algorithm based on technical analysis indicators such as Exponential Moving Average (EMA), Moving Average Convergence Divergence (MACD), Relative Strength Index (RSI) and Filter that gives buying and selling recommendations to investors. The algorithm optimizes the strategies by dynamically searching for parameters that improve profitability in the training period. The best sets of rules are then applied on the testing period. The results show inconsistency in finding a set of trading rules that performs well in both periods. Strategies that achieve very good returns in the training period show difficulty in returning positive results in the testing period, this being consistent with the efficient market hypothesis (EMH).  相似文献   

2.

Background

Using hybrid approach for gene selection and classification is common as results obtained are generally better than performing the two tasks independently. Yet, for some microarray datasets, both classification accuracy and stability of gene sets obtained still have rooms for improvement. This may be due to the presence of samples with wrong class labels (i.e. outliers). Outlier detection algorithms proposed so far are either not suitable for microarray data, or only solve the outlier detection problem on their own.

Results

We tackle the outlier detection problem based on a previously proposed Multiple-Filter-Multiple-Wrapper (MFMW) model, which was demonstrated to yield promising results when compared to other hybrid approaches (Leung and Hung, 2010). To incorporate outlier detection and overcome limitations of the existing MFMW model, three new features are introduced in our proposed MFMW-outlier approach: 1) an unbiased external Leave-One-Out Cross-Validation framework is developed to replace internal cross-validation in the previous MFMW model; 2) wrongly labeled samples are identified within the MFMW-outlier model; and 3) a stable set of genes is selected using an L1-norm SVM that removes any redundant genes present. Six binary-class microarray datasets were tested. Comparing with outlier detection studies on the same datasets, MFMW-outlier could detect all the outliers found in the original paper (for which the data was provided for analysis), and the genes selected after outlier removal were proven to have biological relevance. We also compared MFMW-outlier with PRAPIV (Zhang et al., 2006) based on same synthetic datasets. MFMW-outlier gave better average precision and recall values on three different settings. Lastly, artificially flipped microarray datasets were created by removing our detected outliers and flipping some of the remaining samples'' labels. Almost all the ‘wrong’ (artificially flipped) samples were detected, suggesting that MFMW-outlier was sufficiently powerful to detect outliers in high-dimensional microarray datasets.  相似文献   

3.
Outlier detection and cleaning procedures were evaluated to estimate mathematical restricted variogram models with discrete insect population count data. Because variogram modeling is significantly affected by outliers, methods to detect and clean outliers from data sets are critical for proper variogram modeling. In this study, we examined spatial data in the form of discrete measurements of insect counts on a rectangular grid. Two well-known insect pest population data were analyzed; one data set was the western flower thrips, Frankliniella occidentalis (Pergande) on greenhouse cucumbers and the other was the greenhouse whitefly, Trialeurodes vaporariorum (Westwood) on greenhouse cherry tomatoes. A spatial additive outlier model was constructed to detect outliers in both the isolated and patchy spatial distributions of outliers, and the outliers were cleaned with the neighboring median cleaner. To analyze the effect of outliers, we compared the relative nugget effects of data cleaned of outliers and data still containing outliers after transformation. In addition, the correlation coefficients between the actual and predicted values were compared using the leave-one-out cross-validation method with data cleaned of outliers and non-cleaned data after unbiased back transformation. The outlier detection and cleaning procedure improved geostatistical analysis, particularly by reducing the nugget effect, which greatly impacts the prediction variance of kriging. Consequently, the outlier detection and cleaning procedures used here improved the results of geostatistical analysis with highly skewed and extremely fluctuating data, such as insect counts.  相似文献   

4.
Outlier detection methods were used to scan the genome of the boreal conifer black spruce (Picea mariana [Mill.] B.S.P.) for gene single-nucleotide polymorphisms (SNPs) potentially involved in adaptations to temperature and precipitation variations. The scan involved 583 SNPs from 313 genes potentially playing adaptive roles. Differentiation estimates among population groups defined following variation in temperature and precipitation were moderately high for adaptive quantitative characters such as the timing of budset or tree height (Q(ST) = 0.189-0.314). Average differentiation estimates for gene SNPs were null, with F(ST) values of 0.005 and 0.006, respectively, among temperature and precipitation population groups. Using two detection approaches, a total of 26 SNPs from 25 genes distributed among 11 of the 12 linkage groups of black spruce were detected as outliers with F(ST) as high as 0.078. Nearly half of the outlier SNPs were located in exons and half of those were nonsynonymous. The functional annotations of genes carrying outlier SNPs and regression analyses between the frequencies of these SNPs and climatic variables supported their implication in adaptive processes. Several genes carrying outlier SNPs belonged to gene families previously found to harbour outlier SNPs in a reproductively isolated but largely sympatric congeneric species, suggesting differential subfunctionalization of gene duplicates. Selection coefficient estimates (S) were moderate but well above the magnitude of drift (>1/N(e)), indicating that the signature of natural selection could be detected at the nucleotide level despite the recent establishment of these populations during the Holocene.  相似文献   

5.
In the last few years, dozens of studies have documented the detection of loci influenced by selection from genome scans in a wide range of non-model species. Many of those studies used amplified fragment length polymorphism (AFLP) markers, which became popular for being easily applicable to any organism. However, because they are anonymous markers, AFLPs impose many challenges for their isolation and identification. Most recent AFLP genome scans used capillary electrophoresis (CE), which adds even more obstacles to the isolation of bands with a specific size for sequencing. These caveats might explain the extremely low number of studies that moved from the detection of outlier AFLP markers to their actual isolation and characterization. We document our efforts to characterize a set of outlier AFLP markers from a previous genome scan with CE in ocellated lizards (Lacerta lepida). Seven outliers were successfully isolated, cloned and sequenced. Their sequences are noncoding and show internal indels or polymorphic repetitive elements (microsatellites). Three outliers were converted into codominant markers by using specific internal primers to sequence and screen population variability from undigested DNA. Amplification in closely related lizard species was also achieved, revealing remarkable interspecific conservation in outlier loci sequences. We stress the importance of following up AFLP genome scans to validate selection signatures of outlier loci, but also report the main challenges and pitfalls that may be faced during the process.  相似文献   

6.
The tendency for experimental and industrial variables to include a certain proportion of outliers has become a rule rather than an exception. These clusters of outliers, if left undetected, have the capability to distort the mean and the covariance matrix of the Hotelling’s T 2 multivariate control charts constructed to monitor individual quality characteristics. The effect of this distortion is that the control chart constructed from it becomes unreliable as it exhibits masking and swamping, a phenomenon in which an out-of-control process is erroneously declared as an in-control process or an in-control process is erroneously declared as out-of-control process. To handle these problems, this article proposes a control chart that is based on cluster-regression adjustment for retrospective monitoring of individual quality characteristics in a multivariate setting. The performance of the proposed method is investigated through Monte Carlo simulation experiments and historical datasets. Results obtained indicate that the proposed method is an improvement over the state-of-art methods in terms of outlier detection as well as keeping masking and swamping rate under control.  相似文献   

7.

Aim

Species distribution data play a pivotal role in the study of ecology, evolution, biogeography and biodiversity conservation. Although large amounts of location data are available and accessible from public databases, data quality remains problematic. Of the potential sources of error, positional errors are critical for spatial applications, particularly where these errors place observations beyond the environmental or geographical range of species. These outliers need to be identified, checked and removed to improve data quality and minimize the impact on subsequent analyses. Manually checking all species records within large multispecies datasets is prohibitively costly. This work investigates algorithms that may assist in the efficient vetting of outliers in such large datasets.

Location

We used real, spatially explicit environmental data derived from the western part of Victoria, Australia, and simulated species distributions within this same region.

Methods

By adapting species distribution modelling (SDM), we developed a pseudo‐SDM approach for detecting outliers in species distribution data, which was implemented with random forest (RF) and support vector machine (SVM) resulting in two new methods: RF_pdSDM and SVM_pdSDM. Using virtual species, we compared eight existing multivariate outlier detection methods with these two new methods under various conditions.

Results

The two new methods based on the pseudo‐SDM approach had higher true skill statistic (TSS) values than other approaches, with TSS values always exceeding 0. More than 70% of the true outliers in datasets for species with a low and intermediate prevalence can be identified by checking 10% of the data points with the highest outlier scores.

Main conclusions

Pseudo‐SDM‐based methods were more effective than other outlier detection methods. However, this outlier detection procedure can only be considered as a screening tool, and putative outliers must be examined by experts to determine whether they are actual errors or important records within an inherently biased set of data.  相似文献   

8.
ABSTRACT: BACKGROUND: Mass spectrometry (MS) data are often generated from various biological or chemical experiments and there may exist outlying observations, which are extreme due to technical reasons. The determination of outlying observations is important in the analysis of replicated MS data because elaborate pre-processing is essential for successful analysis with reliable results and manual outlier detection as one of pre-processing steps is time-consuming. The heterogeneity of variability and low replication are often obstacles to successful analysis, including outlier detection. Existing approaches, which assume constant variability, can generate many false positives (outliers) and/or false negatives non-outliers). Thus, a more powerful and accurate approach is needed to account for the heterogeneity of variability and low replication. FINDINGS: We proposed an outlier detection algorithm using projection and quantile regression in MS data from multiple experiments. The performance of the algorithm and program was demonstrated by using both simulated and real-life data. The projection approach with linear, nonlinear, or nonparametric quantile regression was appropriate in heterogeneous high-throughput data with low replication. CONCLUSION: Various quantile regression approaches combined with projection were proposed for detecting outliers. The choice among linear, nonlinear, and nonparametric regressions is dependent on the degree of heterogeneity of the data. The proposed approach was illustrated with MS data with two or more replicates.  相似文献   

9.
Insomnia is an epidemic in the US. Neurofeedback (NFB) is a little used, psychophysiological treatment with demonstrated usefulness for treating insomnia. Our objective was to assess whether two distinct Z-Score NFB protocols, a modified sensorimotor (SMR) protocol and a sequential, quantitative EEG (sQEEG)-guided, individually designed (IND) protocol, would alleviate sleep and associated daytime dysfunctions of participants with insomnia. Both protocols used instantaneous Z scores to determine reward condition administered when awake. Twelve adults with insomnia, free of other mental and uncontrolled physical illnesses, were randomly assigned to the SMR or IND group. Eight completed this randomized, parallel group, single-blind study. Both groups received fifteen 20-min sessions of Z-Score NFB. Pre-post assessments included sQEEG, mental health, quality of life, and insomnia status. ANOVA yielded significant post-treatment improvement for the combined group on all primary insomnia scores: Insomnia Severity Index (ISI p < .005), Pittsburgh Sleep Quality Inventory (PSQI p < .0001), PSQI Sleep Efficiency (p < .007), and Quality of Life Inventory (p < .02). Binomial tests of baseline EEGs indicated a significant proportion of excessively high levels of Delta and Beta power (p < .001) which were lowered post-treatment (paired z-tests p < .001). Baseline EEGs showed excessive sleepiness and hyperarousal, which improved post-treatment. Both Z-Score NFB groups improved in sleep and daytime functioning. Post-treatment, all participants were normal sleepers. Because there were no significant differences in the findings between the two groups, our future large scale studies will utilize the less burdensome to administer Z-Score SMR protocol.  相似文献   

10.
The identification of loci under selection (outliers) is a major challenge in evolutionary biology, being critical to comprehend evolutionary processes leading to population differentiation and speciation, and for conservation purposes, also in light of recent climate change. However, detection of selected loci can be difficult when populations are weakly differentiated. This is the case of marine fish populations, often characterized by high levels of gene flow and connectivity, and particularly of fish living in the Antarctic marine environment, characterized by a complex and strong circulating system promoting individual dispersal all around the continent. With the final aim of identifying outlier loci putatively under selection in the Chionodraco genus, we used 21 microsatellites, including both genomic (Type II) and EST-linked loci (Type I), to investigate the genetic differentiation among the three recently derived Chionodraco species that are endemic to the freezing Antarctic waters. Neutrality tests were applied in interspecific comparisons in order to identify candidate loci showing high levels of genetic differentiation, which might reveal imprints of past selection. Three outlier loci were identified, detecting a higher differentiation between species than did neutral loci. Outliers showed sequence similarity to a calmodulin gene, to an antifreeze glycoprotein/trypsinogen-like protease gene and to nonannotated fish mRNAs. Selective pressures acting on outlier loci identified in this study might reflect past evolutionary processes, which led to species divergence and local adaptation in the Chionodraco genus. Used loci will provide a valuable tool for future population genetic studies in Antarctic notothenioids.  相似文献   

11.
Sequence-based understanding and identification of protein binding interfaces is a challenging research topic due to the complexity in protein systems and the imbalanced distribution between interface and noninterface residues. This paper presents an outlier detection idea to address the redundancy problem in protein interaction data. The cleaned training data are then used for improving the prediction performance. We use three novel measures to describe the extent a residue is considered as an outlier in comparison to the other residues: the distance of a residue instance from the center instance of all residue instances of the same class label (Dist), the probability of the class label of the residue instance (PCL), and the importance of within-class and between-class (IWB) residue instances. Outlier scores are computed by integrating the three factors; instances with a sufficiently large score are treated as outliers and removed. The data sets without outliers are taken as input for a support vector machine (SVM) ensemble. The proposed SVM ensemble trained on input data without outliers performs better than that with outliers. Our method is also more accurate than many literature methods on benchmark data sets. From our empirical studies, we found that some outlier interface residues are truly near to noninterface regions, and some outlier noninterface residues are close to interface regions.  相似文献   

12.
The present study investigated the genetic diversity, population structure, F ST outliers, and extent and pattern of linkage disequilibrium in five populations of Keteleeria davidiana var. formosana, which is listed as a critically endangered species by the Council of Agriculture, Taiwan. Twelve amplified fragment length polymorphism primer pairs generated a total of 465 markers, of which 83.74% on average were polymorphic across populations, with a mean Nei’s genetic diversity of 0.233 and a low level of genetic differentiation (approximately 6%) based on the total dataset. Linkage disequilibrium and HICKORY analyses suggested recent population bottlenecks and inbreeding in K. davidiana var. formosana. Both STRUCTURE and BAPS observed extensive admixture of individual genotypes among populations based on the total dataset in various clustering scenarios, which probably resulted from incomplete lineage sorting of ancestral variation rather than a high rate of recent gene flow. Our results based on outlier analysis revealed generally high levels of genetic differentiation and suggest that divergent selection arising from environmental variation has been driven by differences in temperature, precipitation, and humidity. Identification of ecologically associated outliers among environmentally disparate populations further support divergent selection and potential local adaptation.  相似文献   

13.
14.
15.
Controlling for background demographic effects is important for accurately identifying loci that have recently undergone positive selection. To date, the effects of demography have not yet been explicitly considered when identifying loci under selection during dog domestication. To investigate positive selection on the dog lineage early in the domestication, we examined patterns of polymorphism in six canid genomes that were previously used to infer a demographic model of dog domestication. Using an inferred demographic model, we computed false discovery rates (FDR) and identified 349 outlier regions consistent with positive selection at a low FDR. The signals in the top 100 regions were frequently centered on candidate genes related to brain function and behavior, including LHFPL3, CADM2, GRIK3, SH3GL2, MBP, PDE7B, NTAN1, and GLRA1. These regions contained significant enrichments in behavioral ontology categories. The 3rd top hit, CCRN4L, plays a major role in lipid metabolism, that is supported by additional metabolism related candidates revealed in our scan, including SCP2D1 and PDXC1. Comparing our method to an empirical outlier approach that does not directly account for demography, we found only modest overlaps between the two methods, with 60% of empirical outliers having no overlap with our demography-based outlier detection approach. Demography-aware approaches have lower-rates of false discovery. Our top candidates for selection, in addition to expanding the set of neurobehavioral candidate genes, include genes related to lipid metabolism, suggesting a dietary target of selection that was important during the period when proto-dogs hunted and fed alongside hunter-gatherers.  相似文献   

16.
Emi Tanaka 《Biometrics》2020,76(4):1374-1382
The aim of plant breeding trials is often to identify crop variety that are well adapt to target environments. These varieties are identified through genomic prediction from the analysis of multi-environmental field trial (MET) using linear mixed models. The occurrence of outliers in MET is common and known to adversely impact the accuracy of genomic prediction yet the detection of outliers are often neglected. A number of reasons stand for this—first, complex data such as a MET give rise to distinct levels of residuals (eg, at a trial level or individual observation level). This complexity offers additional challenges for an outlier detection method. Second, many linear mixed model software packages that cater for complex variance structures needed in the analysis of MET are not well streamlined for diagnostics by practitioners. We demonstrate outlier detection methods that are simple to implement in any linear mixed model software packages and computationally fast. Although these methods are not optimal methods in outlier detection, they offer practical value for ease of application in the analysis pipeline of regularly collected data. These are demonstrated using simulation based on two real bread wheat yield METs. In particular, models that consider analysis of yield trials either independently or jointly (thus borrowing strength across trials) are considered. Case studies are presented to highlight benefit of joint analysis for outlier detection.  相似文献   

17.
高温胁迫对不同种源希蒙得木叶片生理特性的影响   总被引:6,自引:0,他引:6  
利用人工气候室模拟高温环境,研究了不同程度高温处理对3个不同种源(Z1:会东可河;Z2:澳大利亚肯多伯冷;Z3:美国菲尼克斯)希蒙得木叶片相对含水量(LRWC)、叶片光合特性、渗透调节、抗氧化保护酶、膜脂过氧化的影响.结果表明,3个种源希蒙得木幼苗LRWC、净光合速率(Pn)和蒸腾速率(Tr)均随高温胁迫强度的加剧呈极显著下降趋势,叶绿素含量(Chl)、可溶性糖含量显著降低;脯氨酸(Pro)含量呈极显著上升趋势;丙二醛(MDA)含量和相对电导率(REC)均随高温胁迫强度的加剧而显著增加;高温胁迫对希蒙得木幼苗叶片过氧化物酶(POD)活性和超氧化物歧化酶(SOD)活性的影响因种源不同而有所差异,Z1、Z2的POD活性随高温胁迫程度的增加而持续上升,Z3的POD活性呈先上升后降低趋势;Z1、Z3的SOD活性随高温胁迫程度的增加而持续上升,Z2的SOD活性呈先降低后上升趋势.以隶属函数法综合分析各种源希蒙得木幼苗抗高温能力的结果表明,3个种源希蒙得木幼苗的抗高温能力由强到弱依次为:Z1、Z3和Z2.结果表明,长期生长在金沙江干热河谷的Z1可能已经适应当地的生态环境,表现出较好的抵御高温的能力,不同种源耐高温能力的差异可能是通过种源所在地的气候、土壤、海拔等因子综合作用,经过漫长的系统发育,产生不同的变异结果.  相似文献   

18.
Paris M  Despres L 《Molecular ecology》2012,21(7):1672-1686
AFLP‐based genome scans are widely used to study the genetics of adaptation and to identify genomic regions potentially under selection. However, this approach usually fails to detect the actual genes or mutations targeted by selection owing to the difficulties of obtaining DNA sequences from AFLP fragments. Here, we combine classical AFLP outlier detection with 454 sequencing of AFLP fragments to obtain sequences from outlier loci. We applied this approach to the study of resistance to Bacillus thuringiensis israelensis (Bti) toxins in the dengue vector Aedes aegypti. A genome scan of Bti‐resistant and Bti‐susceptible A. aegypti laboratory strains was performed based on 432 AFLP markers. Fourteen outliers were detected using two different population genetic algorithms. Out of these, 11 were successfully sequenced. Three contained transposable elements (TEs) sequences, and the 10 outliers that could be mapped at a unique location in the reference genome were located on different supercontigs. One outlier was in the vicinity of a gene coding for an aminopeptidase potentially involved in Bti toxin‐binding. Patterns of sequence variability of this gene showed significant deviation from neutrality in the resistant strain but not in the susceptible strain, even after taking into account the known demographic history of the selected strain. This gene is a promising candidate for future functional analysis.  相似文献   

19.
Identifying local adaptation is crucial in conservation biology to define ecotypes and establish management guidelines. Local adaptation is often inferred from the detection of loci showing a high differentiation between populations, the so‐called FST outliers. Methods of detection of loci under selection are reputed to be robust in most spatial population models. However, using simulations we showed that FST outlier tests provided a high rate of false‐positives (up to 60%) in fractal environments such as river networks. Surprisingly, the number of sampled demes was correlated with parameters of population genetic structure, such as the variance of FSTs, and hence strongly influenced the rate of outliers. This unappreciated property of river networks therefore needs to be accounted for in genetic studies on adaptation and conservation of river organisms.  相似文献   

20.
Recent studies in streams and ponds have demonstrated that the distribution and biomass of aquatic organisms can be estimated by detection and quantification of environmental DNA (eDNA). In more open systems such as seas, it is not evident whether eDNA can represent the distribution and biomass of aquatic organisms because various environmental factors (e.g., water flow) are expected to affect eDNA distribution and concentration. To test the relationships between the distribution of fish and eDNA, we conducted a grid survey in Maizuru Bay, Sea of Japan, and sampled surface and bottom waters while monitoring biomass of the Japanese jack mackerel (Trachurus japonicus) using echo sounder technology. A linear model showed a high R2 value (0.665) without outlier data points, and the association between estimated eDNA concentrations from the surface water samples and echo intensity was significantly positive, suggesting that the estimated spatial variation in eDNA concentration can reflect the local biomass of the jack mackerel. We also found that a best-fit model included echo intensity obtained within 10–150 m from water sampling sites, indicating that the estimated eDNA concentration most likely reflects fish biomass within 150 m in the bay. Although eDNA from a wholesale fish market partially affected eDNA concentration, we conclude that eDNA generally provides a ‘snapshot’ of fish distribution and biomass in a large area. Further studies in which dynamics of eDNA under field conditions (e.g., patterns of release, degradation, and diffusion of eDNA) are taken into account will provide a better estimate of fish distribution and biomass based on eDNA.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号