首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 203 毫秒
1.
《Genomics》2019,111(6):1946-1955
Feature selection is the problem of finding the best subset of features which have the most impact in predicting class labels. It is noteworthy that application of feature selection is more valuable in high dimensional datasets. In this paper, a filter feature selection method has been proposed on high dimensional binary medical datasets – Colon, Central Nervous System (CNS), GLI_85, SMK_CAN_187. The proposed method incorporates three sections. First, whale algorithm has been used to discard irrelevant features. Second, the rest of features are ranked based on a frequency based heuristic approach called Mutual Congestion. Third, majority voting has been applied on best feature subsets constructed using forward feature selection with threshold τ = 10. This work provides evidence that Mutual Congestion is solely powerful to predict class labels. Furthermore, applying whale algorithm increases the overall accuracy of Mutual Congestion in most of the cases. The findings also show that the proposed method improves the prediction with selecting the less possible features in comparison with state of the arts.https://github.com/hnematzadeh  相似文献   

2.
遗传优化算法在基因数据分类中的应用   总被引:1,自引:0,他引:1  
本文提出了一种基于遗传算法的基因微阵列数据特征提取方法。首先对原始数据进行标准化,然后利用方差分析方法对数据进行降低维数处理,最后利用遗传算法对数据进行优化。针对基因数据对遗传算子和适应度函数进行设置,优化数据集选取特征基因,得到较小的特征子集。为了验证选取的特征,利用样本划分法通过判别分析建立分类器进行判定。实验论证此方法具有理想的分类效果,算法稳定、效率高。  相似文献   

3.
Given the pace at which human-induced environmental changes occur, a pressing challenge is to determine the speed with which selection can drive evolutionary change. A key determinant of adaptive response to multivariate phenotypic selection is the additive genetic variance–covariance matrix (G). Yet knowledge of G in a population experiencing new or altered selection is not sufficient to predict selection response because G itself evolves in ways that are poorly understood. We experimentally evaluated changes in G when closely related behavioural traits experience continuous directional selection. We applied the genetic covariance tensor approach to a large dataset (n = 17 328 individuals) from a replicated, 31-generation artificial selection experiment that bred mice for voluntary wheel running on days 5 and 6 of a 6-day test. Selection on this subset of G induced proportional changes across the matrix for all 6 days of running behaviour within the first four generations. The changes in G induced by selection resulted in a fourfold slower-than-predicted rate of response to selection. Thus, selection exacerbated constraints within G and limited future adaptive response, a phenomenon that could have profound consequences for populations facing rapid environmental change.  相似文献   

4.
The land surface phenology (LSP) start of season (SOS) metric signals the seasonal onset of vegetation activity, including canopy growth and associated increases in land-atmosphere water, energy and carbon (CO2) exchanges influencing weather and climate variability. The vegetation optical depth (VOD) parameter determined from satellite passive microwave remote sensing provides for global LSP monitoring that is sensitive to changes in vegetation canopy water content and biomass, and insensitive to atmosphere and solar illumination constraints. Direct field measures of canopy water content and biomass changes desired for LSP validation are generally lacking due to the prohibitive costs of maintaining regional monitoring networks. Alternatively, a normalized microwave reflectance index (NMRI) derived from GPS base station measurements is sensitive to daily vegetation water content changes and may provide for effective microwave LSP validation. We compared multiyear (2007–2011) NMRI and satellite VOD records at over 300 GPS sites in North America, and their derived SOS metrics for a subset of 24 homogenous land cover sites to investigate VOD and NMRI correspondence, and potential NMRI utility for LSP validation. Significant correlations (P?<?0.05) were found at 276 of 305 sites (90.5 %), with generally favorable correspondence in the resulting SOS metrics (r 2?=?0.73, P?<?0.001, RMSE = 36.8 days). This study is the first attempt to compare satellite microwave LSP metrics to a GPS network derived reflectance index and highlights both the utility and limitations of the NMRI data for LSP validation, including spatial scale discrepancies between local NMRI measurements and relatively coarse satellite VOD retrievals.  相似文献   

5.
This paper shows an adaptive statistical test for QRS detection of electrocardiography (ECG) signals. The method is based on a M-ary generalized likelihood ratio test (LRT) defined over a multiple observation window in the Fourier domain. The motivations for proposing another detection algorithm based on maximum a posteriori (MAP) estimation are found in the high complexity of the signal model proposed in previous approaches which i) makes them computationally unfeasible or not intended for real time applications such as intensive care monitoring and (ii) in which the parameter selection conditions the overall performance. In this sense, we propose an alternative model based on the independent Gaussian properties of the Discrete Fourier Transform (DFT) coefficients, which allows to define a simplified MAP probability function. In addition, the proposed approach defines an adaptive MAP statistical test in which a global hypothesis is defined on particular hypotheses of the multiple observation window. In this sense, the observation interval is modeled as a discontinuous transmission discrete-time stochastic process avoiding the inclusion of parameters that constraint the morphology of the QRS complexes.  相似文献   

6.
Identification of risk factors in patients with a particular disease can be analyzed in clinical data sets by using feature selection procedures of pattern recognition and data mining methods. The applicability of the relaxed linear separability (RLS) method of feature subset selection was checked for high-dimensional and mixed type (genetic and phenotypic) clinical data of patients with end-stage renal disease. The RLS method allowed for substantial reduction of the dimensionality through omitting redundant features while maintaining the linear separability of data sets of patients with high and low levels of an inflammatory biomarker. The synergy between genetic and phenotypic features in differentiation between these two subgroups was demonstrated.  相似文献   

7.
The large size of the sorghum [Sorghum bi-color (L.) Moench] landrace collection maintained by ICRISAT lead to the establishment of a core collection. Thus, three subsets of around 200 accessions were established from: (1) a random sampling after stratification of the entire landrace collection (L), (2) a selective sampling based on quantitative characters (PCS), and (3) a selection based on the geographical origin of landraces and the traits under farmers’ selection (T). An assessment was done of the genetic diversity retained by each sampling strategy using the polymorphisms at 15 microsatellite loci. The landraces of each subset were genotyped with three multiplex polymerase chain reactions (PCRs) of five fluorescent primer-pairs each with semi-automated allele sizing. The average allelic richness for each subset was equivalent (16.1, 16.3 and 15.4 alleles per locus for the subsets PCS, L, and T, respectively). The average genetic diversity was also comparable for the three subsets (0.81, 0.77 and 0.80 for the subsets PCS, L, and T, respectively). Allelic frequency distribution for each subset was compared with a chi-square test but few significant differences were observed. A high percentage of rare alleles (71 to 76% of 206 total rare alleles) was maintained in the three subsets. The global molecular diversity retained in each subset was not affected by a sampling procedure based upon phenotypic characters.  相似文献   

8.
A computer algorithm is presented which allows selection of a subset of multiplex markers based on the minimisation of an optimality criterion for a genetic linkage map. It could be applied for choosing a subset of primers (e.g. RAPD, IMA or AFLP), each of which provides several unevenly spaced genetic markers. The goal is to achieve a saturated map of evenly spaced markers, using as few primers as possible to minimise cost and labour. Minimising the average map distance between markers is trivial, but simply leads to selection of those primers which provide the greatest number of markers. However, minimising the standard deviation of interval length ensures that weight is given both to the number of markers and to the evenness of their distribution on the linkage map. This criterion was found empirically to give a result fairly close to the optimum. A stepwise-like selection procedure is therefore implemented, which stops when the optimality criterion does not decrease any more. An example is given of a molecular map of perennial ryegrass with 463 markers obtained from 17 AFLP primers. It is demonstrated that this can be safely reduced to a 175 marker map with only 6 primers. Genetic diversity studies may also benefit from using such a subset of less-redundant markers in genetic distance estimation. Received: 17 March 1999 / Accepted: 23 August 1999  相似文献   

9.
The recent development of lightweight GPS collars has enabled medium-to-small sized animals to be tracked via GPS telemetry. Evaluation of the performance and accuracy of GPS collars is largely confined to devices designed for large animals for deployment in natural environments. This study aimed to assess the performance of lightweight GPS collars within a suburban environment, which may be different from natural environments in a way that is relevant to satellite signal acquisition. We assessed the effects of vegetation complexity, sky availability (percentage of clear sky not obstructed by natural or artificial features of the environment), proximity to buildings, and satellite geometry on fix success rate (FSR) and location error (LE) for lightweight GPS collars within a suburban environment. Sky availability had the largest affect on FSR, while LE was influenced by sky availability, vegetation complexity, and HDOP (Horizontal Dilution of Precision). Despite the complexity and modified nature of suburban areas, values for FSR ( = 90.6%) and LE ( = 30.1 m) obtained within the suburban environment are comparable to those from previous evaluations of GPS collars designed for larger animals and within less built-up environments. Due to fine-scale patchiness of habitat within urban environments, it is recommended that resource selection methods that are not reliant on buffer sizes be utilised for selection studies.  相似文献   

10.
In order to classify the real/pseudo human precursor microRNA (pre-miRNAs) hairpins with ab initio methods, numerous features are extracted from the primary sequence and second structure of pre-miRNAs. However, they include some redundant and useless features. It is essential to select the most representative feature subset; this contributes to improving the classification accuracy. We propose a novel feature selection method based on a genetic algorithm, according to the characteristics of human pre-miRNAs. The information gain of a feature, the feature conservation relative to stem parts of pre-miRNA, and the redundancy among features are all considered. Feature conservation was introduced for the first time. Experimental results were validated by cross-validation using datasets composed of human real/pseudo pre-miRNAs. Compared with microPred, our classifier miPredGA, achieved more reliable sensitivity and specificity. The accuracy was improved nearly 12%. The feature selection algorithm is useful for constructing more efficient classifiers for identification of real human pre-miRNAs from pseudo hairpins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号