首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
《IRBM》2022,43(2):107-113
Background and objectiveAn important task of the brain-computer interface (BCI) of motor imagery is to extract effective time-domain features, frequency-domain features or time-frequency domain features from the raw electroencephalogram (EEG) signals for classification of motor imagery. However, choosing an appropriate method to combine time domain and frequency domain features to improve the performance of motor imagery recognition is still a research hotspot.MethodsIn order to fully extract and utilize the time-domain and frequency-domain features of EEG in classification tasks, this paper proposed a novel dual-stream convolutional neural network (DCNN), which can use time domain signal and frequency domain signal as the inputs, and the extracted time-domain features and frequency-domain features are fused by linear weighting for classification training. Furthermore, the weight can be learned by the DCNN automatically.ResultsThe experiments based on BCI competition II dataset III and BCI competition IV dataset 2a showed that the model proposed by this study has better performance than other conventional methods. The model used time-frequency signal as the inputs had better performance than the model only used time-domain signals or frequency-domain signals. The accuracy of classification was improved for each subject compared with the models only used one signals as the inputs.ConclusionsFurther analysis shown that the fusion weight of different subject is specifically, adjusting the weight coefficient automatically is helpful to improve the classification accuracy.  相似文献   

2.
3.
Automatic text categorization is one of the key techniques in information retrieval and the data mining field. The classification is usually time-consuming when the training dataset is large and high-dimensional. Many methods have been proposed to solve this problem, but few can achieve satisfactory efficiency. In this paper, we present a method which combines the Latent Dirichlet Allocation (LDA) algorithm and the Support Vector Machine (SVM). LDA is first used to generate reduced dimensional representation of topics as feature in VSM. It is able to reduce features dramatically but keeps the necessary semantic information. The Support Vector Machine (SVM) is then employed to classify the data based on the generated features. We evaluate the algorithm on 20 Newsgroups and Reuters-21578 datasets, respectively. The experimental results show that the classification based on our proposed LDA+SVM model achieves high performance in terms of precision, recall and F1 measure. Further, it can achieve this within a much shorter time-frame. Our process improves greatly upon the previous work in this field and displays strong potential to achieve a streamlined classification process for a wide range of applications.  相似文献   

4.
The development of new schemes for weighting DNA sequence data for phylogenetic analysis continues to outpace the development of consensus on the most appropriate weights. The present study is an exploration of the similarities and differences between results from 22 character weighting schemes when applied to a study of barbet and toucan (traditional avian families Capitonidae and Ramphastidae) phylogenetic relationships. The dataset comprises cytochrome b sequences for representatives of all toucan and Neotropical barbet genera, as well as for several genera of Paleotropical barbets. The 22 weighting schemes produced conflicting patterns of relationship among taxa, often with conflicting patterns each receiving strong bootstrap support. Use of multiple weighting schemes helped to identify the source within the dataset (codon position, transitions, transversions) of the various putative phylogenetic signals. Importantly, some phylogenetic hypotheses were consistently supported despite the wide range of weights employed. The use of phylogenetic frameworks to summarize the results of these multiple analyses proved very informative. Relationships among barbets and toucans inferred from these data support the paraphyly of the traditional Capitonidae. Additionally, these data support paraphyly of Neotropical barbets, but rather than indicating a relationship between Semnornis and toucans, as previously suggested by morphological data, most analyses indicate a basal position of Semnornis within the Neotropical radiation. The cytochrome b data also allow inference of relationships among toucans. Supported hypotheses include Ramphastos as the sister to all other toucans, a close relationship of Baillonius and Pteroglossus with these two genera as the sister group to an (Andigena, Selenidera) clade, and the latter four genera as a sister group to Aulacorhynchus.  相似文献   

5.
Lee BK  Lessler J  Stuart EA 《PloS one》2011,6(3):e18174
Propensity score weighting is sensitive to model misspecification and outlying weights that can unduly influence results. The authors investigated whether trimming large weights downward can improve the performance of propensity score weighting and whether the benefits of trimming differ by propensity score estimation method. In a simulation study, the authors examined the performance of weight trimming following logistic regression, classification and regression trees (CART), boosted CART, and random forests to estimate propensity score weights. Results indicate that although misspecified logistic regression propensity score models yield increased bias and standard errors, weight trimming following logistic regression can improve the accuracy and precision of final parameter estimates. In contrast, weight trimming did not improve the performance of boosted CART and random forests. The performance of boosted CART and random forests without weight trimming was similar to the best performance obtainable by weight trimmed logistic regression estimated propensity scores. While trimming may be used to optimize propensity score weights estimated using logistic regression, the optimal level of trimming is difficult to determine. These results indicate that although trimming can improve inferences in some settings, in order to consistently improve the performance of propensity score weighting, analysts should focus on the procedures leading to the generation of weights (i.e., proper specification of the propensity score model) rather than relying on ad-hoc methods such as weight trimming.  相似文献   

6.
The evolution of the influenza A virus to increase its host range is a major concern worldwide. Molecular mechanisms of increasing host range are largely unknown. Influenza surface proteins play determining roles in reorganization of host-sialic acid receptors and host range. In an attempt to uncover the physic-chemical attributes which govern HA subtyping, we performed a large scale functional analysis of over 7000 sequences of 16 different HA subtypes. Large number (896) of physic-chemical protein characteristics were calculated for each HA sequence. Then, 10 different attribute weighting algorithms were used to find the key characteristics distinguishing HA subtypes. Furthermore, to discover machine leaning models which can predict HA subtypes, various Decision Tree, Support Vector Machine, Naïve Bayes, and Neural Network models were trained on calculated protein characteristics dataset as well as 10 trimmed datasets generated by attribute weighting algorithms. The prediction accuracies of the machine learning methods were evaluated by 10-fold cross validation. The results highlighted the frequency of Gln (selected by 80% of attribute weighting algorithms), percentage/frequency of Tyr, percentage of Cys, and frequencies of Try and Glu (selected by 70% of attribute weighting algorithms) as the key features that are associated with HA subtyping. Random Forest tree induction algorithm and RBF kernel function of SVM (scaled by grid search) showed high accuracy of 98% in clustering and predicting HA subtypes based on protein attributes. Decision tree models were successful in monitoring the short mutation/reassortment paths by which influenza virus can gain the key protein structure of another HA subtype and increase its host range in a short period of time with less energy consumption. Extracting and mining a large number of amino acid attributes of HA subtypes of influenza A virus through supervised algorithms represent a new avenue for understanding and predicting possible future structure of influenza pandemics.  相似文献   

7.
Combined with neural language models, distributed word representations achieve significant advantages in computational linguistics and text mining. Most existing models estimate distributed word vectors from large-scale data in an unsupervised fashion, which, however, do not take rich linguistic knowledge into consideration. Linguistic knowledge can be represented as either link-based knowledge or preference-based knowledge, and we propose knowledge regularized word representation models (KRWR) to incorporate these prior knowledge for learning distributed word representations. Experiment results demonstrate that our estimated word representation achieves better performance in task of semantic relatedness ranking. This indicates that our methods can efficiently encode both prior knowledge from knowledge bases and statistical knowledge from large-scale text corpora into a unified word representation model, which will benefit many tasks in text mining.  相似文献   

8.
Summary In this paper we present an iterative character weighting method for the construction of phyletic trees. An initial tree is used to calculate the character weights, which are the number of mutations normalized so that the possible range is corrected for. The weights obtained are used to adjust the tree; this process is iterated until a stable tree is found. Using data generated according to a model tree, we show that the trees constructed by the iterative character weighting method converge to the true underlying tree. Using biological data, the trees become closer to the systematic classification of the species concerned, and patterns conflicting with the phylogenetic pattern can be singled out. The method involves a combination of minimal length methods and similarity methods, whereby the strict parsimony criterion is relaxed.  相似文献   

9.
The specific species-rich high-altitude vegetation of the class Carici rupestris-Kobresietea bellardii Ohba 1974 (CK), with the occurrence of many arctic-alpine and endemic species, was chosen for a case study. The analyses were based on a dataset of 37,204 phytosociological relevés from the Slovak Vegetation Database. The traditional classification of the class CK, based on cluster analyses, was reproduced satisfactorily by means of formalised classification, based on the formal definitions created by the Cocktail method together with the frequency-positive fidelity index affiliation. Unequivocal assignment criteria for all eight associations of both alliances [Oxytropido-Elynion Br.-Bl. (1948) 1949 and Festucion versicoloris Krajina 1933] of the class CK were formulated. The formal delimitations followed the traditional ones very well. It was demonstrated that the results of applying the formal definitions created on the basis of a large, geographically stratified dataset capturing the occurrence of all vegetation types in Slovakia were highly similar in comparison with the traditional classification based on the results of cluster analysis. The reliability and the pros and cons of the expert system are also discussed.  相似文献   

10.
Rapid and reliable estimation of population size is needed for the efficient monitoring of animal populations of conservation concern. Unfortunately, technical advances in this area have not been paralleled in uptake in conservation, which may be due to difficulties in implementation or the lack of general guidelines for application. Here we tested five different methods used to estimate population size [capture–mark–recapture (CMR), finite-mixture models, model averaging of finite-mixture models, accumulation curve methods (ACM), and the line transect method (LT)] using extensive capture–recapture data of the giant day gecko (Gekkonidae, Phelsuma madagascariensis grandis, Gray 1870) at the Masoala rainforest exhibit, Zurich Zoo. When the complete data were analyzed [30 sessions (and 27 sessions for the LT)], all methods except the LT produced similar estimates of population size. The simple ACM gave a small coefficient of variation (CV), but did not cover the most likely value of population size at moderate sampling effort. Nevertheless, the ACM was the only method that showed a reasonable convergence when subsets of data were used. CMR and Pledger models included the reference value in their confidence intervals (CI) after 25 and 30 sessions, respectively. Although model averaging did slightly improve the estimate, the CV was still high for the full dataset. Our method of using subsets of data to test the robustness of estimates is simple to apply and could be adopted more widely in such analyzes to evaluate sensitivity to method of evaluation. In conclusion, simple accumulation methods showed similar efficiency to more complex statistical models, and are likely to be sufficiently precise for most conservation monitoring purposes. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

11.
We show here an example of the application of a novel method, MUTIC (model utilization-based clustering), used for identifying complex interactions between genes or gene categories based on gene expression data. The method deals with binary categorical data which consist of a set of gene expression profiles divided into two biologically meaningful categories. It does not require data from multiple time points. Gene expression profiles are represented by feature vectors whose component features are either gene expression values, or averaged expression values corresponding to gene ontology or protein information resource categories. A supervised learning algorithm (genetic programming) is used to learn an ensemble of classification models distinguishing the two categories based on the feature vectors corresponding to their members. Each feature is associated with a "model utilization vector", which has an entry for each high-quality classification model found, indicating whether or not the feature was used in that model. These utilization vectors are then clustered using a variant of hierarchical clustering called Omniclust. The result is a set of model utilization-based clusters, in which features are gathered together if they are often considered together by classification models - which may be because they are co-expressed, or may be for subtler reasons involving multi-gene interactions. The MUTIC method is illustrated here by applying it to a dataset regarding gene expression in prostate cancer and control samples. Compared to traditional expression-based clustering, MUTIC yields clusters that have higher mathematical quality (in the sense of homogeneity and separation) and that also yield novel insights into the underlying biological processes.  相似文献   

12.

Background  

Microarray experiments are becoming a powerful tool for clinical diagnosis, as they have the potential to discover gene expression patterns that are characteristic for a particular disease. To date, this problem has received most attention in the context of cancer research, especially in tumor classification. Various feature selection methods and classifier design strategies also have been generally used and compared. However, most published articles on tumor classification have applied a certain technique to a certain dataset, and recently several researchers compared these techniques based on several public datasets. But, it has been verified that differently selected features reflect different aspects of the dataset and some selected features can obtain better solutions on some certain problems. At the same time, faced with a large amount of microarray data with little knowledge, it is difficult to find the intrinsic characteristics using traditional methods. In this paper, we attempt to introduce a combinational feature selection method in conjunction with ensemble neural networks to generally improve the accuracy and robustness of sample classification.  相似文献   

13.
《IRBM》2021,42(6):435-441
BackgroundA complete dataset is essential for biomedical implementation. Due to the limitation of objective or subjective factors, missing data often occurs, which exerts uncertainty in the subsequent data processing. Commonly used methods of interpolation are interpolating substitute values that keep minimum error. Some applications of statistics are usually used for handling this problem.MethodsWe are trying to find a higher performance interpolation method compared with the usual statistic methods, by using artificial intelligence which is in full swing today. The prediction and classification of backpropagation neural network are used in this paper, describes a missing data interpolation method to propose the interpolation model that mines association rules in the data. In the experiment, depending on a multi-layer network structure, the model is trained and tested by sample data, constantly revises network weights and thresholds. The error function decreases along the negative gradient direction and approaches the expected real output. The model is validated on the breast cancer dataset, and we select real samples from the data set for validation, moreover, add four traditional methods as a control group.ResultsThe proposed method has great performance improvement in the interpolation of missing data. Experimental results show that the interpolation accuracy of our proposed method (84%) is higher than four traditional methods (1.33%, 74.67%, 73.33%, 77.33%) as mentioned in this paper, BPNN stays low in MSE evaluation. Finally, we analyze the performance of various methods in processing missing data.ConclusionsThe study in this paper has estimated missing data with high accuracy as much as possible to reduce the negative impact in the diagnosis of real life. At the same time, it can also assist in missing data processing in the biomedical field.  相似文献   

14.
We develop a new weighting approach of gene ontology (GO) terms for predicting protein subcellular localization. The weights of individual GO terms, corresponding to their contribution to the prediction algorithm, are determined by the term-weighting methods used in text categorization. We evaluate several term-weighting methods, which are based on inverse document frequency, information gain, gain ratio, odds ratio, and chi-square and its variants. Additionally, we propose a new term-weighting method based on the logarithmic transformation of chi-square. The proposed term-weighting method performs better than other term-weighting methods, and also outperforms state-of-the-art subcellular prediction methods. Our proposed method achieves 98.1%, 99.3%, 98.1%, 98.1%, and 95.9% overall accuracies for the animal BaCelLo independent dataset (IDS), fungal BaCelLo IDS, animal Höglund IDS, fungal Höglund IDS, and PLOC dataset, respectively. Furthermore, the close correlation between high-weighted GO terms and subcellular localizations suggests that our proposed method appropriately weights GO terms according to their relevance to the localizations.  相似文献   

15.
16.
Generative models have shown breakthroughs in a wide spectrum of domains due to recent advancements in machine learning algorithms and increased computational power. Despite these impressive achievements, the ability of generative models to create realistic synthetic data is still under-exploited in genetics and absent from population genetics. Yet a known limitation in the field is the reduced access to many genetic databases due to concerns about violations of individual privacy, although they would provide a rich resource for data mining and integration towards advancing genetic studies. In this study, we demonstrated that deep generative adversarial networks (GANs) and restricted Boltzmann machines (RBMs) can be trained to learn the complex distributions of real genomic datasets and generate novel high-quality artificial genomes (AGs) with none to little privacy loss. We show that our generated AGs replicate characteristics of the source dataset such as allele frequencies, linkage disequilibrium, pairwise haplotype distances and population structure. Moreover, they can also inherit complex features such as signals of selection. To illustrate the promising outcomes of our method, we showed that imputation quality for low frequency alleles can be improved by data augmentation to reference panels with AGs and that the RBM latent space provides a relevant encoding of the data, hence allowing further exploration of the reference dataset and features for solving supervised tasks. Generative models and AGs have the potential to become valuable assets in genetic studies by providing a rich yet compact representation of existing genomes and high-quality, easy-access and anonymous alternatives for private databases.  相似文献   

17.
Radiation-related risks of cancer can be transported from one population to another population at risk, for the purpose of calculating lifetime risks from radiation exposure. Transfer via excess relative risks (ERR) or excess absolute risks (EAR) or a mixture of both (i.e., from the life span study (LSS) of Japanese atomic bomb survivors) has been done in the past based on qualitative weighting. Consequently, the values of the weights applied and the method of application of the weights (i.e., as additive or geometric weighted means) have varied both between reports produced at different times by the same regulatory body and also between reports produced at similar times by different regulatory bodies. Since the gender and age patterns are often markedly different between EAR and ERR models, it is useful to have an evidence-based method for determining the relative goodness of fit of such models to the data. This paper identifies a method, using Akaike model weights, which could aid expert judgment and be applied to help to achieve consistency of approach and quantitative evidence-based results in future health risk assessments. The results of applying this method to recent LSS cancer incidence models are that the relative EAR weighting by cancer solid cancer site, on a scale of 0–1, is zero for breast and colon, 0.02 for all solid, 0.03 for lung, 0.08 for liver, 0.15 for thyroid, 0.18 for bladder and 0.93 for stomach. The EAR weighting for female breast cancer increases from 0 to 0.3, if a generally observed change in the trend between female age-specific breast cancer incidence rates and attained age, associated with menopause, is accounted for in the EAR model. Application of this method to preferred models from a study of multi-model inference from many models fitted to the LSS leukemia mortality data, results in an EAR weighting of 0. From these results it can be seen that lifetime risk transfer is most highly weighted by EAR only for stomach cancer. However, the generalization and interpretation of radiation effect estimates based on the LSS cancer data, when projected to other populations, are particularly uncertain if considerable differences exist between site-specific baseline rates in the LSS and the other populations of interest. Definitive conclusions, regarding the appropriate method for transporting cancer risks, are limited by a lack of knowledge in several areas including unknown factors and uncertainties in biological mechanisms and genetic and environmental risk factors for carcinogenesis; uncertainties in radiation dosimetry; and insufficient statistical power and/or incomplete follow-up in data from radio-epidemiological studies.  相似文献   

18.
目的 长非编码RNA(lncRNAs)参与多种重要的生物学过程并与各种人类疾病密切相关,因此,lncRNA-疾病关联预测研究有助于疾病的诊断、治疗和在分子水平理解人类疾病的发生发展机制。目前,大多数lncRNA-疾病关联预测方法倾向于浅层整合lncRNA和疾病的相关信息,忽略网络拓扑结构中的深层嵌入特征;另外通过随机选取lncRNA-疾病非关联对构建负样本训练集合,影响预测方法的鲁棒性。方法 本文提出一种基于网络嵌入的NELDA方法,预测潜在的lncRNA-疾病关联关系。NELDA首先利用lncRNA 表达谱、疾病本体论和已知的lncRNA-疾病关联关系,构建lncRNA相似性网络、疾病相似性网络和lncRNA-疾病关联网络。然后,通过设计4个深度自编码器分别从lncRNA/疾病的相似性网络、lncRNA-疾病关联网络学习lncRNA和疾病的低维网络嵌入特征。串联lncRNA和疾病的相似性网络嵌入特征及lncRNA和疾病的关联网络嵌入特征,分别输入两个支持向量机分类器预测lncRNA-疾病关联。最后,采用加权融合策略融合两个支持向量机分类器的预测结果,给出lncRNA-疾病关联关系的最终预测结果。另外,根据已知的lncRNA-疾病关联对和疾病语义相似性,设计一种负样本选取策略构建可信度相对较高的lncRNA-疾病非关联对样本集,用以改善分类器的鲁棒性,该策略通过设计一种打分函数为每对lncRNA-疾病进行打分,选取得分较低的lncRNA-疾病对作为lncRNA-疾病非关联对样本(即负样本)。结果 十折交叉验证实验结果表明:NELDA能够有效预测lncRNA-疾病关联关系,其AUC达到0.982 7,比现有LDASR和 LDNFSGB方法分别提高了0.062 7和0.020 7。另外,负样本选取策略与决策级加权融合策略能够有效改善NELDA预测性能。胃癌和乳腺癌案例研究中,29/40(72.5%)预测的与胃癌和乳腺癌关联lncRNAs,在近期文献和公共数据库中能够发现相关的支撑证据。结论 这些实验结果表明,NELDA是一种有效的lncRNA-疾病关联关系预测方法,具有挖掘潜在lncRNA-疾病关联关系的能力。  相似文献   

19.
The Charipinae are a major group of hyperparasitoids of Hemiptera. Here, we present the first cladistic analysis of this subfamily's internal relationships, based on 96 morphological characters of adults. The data matrix was analysed using uniformly weighted parsimony. The effects of using alternative weighting schemes were explored by performing additional searches employing implied weights criteria. One of the caveats of implied weights analysis is that it lacks an objective criterion for selecting the value of the concavity function. In the present study, differential weighting was used to explore the sensitivity of our results to the alternative assumptions made in the analysis and to select one of the most parsimonious trees under equal weights, which we regard as being the hypothesis that minimizes the amount of ad hoc assumptions. The validity of the two existing tribes and the monophyly of all the genera of Charipinae were tested, in particular the cosmopolitan and highly species-rich Alloxysta and Phaenoglyphis , which appear repeatedly in ecological and biochemical studies of host–parasitoid associations. The evolution of several major characters and the relationships between genera are discussed. On the basis of the phylogenetic results, we discuss a number of taxonomic issues. A new classification of the subfamily is proposed in which no tribes are maintained, Carvercharips is synonymyzed with Alloxysta , and the creation of a new genus from Nepal is justified. Our analysis points to the need for a world revision of the basal genus Phaenoglyphis , which is shown as paraphyletic.  相似文献   

20.
《Genomics》2020,112(5):3089-3096
Automatic classification of glaucoma from fundus images is a vital diagnostic tool for Computer-Aided Diagnosis System (CAD). In this work, a novel fused feature extraction technique and ensemble classifier fusion is proposed for diagnosis of glaucoma. The proposed method comprises of three stages. Initially, the fundus images are subjected to preprocessing followed by feature extraction and feature fusion by Intra-Class and Extra-Class Discriminative Correlation Analysis (IEDCA). The feature fusion approach eliminates between-class correlation while retaining sufficient Feature Dimension (FD) for Correlation Analysis (CA). The fused features are then fed to the classifiers namely Support Vector Machine (SVM), Random Forest (RF) and K-Nearest Neighbor (KNN) for classification individually. Finally, Classifier fusion is also designed which combines the decision of the ensemble of classifiers based on Consensus-based Combining Method (CCM). CCM based Classifier fusion adjusts the weights iteratively after comparing the outputs of all the classifiers. The proposed fusion classifier provides a better improvement in accuracy and convergence when compared to the individual algorithms. A classification accuracy of 99.2% is accomplished by the two-level hybrid fusion approach. The method is evaluated on the public datasets High Resolution Fundus (HRF) and DRIVE datasets with cross dataset validation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号