首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
A review of feature selection techniques in bioinformatics   总被引:13,自引:0,他引:13  
Feature selection techniques have become an apparent need in many bioinformatics applications. In addition to the large pool of techniques that have already been developed in the machine learning and data mining fields, specific applications in bioinformatics have led to a wealth of newly proposed techniques. In this article, we make the interested reader aware of the possibilities of feature selection, providing a basic taxonomy of feature selection techniques, and discussing their use, variety and potential in a number of both common as well as upcoming bioinformatics applications.  相似文献   

2.
Holstein cows (n=221) from eight commercial dairy herds were examined for endometritis between 28 and 41 days postpartum using 5 diagnostic techniques: (1) vaginoscopy; (2) ultrasonographic assessment of uterine fluid volume; (3) ultrasonographic assessment of endometrial thickness; (4) endometrial cytology collected by cytobrush; and (5) endometrial cytology collected by uterine lavage. Concordance correlation was used to evaluate the reliability of cytobrush and lavage cytology. Cytobrush cytology was found to have the greatest intraobserver repeatability (cytobrush, rho(c)=0.85 versus lavage, rho(c)=0.76) and was chosen as the reference diagnostic test. Pregnancy data at 150 days postpartum was available for 189 cows. Survival analysis was used to determine the lowest percentage of polymorphonuclear cells associated with time to pregnancy. The sensitivity and specificity of the diagnostic techniques was determined using pregnancy status at 150 days and cytobrush cytology as the diagnostic standards. The risk of non-pregnancy at 150 days was 1.9 times higher in cows with more than 8% PMNs identified using cytobrush cytology than in cows with less than 8% PMNs (P=0.04). Twenty-one cows of 189 cows (11.1%) had >8% PMNs and were considered to be positive for endometritis. Cows with endometritis had a 17.9% lower first service conception rate (P=0.03) and a 24-day increase in median days open (P=0.04). The sensitivities of all five diagnostic tests relative to 150-day pregnancy status ranged from 7.1 to 14.3% and the specificities from 84.0 to 93.3%. Relative to cytobrush cytology, the respective sensitivity and specificity values are as follows: vaginoscopy (53.9%, 95.4%); lavage cytology (92.3%, 93.9%); ultrasonographic assessment of uterine fluid (30.8%, 92.8%); and ultrasonographic assessment of endometrial thickness (3.9%, 89.2%). Endometritis impaired reproductive performance. Cytobrush cytology was the most reliable method of diagnosing endometritis in cattle.  相似文献   

3.
A plenitude of feature selection (FS) methods is available in the literature, most of them rising as a need to analyze data of very high dimension, usually hundreds or thousands of variables. Such data sets are now available in various application areas like combinatorial chemistry, text mining, multivariate imaging, or bioinformatics. As a general accepted rule, these methods are grouped in filters, wrappers, and embedded methods. More recently, a new group of methods has been added in the general framework of FS: ensemble techniques. The focus in this survey is on filter feature selection methods for informative feature discovery in gene expression microarray (GEM) analysis, which is also known as differentially expressed genes (DEGs) discovery, gene prioritization, or biomarker discovery. We present them in a unified framework, using standardized notations in order to reveal their technical details and to highlight their common characteristics as well as their particularities.  相似文献   

4.
5.
The aim of the present study was to investigate the reflection of psychoemotional stress in the body surface potential distribution as documented by isointegral maps of cardiac activation and recovery. In 72 young men (18.3+/- 7.3 y.) with no cardiovascular history body surface potential maps (BSPMs) at rest and during the test of mental arithmetic were recorded. The digitalized data for each point of the QRS, STT and QRST integral maps, for each subject in both situations, were processed and evaluated by methods of univariate as well as spatial mathematical and statistical modeling. The results showed during MA a significant decrease of repolarization integral values over the sternum and right precordium, which contributed to analogically localized decrements also in the QRST BSM. The decrease occurred in more than 2/3 of lead points. The most pronounced changes were observed in the right precordial area, where potentials decreased in more than in 70 % of subjects. In conclusion, the discriminative power of the difference STT and QRST integral maps was strong enough to distinguish the mental arithmetic induced changes in the superficial cardiac electric field. These adrenergic transient alterations in ventricular recovery may be of importance in subjects at risk for ventricular arrhythmias.  相似文献   

6.

Background  

Promoter prediction is an integrant step for understanding gene regulation and annotating genomes. Traditional promoter analysis is mainly based on sequence compositional features. Recently, many kinds of structural features have been employed in promoter prediction. However, considering the high-dimensionality and overfitting problems, it is unfeasible to utilize all available features for promoter prediction. Thus it is necessary to choose some appropriate features for the prediction task.  相似文献   

7.
Kusum 《Bioethics》1993,7(2-3):149-165
Conclusions: The use of pre-natal diagnostic techniques only for sex determination followed by termination of pregnancy on a finding of female foetus, is an atrocious and unethical practice. The bias against a female has been stretched further back: from cradle to grave, it is now from womb to the grave. One cannot however ignore the conditions of the society which breed and encourage such practices. A girl suffers neglect and discrimination right from childhood; she is tortured, harassed and maltreated after marriage. At the work place she is exploited. A widow or a divorcee is looked down upon by the family and the society. All these things make her life miserable and not worth existence....  相似文献   

8.
In the era of structural genomics, the prediction of protein interactions using docking algorithms is an important goal. The success of this method critically relies on the identification of good docking solutions among a vast excess of false solutions. We have adapted the concept of mutual information (MI) from information theory to achieve a fast and quantitative screening of different structural features with respect to their ability to discriminate between physiological and nonphysiological protein interfaces. The strategy includes the discretization of each structural feature into distinct value ranges to optimize its mutual information. We have selected 11 structural features and two datasets to demonstrate that the MI is dimensionless and can be directly compared for diverse structural features and between datasets of different sizes. Conversion of the MI values into a simple scoring function revealed that those features with a higher MI are actually more powerful for the identification of good docking solutions. Thus, an MI-based approach allows the rapid screening of structural features with respect to their information content and should therefore be helpful for the design of improved scoring functions in future. In addition, the concept presented here may also be adapted to related areas that require feature selection for biomolecules or organic ligands.  相似文献   

9.
In choosing between various scanning techniques the factors to be considered include availability, cost, the type of equipment, the expertise of the medical and technical staff, and the inherent capabilities of the system. Although it is difficult to state dogmatically which scanning technique is best for each patient and condition, one or other technique is clearly preferable in some areas of medicine. Ultrasound, for example, should be used in obstetrics, while computerized tomography has revolutionised neuroradiological diagnosis. Nevertheless, there is still no substitute for good history taking and a thorough physical examination. The most important factor determining the choice of technique is the system''s ability to answer the specific question required for the management of the patient.  相似文献   

10.
The potential value of immunoperoxidase techniques in diagnostic cytology   总被引:1,自引:0,他引:1  
M Nadji 《Acta cytologica》1980,24(5):442-447
A slightly modified immunoperoxidase method was developed in our laboratory and applied to a variety of aspiration and exfoliative cytologic material. Our aims were: (1) to explore the applicability of the immunoperoxidase procedure to diagnostic cytology, (2) to attempt to define the histogenesis of neoplastic cells when morphology alone proved insufficient, and (3) to investigate the possibility of differentiating reactive from neoplastic lymphoreticular disorders by studying their immunoglobulin patterns. Our findings indicate that the immunoperoxidase technique is applicable to cytologic material. The simplicity of the procedure, combined with its high sensitivity and excellent morphology, merits wider application of this technique to routine diagnostic cytology.  相似文献   

11.

Background  

Gene selection is an important step when building predictors of disease state based on gene expression data. Gene selection generally improves performance and identifies a relevant subset of genes. Many univariate and multivariate gene selection approaches have been proposed. Frequently the claim is made that genes are co-regulated (due to pathway dependencies) and that multivariate approaches are therefore per definition more desirable than univariate selection approaches. Based on the published performances of all these approaches a fair comparison of the available results can not be made. This mainly stems from two factors. First, the results are often biased, since the validation set is in one way or another involved in training the predictor, resulting in optimistically biased performance estimates. Second, the published results are often based on a small number of relatively simple datasets. Consequently no generally applicable conclusions can be drawn.  相似文献   

12.
Only limited data are available on body surface potential distribution during atrial activation. The aim of this study was to establish the distributions and to analyze chosen quantitative parameters of atrial isointegral maps recorded using a limited 24-lead system in a young healthy population. A total of 166 subjects underwent a procedure of body surface potential mapping. Isointegral maps during the P wave were constructed and qualitatively and quantitatively evaluated. Three types of atrial activation in individual maps were found according to the different shape of the zero isointegral line and to mutual positions of extrema. The most frequently occurring type resembled the group mean maps and was in good agreement with published data obtained from full lead systems. The highest extrema were found in the young men group, while, surprisingly, the lowest values in the young women group. All minima and the majority of maxima were recorded outside the ranges of standard chest leads. The usefulness of the limited lead system to record isointegral P wave maps was shown and new data were presented that can be useful in noninvasive evaluation of atrial pathologies.  相似文献   

13.
In this retrospective study we analysed changes of the ST segment in patients with arterial hypertension using multi-lead body surface mapping of the electric heart field as the ST segment often shows non-specific changes and is influenced by many different conditions. We constructed isointegral maps (IIM) of chosen intervals (the first 35 ms, the first 80 ms, and the whole ST segment) in 42 patients with arterial hypertension (with and without left ventricular hypertrophy) and in the control group involving 23 healthy persons. We analysed the position and values of map extrema. Spatial distribution of voltage integrals was similar in the control group and in the "pure" hypertensives. Patients with the left ventricular hypertrophy exhibited shifts of the integral minima. Despite our expectations, the highest extrema values were found in the control group and not in the left ventricular hypertrophy group. The extrema values were similar in all hypertensives, with or without left ventricular hypertrophy. Differences could be explained neither by the influence of the age, nor by the body habitus.  相似文献   

14.
E Ferrada  A Wagner 《Biophysical journal》2012,102(8):1916-1925
The relationship between the genotype (sequence) and the phenotype (structure) of macromolecules affects their ability to evolve new structures and functions. We here compare the genotype space organization of proteins and RNA molecules to identify differences that may affect this ability. To this end, we computationally study the genotype-phenotype relationship for short RNA and lattice proteins of a reduced monomer alphabet size, to make exhaustive analysis and direct comparison of their genotype spaces feasible. We find that many fewer protein molecules than RNA molecules fold, but they fold into many more structures than RNA. In consequence, protein phenotypes have smaller genotype networks whose member genotypes tend to be more similar than for RNA phenotypes. Neighborhoods in sequence space of a given radius around an RNA molecule contain more novel structures than for protein molecules. We compare this property to evidence from natural RNA and protein molecules, and conclude that RNA genotype space may be more conducive to the evolution of new structure phenotypes.  相似文献   

15.

Background  

Due to the large number of genes in a typical microarray dataset, feature selection looks set to play an important role in reducing noise and computational cost in gene expression-based tissue classification while improving accuracy at the same time. Surprisingly, this does not appear to be the case for all multiclass microarray datasets. The reason is that many feature selection techniques applied on microarray datasets are either rank-based and hence do not take into account correlations between genes, or are wrapper-based, which require high computational cost, and often yield difficult-to-reproduce results. In studies where correlations between genes are considered, attempts to establish the merit of the proposed techniques are hampered by evaluation procedures which are less than meticulous, resulting in overly optimistic estimates of accuracy.  相似文献   

16.
An increased risk of myocardial ischemic changes was demonstrated in patients suffering from panic disorder (PD). Using classical ECG methods, this risk cannot be evaluated in most patients. We measured the vectocardiogram (VCG) using Frank orthogonal leads and body surface maps (BSM) including 12-lead ECG. In our study of 11 PD patients (2 men, 9 women), without any seizures and pharmacological treatment and without cardiovascular symptoms, we found marked sinus tachycardia (heart rate 90.1 +/- 12.2 min(-1)) and a shorter R-R interval (678 +/- 93.6 ms) than in 27 controls (heart rate 73.6 +/- 7.7min(-1), R-R 822.7 +/- 86.4 ms) (5 men, 22 women) (p<0.001). The VCG measured spatial QRS-STT angle was more opened (70.3 +/- 24.5 degrees) than in the control group (49.5 +/- 19.5 degrees) (p<0.05). The maximum (extremum) in depolarization (DIAM max 30, 40) and repolarization (RIAM max 35) of body surface isoarea and isointegral (RIIM max) maps was less positive (p<0.001) and the minimum (DIAM min 40) was less negative than in the controls (p<0.05) even in the period free of a panic attack. Our results showed the changes in the heart electric field parameters occurred in PD patients when compared to the control group.  相似文献   

17.
A potential diagnostic reagent for bovine cysticercosis   总被引:1,自引:0,他引:1  
A fraction of larval Taenia hydatigena cyst fluid was shown to have high sensitivity and specificity in the enzyme-linked immunosorbent assay (ELISA) for the detection of bovine antibodies to the heterologous parasite Taenia saginata. This antigenically active lipoprotein fraction was isolated by ultracentrifugal density flotation using either ammonium sulfate (specific gravity = 1.231 g per ml) or NaCl/KBr (specific gravity = 1.225 g per ml), followed by ion-exchange chromatography. Sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) indicated that this fraction was composed of high molecular weight (65,000 to 77,000 Mr) and low molecular weight (9,500 to 16,000 Mr) proteins. Electrophoresis under non-denaturing conditions in either acrylamide (5%) or agarose (1%) resulted in 1 major diffuse band staining for both protein and lipid. The high and low molecular weight proteins observed on SDS-PAGE under reducing conditions could not be resolved by gel filtration chromatography and emerged as a single lipoprotein peak. This T. hydatigena cyst fluid fraction appears promising as a diagnostic reagent in the ELISA for bovine cysticercosis.  相似文献   

18.
19.
With the frenetic growth of high-dimensional datasets in different biomedical domains, there is an urgent need to develop predictive methods able to deal with this complexity. Feature selection is a relevant strategy in machine learning to address this challenge. We introduce a novel feature selection algorithm for linear regression called BOSO (Bilevel Optimization Selector Operator). We conducted a benchmark of BOSO with key algorithms in the literature, finding a superior accuracy for feature selection in high-dimensional datasets. Proof-of-concept of BOSO for predicting drug sensitivity in cancer is presented. A detailed analysis is carried out for methotrexate, a well-studied drug targeting cancer metabolism.  相似文献   

20.
Most of the conventional feature selection algorithms have a drawback whereby a weakly ranked gene that could perform well in terms of classification accuracy with an appropriate subset of genes will be left out of the selection. Considering this shortcoming, we propose a feature selection algorithm in gene expression data analysis of sample classifications. The proposed algorithm first divides genes into subsets, the sizes of which are relatively small (roughly of size h), then selects informative smaller subsets of genes (of size r < h) from a subset and merges the chosen genes with another gene subset (of size r) to update the gene subset. We repeat this process until all subsets are merged into one informative subset. We illustrate the effectiveness of the proposed algorithm by analyzing three distinct gene expression data sets. Our method shows promising classification accuracy for all the test data sets. We also show the relevance of the selected genes in terms of their biological functions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号