首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
BACKGROUND: Comparative genomic hybridization (CGH) is a relatively new molecular cytogenetic method for detecting chromosomal imbalance. Karyotyping of human metaphases is an important step to assign each chromosome to one of 23 or 24 classes (22 autosomes and two sex chromosomes). Automatic karyotyping in CGH analysis is needed. However, conventional karyotyping approaches based on DAPI images require complex image enhancement procedures. METHODS: This paper proposes a simple feature extraction method, one that generates density profiles from original true color CGH images and uses normalized profiles as feature vectors without quantization. A classifier is developed by using support vector machine (SVM). It has good generalization ability and needs only limited training samples. RESULTS: Experiment results show that the feature extraction method of using color information in CGH images can improve greatly the classification success rate. The SVM classifier is able to acquire knowledge about human chromosomes from relatively few samples and has good generalization ability. A success rate of moe than 90% has been achieved and the time for training and testing is very short. CONCLUSIONS: The feature extraction method proposed here and the SVM-based classifier offer a promising computerized intelligent system for automatic karyotyping of CGH human chromosomes.  相似文献   

2.
Classification and feature selection algorithms for multi-class CGH data   总被引:1,自引:0,他引:1  
Recurrent chromosomal alterations provide cytological and molecular positions for the diagnosis and prognosis of cancer. Comparative genomic hybridization (CGH) has been useful in understanding these alterations in cancerous cells. CGH datasets consist of samples that are represented by large dimensional arrays of intervals. Each sample consists of long runs of intervals with losses and gains. In this article, we develop novel SVM-based methods for classification and feature selection of CGH data. For classification, we developed a novel similarity kernel that is shown to be more effective than the standard linear kernel used in SVM. For feature selection, we propose a novel method based on the new kernel that iteratively selects features that provides the maximum benefit for classification. We compared our methods against the best wrapper-based and filter-based approaches that have been used for feature selection of large dimensional biological data. Our results on datasets generated from the Progenetix database, suggests that our methods are considerably superior to existing methods. AVAILABILITY: All software developed in this article can be downloaded from http://plaza.ufl.edu/junliu/feature.tar.gz.  相似文献   

3.
BACKGROUND: Multiplex or multicolor fluorescence in situ hybridization (M-FISH) is a recently developed cytogenetic technique for cancer diagnosis and research on genetic disorders. By simultaneously viewing the multiply labeled specimens in different color channels, M-FISH facilitates the detection of subtle chromosomal aberrations. The success of this technique largely depends on the accuracy of pixel classification (color karyotyping). Improvements in classifier performance would allow the elucidation of more complex and more subtle chromosomal rearrangements. Normalization of M-FISH images has a significant effect on the accuracy of classification. In particular, misalignment or misregistration across multiple channels seriously affects classification accuracy. Image normalization, including automated registration, must be done before pixel classification. METHODS AND RESULTS: We studied several image normalization approaches that affect image classification. In particular, we developed an automated registration technique to correct misalignment across the different fluor images (caused by chromatic aberration and other factors). This new registration algorithm is based on wavelets and spline approximations that have computational advantages and improved accuracy. To evaluate the performance improvement brought about by these data normalization approaches, we used the downstream pixel classification accuracy as a measurement. A Bayesian classifier assumed that each of 24 chromosome classes had a normal probability distribution. The effects that this registration and other normalization steps have on subsequent classification accuracy were evaluated on a comprehensive M-FISH database established by Advanced Digital Imaging Research (http://www.adires.com/05/Project/MFISH_DB/MFISH_DB.shtml). CONCLUSIONS: Pixel misclassification errors result from different factors. These include uneven hybridization, spectral overlap among fluors, and image misregistration. Effective preprocessing of M-FISH images can decrease the effects of those factors and thereby increase pixel classification accuracy. The data normalization steps described in this report, such as image registration and background flattening, can significantly improve subsequent classification accuracy. An improved classifier in turn would allow subtle DNA rearrangements to be identified in genetic diagnosis and cancer research.  相似文献   

4.
染色体易位重组位点的识别对很多染色体遗传性疾病的诊断有着重要的意义。本文基于实际诊断中采集到的24类染色体数据和9号正常与异常染色体数据,构建了一套自动识别染色体易位重组位点的模型和方法。首先,对染色体图像进行预处理,得到了方向梯度直方图特征(HOG)和局部二值模式特征(LBP),构建了基于纹理特征的染色体24分类多通道网络模型,分类准确率达到了95.99%;再与ResNet18模型(分类准确率95.86%)进行模型融合,最终分类准确率达到97.08%。其次,将染色体密度谱作为正常和异常染色体的分类特征,采用投票的方法集成支持向量机、随机森林和XGBoost模型,构建了正常和异常染色体的集成分类器,正常和异常9号染色体的分类准确率达到了100%。最后,对于易位的异常染色体,我们提出了基于动态时间规划(DTW)的易位重组位点自动识别算法,在异常染色体的密度谱曲线上找到了重组位点,并映射至染色体G显带模式图,得到标准诊断结果,通过与临床专家的诊断结果进行比较说明了自动识别结果的有效性。本文设计的一套自动识别染色体易位重组位点的模型方法对临床辅助诊断有很大的帮助,有望完善成为一套软件系统应用于临床诊断,提升相关疾病的诊断效率和准确率。  相似文献   

5.
This paper presents a novel system to compute the automated classification of wireless capsule endoscope images. Classification is achieved by a classical statistical approach, but novel features are extracted from the wavelet domain and they contain both color and texture information. First, a shift-invariant discrete wavelet transform (SIDWT) is computed to ensure that the multiresolution feature extraction scheme is robust to shifts. The SIDWT expands the signal (in a shift-invariant way) over the basis functions which maximize information. Then cross-co-occurrence matrices of wavelet subbands are calculated and used to extract both texture and color information. Canonical discriminant analysis is utilized to reduce the feature space and then a simple 1D classifier with the leave one out method is used to automatically classify normal and abnormal small bowel images. A classification rate of 94.7% is achieved with a database of 75 images (41 normal and 34 abnormal cases). The high success rate could be attributed to the robust feature set which combines multiresolutional color and texture features, with shift, scale and semi-rotational invariance. This result is very promising and the method could be used in a computer-aided diagnosis system or a content-based image retrieval scheme.  相似文献   

6.
Analysis of array CGH data: from signal ratio to gain and loss of DNA regions   总被引:12,自引:0,他引:12  
MOTIVATION: Genomic DNA regions are frequently lost or gained during tumor progression. Array Comparative Genomic Hybridization (array CGH) technology makes it possible to assess these changes in DNA in cancers, by comparison with a normal reference. The identification of systematically deleted or amplified genomic regions in a set of tumors enables biologists to identify genes involved in cancer progression because tumor suppressor genes are thought to be located in lost genomic regions and oncogenes, in gained regions. Array CGH profiles should also improve the classification of tumors. The achievement of these goals requires a methodology for detecting the breakpoints delimiting altered regions in genomic patterns and assigning a status (normal, gained or lost) to each chromosomal region. RESULTS: We have developed a methodology for the automatic detection of breakpoints from array CGH profile, and the assignment of a status to each chromosomal region. The breakpoint detection step is based on the Adaptive Weights Smoothing (AWS) procedure and provides highly convincing results: our algorithm detects 97, 100 and 94% of breakpoints in simulated data, karyotyping results and manually analyzed profiles, respectively. The percentage of correctly assigned statuses ranges from 98.9 to 99.8% for simulated data and is 100% for karyotyping results. Our algorithm also outperforms other solutions on a public reference dataset. AVAILABILITY: The R package GLAD (Gain and Loss Analysis of DNA) is available upon request.  相似文献   

7.
提出了胰腺内镜超声图像的纹理特征提取与分类方法,可应用于胰腺癌内镜超声图像的计算机辅助诊断。对胰腺内镜超声图像采用数字图像处理算法提取9大类共69个纹理特征。使用类间距作为可分性判据,实现特征的初步筛选,之后使用顺序前进搜索算法进一步筛选特征,并由支撑向量机实现分类。对216例病例随机选取训练集和测试集,通过多次随机实验表明。本文提出的算法实现了较高的分类准确率,为胰腺癌的临床诊断提供有价值的参考意见。  相似文献   

8.
实蝇科果实蝇属昆虫数字图像自动识别系统的构建和测试   总被引:2,自引:0,他引:2  
针对双翅目实蝇科果实蝇属昆虫的自动识别,本文提出利用翅及中胸背板图像的局部二进制模式(local binary pattern, LBP)特征,采用Adaboost算法, 设计和开发“实蝇科果实蝇属昆虫数字图像自动识别系统”(Automated Fruit fly Identification System-Bactrocera, AFIS-B)。该系统包括图像采集、图像裁剪、预处理、特征提取、分类器设计、识别和显示,共7个模块。研究结果表明: LBP特征可以有效鉴别实蝇科果实蝇属昆虫;在对实蝇科果实蝇属8个种的测试中, 该系统表现出较高的准确性和稳定性,平均识别率可达80%以上。此外,还对果实蝇属昆虫翅膀及中胸背板图像在光照不均匀、姿态扭曲、样本受损及样本量大小等不同条件下的识别率进行了试验测试。结果表明, 该系统对测试样本的光照不均匀、 姿态扭曲和样本受损都表现出良好的鲁棒性,正确识别率与训练集样本各个种数量在一定条件下明显正相关,与训练集样本物种总量负相关。该项研究为实蝇科有害昆虫自动识别系统的构建及实际应用提供了理论、 方法及基础数据的支撑, 亦可为其他昆虫自动识别系统的研究和构建提供有益借鉴。 关键词:  相似文献   

9.
10.
MOTIVATION: Since DNA microarray experiments provide us with huge amount of gene expression data, they should be analyzed with statistical methods to extract the meanings of experimental results. Some dimensionality reduction methods such as Principal Component Analysis (PCA) are used to roughly visualize the distribution of high dimensional gene expression data. However, in the case of binary classification of gene expression data, PCA does not utilize class information when choosing axes. Thus clearly separable data in the original space may not be so in the reduced space used in PCA. RESULTS: For visualization and class prediction of gene expression data, we have developed a new SVM-based method called multidimensional SVMs, that generate multiple orthogonal axes. This method projects high dimensional data into lower dimensional space to exhibit properties of the data clearly and to visualize a distribution of the data roughly. Furthermore, the multiple axes can be used for class prediction. The basic properties of conventional SVMs are retained in our method: solutions of mathematical programming are sparse, and nonlinear classification is implemented implicitly through the use of kernel functions. The application of our method to the experimentally obtained gene expression datasets for patients' samples indicates that our algorithm is efficient and useful for visualization and class prediction. CONTACT: komura@hal.rcast.u-tokyo.ac.jp.  相似文献   

11.
肿瘤染色体畸变分析方法新进展   总被引:1,自引:0,他引:1  
薛渊博  宋鑫 《遗传》2008,30(12):1529-1535
摘要: 肿瘤的发生多与染色体畸变有关, 确定染色体畸变与肿瘤的关系, 必然离不开染色体畸变的检测分析。文章简要综述几种常用染色体畸变的检测方法及其新进展, 包括G显带、荧光原位杂交(FISH )、光谱核型分析(SKY)、多色荧光原位杂交(M-FISH)、多色显带分析技术(Rx-FISH)、比较基因组杂交(CGH)和微阵列比较基因组杂交(Array CGH), 以及这些方法在肿瘤诊断和研究方面的应用。  相似文献   

12.
In chromosome analysis, local band analysis plays the main role to identify the perfect matched chromosome in metaspread images to attain the karyotyping. Literature investigations are narrow in chromosome image band analysis due to the higher complexities. In this paper, Pixel level based Conditional Seed Point Algorithm (CSPA) is proposed. This simulation algorithm separates the weak band region to the strong band region, and the strong band region area evaluated was based on the Region of Seed condition Points. This algorithm works well for different intensity levels and adopts the structural changes to identify the bands in image. This algorithm was simulated in more than 450 individual chromosomes to identify the local bands in the chromosome images and provided the accuracy more than 96%.  相似文献   

13.
The modeling of the spatial distribution of image properties is important for many pattern recognition problems in science and engineering. Mathematical methods are needed to quantify the variability of this spatial distribution based on which a decision of classification can be made in an optimal sense. However, image properties are often subject to uncertainty due to both incomplete and imprecise information. This paper presents an integrated approach for estimating the spatial uncertainty of vagueness in images using the theory of geostatistics and the calculus of probability measures of fuzzy events. Such a model for the quantification of spatial uncertainty is utilized as a new image feature extraction method, based on which classifiers can be trained to perform the task of pattern recognition. Applications of the proposed algorithm to the classification of various types of image data suggest the usefulness of the proposed uncertainty modeling technique for texture feature extraction.  相似文献   

14.
竺乐庆  张大兴  张真 《昆虫学报》2015,58(12):1331-1337
【目的】本研究旨在探索使用先进的计算机视觉技术实现对昆虫图像的自动分类方法。【方法】通过预处理对采集的昆虫标本图像去除背景,获得昆虫图像的前景蒙板,并由蒙板确定的轮廓计算出前景图像的最小包围盒,剪切出由最小包围盒确定的前景有效区域,然后对剪切得到的图像进行特征提取。首先提取颜色名特征,把原来的RGB(Red-Green-Blue)图像的像素值映射到11种颜色名空间,其值表示RGB值属于该颜色名的概率,每个颜色名平面划分成3×3像素大小的网格,用每格的概率均值作为网格中心点的描述子,最后用空阈金字塔直方图统计的方式形成颜色名视觉词袋特征;其次提取OpponentSIFT(Opponent Scale Invariant Feature Transform)特征,首先把RGB图像变换到对立色空间,对该空间每通道提取SIFT特征,最后用空域池化和直方图统计方法形成OpponentSIFT视觉词袋。将两种词袋特征串接后得到该昆虫图像的特征向量。使用昆虫图像样本训练集提取到的特征向量训练SVM(Support Vector Machine)分类器,使用这些训练得到的分类器即可实现对鳞翅目昆虫的分类识别。【结果】该方法在包含10种576个样本的昆虫图像数据库中进行了测试,取得了100%的识别正确率。【结论】试验结果证明基于颜色名和OpponentSIFT特征可以有效实现对鳞翅目昆虫图像的识别。  相似文献   

15.
A miscarriage is the most frequent complication of a pregnancy. Poor chromosome preparations, culture failure, or maternal cell contamination may hamper conventional karyotyping. Techniques such as chromosomal comparative genomic hybridization (chromosomal‐CGH), array-comparative genomic hybridization (array-CGH), fluorescence in situ hybridization (FISH), multiplex ligation-dependent probe amplification (MLPA) and quantitative fluorescent polymerase chain reaction (QF-PCR) enable us to trace submicroscopic abnormalities. We found the prevalence of chromosome abnormalities in women facing a single sporadic miscarriage to be 45% (95% CI: 38–52; 13 studies, 7012 samples). The prevalence of chromosome abnormalities in women experiencing a subsequent miscarriage after preceding recurrent miscarriage proved to be comparable: 39% (95% CI: 29–50; 6 studies 1359 samples). More chromosome abnormalities are detected by conventional karyotyping compared to FISH or MLPA only (chromosome region specific techniques), and the same amount of abnormalities compared to QF-PCR (chromosome region specific techniques) and chromosomal‐CGH and array-CGH (whole genome techniques) only. Molecular techniques could play a role as an additional technique when culture failure or maternal contamination occurs: recent studies show that by using array-CGH, an additional 5% of submicroscopic chromosome variants can be detected. Because of the small sample size as well as the unknown clinical relevance of these molecular aberrations, more and larger studies should be performed of submicroscopic chromosome abnormalities among sporadic miscarriage samples. For recurrent miscarriage samples molecular technique studies are relatively new. It has often been suggested that miscarriages are due to chromosomal abnormalities in more than 50%, but the present review has determined that chromosomal and submicroscopic genetic abnormalities on average are prevalent in maximally half of the miscarriage samples. This article is part of a Special Issue entitled: Molecular Genetics of Human Reproductive Failure.  相似文献   

16.
Image classification is a challenging problem in organizing a large image database. However, an effective method for such an objective is still under investigation. A method based on wavelet analysis to extract features for image classification is presented in this paper. After an image is decomposed by wavelet, the statistics of its features can be obtained by the distribution of histograms of wavelet coefficients, which are respectively projected onto two orthogonal axes, i.e., x and y directions. Therefore, the nodes of tree representation of images can be represented by the distribution. The high level features are described in low dimensional space including 16 attributes so that the computational complexity is significantly decreased. 2,800 images derived from seven categories are used in experiments. Half of the images were used for training neural network and the other images used for testing. The features extracted by wavelet analysis and the conventional features are used in the experiments to prove the efficacy of the proposed method. The classification rate on the training data set with wavelet analysis is up to 91%, and the classification rate on the testing data set reaches 89%. Experimental results show that our proposed approach for image classification is more effective.  相似文献   

17.

Background

Predicting protein subnuclear localization is a challenging problem. Some previous works based on non-sequence information including Gene Ontology annotations and kernel fusion have respective limitations. The aim of this work is twofold: one is to propose a novel individual feature extraction method; another is to develop an ensemble method to improve prediction performance using comprehensive information represented in the form of high dimensional feature vector obtained by 11 feature extraction methods.

Methodology/Principal Findings

A novel two-stage multiclass support vector machine is proposed to predict protein subnuclear localizations. It only considers those feature extraction methods based on amino acid classifications and physicochemical properties. In order to speed up our system, an automatic search method for the kernel parameter is used. The prediction performance of our method is evaluated on four datasets: Lei dataset, multi-localization dataset, SNL9 dataset and a new independent dataset. The overall accuracy of prediction for 6 localizations on Lei dataset is 75.2% and that for 9 localizations on SNL9 dataset is 72.1% in the leave-one-out cross validation, 71.7% for the multi-localization dataset and 69.8% for the new independent dataset, respectively. Comparisons with those existing methods show that our method performs better for both single-localization and multi-localization proteins and achieves more balanced sensitivities and specificities on large-size and small-size subcellular localizations. The overall accuracy improvements are 4.0% and 4.7% for single-localization proteins and 6.5% for multi-localization proteins. The reliability and stability of our classification model are further confirmed by permutation analysis.

Conclusions

It can be concluded that our method is effective and valuable for predicting protein subnuclear localizations. A web server has been designed to implement the proposed method. It is freely available at http://bioinformatics.awowshop.com/snlpred_page.php.  相似文献   

18.
In wheat (Triticum aestivum L) and other cereals, the number of ears per unit area is one of the main yield‐determining components. An automatic evaluation of this parameter may contribute to the advance of wheat phenotyping and monitoring. There is no standard protocol for wheat ear counting in the field, and moreover it is time consuming. An automatic ear‐counting system is proposed using machine learning techniques based on RGB (red, green, blue) images acquired from an unmanned aerial vehicle (UAV). Evaluation was performed on a set of 12 winter wheat cultivars with three nitrogen treatments during the 2017–2018 crop season. The automatic system uses a frequency filter, segmentation and feature extraction, with different classification techniques, to discriminate wheat ears in micro‐plot images. The relationship between the image‐based manual counting and the algorithm counting exhibited high levels of accuracy and efficiency. In addition, manual ear counting was conducted in the field for secondary validation. The correlations between the automatic and the manual in‐situ ear counting with grain yield were also compared. Correlations between the automatic ear counting and grain yield were stronger than those between manual in‐situ counting and GY, particularly for the lower nitrogen treatment. Methodological requirements and limitations are discussed.  相似文献   

19.
A wide interest has been observed in the medical health care applications that interpret neuroimaging scans by machine learning systems. This research proposes an intelligent, automatic, accurate, and robust classification technique to classify the human brain magnetic resonance image (MRI) as normal or abnormal, to cater down the human error during identifying the diseases in brain MRIs. In this study, fast discrete wavelet transform (DWT), principal component analysis (PCA), and least squares support vector machine (LS-SVM) are used as basic components. Firstly, fast DWT is employed to extract the salient features of brain MRI, followed by PCA, which reduces the dimensions of the features. These reduced feature vectors also shrink the memory storage consumption by 99.5%. At last, an advanced classification technique based on LS-SVM is applied to brain MR image classification using reduced features. For improving the efficiency, LS-SVM is used with non-linear radial basis function (RBF) kernel. The proposed algorithm intelligently determines the optimized values of the hyper-parameters of the RBF kernel and also applied k-fold stratified cross validation to enhance the generalization of the system. The method was tested by 340 patients’ benchmark datasets of T1-weighted and T2-weighted scans. From the analysis of experimental results and performance comparisons, it is observed that the proposed medical decision support system outperformed all other modern classifiers and achieves 100% accuracy rate (specificity/sensitivity 100%/100%). Furthermore, in terms of computation time, the proposed technique is significantly faster than the recent well-known methods, and it improves the efficiency by 71%, 3%, and 4% on feature extraction stage, feature reduction stage, and classification stage, respectively. These results indicate that the proposed well-trained machine learning system has the potential to make accurate predictions about brain abnormalities from the individual subjects, therefore, it can be used as a significant tool in clinical practice.  相似文献   

20.
癌基因表达数据集具有小样本、高维数之特点,一般的机器学习机难以对其有效分类。因此,通常需要采用某些特征提取度量标准来进行降维处理。可是常用的一些特征提取度量标准亦会导致分类效果欠佳之问题。依据微分容量控制学习机DCCM,提出了一个新的特征提取度量标准NFEC,然后依据NFEC和DCCM,提出了适于癌基因表达数据集的特征提取算法DCCFE。实验表明,新的度量NFEC和新的特征提取算法DCCFE较之现有方法对癌基因表达数据集分类时更为有效。本文的工作意义在于:(1)提出了一个新的更有意义的特征提取度量标准;(2)DCCM可以采用比核函数更为一般的一阶可微函数,因而提出的新的特征提取算法更具普遍应用意义。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号