首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
The Shannon entropy is a common way of measuring conservation of sites in multiple sequence alignments, and has also been extended with the relative Shannon entropy to account for background frequencies. The von Neumann entropy is another extension of the Shannon entropy, adapted from quantum mechanics in order to account for amino acid similarities. However, there is yet no relative von Neumann entropy defined for sequence analysis. We introduce a new definition of the von Neumann entropy for use in sequence analysis, which we found to perform better than the previous definition. We also introduce the relative von Neumann entropy and a way of parametrizing this in order to obtain the Shannon entropy, the relative Shannon entropy and the von Neumann entropy at special parameter values. We performed an exhaustive search of this parameter space and found better predictions of catalytic sites compared to any of the previously used entropies.  相似文献   

2.
We have investigated the registration of mammograms based on the Tsallis entropy using mutual information measure. Tsallis entropy has one more parameter ‘q’ and the values of ‘q’ decide the quality of the registration. Existing Tsallis entropy based algorithms are not automatic as they claimed to be. In this article, an automatic affine image registration based on Tsallis entropy is proposed and its performance is analyzed for clinically acquired mammograms for globally registering them. The accuracy is compared with traditionally used mutual information and normalized mutual information based on Shannon entropy. Our algorithm shows promising results with increased accuracy with reduction in number of evaluations. Further, the need for pre-registration in mammogram is discussed in detail. Through this experiment, it is found that the proposed algorithm is effective enough to replace Shannon and existing Tsallis entropy based affine registration schemes.  相似文献   

3.
In an attempt to analyze structure, function and evolution of HIV-1 GP120 V3, interactions among the Hartree–Fock energy, the conformational entropy and the Shannon entropy were determined for the 1NJ0 set of antibody-bound V3 loop conformers. The Hartree–Fock energy of each conformer was determined at the MINI level with GAMESS. The conformational entropy was determined per conformer and per residue from the mass-weighted covariance matrices. The Shannon entropy per residue was determined from sequence-substitution frequencies. Correlations were determined by linear regression analysis. There was a negative correlation between the Hartree–Fock energy and the conformational entropy (R=−0.4840, p=0.0078, df =28) that enhanced the negative Helmholtz-free-energy change for the binding of the GP120 ligand to target CD4. The Shannon entropy of V3 was a function of the conformational entropy variance (R=0.7225, p=0.00157, df=15) and of the V3 Hartree–Fock energy. Biological implications of this work are that (1) conformational entropy interacts with V3 Hartree–Fock energy to enhance GP120 binding to CD4 cell receptors and that (2) the Hartree–Fock energy of V3 interacts with the evolutionary system to participate in the regulation of V3 diversity.  相似文献   

4.
Deep learning based retinopathy classification with optical coherence tomography (OCT) images has recently attracted great attention. However, existing deep learning methods fail to work well when training and testing datasets are different due to the general issue of domain shift between datasets caused by different collection devices, subjects, imaging parameters, etc. To address this practical and challenging issue, we propose a novel deep domain adaptation (DDA) method to train a model on a labeled dataset and adapt it to an unlabelled dataset (collected under different conditions). It consists of two modules for domain alignment, that is, adversarial learning and entropy minimization. We conduct extensive experiments on three public datasets to evaluate the performance of the proposed method. The results indicate that there are large domain shifts between datasets, resulting a poor performance for conventional deep learning methods. The proposed DDA method can significantly outperform existing methods for retinopathy classification with OCT images. It achieves retinopathy classification accuracies of 0.915, 0.959 and 0.990 under three cross-domain (cross-dataset) scenarios. Moreover, it obtains a comparable performance with human experts on a dataset where no labeled data in this dataset have been used to train the proposed DDA method. We have also visualized the learnt features by using the t-distributed stochastic neighbor embedding (t-SNE) technique. The results demonstrate that the proposed method can learn discriminative features for retinopathy classification.  相似文献   

5.
《IRBM》2022,43(4):272-278
PurposeVulnerable plaque of carotid atherosclerosis is prone to rupture, which can easily lead to acute cardiovascular and cerebrovascular accidents. Accurate identification of the vulnerable plaque is a challenging task, especially on limited datasets.MethodsThis paper proposes a multi-feature fusion method to identify high-risk plaque, in which three types of features are combined, i.e. global features of carotid ultrasound images, echo features of regions of interests (ROI) and expert knowledge from ultrasound reports. Due to the fusion of three types of features, more critical features for identifying high-risk plaque are included in the feature set. Therefore, better performance can be achieved even on limited datasets.ResultsFrom testing all combinations of three types of features, the results showed that the accuracy of using all three types of features is the highest. The experiments also showed that the performance of the proposed method is better than other plaque classification methods and classical Convolutional Neural Networks (CNNs) on the Plaque dataset.ConclusionThe proposed method helped to build a more complete feature set so that the machine learning models could identify vulnerable plaque more accurately even on datasets with poor quality and small scale.  相似文献   

6.
Camera traps are a popular tool to sample animal populations because they are noninvasive, detect a variety of species, and can record many thousands of animal detections per deployment. Cameras are typically set to take bursts of multiple photographs for each detection and are deployed in arrays of dozens or hundreds of sites, often resulting in millions of photographs per study. The task of converting photographs to animal detection records from such large image collections is daunting, and made worse by situations that generate copious empty pictures from false triggers (e.g., camera malfunction or moving vegetation) or pictures of humans. We developed computer vision algorithms to detect and classify moving objects to aid the first step of camera trap image filtering—separating the animal detections from the empty frames and pictures of humans. Our new work couples foreground object segmentation through background subtraction with deep learning classification to provide a fast and accurate scheme for human–animal detection. We provide these programs as both Matlab GUI and command prompt developed with C++. The software reads folders of camera trap images and outputs images annotated with bounding boxes around moving objects and a text file summary of results. This software maintains high accuracy while reducing the execution time by 14 times. It takes about 6 seconds to process a sequence of ten frames (on a 2.6 GHZ CPU computer). For those cameras with excessive empty frames due to camera malfunction or blowing vegetation automatically removes 54% of the false‐triggers sequences without influencing the human/animal sequences. We achieve 99.58% on image‐level empty versus object classification of Serengeti dataset. We offer the first computer vision tool for processing camera trap images providing substantial time savings for processing large image datasets, thus improving our ability to monitor wildlife across large scales with camera traps.  相似文献   

7.
Homology detection and protein structure prediction are central themes in bioinformatics. Establishment of relationship between protein sequences or prediction of their structure by sequence comparison methods finds limitations when there is low sequence similarity. Recent works demonstrate that the use of profiles improves homology detection and protein structure prediction. Profiles can be inferred from protein multiple alignments using different approaches. The "Conservatism-of-Conservatism" is an effective profile analysis method to identify structural features between proteins having the same fold but no detectable sequence similarity. The information obtained from protein multiple alignments varies according to the amino acid classification employed to calculate the profile. In this work, we calculated entropy profiles from PSI-BLAST-derived multiple alignments and used different amino acid classifications summarizing almost 500 different attributes. These entropy profiles were converted into pseudocodes which were compared using the FASTA program with an ad-hoc matrix. We tested the performance of our method to identify relationships between proteins with similar fold using a nonredundant subset of sequences having less than 40% of identity. We then compared our results using Coverage Versus Error per query curves, to those obtained by methods like PSI-BLAST, COMPASS and HHSEARCH. Our method, named HIP (Homology Identification with Profiles) presented higher accuracy detecting relationships between proteins with the same fold. The use of different amino acid classifications reflecting a large number of amino acid attributes, improved the recognition of distantly related folds. We propose the use of pseudocodes representing profile information as a fast and powerful tool for homology detection, fold assignment and analysis of evolutionary information enclosed in protein profiles.  相似文献   

8.
PurposeThe classification of urinary stones is important prior to treatment because the treatments depend on three types of urinary stones, i.e., calcium, uric acid, and mixture stones. We have developed an automatic approach for the classification of urinary stones into the three types based on microcomputed tomography (micro-CT) images using a convolutional neural network (CNN).Materials and methodsThirty urinary stones from different patients were scanned in vitro using micro-CT (pixel size: 14.96 μm; slice thickness: 15 μm); a total of 2,430 images (micro-CT slices) were produced. The slices (227 × 227 pixels) were classified into the three categories based on their energy dispersive X-ray (EDX) spectra obtained via scanning electron microscopy (SEM). The images of urinary stones from each category were divided into three parts; 66%, 17%, and 17% of the dataset were assigned to the training, validation, and test datasets, respectively. The CNN model with 15 layers was assessed based on validation accuracy for the optimization of hyperparameters such as batch size, learning rate, and number of epochs with different optimizers. Then, the model with the optimized hyperparameters was evaluated for the test dataset to obtain classification accuracy and error.ResultsThe validation accuracy of the developed approach with CNN with optimized hyperparameters was 0.9852. The trained CNN model achieved a test accuracy of 0.9959 with a classification error of 1.2%.ConclusionsThe proposed automated CNN-based approach could successfully classify urinary stones into three types, namely calcium, uric acid, and mixture stones, using micro-CT images.  相似文献   

9.
目的:探讨基于多尺度快速样本熵与随机森林的心电图分析方法对常见心律失常(房性早搏、室性早搏)的自动诊断的可行性和有效性。方法:利用不同心律失常疾病的心电信号存在复杂性差异的特点,通过多尺度熵计算心电信号在不同尺度下的样本熵值以组成特征向量;利用kd树提高多尺度熵的计算效率,增强算法的实时性。利用训练样本的特征向量构建随机森林分类器,再根据众多决策树的分类结果结合投票原则确定测试样本心律失常疾病的类型。结果:本文提出的心电图分析方法能够有效地识别正常心律、房性早搏(APB)及室性早搏(VPB),平均识别准确率达到91.60%。结论:本文提出的心电图分析方法对常见心律失常(APB,VPB)具有较高的识别准确率及临床实用价值。  相似文献   

10.
Background

Genomic islands (GIs) are clusters of alien genes in some bacterial genomes, but not be seen in the genomes of other strains within the same genus. The detection of GIs is extremely important to the medical and environmental communities. Despite the discovery of the GI associated features, accurate detection of GIs is still far from satisfactory.

Results

In this paper, we combined multiple GI-associated features, and applied and compared various machine learning approaches to evaluate the classification accuracy of GIs datasets on three genera: Salmonella, Staphylococcus, Streptococcus, and their mixed dataset of all three genera. The experimental results have shown that, in general, the decision tree approach outperformed better than other machine learning methods according to five performance evaluation metrics. Using J48 decision trees as base classifiers, we further applied four ensemble algorithms, including adaBoost, bagging, multiboost and random forest, on the same datasets. We found that, overall, these ensemble classifiers could improve classification accuracy.

Conclusions

We conclude that decision trees based ensemble algorithms could accurately classify GIs and non-GIs, and recommend the use of these methods for the future GI data analysis. The software package for detecting GIs can be accessed at http://www.esu.edu/cpsc/che_lab/software/GIDetector/.

  相似文献   

11.

We are developing a program to calculate optimal RNA secondary structures. The model uses di-nucleotide pairing energies as with most traditional approaches. However, for long-range entropy interactions, the approach uses an entropy-loss model based on the accumulated sum of the entropy of bonding between each base-pair weighted inversely by the correlation of the RNA sequence (the Kuhn length). Stiff RNA forms very different structures from flexible RNA. The results demonstrate that the long-range folding is largely governed by this entropy and the Kuhn length.  相似文献   

12.
13.
《IRBM》2022,43(5):479-485
ObjectiveThe structural complexity and uneven gray distribution of pneumonia images seriously affect the accuracy of pneumonia classification. As DenseNet has the characteristic of continuously transmitting the learned features of each layer backwards, which makes DenseNet not only reduce the model parameters, but also makes the local features learn better. Therefore, this paper proposes a method based on DenseNet to classify pneumonia.Material and methodsThis method adds a feature channel attention block Squeeze and Excitation (SE) to DenseNet to highlight pneumonia information in feature maps, replaces the average pooling of the third transition layer in DenseNet with max-pooling to further focus on the lesion region, and by comparing several activation functions, we choose PReLU to avoid neuron death in the process of model training ultimately. Moreover, we preprocess the chest X-ray2017 dataset with data augmentation and normalization.ResultsThe experimental results show that compared with DenseNet, our model's Accuracy, Precision, Recall and F1-score are improved by 2.4%, 2.0%, 1.8%, 1.8%, respectively, which can reach 92.8%, 92.6%, 96.2%, 94.3%.Conclusion:In this paper, we propose an attention-based DenseNet method for pneumonia classification, which make it pay more attention to the pneumonia areas to improve the classification performance.  相似文献   

14.

Background

The quantification of species-richness and species-turnover is essential to effective monitoring of ecosystems. Wetland ecosystems are particularly in need of such monitoring due to their sensitivity to rainfall, water management and other external factors that affect hydrology, soil, and species patterns. A key challenge for environmental scientists is determining the linkage between natural and human stressors, and the effect of that linkage at the species level in space and time. We propose pixel intensity based Shannon entropy for estimating species-richness, and introduce a method based on statistical wavelet multiresolution texture analysis to quantitatively assess interseasonal and interannual species turnover.

Methodology/Principal Findings

We model satellite images of regions of interest as textures. We define a texture in an image as a spatial domain where the variations in pixel intensity across the image are both stochastic and multiscale. To compare two textures quantitatively, we first obtain a multiresolution wavelet decomposition of each. Either an appropriate probability density function (pdf) model for the coefficients at each subband is selected, and its parameters estimated, or, a non-parametric approach using histograms is adopted. We choose the former, where the wavelet coefficients of the multiresolution decomposition at each subband are modeled as samples from the generalized Gaussian pdf. We then obtain the joint pdf for the coefficients for all subbands, assuming independence across subbands; an approximation that simplifies the computational burden significantly without sacrificing the ability to statistically distinguish textures. We measure the difference between two textures'' representative pdf''s via the Kullback-Leibler divergence (KL). Species turnover, or diversity, is estimated using both this KL divergence and the difference in Shannon entropy. Additionally, we predict species richness, or diversity, based on the Shannon entropy of pixel intensity.To test our approach, we specifically use the green band of Landsat images for a water conservation area in the Florida Everglades. We validate our predictions against data of species occurrences for a twenty-eight years long period for both wet and dry seasons. Our method correctly predicts 73% of species richness. For species turnover, the newly proposed KL divergence prediction performance is near 100% accurate. This represents a significant improvement over the more conventional Shannon entropy difference, which provides 85% accuracy. Furthermore, we find that changes in soil and water patterns, as measured by fluctuations of the Shannon entropy for the red and blue bands respectively, are positively correlated with changes in vegetation. The fluctuations are smaller in the wet season when compared to the dry season.

Conclusions/Significance

Texture-based statistical multiresolution image analysis is a promising method for quantifying interseasonal differences and, consequently, the degree to which vegetation, soil, and water patterns vary. The proposed automated method for quantifying species richness and turnover can also provide analysis at higher spatial and temporal resolution than is currently obtainable from expensive monitoring campaigns, thus enabling more prompt, more cost effective inference and decision making support regarding anomalous variations in biodiversity. Additionally, a matrix-based visualization of the statistical multiresolution analysis is presented to facilitate both insight and quick recognition of anomalous data.  相似文献   

15.

Background

Using hybrid approach for gene selection and classification is common as results obtained are generally better than performing the two tasks independently. Yet, for some microarray datasets, both classification accuracy and stability of gene sets obtained still have rooms for improvement. This may be due to the presence of samples with wrong class labels (i.e. outliers). Outlier detection algorithms proposed so far are either not suitable for microarray data, or only solve the outlier detection problem on their own.

Results

We tackle the outlier detection problem based on a previously proposed Multiple-Filter-Multiple-Wrapper (MFMW) model, which was demonstrated to yield promising results when compared to other hybrid approaches (Leung and Hung, 2010). To incorporate outlier detection and overcome limitations of the existing MFMW model, three new features are introduced in our proposed MFMW-outlier approach: 1) an unbiased external Leave-One-Out Cross-Validation framework is developed to replace internal cross-validation in the previous MFMW model; 2) wrongly labeled samples are identified within the MFMW-outlier model; and 3) a stable set of genes is selected using an L1-norm SVM that removes any redundant genes present. Six binary-class microarray datasets were tested. Comparing with outlier detection studies on the same datasets, MFMW-outlier could detect all the outliers found in the original paper (for which the data was provided for analysis), and the genes selected after outlier removal were proven to have biological relevance. We also compared MFMW-outlier with PRAPIV (Zhang et al., 2006) based on same synthetic datasets. MFMW-outlier gave better average precision and recall values on three different settings. Lastly, artificially flipped microarray datasets were created by removing our detected outliers and flipping some of the remaining samples'' labels. Almost all the ‘wrong’ (artificially flipped) samples were detected, suggesting that MFMW-outlier was sufficiently powerful to detect outliers in high-dimensional microarray datasets.  相似文献   

16.
The family Gigasporaceae consisted of the two genera Gigaspora and Scutellospora when first erected. In a recent revision of this classification, Scutellospora was divided into three families and four genera based on two main lines of evidence: (1) phylogenetic patterns of coevolving small and large rRNA genes and (2) morphology of spore germination shields. The rRNA trees were assumed to accurately reflect species evolution, and shield characters were selected because they correlated with gene trees. These characters then were used selectively to support gene trees and validate the classification. To test this new classification, a phylogenetic tree was reconstructed from concatenated 25S rRNA and β-tubulin gene sequences using 35% of known species in Gigasporaceae. A tree also was reconstructed from 23 morphological characters represented in 71% of known species. Results from both datasets showed that the revised classification was untenable. The classification also failed to accurately represent sister group relationships amongst higher taxa. Only two clades were fully resolved and congruent among datasets: Gigaspora and Racocetra (a clade consisting of species with spores having one inner germinal wall). Other clades were unresolved, which was attributed in part to undersampling of species. Topology of the morphology-based phylogeny was incongruent with gene evolution. Five shield characters were reduced to three, of which two were phylogenetically uninformative because they were homoplastic. Therefore, most taxa erected in the new classification are rejected. The classification is revised to restore the family Gigasporaceae, within which are the three genera Gigaspora, Racocetra, and Scutellospora. This classification does not reflect strict topology of either gene or morphological evolution. Further revisions must await sampling of additional characters and taxa to better ascertain congruence between datasets and infer a more accurate phylogeny of this important group of fungi.  相似文献   

17.
Shannon entropy H and related measures are increasingly used in molecular ecology and population genetics because (1) unlike measures based on heterozygosity or allele number, these measures weigh alleles in proportion to their population fraction, thus capturing a previously-ignored aspect of allele frequency distributions that may be important in many applications; (2) these measures connect directly to the rich predictive mathematics of information theory; (3) Shannon entropy is completely additive and has an explicitly hierarchical nature; and (4) Shannon entropy-based differentiation measures obey strong monotonicity properties that heterozygosity-based measures lack. We derive simple new expressions for the expected values of the Shannon entropy of the equilibrium allele distribution at a neutral locus in a single isolated population under two models of mutation: the infinite allele model and the stepwise mutation model. Surprisingly, this complex stochastic system for each model has an entropy expressable as a simple combination of well-known mathematical functions. Moreover, entropy- and heterozygosity-based measures for each model are linked by simple relationships that are shown by simulations to be approximately valid even far from equilibrium. We also identify a bridge between the two models of mutation. We apply our approach to subdivided populations which follow the finite island model, obtaining the Shannon entropy of the equilibrium allele distributions of the subpopulations and of the total population. We also derive the expected mutual information and normalized mutual information (“Shannon differentiation”) between subpopulations at equilibrium, and identify the model parameters that determine them. We apply our measures to data from the common starling (Sturnus vulgaris) in Australia. Our measures provide a test for neutrality that is robust to violations of equilibrium assumptions, as verified on real world data from starlings.  相似文献   

18.
Protein sequence conservation is a powerful and widely used indicator for predicting catalytic residues from enzyme sequences. In order to incorporate amino acid similarity into conservation measures, one attempt is to group amino acids into disjoint sets. In this paper, based on the overlapping amino acids classification proposed by Taylor, we define the relative entropy of Venn diagram (RVD) and RVD2. In large-scale testing, we demonstrate that RVD and RVD2 perform better than many existing conservation measures in identifying catalytic residues, especially than the commonly used relative entropy (RE) and Jensen–Shannon divergence (JSD). To further improve RVD and RVD2, two new conservation measures are obtained by combining them with the classical JSD. Experimental results suggest that these combination measures have excellent performances in identifying catalytic residues.  相似文献   

19.
《IRBM》2022,43(5):362-371
ObjectivesHyperspectral imaging (HSI) has great potential in detecting the health conditions of neonates as it provides diagnostic information about the tissue by avoiding tissue biopsy. HSI gives more features than thermal imaging, which can obtain images in a single wavelength, as it can obtain images in a large number of wavelengths. The data obtained with hyperspectral sensors are 3-dimensional data called hypercube including first two-dimensional spatial information and third-dimensional spectral information.Material and methodsIn this study, hyperspectral data were obtained from 19 different neonates in the Neonatal Intensive Care Unit (NICU) of Selcuk University, Medical Faculty. There are 16 hypercubes from 16 unhealthy neonates, 16 hypercubes from 3 healthy neonates in a period of three months, and 32 hypercubes in total are available. For the training of 3D-CNN model, data augmentation methods, such as rotation, height shifting, width shifting, and shearing were applied to hyperspectral data. A number of 32 hypercubes taken from neonates in NICU were augmented to 160 hypercubes. Spectral signatures were examined and 51 bands in the range of 700-850 nm with distinctive features were used for the classification. The spectral dimension was reduced by applying Principal Component Analysis (PCA) to all hypercubes. In addition, it is aimed to obtain both spectral and spatial features with the 3D-CNN. For increasing the classification efficiency, ROI extraction was made and four datasets were created in different spatial dimensions. These datasets contain 160, 640, 1440, and 5760 hypercubes, respectively.ResultsThe best result was achieved by using 5760 hypercubes of 25x25x51. As a result of the classification of the hypercubes, accuracy 98.00%, sensitivity 97.22%, and specificity 98.78% were obtained. It was determined how many PCs used to achieve the best result. Further, the proposed 3D-CNN model is compared to 2D-CNN model to evaluate the performance of the study.ConclusionIt was aimed to evaluate the health status of neonates fastly by using HSI and 3D-CNN for the first time. The obtained results are an indication that HSI and 3D-CNN are very effective for the evaluation of unhealthy and healthy neonates.  相似文献   

20.
  1. A time‐consuming challenge faced by camera trap practitioners is the extraction of meaningful data from images to inform ecological management. An increasingly popular solution is automated image classification software. However, most solutions are not sufficiently robust to be deployed on a large scale due to lack of location invariance when transferring models between sites. This prevents optimal use of ecological data resulting in significant expenditure of time and resources to annotate and retrain deep learning models.
  2. We present a method ecologists can use to develop optimized location invariant camera trap object detectors by (a) evaluating publicly available image datasets characterized by high intradataset variability in training deep learning models for camera trap object detection and (b) using small subsets of camera trap images to optimize models for high accuracy domain‐specific applications.
  3. We collected and annotated three datasets of images of striped hyena, rhinoceros, and pigs, from the image‐sharing websites FlickR and iNaturalist (FiN), to train three object detection models. We compared the performance of these models to that of three models trained on the Wildlife Conservation Society and Camera CATalogue datasets, when tested on out‐of‐sample Snapshot Serengeti datasets. We then increased FiN model robustness by infusing small subsets of camera trap images into training.
  4. In all experiments, the mean Average Precision (mAP) of the FiN trained models was significantly higher (82.33%–88.59%) than that achieved by the models trained only on camera trap datasets (38.5%–66.74%). Infusion further improved mAP by 1.78%–32.08%.
  5. Ecologists can use FiN images for training deep learning object detection solutions for camera trap image processing to develop location invariant, robust, out‐of‐the‐box software. Models can be further optimized by infusion of 5%–10% camera trap images into training data. This would allow AI technologies to be deployed on a large scale in ecological applications. Datasets and code related to this study are open source and available on this repository: https://doi.org/10.5061/dryad.1c59zw3tx.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号