首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.

Motivation

Two-locus model is a typical significant disease model to be identified in genome-wide association study (GWAS). Due to intensive computational burden and diversity of disease models, existing methods have drawbacks on low detection power, high computation cost, and preference for some types of disease models.

Method

In this study, two scoring functions (Bayesian network based K2-score and Gini-score) are used for characterizing two SNP locus as a candidate model, the two criteria are adopted simultaneously for improving identification power and tackling the preference problem to disease models. Harmony search algorithm (HSA) is improved for quickly finding the most likely candidate models among all two-locus models, in which a local search algorithm with two-dimensional tabu table is presented to avoid repeatedly evaluating some disease models that have strong marginal effect. Finally G-test statistic is used to further test the candidate models.

Results

We investigate our method named FHSA-SED on 82 simulated datasets and a real AMD dataset, and compare it with two typical methods (MACOED and CSE) which have been developed recently based on swarm intelligent search algorithm. The results of simulation experiments indicate that our method outperforms the two compared algorithms in terms of detection power, computation time, evaluation times, sensitivity (TPR), specificity (SPC), positive predictive value (PPV) and accuracy (ACC). Our method has identified two SNPs (rs3775652 and rs10511467) that may be also associated with disease in AMD dataset.  相似文献   

2.
3.
In the era of metagenomics and diagnostics sequencing, the importance of protein comparison methods of boosted performance cannot be overstated. Here we present PSimScan (Protein Similarity Scanner), a flexible open source protein similarity search tool which provides a significant gain in speed compared to BLASTP at the price of controlled sensitivity loss. The PSimScan algorithm introduces a number of novel performance optimization methods that can be further used by the community to improve the speed and lower hardware requirements of bioinformatics software. The optimization starts at the lookup table construction, then the initial lookup table–based hits are passed through a pipeline of filtering and aggregation routines of increasing computational complexity. The first step in this pipeline is a novel algorithm that builds and selects ‘similarity zones’ aggregated from neighboring matches on small arrays of adjacent diagonals. PSimScan performs 5 to 100 times faster than the standard NCBI BLASTP, depending on chosen parameters, and runs on commodity hardware. Its sensitivity and selectivity at the slowest settings are comparable to the NCBI BLASTP’s and decrease with the increase of speed, yet stay at the levels reasonable for many tasks. PSimScan is most advantageous when used on large collections of query sequences. Comparing the entire proteome of Streptocuccus pneumoniae (2,042 proteins) to the NCBI’s non-redundant protein database of 16,971,855 records takes 6.5 hours on a moderately powerful PC, while the same task with the NCBI BLASTP takes over 66 hours. We describe innovations in the PSimScan algorithm in considerable detail to encourage bioinformaticians to improve on the tool and to use the innovations in their own software development.  相似文献   

4.
In this paper, stochastic leader gravitational search algorithm (SL-GSA) based on randomized k is proposed. Standard GSA (SGSA) utilizes the best agents without any randomization, thus it is more prone to converge at suboptimal results. Initially, the new approach randomly choses k agents from the set of all agents to improve the global search ability. Gradually, the set of agents is reduced by eliminating the agents with the poorest performances to allow rapid convergence. The performance of the SL-GSA was analyzed for six well-known benchmark functions, and the results are compared with SGSA and some of its variants. Furthermore, the SL-GSA is applied to minimum variance distortionless response (MVDR) beamforming technique to ensure compatibility with real world optimization problems. The proposed algorithm demonstrates superior convergence rate and quality of solution for both real world problems and benchmark functions compared to original algorithm and other recent variants of SGSA.  相似文献   

5.
Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble’s output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) − k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer’s disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases.  相似文献   

6.
This paper proposes a novel method to improve the efficiency of a swarm of robots searching in an unknown environment. The approach focuses on the process of feeding and individual coordination characteristics inspired by the foraging behavior in nature. A predatory strategy was used for searching; hence, this hybrid approach integrated a random search technique with a dynamic particle swarm optimization (DPSO) search algorithm. If a search robot could not find any target information, it used a random search algorithm for a global search. If the robot found any target information in a region, the DPSO search algorithm was used for a local search. This particle swarm optimization search algorithm is dynamic as all the parameters in the algorithm are refreshed synchronously through a communication mechanism until the robots find the target position, after which, the robots fall back to a random searching mode. Thus, in this searching strategy, the robots alternated between two searching algorithms until the whole area was covered. During the searching process, the robots used a local communication mechanism to share map information and DPSO parameters to reduce the communication burden and overcome hardware limitations. If the search area is very large, search efficiency may be greatly reduced if only one robot searches an entire region given the limited resources available and time constraints. In this research we divided the entire search area into several subregions, selected a target utility function to determine which subregion should be initially searched and thereby reduced the residence time of the target to improve search efficiency.  相似文献   

7.
The shortest common supersequence problem is a classical problem with many applications in different fields such as planning, Artificial Intelligence and especially in Bioinformatics. Due to its NP-hardness, we can not expect to efficiently solve this problem using conventional exact techniques. This paper presents a heuristic to tackle this problem based on the use at different levels of a probabilistic variant of a classical heuristic known as Beam Search. The proposed algorithm is empirically analysed and compared to current approaches in the literature. Experiments show that it provides better quality solutions in a reasonable time for medium and large instances of the problem. For very large instances, our heuristic also provides better solutions, but required execution times may increase considerably.  相似文献   

8.
9.
Grid computing uses distributed interconnected computers and resources collectively to achieve higher performance computing and resource sharing. Task scheduling is one of the core steps to efficiently exploit the capabilities of Grid environment. Recently, heuristic algorithms have been successfully applied to solve task scheduling on computational Grids. In this paper, Gravitational Search Algorithm (GSA), as one of the latest population-based metaheuristic algorithms, is used for task scheduling on computational Grids. The proposed method employs GSA to find the best solution with the minimum makespan and flowtime. We evaluate this approach with Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) method. The results demonstrate that the benefit of the GSA is its speed of convergence and the capability to obtain feasible schedules.  相似文献   

10.
In this paper, a randomized numerical approach is used to obtain approximate solutions for a class of nonlinear Fredholm integral equations of the second kind. The proposed approach contains two steps: at first, we define a discretized form of the integral equation by quadrature formula methods and solution of this discretized form converges to the exact solution of the integral equation by considering some conditions on the kernel of the integral equation. And then we convert the problem to an optimal control problem by introducing an artificial control function. Following that, in the next step, solution of the discretized form is approximated by a kind of Monte Carlo (MC) random search algorithm. Finally, some examples are given to show the efficiency of the proposed approach.  相似文献   

11.
Cloud computing has attracted significant attention from research community because of rapid migration rate of Information Technology services to its domain. Advances in virtualization technology has made cloud computing very popular as a result of easier deployment of application services. Tasks are submitted to cloud datacenters to be processed on pay as you go fashion. Task scheduling is one the significant research challenges in cloud computing environment. The current formulation of task scheduling problems has been shown to be NP-complete, hence finding the exact solution especially for large problem sizes is intractable. The heterogeneous and dynamic feature of cloud resources makes optimum task scheduling non-trivial. Therefore, efficient task scheduling algorithms are required for optimum resource utilization. Symbiotic Organisms Search (SOS) has been shown to perform competitively with Particle Swarm Optimization (PSO). The aim of this study is to optimize task scheduling in cloud computing environment based on a proposed Simulated Annealing (SA) based SOS (SASOS) in order to improve the convergence rate and quality of solution of SOS. The SOS algorithm has a strong global exploration capability and uses fewer parameters. The systematic reasoning ability of SA is employed to find better solutions on local solution regions, hence, adding exploration ability to SOS. Also, a fitness function is proposed which takes into account the utilization level of virtual machines (VMs) which reduced makespan and degree of imbalance among VMs. CloudSim toolkit was used to evaluate the efficiency of the proposed method using both synthetic and standard workload. Results of simulation showed that hybrid SOS performs better than SOS in terms of convergence speed, response time, degree of imbalance, and makespan.  相似文献   

12.
13.
14.
Abstract

The genetic algorithm is a technique of function optimization derived from the principles of evolutionary theory. We have adapted it to perform conformational search on polypeptides and proteins. The algorithm was first tested on several small polypeptides and the 46 amino acid protein crambin under the AMBER potential energy function. The probable global minimum conformations of the polypeptides were located 90% of the time and a non-native conformation of crambin was located that was 150kcal/mol lower in potential energy than the minimized crystal structure conformation. Next, we used a knowledge-based potential function to predict the structures of melittin, pancreatic polypeptide, and crambin. A 2.31 Å ΔRMS conformation of melittin and a 5.33 Å ΔRMS conformation of pancreatic polypeptide were located by genetic algorithm-based conformational search under the knowledge-based potential function. Although the ΔRMS of pancreatic polypeptide was somewhat high, most of the secondary structure was correct. The secondary structure of crambin was predicted correctly, but the potential failed to promote packing interactions. Finally, we tested the packing aspects of our potential function by attempting to predict the tertiary structure of cytochrome b 562 given correct secondary structure as a constraint. The final predicted conformation of cytochrome b 562 was an almost completely extended continuous helix which indicated that the knowledge-based potential was useless for tertiary structure prediction. This work serves as a warning against testing potential functions designed for tertiary structure prediction on small proteins.  相似文献   

15.
16.
Abstract

An algorithm is described for generation of the long sequence written in a four letter alphabet from the constituent k-tuple words in the minimal number of separate, randomly defined fragments of the starting sequence. It is primarily intended for use in sequencing by hybridization (SBH) process- a potential method for sequencing human genome DNA (Drmanac et al., Genomics 4, pp. 114–128, 1989). The algorithm is based on the formerly defined rules and informative entities of the linear sequence.

The algorithm requires neither knowledge on the number of appearances of a given k-tuple in sequence fragments, nor the information on which k-tuple words are on the ends of a fragment. It operates with the mixed content of k-tuples of the various lengths. The concept of the algorithm enables operations with the k-tuple sets containing false positive and false negative k-tuples. The content of the false k-tuples primarily affects the completeness of the generated sequence, and its correctness in the specific cases only. The algorithm can be used for the optimization of SBH parameters in the simulation experiments, as well as for the sequence generation in the real SBH experiments on the genomic DNA.  相似文献   

17.
18.
The population migration algorithm (PMA) is a simulation of a population of the intelligent algorithm. Given the prematurity and low precision of PMA, this paper introduces a local search mechanism of the leap-frog algorithm and crossover operator to improve the PMA search speed and global convergence properties. The typical test function verifies the improved algorithm through its performance. Compared with the improved population migration and other intelligential algorithms, the result shows that the convergence rate of the improved PMA is very high and its convergence is proved.  相似文献   

19.
During inflammation, the resulting oxidative stress can damage surrounding host tissue, forming protein-carbonyls. The SJL mouse is an experimental animal model used to assess in vivo toxicological responses to reactive oxygen and nitrogen species from inflammation. The goals of this study were to identify the major serum proteins modified with a carbonyl functionality and to identify the types of carbonyl adducts. To select for carbonyl-modified proteins, serum proteins were reacted with an aldehyde reactive probe that biotinylated the carbonyl modification. Modified proteins were enriched by avidin affinity and identified by two-dimensional liquid chromatography tandem MS. To identify the carbonyl modification, tryptic peptides from serum proteins were subjected to avidin affinity and the enriched modified peptides were analyzed by liquid chromatography tandem MS. It was noted that the aldehyde reactive probe tag created tag-specific fragment ions and neutral losses, and these extra features in the mass spectra inhibited identification of the modified peptides by database searching. To enhance the identification of carbonyl-modified peptides, a program was written that used the tag-specific fragment ions as a fingerprint (in silico filter program) and filtered the mass spectrometry data to highlight only modified peptides. A de novo-like database search algorithm was written (biotin peptide identification program) to identify the carbonyl-modified peptides. Although written specifically for our experiments, this software can be adapted to other modification and enrichment systems. Using these routines, a number of lipid peroxidation-derived protein carbonyls and direct side-chain oxidation proteins carbonyls were identified in SJL mouse serum.During inflammation, activated phagocytes secrete reactive nitrogen species (RNS) and reactive oxygen species (ROS) that can eliminate infectious agents. If inflammation is chronic, RNS and ROS can also damage surrounding host tissue, resulting in protein modification in the form of protein-carbonyls (1). Total protein carbonylation has been used as a marker of oxidative stress and inflammation and increased levels have been seen in heart disease, lung disease, aging, neurodegenerative disorders, and inflammatory bowel disease (27). The carbonylation of proteins can result from the direct oxidation of protein side-chains, forming the glutamate and aminoadipate semialdehydes (Scheme 1) (8, 9), but can also occur from the indirect oxidation of polyunsaturated fatty acids (lipid peroxidation) and carbohydrates, leading to a variety of reactive aldehydes (Scheme 2) (10). These aldehydes covalently modify proteins through conjugate addition (often Michael addition) to nucleophilic amino acid side chains, creating protein-bound carbonyls (10, 11).Open in a separate windowScheme 1.Direct oxidative carbonylation of proteins to form glutamate and aminoadipate semialdehydes.Open in a separate windowScheme 2.Reactive aldehydes, arising from oxidation of polyunsaturated fatty acids and carbohydrates, can indirectly lead to protein carbonylation.In a previous study, DNA oxidative damage products, from tissues from the SJL mouse model of inflammation, were quantitated (12). Only the lipid peroxidation adducts increased in association with inflammation, which suggested an important role of lipids in inflammatory disease progression and established a direct correlation between inflammation and the increased formation of reactive aldehydes from oxidized lipids. Although DNA modification because of inflammation has been the focus of many animal and human studies, it is proteins that are considered most likely to be ubiquitously affected by disease, response, and recovery (13), and the biological consequences include more rapid protein turnover as well as novel signaling (1416). The overall carbonylation of proteins has been well documented in other inflammatory animal models, which have shown significant increases in protein-carbonyls in the mucosal lining of rat colon (17) and mouse colon (5) whereas increased levels of protein carbonyls were observed in rat serum, along with a higher turnover of proteins from the inflamed tissue (18, 19). Furthermore, increased protein carbonyl modification has been reported in studies of the colon mucosal lining from patients diagnosed with inflammatory bowel disease (20, 21). Taken together, these observations suggest that an increase in carbonylated proteins is likely in the SJL mouse and that the extent and type of protein-carbonyls could potentially be a marker for inflammation and disease.The SJL mouse is an experimental animal model used to assess in vivo toxicological responses to nitric oxide (NO) overproduction from inflammation (22). Injections of RscX lymphoma cells into these mice result in rapid tumor growth as well as host T-cell proliferation in lymph nodes, spleen, and liver, resulting in morbidity within 15 days. The induced macrophages create a 50-fold increase in NO production in spleen and lymph nodes and the post-translational modification 3-nitrotyrosine was highly elevated in spleen tissue.The identification of endogenously formed protein carbonyls in serum is challenging because of their low abundance and the large number of possible modifications (1, 2, 23), some of which are shown in Schemes 1 and 2. We recently identified proteins modified by the carbonyl 9,12-dioxo-10(E)-dodecenoic acid (DODE) in cells treated with the hydroperoxide of linoleic acid (13-HPODE) (24). This work used a technique first demonstrated by Maier and coworkers (25, 26). Protein carbonyls were derivatized with an aldehyde reactive probe (ARP),1 a biotinylated hydroxylamine that reacts preferentially with aldehyde and keto groups (27), allowing for subsequent enrichment of the modified proteins by avidin affinity. DODE-modified proteins were also identified using an anti-DODE antibody and Western blots. Although a number of DODE modified proteins were identified, we were unable to definitively identify the carbonyl modified peptides by mass spectrometry due both to low abundance and to the interference of ARP-tag-specific fragment ions on database searching.In this current study, SJL mouse serum was screened for the presence of protein carbonyls endogenously formed during inflammation. Carbonyl-modified proteins were then identified using techniques previously established (24); first anti-DODE Western blotting followed by ARP derivatization/enrichment and two-dimensional liquid chromatography tandem MS (2D-LC-MS/MS). These proteins then formed a database of putative carbonyl-modified proteins from SJL mouse serum. To identify the type of carbonyl modification and the modified peptide, the ARP derivatized peptides were enriched and analyzed by mass spectrometry. To minimize the confounding effect of ARP fragmentation, an algorithm (in silico filter) was written that filtered the mass spectrometry data to select only those peptides containing the known ARP pattern of fragmentation. This alone effectively reduced the number of false positives. To further alleviate the interfering effects of ARP fragments on peptide identification by database searching, a de novo searching algorithm (Biotin Peptide Identification program, BPI) was written. Peptides were evaluated against the database of proteins that had been previously identified as potentially carbonyl modified. Because modified peptides were searched against a finite list of proteins and all results were manually evaluated, the BPI program did not calculate a statistical peptide score, which allowed the identification of lower abundant modified peptides that would not be considered significant by standard search engines such as Mascot. The BPI program was also written with the flexibility to evaluate a wide range of known carbonyl-adduct masses and could therefore screen for a large number of carbonyl adducts at one time. This should also allow the program to be used with modification/enrichment systems other than the one used here. The program thus selected a finite number of carbonyl modified peptides, resulting in the identification of a number of proteins that were endogenously carbonylated in serum from the SJL mouse inflammation model.  相似文献   

20.
Color is one of the most prominent features of an image and used in many skin and face detection applications. Color space transformation is widely used by researchers to improve face and skin detection performance. Despite the substantial research efforts in this area, choosing a proper color space in terms of skin and face classification performance which can address issues like illumination variations, various camera characteristics and diversity in skin color tones has remained an open issue. This research proposes a new three-dimensional hybrid color space termed SKN by employing the Genetic Algorithm heuristic and Principal Component Analysis to find the optimal representation of human skin color in over seventeen existing color spaces. Genetic Algorithm heuristic is used to find the optimal color component combination setup in terms of skin detection accuracy while the Principal Component Analysis projects the optimal Genetic Algorithm solution to a less complex dimension. Pixel wise skin detection was used to evaluate the performance of the proposed color space. We have employed four classifiers including Random Forest, Naïve Bayes, Support Vector Machine and Multilayer Perceptron in order to generate the human skin color predictive model. The proposed color space was compared to some existing color spaces and shows superior results in terms of pixel-wise skin detection accuracy. Experimental results show that by using Random Forest classifier, the proposed SKN color space obtained an average F-score and True Positive Rate of 0.953 and False Positive Rate of 0.0482 which outperformed the existing color spaces in terms of pixel wise skin detection accuracy. The results also indicate that among the classifiers used in this study, Random Forest is the most suitable classifier for pixel wise skin detection applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号