共查询到20条相似文献,搜索用时 15 毫秒
1.
M A Charleston 《Journal of computational biology》2001,8(1):79-91
The article introduces a parallel heuristic search strategy ("Hitch-hiking") which can be used in conjunction with other random-walk heuristic search strategies. It is applied to an artificial phylogeny problem, in which character sequences are evolved using pseudo-random numbers from a hypothetical ancestral sequence. The objective function to be minimized is the minimum number of character-state changes required on a binary tree that could account for the sequences observed at the tips (leaves) of the tree -- the Maximum Parsimony criterion. The Hitch-hiking strategy is shown to be useful in that it is robust and that on average the solutions found using the strategy are better than those found without. Also the strategy can dynamically provide information on the characteristics of the landscape of the problem. I argue that Hitch-hiking as a scheme for parallelization of existing heuristic search strategies is of potentially very general use, in many areas of combinatorial optimization. 相似文献
2.
Kyungtaek Lim Kazunori D. Yamada Martin C. Frith Kentaro Tomii 《Journal of structural and functional genomics》2016,17(4):147-154
Protein database search for public databases is a fundamental step in the target selection of proteins in structural and functional genomics and also for inferring protein structure, function, and evolution. Most database search methods employ amino acid substitution matrices to score amino acid pairs. The choice of substitution matrix strongly affects homology detection performance. We earlier proposed a substitution matrix named MIQS that was optimized for distant protein homology search. Herein we further evaluate MIQS in combination with LAST, a heuristic and fast database search tool with a tunable sensitivity parameter m, where larger m denotes higher sensitivity. Results show that MIQS substantially improves the homology detection and alignment quality performance of LAST across diverse m parameters. Against a protein database consisting of approximately 15 million sequences, LAST with m?=?105 achieves better homology detection performance than BLASTP, and completes the search 20 times faster. Compared to the most sensitive existing methods being used today, CS-BLAST and SSEARCH, LAST with MIQS and m?=?106 shows comparable homology detection performance at 2.0 and 3.9 times greater speed, respectively. Results demonstrate that MIQS-powered LAST is a time-efficient method for sensitive and accurate homology search. 相似文献
3.
Le Van Quyen M 《Biological research》2003,36(1):67-88
My purpose in this paper is to sketch a research direction based on Francisco Varela's pioneering work in neurodynamics (see also Rudrauf et al. 2003, in this issue). Very early on he argued that the internal coherence of every mental-cognitive state lies in the global self-organization of the brain activities at the large-scale, constituting a fundamental pole of integration called here a "dynamic core". Recent neuroimaging evidence appears to broadly support this hypothesis and suggests that a global brain dynamics emerges at the large scale level from the cooperative interactions among widely distributed neuronal populations. Despite a growing body of evidence supporting this view, our understanding of these large-scale brain processes remains hampered by the lack of a theoretical language for expressing these complex behaviors in dynamical terms. In this paper, I propose a rough cartography of a comprehensive approach that offers a conceptual and mathematical framework to analyze spatio-temporal large-scale brain phenomena. I emphasize how these nonlinear methods can be applied, what property might be inferred from neuronal signals, and where one might productively proceed for the future. This paper is dedicated, with respect and affection, to the memory of Francisco Varela. 相似文献
4.
MOTIVATION: Long terminal repeat (LTR) retrotransposons constitute a substantial fraction of most eukaryotic genomes and are believed to have a significant impact on genome structure and function. Conventional methods used to search for LTR retrotransposons in genome databases are labor intensive. We present an efficient, reliable and automated method to identify and analyze members of this important class of transposable elements. RESULTS: We have developed a new data-mining program, LTR_STRUC (LTR retrotransposon structure program) which identifies and automatically analyzes LTR retrotransposons in genome databases by searching for structural features characteristic of such elements. LTR_STRUC has significant advantages over conventional search methods in the case of LTR retrotransposon families having low sequence homology to known queries or families with atypical structure (e.g. non-autonomous elements lacking canonical retroviral ORFs) and is thus a discovery tool that complements established methods. LTR_STRUC finds LTR retrotransposons using an algorithm that encompasses a number of tasks that would otherwise have to be initiated individually by the user. For each LTR retrotransposon found, LTR_STRUC automatically generates an analysis of a variety of structural features of biological interest. AVAILABILITY: The LTR_STRUC program is currently available as a console application free of charge to academic users from the authors. 相似文献
5.
6.
Marc Tessera 《Origins of life and evolution of the biosphere》2017,47(1):57-68
The search for origin of ‘life’ is made even more complicated by differing definitions of the subject matter, although a general consensus is that an appropriate definition should center on Darwinian evolution (Cleland and Chyba 2002). Within a physical approach which has been defined as a level-4 evolution (Tessera and Hoelzer 2013), one mechanism could be described showing that only three conditions are required to allow natural selection to apply to populations of different system lineages. This approach leads to a vesicle- based model with the necessary properties. Of course such a model has to be tested. Thus, after a brief presentation of the model an experimental program is proposed that implements the different steps able to show whether this new direction of the research in the field is valid and workable. 相似文献
7.
High-throughput genotyping technologies such as DNA pooling and DNA microarrays mean that whole-genome screens are now practical for complex disease gene discovery using association studies. Because it is currently impractical to use all available markers, a subset is typically selected on the basis of required saturation density. Restricting markers to those within annotated genomic features of interest (e.g., genes or exons) or within feature-rich regions, reduces workload and cost while retaining much information. We have designed a program (MaGIC) that exploits genome assembly data to create lists of markers correlated with other genomic features. Marker lists are generated at a user-defined spacing and can target features with a user-defined density. Maps are in base pairs or linkage disequilibrium units (LDUs) as derived from the International HapMap data, which is useful for association studies and fine-mapping. Markers may be selected on the basis of heterozygosity and source database, and single nucleotide polymorphism (SNP) markers may additionally be selected on the basis of validation status. The import function means the method can be used for any genomic features such as housekeeping genes, long interspersed elements (LINES), or Alu repeats in humans, and is also functional for other species with equivalent data. The program and source code is freely available at http://cogent.iop.kcl.ac.uk/MaGIC.cogx. 相似文献
8.
Dosenbach NU Visscher KM Palmer ED Miezin FM Wenger KK Kang HC Burgund ED Grimes AL Schlaggar BL Petersen SE 《Neuron》2006,50(5):799-812
When performing tasks, humans are thought to adopt task sets that configure moment-to-moment data processing. Recently developed mixed blocked/event-related designs allow task set-related signals to be extracted in fMRI experiments, including activity related to cues that signal the beginning of a task block, "set-maintenance" activity sustained for the duration of a task block, and event-related signals for different trial types. Data were conjointly analyzed from mixed design experiments using ten different tasks and 183 subjects. Dorsal anterior cingulate cortex/medial superior frontal cortex (dACC/msFC) and bilateral anterior insula/frontal operculum (aI/fO) showed reliable start-cue and sustained activations across all or nearly all tasks. These regions also carried the most reliable error-related signals in a subset of tasks, suggesting that the regions form a "core" task-set system. Prefrontal regions commonly related to task control carried task-set signals in a smaller subset of tasks and lacked convergence across signal types. 相似文献
9.
ABSTRACT: BACKGROUND: Previous studies on tumor classification based on gene expression profiles suggest that gene selection plays a key role in improving the classification performance. Moreover, finding important tumor-related genes with the highest accuracy is a very important task because these genes might serve as tumor biomarkers, which is of great benefit to not only tumor molecular diagnosis but also drug development. RESULTS: This paper proposes a novel gene selection method with rich biomedical meaning based on Heuristic Breadth-first Search Algorithm (HBSA) to find as many optimal gene subsets as possible. Due to the curse of dimensionality, this type of method could suffer from over-fitting and selection bias problems. To address these potential problems, a HBSA-based ensemble classifier is constructed using majority voting strategy from individual classifiers constructed by the selected gene subsets, and a novel HBSA-based gene ranking method is designed to find important tumor-related genes by measuring the significance of genes using their occurrence frequencies in the selected gene subsets. The experimental results on nine tumor datasets including three pairs of cross-platform datasets indicate that the proposed method can not only obtain better generalization performance but also find many important tumor-related genes. CONCLUSIONS: It is found that the frequencies of the selected genes follow a power-law distribution, indicating that only a few top-ranked genes can be used as potential diagnosis biomarkers. Moreover, the top-ranked genes leading to very high prediction accuracy are closely related to specific tumor subtype and even hub genes. Compared with other related methods, the proposed method can achieve higher prediction accuracy with fewer genes. Moreover, they are further justified by analyzing the top-ranked genes in the context of individual gene function, biological pathway, and protein-protein interaction network. 相似文献
10.
A computer program that facilitates the creation of a culture collection database has been written for a microcomputer (Apple He with a Z-80 card) using dBASE II® (Ashton-Tate). The Culture Collection Program accommodates up to 250 individual strain records on one 5 1/4" floppy disk. For each strain, information that can be stored includes the name of the micro-organism, culture collection number, antibiotic resistance markers, plasmids, genetic markers, references, growth medium, growth temperature and additional comments. The last date of subculturing can be ascertained and information about the status of the preserved cultures can also be noted. With a menu-driven format which requires no computer programming expertise, the user can readily create new entries, update old ones and search the database for strains with certain common properties. 相似文献
11.
12.
ABSTRACT: Selection of appropriate outcomes or domains is crucial when designing clinical trials to compare directly the effects of different interventions in ways that minimise bias. If the findings are to influence policy and practice then the chosen outcomes need to be relevant and important to key stakeholders including patients and the public, health care professionals and others making decisions about health care. There is a growing recognition that insufficient attention has been paid to the outcomes measured in clinical trials. These issues could be addressed through the development and use of an agreed standardised collection of outcomes, known as a core outcome set, which should be measured and reported, as a minimum, in all trials for a specific clinical area. Accumulating work in this area has identified the need for general guidance on the development of core outcome sets. Key issues to consider in the development of a core outcome set include its scope, the stakeholder groups to be involved, choice of consensus method and the achievement of a consensus. 相似文献
13.
This paper presents a language for describing arrangements of motifs in biological sequences, and a program that uses the language to find the arrangements in motif match databases. The program does not by itself search for the constituent motifs, and is thus independent of how they are detected, which allows it to use motif match data of various origins. AVAILABILITY: The program can be tested online at http://hits.isb-sib.ch and the distribution is available from ftp://ftp.isrec.isb-sib.ch/pub/software/unix/mmsearch-1.0.tar.gz CONTACT: Thomas.Junier@isrec.unil.ch SUPPLEMENTARY INFORMATION: The full documentation about mmsearchis available from http://hits.isb-sib.ch/~tjunier/mmsearch/doc. 相似文献
14.
15.
Summary CLUSLA, a computer program for the clustering of very large phytosociological data sets is described. It is an elaboration of Janssen's (1975) simple procedure. The essence of the program is the creation of clusters, each starting with one relevé, as the relevés are entered in the program. Each new relevé that is sufficiently distinct from already existing clusters is considered a new cluster. The fusion criterion is the attainment of a certain level of (dis-) similarity between relevé and cluster. Bray and Curtis' dissimilarity measure with presence-absence data was used.The program, written in FORTRAN for an IBM 370–158 system, can deal with practically unlimited numbers of relevés, provided the product of the number of primary clusters and the number of species does not exceed 140.000. We adopted maxima of 100 and 1400 respectively.After the primary clustering round a reallocation is performed. Then a simple table is printed with information on the significance of occurrence of species in clusters according to a chi-square approach. The primary clusters can be treated again with a higher fusion threshold; or approached with more elaborate methods, in our case particularly the TABORD program.The program is demonstrated with a collection of 6072 relevés with 889 species of salt marsh vegetation from the Working-Group for Data-Processing.Contribution from the Working Group for Data-Processing in Phytosociology, International Society for Vegetation Science. Nomenclature follows the Trieste system, which will be published later.The authors are very grateful to Drs. Jan Janssen, Mike Dale, László Orlóci and Mike Austin for their comments on drafts of the program, and to Wil Kortekaas for her help in the interpretation of the tables. 相似文献
16.
G Valle 《Nucleic acids research》1993,21(22):5152-5156
DISCOVER1 (DIStribution COunter VERsion 1) is a new program that can identify DNA motifs occurring with a high deviation from the expected frequency. The program generates families of patterns, each family having a common set of defined bases. Undefined bases are inserted amongst the defined bases in different ways, thus generating the diverse patterns of each family. The occurrences of the different patterns are then compared and analysed within each family, assuming that all patterns should have the same probability of occurrence. An extensive use of computer memory, combined with the immediate sorting of counts by address calculation allow a complete counting of all DNA motifs on a single pass on the DNA sequence. This approach offers a very fast way to search for unusually distributed patterns and can identify inexact patterns as well as exact patterns. 相似文献
17.
Persistent cell motion in the absence of external signals: a search strategy for eukaryotic cells 总被引:1,自引:0,他引:1
Background
Eukaryotic cells are large enough to detect signals and then orient to them by differentiating the signal strength across the length and breadth of the cell. Amoebae, fibroblasts, neutrophils and growth cones all behave in this way. Little is known however about cell motion and searching behavior in the absence of a signal. Is individual cell motion best characterized as a random walk? Do individual cells have a search strategy when they are beyond the range of the signal they would otherwise move toward? Here we ask if single, isolated, Dictyostelium and Polysphondylium amoebae bias their motion in the absence of external cues.Methodology
We placed single well-isolated Dictyostelium and Polysphondylium cells on a nutrient-free agar surface and followed them at 10 sec intervals for ∼10 hr, then analyzed their motion with respect to velocity, turning angle, persistence length, and persistence time, comparing the results to the expectation for a variety of different types of random motion.Conclusions
We find that amoeboid behavior is well described by a special kind of random motion: Amoebae show a long persistence time (∼10 min) beyond which they start to lose their direction; they move forward in a zig-zag manner; and they make turns every 1–2 min on average. They bias their motion by remembering the last turn and turning away from it. Interpreting the motion as consisting of runs and turns, the duration of a run and the amplitude of a turn are both found to be exponentially distributed. We show that this behavior greatly improves their chances of finding a target relative to performing a random walk. We believe that other eukaryotic cells may employ a strategy similar to Dictyostelium when seeking conditions or signal sources not yet within range of their detection system. 相似文献18.
Baumgartner C Rejtar T Kullolli M Akella LM Karger BL 《Journal of proteome research》2008,7(9):4199-4208
A novel computational approach, termed Search for Modified Peptides (SeMoP), for the unrestricted discovery and verification of peptide modifications in shotgun proteomic experiments using low resolution ion trap MS/MS spectra is presented. Various peptide modifications, including post-translational modifications, sequence polymorphisms, as well as sample handling-induced changes, can be identified using this approach. SeMoP utilizes a three-step strategy: (1) a standard database search to identify proteins in a sample; (2) an unrestricted search for modifications using a newly developed algorithm; and (3) a second standard database search targeted to specific modifications found using the unrestricted search. This targeted approach provides verification of discovered modifications and, due to increased sensitivity, a general increase in the number of peptides with the specific modification. The feasibility of the overall strategy has been first demonstrated in the analysis of 65 plasma proteins. Various sample handling induced modifications, such as beta-elimination of disulfide bridges and pyrocarbamidomethylation, as well as biologically induced modifications, such as phosphorylation and methylation, have been detected. A subsequent targeted Sequest search has been used to verify selected modifications, and a 4-fold increase in the number of modified peptides was obtained. In a second application, 1367 proteins of a cervical cancer cell line were processed, leading to detection of several novel amino acid substitutions. By conducting the search against a database of peptides derived from proteins with decoy sequences, a false discovery rate of less than 5% for the unrestricted search resulted. SeMoP is shown to be an effective and easily implemented approach for the discovery and verification of peptide modifications. 相似文献
19.
MATCH-UP/MATRIX is a program designed to aid the investigatorinterested in determining primary protein structure. It is writtenin Applesoft BASIC for the Apple lle microcomputer. MATCH-UPwill survey any set of proteinaceous materials for amino acidsequence homology; however, it is primarily intended to comparethe structures of newly sequenced peptides with the establishedstructure of a protein with suspected homology. Any peptide-to-proteinalignment which shows a homology greater than or equal to thepercentage specified by the user will result in output. MATRIXwill compare the sequences of two proteins (peptides) in whateveralignment specified by the user and is intended to spot insertionsand/or deletions between structures.
Received on December 2, 1985; accepted on March 10, 1986 相似文献
20.
Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences 总被引:15,自引:0,他引:15
MOTIVATION: In 2001 and 2002, we published two papers (Bioinformatics, 17, 282-283, Bioinformatics, 18, 77-82) describing an ultrafast protein sequence clustering program called cd-hit. This program can efficiently cluster a huge protein database with millions of sequences. However, the applications of the underlying algorithm are not limited to only protein sequences clustering, here we present several new programs using the same algorithm including cd-hit-2d, cd-hit-est and cd-hit-est-2d. Cd-hit-2d compares two protein datasets and reports similar matches between them; cd-hit-est clusters a DNA/RNA sequence database and cd-hit-est-2d compares two nucleotide datasets. All these programs can handle huge datasets with millions of sequences and can be hundreds of times faster than methods based on the popular sequence comparison and database search tools, such as BLAST. 相似文献