首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We propose a new algorithm for identifying cis-regulatory modules in genomic sequences. The proposed algorithm, named RISO, uses a new data structure, called box-link, to store the information about conserved regions that occur in a well-ordered and regularly spaced manner in the data set sequences. This type of conserved regions, called structured motifs, is extremely relevant in the research of gene regulatory mechanisms since it can effectively represent promoter models. The complexity analysis shows a time and space gain over the best known exact algorithms that is exponential in the spacings between binding sites. A full implementation of the algorithm was developed and made available online. Experimental results show that the algorithm is much faster than existing ones, sometimes by more than four orders of magnitude. The application of the method to biological data sets shows its ability to extract relevant consensi.  相似文献   

2.

Background  

Complex networks are studied across many fields of science and are particularly important to understand biological processes. Motifs in networks are small connected sub-graphs that occur significantly in higher frequencies than in random networks. They have recently gathered much attention as a useful concept to uncover structural design principles of complex networks. Existing algorithms for finding network motifs are extremely costly in CPU time and memory consumption and have practically restrictions on the size of motifs.  相似文献   

3.
Short motifs of many cis-regulatory elements (CREs) can be found in the promoters of most Arabidopsis genes, and this raises the question of how their presence can confer specific regulation. We developed a universal algorithm to test the biological significance of CREs by first identifying every Arabidopsis gene with a CRE and then statistically correlating the presence or absence of the element with the gene expression profile on multiple DNA microarrays. This algorithm was successfully verified for previously characterized abscisic acid, ethylene, sucrose and drought responsive CREs in Arabidopsis, showing that the presence of these elements indeed correlates with treatment-specific gene induction. Later, we used standard motif sampling methods to identify 128 putative motifs induced by excess light, reactive oxygen species and sucrose. Our algorithm was able to filter 20 out of 128 novel CREs which significantly correlated with gene induction by either heat, reactive oxygen species and/or sucrose. The position, orientation and sequence specificity of CREs was tested in silicio by analyzing the expression of genes with naturally occurring sequence variations. In three novel CREs the forward orientation correlated with sucrose induction and the reverse orientation with sucrose suppression. The functionality of the predicted novel CREs was experimentally confirmed using Arabidopsis cell-suspension cultures transformed with short promoter fragments or artificial promoters fused with the GUS reporter gene. Our genome-wide analysis opens up new possibilities for in silicio verification of the biological significance of newly discovered CREs, and allows for subsequent selection of such CREs for experimental studies.  相似文献   

4.
5.
Motivation: Genomes contain biologically significant informationthat extends beyond that encoded in genes. Some of this informationrelates to various short dispersed repeats distributed throughoutthe genome. The goal of this work was to combine tools for detectionof statistically significant dispersed repeats in DNA sequenceswith tools to aid development of hypotheses regarding theirpossible physiological functions in an easy-to-use web-basedenvironment. Results: Ab Initio Motif Identification Environment (AIMIE)was designed to facilitate investigations of dispersed sequencemotifs in prokaryotic genomes. We used AIMIE to analyze theEscherichia coli and Haemophilus influenzae genomes in orderto demonstrate the utility of the new environment. AIMIE detectedrepeated extragenic palindrome (REP) elements, CRISPR repeats,uptake signal sequences, intergenic dyad sequences and severalother over-represented sequence motifs. Distributional patternsof these motifs were analyzed using the tools included in AIMIE. Availability: AIMIE and the related software can be accessedat our web site http://www.cmbl.uga.edu/software.html. Contact: mrazek{at}uga.edu Associate Editor: Alex Bateman  相似文献   

6.
7.

Background

Chitin is a polysaccharide that forms the hard, outer shell of arthropods and the cell walls of fungi and some algae. Peptidoglycan is a polymer of sugars and amino acids constituting the cell walls of most bacteria. Enzymes that are able to hydrolyze these cell membrane polymers generally play important roles for protecting plants and animals against infection with insects and pathogens. A particular group of such glycoside hydrolase enzymes share some common features in their three-dimensional structure and in their molecular mechanism, forming the lysozyme superfamily.

Results

Besides having a similar fold, all known catalytic domains of glycoside hydrolase proteins of lysozyme superfamily (families and subfamilies GH19, GH22, GH23, GH24 and GH46) share in common two structural elements: the central helix of the all-α domain, which invariably contains the catalytic glutamate residue acting as general-acid catalyst, and a β-hairpin pointed towards the substrate binding cleft. The invariant β-hairpin structure is interestingly found to display the highest amino acid conservation in aligned sequences of a given family, thereby allowing to define signature motifs for each GH family. Most of such signature motifs are found to have promising performances for searching sequence databases. Our structural analysis further indicates that the GH motifs participate in enzymatic catalysis essentially by containing the catalytic water positioning residue of inverting mechanism.

Conclusions

The seven families and subfamilies of the lysozyme superfamily all have in common a β-hairpin structure which displays a family-specific sequence motif. These GH β-hairpin motifs contain potentially important residues for the catalytic activity, thereby suggesting the participation of the GH motif to catalysis and also revealing a common catalytic scheme utilized by enzymes of the lysozyme superfamily.  相似文献   

8.

Background

Carbonylation, which takes place through oxidation of reactive oxygen species (ROS) on specific residues, is an irreversibly oxidative modification of proteins. It has been reported that the carbonylation is related to a number of metabolic or aging diseases including diabetes, chronic lung disease, Parkinson’s disease, and Alzheimer’s disease. Due to the lack of computational methods dedicated to exploring motif signatures of protein carbonylation sites, we were motivated to exploit an iterative statistical method to characterize and identify carbonylated sites with motif signatures.

Results

By manually curating experimental data from research articles, we obtained 332, 144, 135, and 140 verified substrate sites for K (lysine), R (arginine), T (threonine), and P (proline) residues, respectively, from 241 carbonylated proteins. In order to examine the informative attributes for classifying between carbonylated and non-carbonylated sites, multifarious features including composition of twenty amino acids (AAC), composition of amino acid pairs (AAPC), position-specific scoring matrix (PSSM), and positional weighted matrix (PWM) were investigated in this study. Additionally, in an attempt to explore the motif signatures of carbonylation sites, an iterative statistical method was adopted to detect statistically significant dependencies of amino acid compositions between specific positions around substrate sites. Profile hidden Markov model (HMM) was then utilized to train a predictive model from each motif signature. Moreover, based on the method of support vector machine (SVM), we adopted it to construct an integrative model by combining the values of bit scores obtained from profile HMMs. The combinatorial model could provide an enhanced performance with evenly predictive sensitivity and specificity in the evaluation of cross-validation and independent testing.

Conclusion

This study provides a new scheme for exploring potential motif signatures at substrate sites of protein carbonylation. The usefulness of the revealed motifs in the identification of carbonylated sites is demonstrated by their effective performance in cross-validation and independent testing. Finally, these substrate motifs were adopted to build an available online resource (MDD-Carb, http://csb.cse.yzu.edu.tw/MDDCarb/) and are also anticipated to facilitate the study of large-scale carbonylated proteomes.
  相似文献   

9.

Background

The Illumina HumanMethylation450 BeadChip (HM450K) measures the DNA methylation of 485,512 CpGs in the human genome. The technology relies on hybridization of genomic fragments to probes on the chip. However, certain genomic factors may compromise the ability to measure methylation using the array such as single nucleotide polymorphisms (SNPs), small insertions and deletions (INDELs), repetitive DNA, and regions with reduced genomic complexity. Currently, there is no clear method or pipeline for determining which of the probes on the HM450K bead array should be retained for subsequent analysis in light of these issues.

Results

We comprehensively assessed the effects of SNPs, INDELs, repeats and bisulfite induced reduced genomic complexity by comparing HM450K bead array results with whole genome bisulfite sequencing. We determined which CpG probes provided accurate or noisy signals. From this, we derived a set of high-quality probes that provide unadulterated measurements of DNA methylation.

Conclusions

Our method significantly reduces the risk of false discoveries when using the HM450K bead array, while maximising the power of the array to detect methylation status genome-wide. Additionally, we demonstrate the utility of our method through extraction of biologically relevant epigenetic changes in prostate cancer.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-51) contains supplementary material, which is available to authorized users.  相似文献   

10.
In high-throughput studies of diseases, terms enriched with disease-related genes based on Gene Ontology (GO) are routinely found. However, most current algorithms used to find significant GO terms cannot handle the redundancy that results from the dependencies of GO terms. Simply based on some numerical considerations, current algorithms developed for reducing this redundancy may produce results that do not account for biologically interesting cases. In this article, we present several rules used to design a tool called GO-function for extracting biologically relevant terms from statistically significant GO terms for a disease. Using one gene expression profile for colorectal cancer, we compared GO-function with four algorithms designed to treat redundancy. Then, we validated results obtained in this data set by GO-function using another data set for colorectal cancer. Our analysis showed that GO-function can identify disease-related terms that are more statistically and biologically meaningful than those found by the other four algorithms.  相似文献   

11.
12.
13.
The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in protein-binding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if a clique consisting of a sufficiently large number of mutated copies of the motif (i.e., the signals) is present in the DNA sequence. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum detectable clique size qc as a function of sequence length N for random sequences. We found that qc increases linearly with N for a fast version of the algorithm based on counting three-member sub-cliques. Imposing consensus constraints reduces qc by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N = 12,000 for (l, d) = (15, 4).  相似文献   

14.

Background  

Existing biological databases support a variety of queries such as keyword or definition search. However, they do not provide any measure of relevance for the instances reported, and result sets are usually sorted arbitrarily.  相似文献   

15.
16.
Chu JS  Johnsen RC  Chua SY  Tu D  Dennison M  Marra M  Jones SJ  Baillie DL  Rose AM 《Genetics》2012,190(4):1225-1233
The issue of heterozygosity continues to be a challenge in the analysis of genome sequences. In this article, we describe the use of allele ratios to distinguish biologically significant single-nucleotide variants from background noise. An application of this approach is the identification of lethal mutations in Caenorhabditis elegans essential genes, which must be maintained by the presence of a wild-type allele on a balancer. The h448 allele of let-504 is rescued by the duplication balancer sDp2. We readily identified the extent of the duplication when the percentage of read support for the lesion was between 70 and 80%. Examination of the EMS-induced changes throughout the genome revealed that these mutations exist in contiguous blocks. During early embryonic division in self-fertilizing C. elegans, alkylated guanines pair with thymines. As a result, EMS-induced changes become fixed as either G→A or C→T changes along the length of the chromosome. Thus, examination of the distribution of EMS-induced changes revealed the mutational and recombinational history of the chromosome, even generations later. We identified the mutational change responsible for the h448 mutation and sequenced PCR products for an additional four alleles, correlating let-504 with the DNA-coding region for an ortholog of a NFκB-activating protein, NKAP. Our results confirm that whole-genome sequencing is an efficient and inexpensive way of identifying nucleotide alterations responsible for lethal phenotypes and can be applied on a large scale to identify the molecular basis of essential genes.  相似文献   

17.
We present a computational scheme to locally align a collection of RNA sequences using sequence and structure constraints. In addition, the method searches for the resulting alignments with the most significant common motifs, among all possible collections. The first part utilizes a simplified version of the Sankoff algorithm for simultaneous folding and alignment of RNA sequences, but maintains tractability by constructing multi-sequence alignments from pairwise comparisons. The algorithm finds the multiple alignments using a greedy approach and has similarities to both CLUSTAL and CONSENSUS, but the core algorithm assures that the pairwise alignments are optimized for both sequence and structure conservation. The choice of scoring system and the method of progressively constructing the final solution are important considerations that are discussed. Example solutions, and comparisons with other approaches, are provided. The solutions include finding consensus structures identical to published ones.  相似文献   

18.

Background  

Understanding gene regulatory networks has become one of the central research problems in bioinformatics. More than thirty algorithms have been proposed to identify DNA regulatory sites during the past thirty years. However, the prediction accuracy of these algorithms is still quite low. Ensemble algorithms have emerged as an effective strategy in bioinformatics for improving the prediction accuracy by exploiting the synergetic prediction capability of multiple algorithms.  相似文献   

19.
MAVisto: a tool for the exploration of network motifs   总被引:1,自引:0,他引:1  
SUMMARY: MAVisto is a tool for the exploration of motifs in biological networks. It provides a flexible motif search algorithm and different views for the analysis and visualization of network motifs. These views help to explore interesting motifs: the frequency of motif occurrences can be compared with randomized networks, a list of motifs along with information about structure and number of occurrences depending on the reuse of network elements shows potentially interesting motifs, a motif fingerprint reveals the overall distribution of motifs of a given size and the distribution of a particular motif in the network can be visualized using an advanced layout algorithm. AVAILABILITY: MAVisto is platform independent and available free of charge as a Java webstart application at http://mavisto.ipk-gatersleben.de/ CONTACT: schwoebb@ipk-gatersleben.de SUPPLEMENTARY INFORMATION: Can be found at http://mavisto.ipk-gatersleben.de/  相似文献   

20.
RNA molecules, which are found in all living cells, fold into characteristic structures that account for their diverse functional activities. Many of these RNA structures consist of a collection of fundamental RNA motifs. The various combinations of RNA basic components form different RNA classes and define their unique structural and functional properties. The availability of many genome sequences makes it possible to search computationally for functional RNAs. Biological experiments indicate that functional RNAs have characteristic RNA structural motifs represented by specific combinations of base pairings and conserved nucleotides in the loop regions. The searching for those well-ordered RNA structures and their homologues in genomic sequences is very helpful for the understanding of RNA-based gene regulation. In this paper, we consider the following problem: given an RNA sequence with a known secondary structure, efficiently determine candidate segments in genomic sequences that can potentially form RNA secondary structures similar to the given RNA secondary structure. Our new bottom-up approach searches all potential stem-loops similar to ones of the given RNA secondary structure first, and then based on located stem-loops, detects potential homologous structural RNAs in genomic sequences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号