首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
More than 200 open reading frames (ORFs) from the human cytomegalovirus genome have been reported as potentially coding for proteins. We have used two pattern-based in silico approaches to analyze this set of putative viral genes. With the help of an objective annotation method that is based on the Bio-Dictionary, a comprehensive collection of amino acid patterns that describes the currently known natural sequence space of proteins, we have reannotated all of the previously reported putative genes of the human cytomegalovirus. Also, with the help of MUSCA, a pattern-based multiple sequence alignment algorithm, we have reexamined the original human cytomegalovirus gene family definitions. Our analysis of the genome shows that many of the coded proteins comprise amino acid combinations that are unique to either the human cytomegalovirus or the larger group of herpesviruses. We have confirmed that a surprisingly large portion of the analyzed ORFs encode membrane proteins, and we have discovered a significant number of previously uncharacterized proteins that are predicted to be G-protein-coupled receptor homologues. The analysis also indicates that many of the encoded proteins undergo posttranslational modifications such as hydroxylation, phosphorylation, and glycosylation. ORFs encoding proteins with similar functional behavior appear in neighboring regions of the human cytomegalovirus genome. All of the results of the present study can be found and interactively explored online (http://cbcsrv.watson.ibm.com/virus/).  相似文献   

2.
We have developed a restriction landmark genome scanning (RLGS) system in silico, involving two-dimensional electrophoretic analysis of DNA by computer simulation that is based on the availability of whole-genome sequences for specific organisms. We applied the technique to the analysis of the Xanthomonas oryzae pathovar oryzae (Xoo) MAFF 311018, which causes bacterial blight in rice. The coverage that was found to be achievable using RLGS in silico, as a percentage of the genomic regions that could be detected, ranged from 44.5% to 72.7% per image. However, this reached a value of 96.7% using four images that were obtained with different combinations of landmark restriction enzymes. Interestingly, the signal intensity of some of the specific spots obtained was significantly lower than that of other surrounding spots when MboI, which cleaves unmethylated 5'-GATC-3' sites, was used. DNA gel blot analysis with both DNA adenine methylase (Dam)-sensitive and -insensitive isoschizomers (MboI and Sau3AI) revealed that Dam-mediated DNA adenine methylation had indeed occurred at these particular sites. These results suggest that a significant portion of the 5'-GATC-3' sites within the Xoo genome is stably methylated by Dam.  相似文献   

3.
Summary A list is presented of published reports of DNA polymorphisms found in the human genome by restriction enzyme analysis. While the list indicates the large number of restriction fragment length polymorphisms (RFLPs) detected to date, the information collated is insufficient to permit an estimate of heterozygosity for the genome as a whole. Data from our laboratory are therefore also presented on RFLPs detected using a random sample of cloned DNA segments. Such an analysis has permitted a first unbiassed estimate of heterozygosity for the human genome. Since this figure is an order of magnitude higher than previous estimates derived from protein data, the majority of polymorphic variation present in the human genome must, by implication, occur in noncoding sequences. In addition it was confirmed that enzymes containing the dinucleotide CpG in their recognition sequences detect more polymorphic variation than those that do not contain a CpG. Also presented are the clinical applications of DNA polymorphisms in the diagnosis of human genetic disease.Supported by the Deutsche Forschungsgemeinschaft  相似文献   

4.
The open reading frames of human cytomegalovirus (human herpesvirus-5, HHV5) encode some 213 unique proteins with mostly unknown functions. Using the threading program, ProCeryon, we calculated possible matches between the amino acid sequences of these proteins and the Protein Data Bank library of three-dimensional structures. Thirty-six proteins were fully identified in terms of their structure and, often, function; 65 proteins were recognized as members of narrow structural/functional families (e.g. DNA-binding factors, cytokines, enzymes, signaling particles, cell surface receptors etc.); and 87 proteins were assigned to broad structural classes (e.g. all-beta, 3-layer-alphabetaalpha, multidomain, etc.). Genes encoding proteins with similar folds, or containing identical structural traits (extreme sequence length, runs of unstructured (Pro and/or Gly-rich) residues, transmembrane segments, etc.) often formed tandem clusters throughout the genome. In the course of this work, benchmarks on about 20 known folds were used to optimize adjustable parameters of threading calculations, i.e. gap penalty weights used in sequence/structure alignments; new scores obtained as simple combinations of existing scoring functions; and number of threading runs conducive to meaningful results. An introduction of summed, per-residue-normalized scores has been essential for discovery of subdomains (EGF-like, SH2, SH3) in longer protein sequences, such as the eight "open sandwich" cytokine domains, 60-70 amino acids long and having the 3beta1alpha fold with one or two disulfide bridges, present in otherwise unrelated proteins.  相似文献   

5.
Studies on the nature of restriction fragment length polymorphisms (RFLPs) were undertaken to characterize the Citrus genome. This type of analysis has not been carried out with any other perennial crop. Citrus reticulata Blanco cv Clementine, C. xparadisi Macf. cv Duncan, and an F1 hybrid (LB 1–21) were used to determine what probe/enzyme combinations revealed polymorphisms in Southern analysis, and a backcross family (LB 1–21xClementine) of 65 randomly selected hybrid seedlings was used for some analyses. A majority (73%) of the clones examined from a PstI genomic library appeared to detect single-copy sequences based on RFLP banding patterns, while clones from a cDNA library revealed a lower percentage of single copy sequences. When hybridization stringencies were lowered, 21% of the genomic clones examined revealed greater copy numbers. PstI digestion of Duncan DNA indicated abundant methylation, so the relatively high frequency of multiple-copy sequences observed at moderate stringency cannot be attributed to a lack of methylation of the Citrus DNA. The polymorphisms in banding patterns observed primarily resulted from insertions and/or deletions rather than from base substitutions, and a model is presented to account for the varying patterns obtained from individual probes with different restriction enzymes. Finally, a model for transposon activity in Citrus is proposed, based on observations made during the course of these studies.  相似文献   

6.
A computer-based differential display tool named HsAnalyst has been developed and successfully used for the comparison of expression patterns in a set of tumours versus a set of normal tissues. A list of EST clusters highly represented in tumours and rarely observed in normal tissues has been developed as a resulting output file of the program. These differentially expressed EST clusters (genes) can be useful for developing new tumour markers and prognostic indicators for a wide set of human malignancies. Tumour-specific protein-coding genes may be considered a manifestation of tumour-specific gene expression.  相似文献   

7.
8.
9.
Recent advances in DNA sequencing technology have enabled elucidation of whole genome information from a plethora of organisms. In parallel with this technology, various bioinformatics tools have driven the comparative analysis of the genome sequences between species and within isolates. While drawing meaningful conclusions from a large amount of raw material, computer-aided identification of suitable targets for further experimental analysis and characterization, has also led to the prediction of non-human homologous essential genes in bacteria as promising candidates for novel drug discovery. Here, we present a comparative genomic analysis to identify essential genes in Burkholderia pseudomallei. Our in silico prediction has identified 312 essential genes which could also be potential drug candidates. These genes encode essential proteins to support the survival of B. pseudomallei including outer-inner membrane and surface structures, regulators, proteins involved in pathogenenicity, adaptation, chaperones as well as degradation of small and macromolecules, energy metabolism, information transfer, central/intermediate/miscellaneous metabolism pathways and some conserved hypothetical proteins of unknown function. Therefore, our in silico approach has enabled rapid screening and identification of potential drug targets for further characterization in the laboratory.  相似文献   

10.
In silico tools have been developed to predict variants that may have an impact on pre-mRNA splicing. The major limitation of the application of these tools to basic research and clinical practice is the difficulty in interpreting the output. Most tools only predict potential splice sites given a DNA sequence without measuring splicing signal changes caused by a variant. Another limitation is the lack of large-scale evaluation studies of these tools. We compared eight in silico tools on 2959 single nucleotide variants within splicing consensus regions (scSNVs) using receiver operating characteristic analysis. The Position Weight Matrix model and MaxEntScan outperformed other methods. Two ensemble learning methods, adaptive boosting and random forests, were used to construct models that take advantage of individual methods. Both models further improved prediction, with outputs of directly interpretable prediction scores. We applied our ensemble scores to scSNVs from the Catalogue of Somatic Mutations in Cancer database. Analysis showed that predicted splice-altering scSNVs are enriched in recurrent scSNVs and known cancer genes. We pre-computed our ensemble scores for all potential scSNVs across the human genome, providing a whole genome level resource for identifying splice-altering scSNVs discovered from large-scale sequencing studies.  相似文献   

11.
 We consider the general problem of constructing a physical map of a genome by welding islands of overlapping clones. Both distribution of clone length and non-uniform probability of overlap detection are taken into account, the latter restricted to the Markov case in which only the location of the end of the developing island is required. Exact results for the distribution of island length are obtained in the special cases of fixed clone length or rigid overlap criterion, and mean and variance for the general situation. Determination of ocean length distribution permits island number and contig number distributions to be found as well. Received: 21 December 1998  相似文献   

12.
In selenoproteins, incorporation of the amino acid selenocysteine is specified by the UGA codon, usually a stop signal. The alternative decoding of UGA is conferred by an mRNA structure, the SECIS element, located in the 3′-untranslated region of the selenoprotein mRNA. Because of the non-standard use of the UGA codon, current computational gene prediction methods are unable to identify selenoproteins in the sequence of the eukaryotic genomes. Here we describe a method to predict selenoproteins in genomic sequences, which relies on the prediction of SECIS elements in coordination with the prediction of genes in which the strong codon bias characteristic of protein coding regions extends beyond a TGA codon interrupting the open reading frame. We applied the method to the Drosophila melanogaster genome, and predicted four potential selenoprotein genes. One of them belongs to a known family of selenoproteins, and we have tested experimentally two other predictions with positive results. Finally, we have characterized the expression pattern of these two novel selenoprotein genes.  相似文献   

13.
Genomic DNA size was measured in three strains of Pseudomonas aeruginosa, ATCC 29260 (exotoxin A), ATCC 33467 (type I smooth) and ATCC 33468 (type 2 mucoid) by transverse alternating field electrophoresis of restriction fragments. Because of the high (67%) G + C content of Pseudomonas aeruginosa, restriction enzymes that recognize sequences with at least 4 AT base pairs were expected to be rare cutters. Eight enzymes produced fragments greater than 200 kb in size: Dral (TTT/AAA), Asnl (ATT/AAT), Hpal (GTT/AAC), AfIII (C/TTAAG), Xbal (T/CTAGA), Spel (A/CTAGT), Sspl (AAT/ATT) and Ndel (CA/TATG). All eight enzymes recognized one of three rare tetranucleotide sequences, TTAA, CTAG or ATAT. Pseudomonas aeruginosa strain 29260 has a genomic DNA size of 5573 kb. Strains 33467 and 33468 have identical restriction patterns and a possible deletion with a genomic size of 5407 kb.  相似文献   

14.
Genomic DNA size was measured in clinical isolates of Haemophilus influenzae by Pulsed-Field Gel Electrophoresis of DNA restriction fragments. Because of the high (64%) A+T content of H. influenzae DNA, restriction enzymes that recognize sequences with at least four GC base pairs were expected to be rare cutters. Five enzymes that produced fragments greater than 200 kb in size were used to digest intact chromosomes and fragments resolved by TAFE and/or FIGE: ApaI (GGGCCC), EagI (CGGCCG), NotI (GCGGCCGC), RsrI (CGGA/TCCG), and SmaI (CCCGGG). All five had recognition sequences with at least six GC base pairs. The genomic DNA size of H. influenzae serotype b, estimated with ApaI, EagI, NotI, RsrII, and SmaI, is 1,950 kb.  相似文献   

15.
We have developed a website, www.in-silico.com, which runs a software program that performs three basic tasks in completely sequenced bacterial genomes by in silico analysis: PCR amplification, amplified fragment length polymorphism (AFLP-PCR) and endonuclease restriction. For PCR, after selection of the genome and introduction of primers, fragment size, DNA sequence and corresponding open reading frame (ORF) identity of the resulting PCR product is computed. Plasmids of sequenced species may be included in the analysis. Theoretical AFLP-PCR analyzes similar parameters, and includes a suggestion tool providing a list of commercial restriction enzyme pairs yielding up to 50 amplicons in the selected genome. Endonuclease restriction analysis of complete genomes and plasmids calculates the number of restriction sites for endonucleases in a given genome. If the number of fragments is 50 or fewer, pulsed field gel electrophoresis image and restriction maps are illustrated. Other tools that have been included in this site are ORF search by name and DNA to protein translation as well as restriction digestion of user-defined DNA sequences. AVAILABILITY: This is a new molecular biology resource freely available over the Internet at http://www.in-silico.com  相似文献   

16.
Repeating restriction fragments of human DNA.   总被引:1,自引:0,他引:1  
Human DNA digested with Hae III showed multiple repeats of a 170 base pair fragment. The most prominent band was the 340 base pair dimer, estimated to be 0.8% of the entire genome. Eco R1 and Hha I yielded fragments with similar electrophoretic mobility to the Hae III dimer. In each case this band was markedly enriched in DNA reassociating at a 0t of less than or equal to 1. Hybridization of the Hae III dimer to gels eluted on to filters demonstrated that the multiple Hae III fragments and Eco R1 fragments contained compatible sequences. These sequences may comprise a distinct subclass of DNA.  相似文献   

17.
18.
19.
Terminal restriction fragment length polymorphism (T-RFLP) analysis of PCR-amplified genes is a widely used fingerprinting technique in molecular microbial ecology. In this study, we show that besides expected terminal restriction fragments (T-RFs), additional secondary T-RFs occur in T-RFLP analysis of amplicons from cloned 16S rRNA genes at high frequency. A total of 50% of 109 bacterial and 78% of 68 archaeal clones from the guts of cetoniid beetle larvae, using MspI and AluI as restriction enzymes, respectively, were affected by the presence of these additional T-RFs. These peaks were called "pseudo-T-RFs" since they can be detected as terminal fluorescently labeled fragments in T-RFLP analysis but do not represent the primary terminal restriction site as indicated by sequence data analysis. Pseudo-T-RFs were also identified in T-RFLP profiles of pure culture and environmental DNA extracts. Digestion of amplicons with the single-strand-specific mung bean nuclease prior to T-RFLP analysis completely eliminated pseudo-T-RFs. This clearly indicates that single-stranded amplicons are the reason for the formation of pseudo-T-RFs, most probably because single-stranded restriction sites cannot be cleaved by restriction enzymes. The strong dependence of pseudo-T-RF formation on the number of cycles used in PCR indicates that (partly) single-stranded amplicons can be formed during amplification of 16S rRNA genes. In a model, we explain how transiently formed secondary structures of single-stranded amplicons may render single-stranded amplicons accessible to restriction enzymes. The occurrence of pseudo-T-RFs has consequences for the interpretation of T-RFLP profiles from environmental samples, since pseudo-T-RFs may lead to an overestimation of microbial diversity. Therefore, it is advisable to establish 16S rRNA gene sequence clone libraries in parallel with T-RFLP analysis from the same sample and to check clones for their in vitro digestion T-RF pattern to facilitate the detection of pseudo-T-RFs.  相似文献   

20.
中国钩端螺旋体rRNA基因多态性分析   总被引:1,自引:0,他引:1  
以DigdUTP标记的16SrRNA及23SrRNA基因为探针,分析了八个血清群54个血清型64株国内外致病性钩端螺旋体参考株和27株野生株染色体经限制性内切酶EcoRⅠ消化后的rRNA基因限制性图谱。结果发现,91株菌中共有56个核糖核酸型(Ribotype,简称RT),除部分血清群中少数不同的血清型有相同的RT型外,大部分血清型都有独特的RT型,同一血清群往往拥有共同的核心片段;除黄疸出血群的黄疸出血型外,同一血清型的国内和国际参考株的RT型不相同;大多数野生株的RT和相应血清型国内参考株相同,差异也只表现为谱形上个别带型的缺少和增加,所研究的波摩那型野生株的RT型和国际参考株相同而和国内参考株不同  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号