期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Detecting overlapping coding sequences in virus genomes

Andrew E Firth Chris M Brown 《BMC bioinformatics》2006,7(1):1-6

Background

With the advances in DNA sequencer-based technologies, it has become possible to automate several steps of the genotyping process leading to increased throughput. To efficiently handle the large amounts of genotypic data generated and help with quality control, there is a strong need for a software system that can help with the tracking of samples and capture and management of data at different steps of the process. Such systems, while serving to manage the workflow precisely, also encourage good laboratory practice by standardizing protocols, recording and annotating data from every step of the workflow.

Results

A laboratory information management system (LIMS) has been designed and implemented at the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT) that meets the requirements of a moderately high throughput molecular genotyping facility. The application is designed as modules and is simple to learn and use. The application leads the user through each step of the process from starting an experiment to the storing of output data from the genotype detection step with auto-binning of alleles; thus ensuring that every DNA sample is handled in an identical manner and all the necessary data are captured. The application keeps track of DNA samples and generated data. Data entry into the system is through the use of forms for file uploads. The LIMS provides functions to trace back to the electrophoresis gel files or sample source for any genotypic data and for repeating experiments. The LIMS is being presently used for the capture of high throughput SSR (simple-sequence repeat) genotyping data from the legume (chickpea, groundnut and pigeonpea) and cereal (sorghum and millets) crops of importance in the semi-arid tropics.

Conclusion

A laboratory information management system is available that has been found useful in the management of microsatellite genotype data in a moderately high throughput genotyping laboratory. The application with source code is freely available for academic users and can be downloaded fromhttp://www.icrisat.org/gt-bt/lims/lims.asp. 相似文献

2.

Detecting overlapping coding sequences with pairwise alignments

Firth AE Brown CM 《Bioinformatics (Oxford, England)》2005,21(3):282-292

MOTIVATION: Overlapping gene coding sequences (CDSs) are particularly common in viruses but also occur in more complex genomes. Detecting such genes with conventional gene-finding algorithms can be difficult for several reasons. If an overlapping CDS is on the same read-strand as a known CDS, then there may not be a distinct promoter or mRNA. Furthermore, the constraints imposed by double-coding can result in atypical codon biases. However, these same constraints lead to particular mutation patterns that may be detectable in sequence alignments. RESULTS: In this paper, we investigate several statistics for detecting double-coding sequences with pairwise alignments--including a new maximum-likelihood method. We also develop a model for double-coding sequence evolution. Using simulated sequences generated with the model, we characterize the distribution of each statistic as a function of sequence composition, length, divergence time and double-coding frame. Using these results, we develop several algorithms for detecting overlapping CDSs. The algorithms were tested on known overlapping CDSs and other overlapping open reading frames (ORFs) in the hepatitis B virus (HBV), Escherichia coli and Salmonella typhimurium genomes. The algorithms should prove useful for detecting novel overlapping genes--especially short coding ORFs in viruses. AVAILABILITY: Programs may be obtained from the authors. SUPPLEMENTARY INFORMATION: http://biochem.otago.ac.nz/double.html. 相似文献

3.

Novel overlapping coding sequences in Chlamydia trachomatis

Jensen KT Petersen L Falk S Iversen P Andersen P Theisen M Krogh A 《FEMS microbiology letters》2006,265(1):106-117

相似文献

4.

Correlations between coding and contiguous non-coding sequences in isochore families from vertebrate genomes

Costantini M Bernardi G 《Gene》2008,410(2):241-248

Many years ago compositional correlations were found to hold between coding and contiguous non-coding sequences. These correlations were essentially studied in whole genomes of mammals, which are characterized by strong compositional heterogeneities. Here we investigated whether these correlations also hold within the much more homogeneous isochore families. This point was checked not only in the case of mammals, but also in that of phylogenetically distant vertebrates, which are characterized by very different compositional patterns. Indeed, these are remarkably different in cold- and warm-blooded vertebrates. Fish genomes, for instance, are much more homogeneous than those of mammals and birds. The compositional correlations between coding sequences and the corresponding introns, or their 5′ and 3′ flanking regions, were studied in the isochore families of the fully sequenced genomes from four fishes (Brachydanio rerio, Oryzias latipes, Gasterosteus aculeatus and Tetraodon nigroviridis), human and chicken. 相似文献

5.

新城疫病毒非编码序列的遗传演化趋势

徐怀英秦卓明亓丽红张伟王友令刘金华《微生物学报》2014,54(9):1073-1081

【目的】探讨新城疫病毒全基因序列中非编码序列的分子演化规律。【方法】结合本研究室2012年自产蛋下降鸭群中分离测序的一株鸭源新城疫病毒全序列,从GenBank下载35株不同基因型新城疫病毒全长cDNA序列,获取非编码序列,分别绘制引导序列、尾随序列、F-HN及HN-L基因间隔序列(IGS)的遗传进化树,比较编码基因内5’及3’UTR序列核苷酸序列替代特点。【结果】非编码序列的长度及位置高度保守,而其核苷酸基因序列在不断发生变异,且变异趋势与编码基因序列相一致。【结论】新城疫病毒在整个基因组上编码和非编码序列同步发生变异。相似文献

6.

Analysis on frequency and density of microsatellites in coding sequences of several eukaryotic genomes 总被引：9，自引：0，他引：9

Li B Xia Q Lu C Zhou Z Xiang Z 《基因组蛋白质组与生物信息学报(英文版)》2004,2(1):24-31

Microsatellites or simple sequence repeats (SSRs) have been found in most organisms during the last decade. Since large-scale sequences are being generated, especially those that can be used to search for microsatelUtes, the development of these markers is getting more convenient. Keeping SSRs in viewing the importance of the application, available CDS (coding sequences) or ESTs (expressed sequence tags) of some eukaryotic species were used to study the frequency and density of various types of microsatellites. On the basis of surveying CDS or EST sequences amounting to 66.6 Mb in silkworm, 37.2 Mb in fly, 20.8 Mb in mosquito, 60.0 Mb in mouse, 34.9 Mb in zebrafish and 33.5 Mb in Caenorhabditis elegans, the frequency of SSRs was 1/1.00 Kb in silkworm, 1/0.77 Kb in fly, 1/1.03 Kb in mosquito, 1/1.21 Kb in mouse, 1/1.25 Kb in zebrafish and 1/1.38 Kb in C. elegans. The overall average SSR frequency of these species is 1/1.07 Kb. Hexanucleotide repeats (64.5%-76.6%) are the most abundant class of SSR in the investigated species, followed by trimeric, dimeric, tetrameric, monomeric and pentameric repeats. Furthermore, the A-rich repeats are predominant in each type of SSRs, whereas G-rich repeats are rare in the coding regions. 相似文献

7.

Using networks to analyze and visualize the distribution of overlapping genes in virus genomes

Laura Muoz-Baena Art F. Y. Poon 《PLoS pathogens》2022,18(2)

Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may increase the information content of compact genomes or influence the creation of new genes. Here we report a global comparative study of overlapping open reading frames (OvRFs) of 12,609 virus reference genomes in the NCBI database. We retrieved metadata associated with all annotated open reading frames (ORFs) in each genome record to calculate the number, length, and frameshift of OvRFs. Our results show that while the number of OvRFs increases with genome length, they tend to be shorter in longer genomes. The majority of overlaps involve +2 frameshifts, predominantly found in dsDNA viruses. Antisense overlaps in which one of the ORFs was encoded in the same frame on the opposite strand (−0) tend to be longer. Next, we develop a new graph-based representation of the distribution of overlaps among the ORFs of genomes in a given virus family. In the absence of an unambiguous partition of ORFs by homology at this taxonomic level, we used an alignment-free k-mer based approach to cluster protein coding sequences by similarity. We connect these clusters with two types of directed edges to indicate (1) that constituent ORFs are adjacent in one or more genomes, and (2) that these ORFs overlap. These adjacency graphs not only provide a natural visualization scheme, but also a novel statistical framework for analyzing the effects of gene- and genome-level attributes on the frequencies of overlaps. 相似文献

8.

Complete DNA sequences of two oka strain varicella-zoster virus genomes 总被引：1，自引：0，他引：1

下载免费PDF全文

Tillieux SL Halsey WS Thomas ES Voycik JJ Sathe GM Vassilev V 《Journal of virology》2008,82(22):11023-11044

Varicella-zoster virus (VZV) is a herpesvirus and is the causative agent of chicken pox (varicella) and shingles (herpes zoster). Active immunization against varicella became possible with the development of live attenuated varicella vaccine. The Oka vaccine strain was isolated in Japan from a child who had typical varicella, and it was then attenuated by serial passages in cell culture. Several manufacturers have obtained this attenuated Oka strain and, following additional passages, have developed their own vaccine strains. Notably, the vaccines Varilrix and Varivax are produced by GlaxoSmithKline Biologicals and Merck & Co., Inc., respectively. Both vaccines have been well studied in terms of safety and immunogenicity. In this study, we report the complete nucleotide sequence of the Varilrix (Oka-V_GSK) and Varivax (Oka-V_Merck) vaccine strain genomes. Their genomes are composed of 124,821 and 124,815 bp, respectively. Full genome annotations covering the features of Oka-derived vaccine genomes have been established for the first time. Sequence analysis indicates 36 nucleotide differences between the two vaccine strains throughout the entire genome, among which only 14 are involved in unique amino acid substitutions. These results demonstrate that, although Oka-V_GSK and Oka-V_Merck vaccine strains are not identical, they are very similar, which supports the clinical data showing that both vaccines are well tolerated and elicit strong immune responses against varicella. 相似文献

9.

Detecting uber-operons in prokaryotic genomes 总被引：3，自引：1，他引：3

下载免费PDF全文

Che D Li G Mao F Wu H Xu Y 《Nucleic acids research》2006,34(8):2418-2427

相似文献

10.

Insertion sequences in prokaryotic genomes

Siguier P Filée J Chandler M 《Current opinion in microbiology》2006,9(5):526-531

Insertion sequences (ISs) are small DNA segments that are often capable of moving neighbouring genes. Over 1500 different ISs have been identified to date. They can have large and spectacular effects in shaping and reshuffling the bacterial genome. Recent studies have provided dramatic examples of such IS activity, including massive IS expansion during the emergence of some pathogenic bacterial species and the intimate involvement of ISs in assembling genes into complex plasmid structures. However, a global understanding of their impact on bacterial genomes requires detailed knowledge of their distribution across the eubacterial and archaeal kingdoms, understanding their partition between chromosomes and extra-chromosomal elements (e.g. plasmids and viruses) and the factors which influence this, and appreciation of the different transposition mechanisms in action, the target preferences and the host factors that influence transposition. In addition, defective (non- autonomous) elements, which can be complemented by related active elements in the same cell, are often overlooked in genome annotations but also contribute to the evolution of genome organisation. 相似文献

11.

Intervening sequences in chloroplast genomes 总被引：13，自引：0，他引：13

B Koller H Delius 《Cell》1984,36(3):613-622

相似文献

12.

Reticuloendotheliosis virus sequences within the genomes of field strains of fowlpox virus display variability 总被引：8，自引：0，他引：8

下载免费PDF全文

Singh P Schnitzlein WM Tripathy DN 《Journal of virology》2003,77(10):5855-5862

Nine field strains of fowlpox virus (FPV) isolated during a 24-year span from geographically diverse outbreaks of fowlpox in the United States were screened for the presence of reticuloendotheliosis virus (REV) sequences in their genomes by PCR. Each isolate appeared to be heterogeneous in that either a nearly intact provirus or just a 248- or 508-nucleotide fusion of portions of the integrated REV 5' and 3' long terminal repeats (LTRs) was exclusively present at the same genomic site. In contrast, four fowlpox vaccines of FPV origin and three originating from pigeonpox virus were genetically homogeneous in having retained only the 248-bp LTR fusion, whereas two other FPV-based vaccines had only the larger one. These remnants of integrated REV presumably arose during homologous recombination at one of the two regions common to both LTRs or during retroviral excision from the FPV genome. Loss of the provirus appeared to be a natural event because the tripartite population could be detected in a field sample (tracheal lesion). Moreover, the provirus was also readily deleted during propagation of FPV in cultured cells, as evidenced by the detection of truncated LTRs after one passage of a plaque-purified FPV recombinant having a "genetically marked" provirus. However, the deletion mutants did not appear to have a substantial replicative advantage in vitro because even after 55 serial passages the original recombinant FPV was still prevalent. As to the in vivo environment, retention of the REV provirus may confer some benefit to FPV for infection of poultry previously vaccinated against fowlpox. 相似文献

13.

A compression-based approach for coding sequences identification. I. Application to prokaryotic genomes.

Giulia Menconi Roberto Marangoni 《Journal of computational biology》2006,13(8):1477-1488

Most of the gene prediction algorithms for prokaryotes are based on Hidden Markov Models or similar machine-learning approaches, which imply the optimization of a high number of parameters. The present paper presents a novel method for the classification of coding and non-coding regions in prokaryotic genomes, based on a suitably defined compression index of a DNA sequence. The main features of this new method are the non-parametric logic and the costruction of a dictionary of words extracted from the sequences. These dictionaries can be very useful to perform further analyses on the genomic sequences themselves. The proposed approach has been applied on some prokaryotic complete genomes, obtaining optimal scores of correctly recognized coding and non-coding regions. Several false-positive and false-negative cases have been investigated in detail, which have revealed that this approach can fail in the presence of highly structured coding regions (e.g., genes coding for modular proteins) or quasi-random non-coding regions (e.g., regions hosting non-functional fragments of copies of functional genes; regions hosting promoters or other protein-binding sequences). We perform an overall comparison with other gene-finder software, since at this step we are not interested in building another gene-finder system, but only in exploring the possibility of the suggested approach. 相似文献

14.

基因组中开阅读框架长度的分布模型与基因组进化 总被引：3，自引：1，他引：2

冯立芹李宏《生物物理学报》2004,20(5):375-381

分析了5种真核、15种细菌和10种古菌基因组中开阅读框架(open reading flame,ORF)的数目随长度的分布,发现不同生物的分布相似且有明显的规律性。用各种分布模型进行拟合比较,结果显示每种生物的这类分布均符合Г（α,β）分布,由此提出生物基因组中ORF的数目随长度的分布是Г（α,β）分布的假设。分析各生物基因组的拟合参数,发现α和β值与基因组进化存在明显的相关性;讨论了α和β值的生物进化意义,并给出了真核生物偏好使用长基因的结论;依照Г（α,β）分布估计了酵母基因组中ORF数目的上限为5870个。该方法对于研究生物基因组进化以及评估理论预测基因的可靠性具有建设性意义。相似文献

15.

Detecting specific repeated sequences in large,complex genomes by using representative difference analysis and double-probe verification

Francois Sabot Pierre Sourdille Michel Bernard 《Plant Molecular Biology Reporter》2004,22(1):91-91

We present a modification of the representative difference analysis (RDA) technique used to target AT-rich repeated sequences, such as transposable elements, with a double-probe verification system. RDA is a subtractive/amplification PCR-based technology used to identify specific sequences that are different between 2 related genomes.Vsp I restriction enzyme was used to target AT-rich sequences. RDA products were cloned with a high efficiency. Double-probe verification is based on reverse dot-blot of cloned RDA products and uses a positive and a negative probe. We tested thisVsp I-modified RDA on different combinations of bread wheat (Triticum aestivum) and relatives.Triticeae members have large, complex genomes with various ploidy levels. RDA experiments were performed with single or bulked DNA. Reverse dot-blot double-probe verification detected specific repeated sequences quickly and efficiently. Together, the 2 systems provide a powerful tool for obtaining specific transposable elements and repeated sequences that are different between related genomes, regardless of genome size and ploidy. 相似文献

16.

Detecting overlapping protein complexes in protein-protein interaction networks

Nepusz T Yu H Paccanaro A 《Nature methods》2012,9(5):471-472

We introduce clustering with overlapping neighborhood expansion (ClusterONE), a method for detecting potentially overlapping protein complexes from protein-protein interaction data. ClusterONE-derived complexes for several yeast data sets showed better correspondence with reference complexes in the Munich Information Center for Protein Sequence (MIPS) catalog and complexes derived from the Saccharomyces Genome Database (SGD) than the results of seven popular methods. The results also showed a high extent of functional homogeneity. 相似文献

17.

Multiple coding sequences for the genome-linked virus protein (VPg) in dicistroviruses

Nakashima N Shibuya N 《Journal of invertebrate pathology》2006,92(2):100-104

N-terminal Edman sequencing of the genome-linked viral protein (VPg) of Plautia stali intestine virus (PSIV, Dicistroviridae) detected heterologus residues. The VPg sequence determined was found to be triplicated in the nonstructural protein precursor. Multiple VPg-like sequences were also found in 10 of the 12 dicistroviruses with a maximum of six copies in Solenopsis invicta virus-1. We postulate that redundant VPg coding sequences facilitate multiplication of dicistroviruses, because fewer cycle of translation of the nonstructural protein precursor produces larger amounts of VPg proteins in parallel with the increased production of capsid proteins by the intergenic internal ribosome entry site mediated translation. 相似文献

18.

Fv-1 N- and B-tropism-specific sequences in murine leukemia virus and related endogenous proviral genomes 总被引：2，自引：2，他引：0

L R Boone P L Glover C L Innes L A Niver M C Bondurant W K Yang 《Journal of virology》1988,62(8):2644-2650

Oligonucleotide probes specific for the Fv-1 N- and B-tropic host range determinants of the gag p30-coding sequence were used to analyze DNA clones of various murine leukemia virus (MuLV) and endogenous MuLV-related proviral genomes and chromosomal DNA from four mouse strains. The group of DNA clones consisted of ecotropic MuLVs of known Fv-1 host range, somatically acquired ecotropic MuLV proviruses, xenotropic MuLV isolates, and endogenous nonecotropic MuLV-related proviral sequences from mouse chromosomal DNA. As expected, the prototype N-tropism determinant is carried by N-tropic viruses of several different origins. All seven endogenous nonecotropic MuLV-related proviral sequence clones derived from RFM/Un mouse chromosomal DNA, although not recognized by the N probe, showed positive hybridization with the prototype B-tropism-specific probe. The two xenotropic MuLV clones derived from infectious virus (one of BALB:virus-2 and one of AKR xenotropic virus) failed to hybridize with the N- and B-tropic oligonucleotide probes tested and with one probe specific for NB-tropic Moloney MuLV. One of two endogenous xenotropic class proviruses derived from HRS/J mouse chromosomal DNA (J. P. Stoye and J. M. Coffin, J. Virol. 61:2659-2669, 1987) also failed to hybridize to the N- and B-tropic probes, whereas the other hybridized to the B-tropic probe. In addition, analysis of mouse chromosomal DNA from four strains indicates that hybridization with the N-tropic probe correlates with the presence or absence of endogenous ecotropic MuLV provirus, whereas the B-tropic probe detects abundant copies of endogenous nonecotropic MuLV-related proviral sequences. These results suggest that the B-tropism determinant in B-tropic ecotropic MuLV may arise from recombination between N-tropic ecotropic MuLV and members of the abundant endogenous nonecotropic MuLV-related classes including a subset of endogenous xenotropic proviruses. 相似文献

19.

Proceedings of the SMBE Tri-National Young Investigators' Workshop 2005. Positional conservation of clusters of overlapping promoter-like sequences in enterobacterial genomes

Huerta AM Collado-Vides J Francino MP;SMBE Tri-National Young Investigators 《Molecular biology and evolution》2006,23(5):997-1010

相似文献

20.

Viral sequences integrated into plant genomes 总被引：5，自引：0，他引：5

Hull R Harper G Lockhart B 《Trends in plant science》2000,5(9):362-365

相似文献