共查询到20条相似文献,搜索用时 0 毫秒
1.
Detection of Short Protein Coding Regions within the Cyanobacterium Genome: Application of the Hidden Markov Model 总被引:3,自引:0,他引:3
The gene-finding programs developed so far have not paid muchattention to the detection of short protein coding regions (CDSs).However, the detection of short CDSs is important for the studyof photosynthesis. We utilized GeneHacker, a gene-finding programbased on the hidden Markov model (HMM), to detect short CDSs(from 90 to 300 bases) in a 1.0 mega contiguous sequence ofcyanobacterium Synechocystis sp. strain PCC6803 which carriesa complete set of genes for oxygenic photosynthesis. GeneHackerdiffers from other gene-finding programs based on the HMM inthat it utilizes di-codon statistics as well. GeneHacker successfullydetected seven out of the eight short CDSs annotated in thissequence and was clearly superior to GeneMark in this rangeof length. GeneHacker detected 94 potentially new CDSs, 9 ofwhich have counterparts in the genetic databases. Four of thenine CDSs were less than 150 bases and were photosynthesis-relatedgenes. The results show the effectiveness of GeneHacker in detectingvery short CDSs corresponding to genes. 相似文献
2.
A search for new members of the mammalian interspersed repeat (MIR) family has been done over the coding regions of human genome from GenBank-116. Only 254 nucleotide sequences contained MIRs in coding regions, of which 45 MIR copies were unknown before, including 17 that occurred in translated gene regions. The program developed by the authors has been demonstrated to surpass the CENSOR program in the search power. The evolution of the MIR copies located in translated regions of human genome is discussed. 相似文献
3.
4.
Characteristics of Nucleotide Substitution in the Hepatitis C Virus Genome: Constraints on Sequence Change in Coding Regions at Both Ends of the Genome 总被引:19,自引:0,他引:19
Comparison of complete genome sequences for different variants of hepatitis C virus (HCV) reveals several different constraints
on sequence change. Synonymous changes are suppressed in coding regions at both 5′ and 3′ ends of the genome. No evidence
was found for the existence of alternative reading frames or for a lower mutation frequency in these regions. Instead, suppression
may be due to constraints imposed by RNA secondary structures identified within the core and NS5b genes. Nonsynonymous substitutions
are less frequent than synonymous ones except in the hypervariable region of E2 and, to a lesser extent, in E1, NS2, and NS5b.
Transitions are more frequent than transversions, particularly at the third position of codons where the bias is 16:1. In
addition, nucleotide substitutions may not occur symmetrically since there is a bias toward G or C at the third position of
codons, while T ↔ C transitions were twice as frequent as A ↔ G transitions. These different biases do not affect the phylogenetic
analysis of HCV variants but need to be taken into account in interpreting sequence change in longitudinal studies.
Received: 9 September 1996 / Accepted: 20 April 1997 相似文献
5.
6.
7.
Equine herpesvirus type 9 (EHV-9), which we isolated from a case of epizootic encephalitis in a herd of Thomson''s gazelles (Gazella thomsoni) in 1993, has been known to cause fatal encephalitis in Thomson''s gazelle, giraffe, and polar bear in natural infections. Our previous report indicated that EHV-9 was similar to the equine pathogen equine herpesvirus type 1 (EHV-1), which mainly causes abortion, respiratory infection, and equine herpesvirus myeloencephalopathy. We determined the genome sequence of EHV-9. The genome has a length of 148,371 bp and all 80 of the open reading frames (ORFs) found in the genome of EHV-1. The nucleotide sequences of the ORFs in EHV-9 were 86 to 95% identical to those in EHV-1. The whole genome sequence should help to reveal the neuropathogenicity of EHV-9. 相似文献
8.
9.
Wayne S. Kontur Wendy S. Schackwitz Natalia Ivanova Joel Martin Kurt LaButti Shweta Deshpande Hope N. Tice Christa Pennacchio Erica Sodergren George M. Weinstock Daniel R. Noguera Timothy J. Donohue 《Journal of bacteriology》2012,194(24):7016-7017
The DNA sequences of chromosomes I and II of Rhodobacter sphaeroides strain 2.4.1 have been revised, and the annotation of the entire genomic sequence, including both chromosomes and the five plasmids, has been updated. Errors in the originally published sequence have been corrected, and ∼11% of the coding regions in the original sequence have been affected by the revised annotation. 相似文献
10.
11.
Philippe Lefran?ois Raymond K. Auerbach Christopher M. Yellman G. Shirleen Roeder Michael Snyder 《PLoS genetics》2013,9(1)
Accurate chromosome segregation requires centromeres (CENs), the DNA sequences where kinetochores form, to attach chromosomes to microtubules. In contrast to most eukaryotes, which have broad centromeres, Saccharomyces cerevisiae possesses sequence-defined point CENs. Chromatin immunoprecipitation followed by sequencing (ChIP–Seq) reveals colocalization of four kinetochore proteins at novel, discrete, non-centromeric regions, especially when levels of the centromeric histone H3 variant, Cse4 (a.k.a. CENP-A or CenH3), are elevated. These regions of overlapping protein binding enhance the segregation of plasmids and chromosomes and have thus been termed Centromere-Like Regions (CLRs). CLRs form in close proximity to S. cerevisiae CENs and share characteristics typical of both point and regional CENs. CLR sequences are conserved among related budding yeasts. Many genomic features characteristic of CLRs are also associated with these conserved homologous sequences from closely related budding yeasts. These studies provide general and important insights into the origin and evolution of centromeres. 相似文献
12.
Novel methods for identifying a new type of DNA latent periodicity, called latent profile periodicity or latent profility, are used to search for periodic structures in genes. These methods reveal two distinct levels of organization of genetic information encoding. It is shown that latent profility in genes may correlate with specific structural features of their encoded proteins. 相似文献
13.
Klimke W O'Donovan C White O Brister JR Clark K Fedorov B Mizrachi I Pruitt KD Tatusova T 《Standards in genomic sciences》2011,5(1):168-193
The promise of genome sequencing was that the vast undiscovered country would be mapped out by comparison of the multitude of sequences available and would aid researchers in deciphering the role of each gene in every organism. Researchers recognize that there is a need for high quality data. However, different annotation procedures, numerous databases, and a diminishing percentage of experimentally determined gene functions have resulted in a spectrum of annotation quality. NCBI in collaboration with sequencing centers, archival databases, and researchers, has developed the first international annotation standards, a fundamental step in ensuring that high quality complete prokaryotic genomes are available as gold standard references. Highlights include the development of annotation assessment tools, community acceptance of protein naming standards, comparison of annotation resources to provide consistent annotation, and improved tracking of the evidence used to generate a particular annotation. The development of a set of minimal standards, including the requirement for annotated complete prokaryotic genomes to contain a full set of ribosomal RNAs, transfer RNAs, and proteins encoding core conserved functions, is an historic milestone. The use of these standards in existing genomes and future submissions will increase the quality of databases, enabling researchers to make accurate biological discoveries. 相似文献
14.
When divergence between viral species is large, the analysis and comparison of nucleotide or protein sequences are dependent
on mutation biases and multiple substitutions per site leading, among other things, to the underestimation of branch lengths
in phylogenetic trees. To avoid the problem of multiply substituted sites, a method not directly based on the nucleic or protein
sequences has been applied to retroviruses. It consisted of asking questions about genome structure or organization, and gene
function, the series of answers creating coded sequences analyzed by phylogenic software. This method recovered the principal
retroviral groups such as the lentiviruses and spumaviruses and highlighted questions and answers characteristic of each group
of retroviruses. In general, there was reasonable concordance between the coded genome methodology and that based on conventional
phylogeny of the integrase protein sequence, indicating that integrase was fixing mutations slowly enough to marginalize the
problem of multiple substitutions at sites. To a first approximation, this suggests that the acquisition of novel genetic
features generally parallels the fixation of amino acid substitutions.
Received: 18 May 2001 / Accepted: 7 September 2001 相似文献
15.
16.
Protein Kinase Activity in Equine Herpesvirus 总被引:1,自引:24,他引:1
Charles C. Randall Howell W. Rogers Donald N. Downer Glenn A. Gentry 《Journal of virology》1972,9(2):216-222
A protein kinase which is intimately associated with equine herpesvirus (equine abortion virus) was found by using adenosine triphosphate-gamma-(32)P as a phosphate donor and virus protein as an acceptor. Consistent demonstration of the activity requires prior removal of phosphohydrolase. The kinase activity requires Mg(2+), is not stimulated by cyclic adenosine monophosphate, but is enhanced by added protamine or arginine-rich histone. The labeled product is resistant to ribonuclease, deoxyribonuclease, and chloroform-methanol but is sensitive to Pronase. Other tests suggest that serine and threonine residues are the acceptor sites. In the in vitro reaction, the incorporation represents an average of approximately 4,500 phosphate residues per virion, and all 17 virus protein bands resolved by polyacrylamide gel electrophoresis appear to be labeled. 相似文献
17.
蛋白质组表达图谱用于基因组功能提示的可行性研究 总被引:1,自引:0,他引:1
本文以ECO2DBASE(Edition 6) 为研究材料, 探讨了利用蛋白质组表达图谱提供的生命动态活动信息提高基因组功能提示效果的可行性。在设计出一套较为完整的细胞功能簇(CRC)聚类方案的基础上, 经考察,79 个蛋白质聚成4 个不同的CRC。结果显示出功能相关的蛋白质趋向于聚集在相同的CRC中, 如9 种氨酰tRNA 合成酶和4 种热休克蛋白分别准确地聚合到CRC2 和CRC3 中。这些结果提示: 在蛋白质组研究资料比较充分的前提下, 通过有效的算法, 蛋白质组表达图谱可以为基因组功能提示提供非常重要的序列相似性之外的功能信息 相似文献
18.
19.
James W. Wynne Torsten Seemann Dieter M. Bulach Scott A. Coutts Adel M. Talaat Wojtek P. Michalski 《Journal of bacteriology》2010,192(23):6319-6320
We report the resequencing and revised annotation of the Mycobacterium avium subsp. paratuberculosis K10 genome. A total of 90 single-nucleotide errors and a 51-bp indel in the original K10 genome were corrected, and the whole genome annotation was revised. Correction of these sequencing errors resulted in 28 frameshift alterations. The amended genome sequence is accessible via the supplemental section of study SRR060191 in the NCBI Sequence Read Archive and will serve as a valuable reference genome for future studies.The American bovine isolate K10 remains the only Mycobacterium avium subsp. paratuberculosis genome to be fully sequenced and published to date (1). Although this 4.8-Mbp genome likely contains some assembly errors (3), it has provided, and will continue to provide, an invaluable resource for Mycobacterium research. The assembly errors were identified through optical mapping of related M. avium subsp. paratuberculosis strain ATCC 19698, which revealed a 648-kb inversion around the origin of replication and two additional copies of the insertion sequences IS1311 and IS_MAP03 (3). These findings were subsequently validated via PCR, Southern blotting, and (Sanger) sequence analysis in ATCC 19698 and were also confirmed to be present in K10 (3). We designate this interim corrected genome M. avium subsp. paratuberculosis K10′. To further improve this resource, we undertook a resequencing project of the original M. avium subsp. paratuberculosis K10 genome.Whole-genome sequencing was performed on the Illumina GAIIx platform using one flow cell lane with 36-cycle paired-end chemistry. Reads were variably trimmed at the 3′ end based on the Illumina Read Segment Quality Indicator (Illumina manual), and read pairs containing ambiguous bases were removed. Read mapping onto the K10′ genome sequence was performed using SHRiMP (ver. 1.3.2) (2), and single-nucleotide polymorphisms and indels (deletion and insertion polymorphisms [DIPs]) were called using Nesoni (ver. 0.29; Monash University Victorian Bioinformatics Consortium) with default parameters. Read mapping determined that the data set comprised an average sequence coverage of 72.6 across the K10′ genome. This high sequence coverage allowed differences between K10\K10′ and the resequenced version of the genome, designated K10", to be identified with high confidence.Ninety single-nucleotide differences and one 51-bp indel were identified in the K10" genome. As confirmation that these differences are likely to represent errors in the original genome sequence, we have also detected these polymorphisms in two additional bovine M. avium subsp. paratuberculosis genomes recently sequenced and assembled within our laboratory (data not shown). Seven of the 90 differences and the 51-bp indel were subjected to PCR and Sanger sequencing for verification. All of the polymorphisms were confirmed to be present in K10" compared to the original genome sequence.Thirty-six single-nucleotide deletions and four nucleotide insertions were identified in K10" compared to the reference. These DIPs resulted in 27 frameshift mutations of protein coding loci. As a consequence of these frameshifts, one complete coding sequence (CDS) feature was removed (MAPK_3751), one novel CDS was created (MAPK_2081b), and one pseudogene was repaired (MAPK_4158-4159). In almost all of the other cases, the frameshifts resulted in proteins which more closely resembled their orthologs in M. avium subsp. hominissuis and M. intracellulare. Other frameshifts of biological interest include the truncation of a PPE family protein (MAPK_1173) and the extension of an MCE (mammalian cell entry) family protein (MAPK_4086). Compared to the reference, K10" also had a 51-bp indel within a possible MCE family protein (MAPK_1575). This indel consisted of an 11-bp deletion (bases 2436510 to 2436520 in the original K10 sequence) and an insertion of 51 bp. The resulting protein sequence now more closely resembles orthologs of the MCE family in other Mycobacterium spp. In conclusion, the fact that so many of the amended bases have resulted in revised coding regions indicates the underlying importance of this exercise. 相似文献