首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The African malaria mosquito Anopheles gambiae was the first disease vector chosen for genome sequencing. Although its genome assembly has been facilitated by physical mapping, large gaps still pose a serious problem for accurate annotation and genome analysis. The majority of the gaps are located in regions of pericentromeric and intercalary heterochromatin. Genomic analysis has identified protein-coding genes and various classes of repetitive elements in the Anopheles heterochromatin. Molecular and cytogenetic studies have demonstrated that heterochromatin is a structurally heterogeneous and rapidly evolving part of the malaria mosquito genome.  相似文献   

2.
Genome sequences are annotated by computational prediction of coding sequences, followed by similarity searches such as BLAST, which provide a layer of possible functional information. While the existence of processes such as alternative splicing complicates matters for eukaryote genomes, the view of bacterial genomes as a linear series of closely spaced genes leads to the assumption that computational annotations that predict such arrangements completely describe the coding capacity of bacterial genomes. We undertook a proteomic study to identify proteins expressed by Pseudomonas fluorescens Pf0-1 from genes that were not predicted during the genome annotation. Mapping peptides to the Pf0-1 genome sequence identified sixteen non-annotated protein-coding regions, of which nine were antisense to predicted genes, six were intergenic, and one read in the same direction as an annotated gene but in a different frame. The expression of all but one of the newly discovered genes was verified by RT-PCR. Few clues as to the function of the new genes were gleaned from informatic analyses, but potential orthologs in other Pseudomonas genomes were identified for eight of the new genes. The 16 newly identified genes improve the quality of the Pf0-1 genome annotation, and the detection of antisense protein-coding genes indicates the under-appreciated complexity of bacterial genome organization.  相似文献   

3.
The mouse (Mus musculus) is the premier animal model for understanding human disease and development. Here we show that a comprehensive understanding of mouse biology is only possible with the availability of a finished, high-quality genome assembly. The finished clone-based assembly of the mouse strain C57BL/6J reported here has over 175,000 fewer gaps and over 139 Mb more of novel sequence, compared with the earlier MGSCv3 draft genome assembly. In a comprehensive analysis of this revised genome sequence, we are now able to define 20,210 protein-coding genes, over a thousand more than predicted in the human genome (19,042 genes). In addition, we identified 439 long, non–protein-coding RNAs with evidence for transcribed orthologs in human. We analyzed the complex and repetitive landscape of 267 Mb of sequence that was missing or misassembled in the previously published assembly, and we provide insights into the reasons for its resistance to sequencing and assembly by whole-genome shotgun approaches. Duplicated regions within newly assembled sequence tend to be of more recent ancestry than duplicates in the published draft, correcting our initial understanding of recent evolution on the mouse lineage. These duplicates appear to be largely composed of sequence regions containing transposable elements and duplicated protein-coding genes; of these, some may be fixed in the mouse population, but at least 40% of segmentally duplicated sequences are copy number variable even among laboratory mouse strains. Mouse lineage-specific regions contain 3,767 genes drawn mainly from rapidly-changing gene families associated with reproductive functions. The finished mouse genome assembly, therefore, greatly improves our understanding of rodent-specific biology and allows the delineation of ancestral biological functions that are shared with human from derived functions that are not.  相似文献   

4.
Xing XB  Li QR  Sun H  Fu X  Zhan F  Huang X  Li J  Chen CL  Shyr Y  Zeng R  Li YX  Xie L 《Genomics》2011,98(5):343-351
Identifying protein-coding genes in eukaryotic genomes remains a challenge in post-genome era due to the complex gene models. We applied a proteogenomics strategy to detect un-annotated protein-coding regions in mouse genome. High-accuracy tandem mass spectrometry (MS/MS) data from diverse mouse samples were generated by LTQ-Orbitrap mass spectrometer in house. Two searchable diagnostic proteomic datasets were constructed, one with all possible encoding exon junctions, and the other with all putative encoding exons, for the discovery of novel exon splicing events and novel uninterrupted protein-coding regions. Altogether 29,586 unique peptides were identified. Aligning backwards to the mouse genome, the translation of 4471 annotated genes was validated by the known peptides; and 172 genic events were defined in mouse genome by the novel peptides. The approach in the current work can provide substantial evidences for eukaryote genome annotation in encoding genes.  相似文献   

5.
6.
Rheumatoid arthritis (RA) is an autoimmune disease, the pathogenesis of which is affected by multiple genetic and environmental factors. To understand the genetic and molecular basis of RA, a large number of quantitative trait loci (QTL) that regulate experimental autoimmune arthritis have been identified using various rat models for RA. However, identifying the particular responsible genes within these QTL remains a major challenge. Using currently available genome data and gene annotation information, we systematically examined RA-associated genes and polymorphisms within and outside QTL over the whole rat genome. By the whole genome analysis of genes and polymorphisms, we found that there are significantly more RA-associated genes in QTL regions as contrasted with non-QTL regions. Further experimental studies are necessary to determine whether these known RA-associated genes or polymorphisms are genetic components causing the QTL effect.  相似文献   

7.
8.
9.
10.
Yang Z  Huang J 《FEBS letters》2011,(4):641-644
The origin of new genes is critical for organisms adapting to new niches. Here, we present evidence for a recent de novo origin of at least 13 protein-coding genes in the genome of Plasmodium vivax. Although recently de novo originated genes have often been suggested to be initially intronless, five of the genes identified in our analysis contain introns in their coding regions. Further investigations revealed that these introns likely evolved from previously intergenic regions together with the coding sequences. We discuss the potential mechanisms for intron formation in these genes and propose that intronization be considered in the formation of de novo originated genes.  相似文献   

11.
As more and more complete bacterial genome sequences become available, the genome annotation of previously sequenced genomes may become quickly outdated. This is primarily due to the discovery and functional characterization of new genes. We have reannotated the recently published genome of Shewanella oneidensis with the following results: 51 new genes have been identified, and functional annotation has been added to the 97 genes, including 15 new and 82 existing ones with previously unassigned function. The identification of new genes was achieved by predicting the protein coding regions using the HMM-based program GeneMark.hmm. Subsequent comparison of the predicted gene products to the non-redundant protein database using BLAST and the COG (Clusters of Orthologous Groups) database using COGNITOR provided for the functional annotation.  相似文献   

12.
We carried out a comprehensive genomic analysis of porcine copy number variants (CNVs) based on whole‐genome SNP genotyping data and provided new measures of genomic diversity (number, length and distribution of CNV events) for a highly inbred strain (the Guadyerbas strain). This strain represents one of the most ancient surviving populations of the Iberian breed, and it is currently in serious danger of extinction. CNV detection was conducted on the complete Guadyerbas population, adjusted for genomic waves, and used strict quality criteria, pedigree information and the latest porcine genome annotation. The analysis led to the detection of 65 CNV regions (CNVRs). These regions cover 0.33% of the autosomal genome of this particular strain. Twenty‐nine of these CNVRs were identified here for the first time. The relatively low number of detected CNVRs is in line with the low variability and high inbreeding estimated previously for this Iberian strain using pedigree, microsatellite or SNP data. A comparison across different porcine studies has revealed that more than half of these regions overlap with previously identified CNVRs or multicopy regions. Also, a preliminary analysis of CNV detection using whole‐genome sequence data for four Guadyerbas pigs showed overlapping for 16 of the CNVRs, supporting their reliability. Some of the identified CNVRs contain relevant functional genes (e.g., the SCD and USP15 genes), which are worth being further investigated because of their importance in determining the quality of Iberian pig products. The CNVR data generated could be useful for improving the porcine genome annotation.  相似文献   

13.
14.
15.
16.
17.
【目的】优化柞蚕Antheraea pernyi基因组注释,更好地扩展其在比较基因组学及品种改良研究中的应用。【方法】对柞蚕进行全长转录组测序分析;经全长转录本与参考基因组比对,鉴定新基因及新转录本,并对这些新基因和新转录本进行功能注释及长链非编码RNAs (lncRNAs)预测。利用大量的蛋白质编码转录本和lncRNAs对柞蚕基因组中基因结构进行修订。最后创建矫正后的柞蚕基因组基因注释。【结果】新发现1 997个蛋白编码基因和3 399个lncRNA基因,分别由2 402个和3 574个全长转录本数据支持。发现柞蚕基因组含25 021个基因,其中19 825个基因是蛋白编码基因,包括7个保幼激素酸甲基转移酶基因。【结论】本研究促进了对柞蚕基因组基因注释信息的认识,为柞蚕及相关物种功能基因组及比较基因组学研究提供了很有用的数据资源。  相似文献   

18.

Background  

Complete genome annotation will likely be achieved through a combination of computer-based analysis of available genome sequences combined with direct experimental characterization of expressed regions of individual genomes. We have utilized a comparative genomics approach involving the sequencing of randomly selected hamster testis cDNAs to begin to identify genes not previously annotated on the human, mouse, rat and Fugu (pufferfish) genomes.  相似文献   

19.
The published sequence of the Vibrio cholerae genome indicates that, in addition to the genes that encode proteins of known and unknown function, there are 1577 ORFs identified as conserved hypothetical or hypothetical gene candidates. Because the annotation is not 100% accurate, it is not known which of the 1577 ORFs are true protein-coding genes. In this paper, an algorithm based on the Z curve method, with sensitivity, specificity and accuracy greater than 98%, is used to solve this problem. Twenty-fold cross-validation tests show that the accuracy of the algorithm is 98.8%. A detailed discussion of the mechanism of the algorithm is also presented. It was found that 172 of the 1577 ORFs are unlikely to be protein-coding genes. The number of protein-coding genes in the V. cholerae genome was re-estimated and found to be approximately 3716. This result should be of use in microarray analysis of gene expression in the genome, because the cost of preparing chips may be somewhat decreased. A computer program was written to calculate a coding score called VCZ for gene identification in the genome. Coding/noncoding is simply determined by VCZ > 0/VCZ < 0. The program is freely available on request for academic use.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号