首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Gene and SNP annotation are among the first and most important steps in analyzing a genome. As the number of sequenced genomes continues to grow, a key question is: how does the quality of the assembled sequence affect the annotations? We compared the gene and SNP annotations for two different Bos taurus genome assemblies built from the same data but with significant improvements in the later assembly. The same annotation software was used for annotating both sequences. While some annotation differences are expected even between high-quality assemblies such as these, we found that a staggering 40% of the genes (>9,500) varied significantly between assemblies, due in part to the availability of new gene evidence but primarily to genome mis-assembly events and local sequence variations. For instance, although the later assembly is generally superior, 660 protein coding genes in the earlier assembly are entirely missing from the later genome''s annotation, and approximately 3,600 (15%) of the genes have complex structural differences between the two assemblies. In addition, 12–20% of the predicted proteins in both assemblies have relatively large sequence differences when compared to their RefSeq models, and 6–15% of bovine dbSNP records are unrecoverable in the two assemblies. Our findings highlight the consequences of genome assembly quality on gene and SNP annotation and argue for continued improvements in any draft genome sequence. We also found that tracking a gene between different assemblies of the same genome is surprisingly difficult, due to the numerous changes, both small and large, that occur in some genes. As a side benefit, our analyses helped us identify many specific loci for improvement in the Bos taurus genome assembly.  相似文献   

2.
3.
Sea urchin actin gene subtypes. Gene number, linkage and evolution   总被引:12,自引:0,他引:12  
The actin gene family of the sea urchin Strongylocentrotus purpuratus was analyzed by the genome blot method, using subcloned probes specific to the 3' terminal non-translated actin gene sequence, intervening sequence and coding region probes. We define an actin gene subtype as that gene or set of genes displaying homology with a given 3' terminal sequence probe, when hybridized at 55 degrees C, 0.75 M-Na+. By determining the often polymorphic restriction fragment band pattern displayed in genome blots by each probe, all, or almost all of the actin genes in this species could be classified. Our evidence shows that the S. purpuratus genome probably contains seven to eight actin genes, and these can be assigned to four subtypes. Studies of the expression of the genes (Shott et al., 1983) show that the actin genes of three of these subtypes code for cytoskeletal actins (Cy), while the fourth gives rise to a muscle-specific actin (M). We denote the array of S. purpuratus actin genes indicated by our data as follows. There is a single CyI actin gene, two or possibly three CyII genes (CyIIa, CyIIb, and possibly CyIIc), three CyIII actin genes (CyIIIa, CyIIIb, CyIIIc), and a single M actin gene. Comparative studies were carried out on the actin gene families of five other sea urchin species. At least the CyIIa and CyIIb genes are also linked in the Strongylocentrotus franciscanus genome, and this species also has a CyI gene, an M actin gene and at least two CyIII actin genes. It is not clear whether it also possesses a CyIIc actin gene, or a CyIIIc actin gene. The genome of a more closely related congener, Strongylocentrotus dr?bachiensis, includes 3' terminal sequences suggesting the presence of a CyIIc gene. In S. franciscanus and S. dr?bachiensis the first intron of the CyI gene has remained homologous with intron sequences of both the CyIIa and CyIIb genes, indicating a common origin of these three linked cytoskeletal actin genes. Of the four S. purpuratus 3' terminal subtype probe sequences only the CyI 3' terminal sequence has been conserved sufficiently during evolution to permit detection outside of the genus Strongylocentrotus. An unexpected observation was that a sequence found only in the 3' untranslated region of the CyII actin gene in the DNA of S. dr?bachiensis and S. purpuratus is represented as a large family of interspersed repeat sequences in the genome of S. franciscanus.  相似文献   

4.
5.
6.
《TARGETS》2003,2(3):109-114
The publication of the sequence of the human genome revealed that the gene count in humans is much lower than previously estimated. Although textbooks usually place the number at 100,000, it is currently estimated that the human genome contains no more than 30,000 protein-coding genes. How can the great complexity of human life be explained by this number, which is less than twice the number of genes in the primitive worm C. elegans? The answer probably lies in the recent discovery that about half of all human genes undergo alternative splicing. This paper reviews the broad implications of alternative splicing for the drug-discovery process.  相似文献   

7.
8.
The mouse (Mus musculus) is the premier animal model for understanding human disease and development. Here we show that a comprehensive understanding of mouse biology is only possible with the availability of a finished, high-quality genome assembly. The finished clone-based assembly of the mouse strain C57BL/6J reported here has over 175,000 fewer gaps and over 139 Mb more of novel sequence, compared with the earlier MGSCv3 draft genome assembly. In a comprehensive analysis of this revised genome sequence, we are now able to define 20,210 protein-coding genes, over a thousand more than predicted in the human genome (19,042 genes). In addition, we identified 439 long, non–protein-coding RNAs with evidence for transcribed orthologs in human. We analyzed the complex and repetitive landscape of 267 Mb of sequence that was missing or misassembled in the previously published assembly, and we provide insights into the reasons for its resistance to sequencing and assembly by whole-genome shotgun approaches. Duplicated regions within newly assembled sequence tend to be of more recent ancestry than duplicates in the published draft, correcting our initial understanding of recent evolution on the mouse lineage. These duplicates appear to be largely composed of sequence regions containing transposable elements and duplicated protein-coding genes; of these, some may be fixed in the mouse population, but at least 40% of segmentally duplicated sequences are copy number variable even among laboratory mouse strains. Mouse lineage-specific regions contain 3,767 genes drawn mainly from rapidly-changing gene families associated with reproductive functions. The finished mouse genome assembly, therefore, greatly improves our understanding of rodent-specific biology and allows the delineation of ancestral biological functions that are shared with human from derived functions that are not.  相似文献   

9.
10.
11.
Structure and organization of the human transglutaminase 1 gene.   总被引:9,自引:0,他引:9  
Membrane-associated transglutaminases (TGase1) have recently been found to be common in mammalian cells, but it is not clear whether these derive from the same or different genes. In order to determine the complexity of this system, we have isolated and characterized the human gene (TGM1). The gene of 14,133 base pairs was found to contain 15 exons spliced by 14 introns. Interestingly, the positions of these introns have been conserved in comparison with the genes of two other transglutaminase-like activities described in the literature, but the TGM1 gene is by far the smallest characterized to date because its introns are relatively smaller. On the other hand, the TGase1 enzyme is the largest known transglutaminase (about 90 kDa), apparently because its gene acquired tracts that encode additional sequences on its amino and carboxyl termini that confer its unique properties. Southern blot analyses of total human genomic DNA cut with several restriction enzymes reveal only one band. Use of human-rodent cell hybrid panels and chromosomal in situ hybridization with biotin-labeled probes revealed that the human TGM1 gene maps to chromosome position 14q11.2-13. Such data suggest there is a single gene copy per haploid human genome. Comparisons of sequence identities and homologies indicate that the transglutaminase family of genes arose by duplications and subsequent divergent evolution from a common ancestor but later became scattered in the human genome. Although our present Southern blot and chromosomal localization studies revealed no restriction fragment length polymorphisms, comparisons of published sequences and our genomic clone indicate there are two sequence variants for TGase1 within the human population. The rare smaller variant contains a two-nucleotide deletion near the 5'-end, uses an alternate initiation codon, and differs from the common larger variant only in the first 15 amino acids. Furthermore, the DNA sequences of intron 14 possess several tracts of dinucleotide repeats that by polymerase chain reaction analysis show wide size polymorphism within the human population. Accordingly, this gene system constitutes a useful polymorphic marker for genetic linkage analyses.  相似文献   

12.
From a human gene library we have isolated and sequenced a beta-actin-like pseudogene, H beta Ac-psi 2, which lacks intervening sequences and contains several mutations resulting in frame-shifts, stop codons and in a departure from the known beta-actin protein sequence. We have also extended our sequence work on the intronless human beta-actin-related pseudogene H beta Ac-psi 1 described previously and we find that both genes are processed genes ending in a poly(dA) tract and flanked by direct repeats. The gene H beta Ac-psi 2 is preceded by a 230-bp region in which the simple sequence 5'-GAAA-3' is repeated greater than 40 times. This satellite-like sequence is highly repetitive in the human genome.  相似文献   

13.
14.
The central gene cluster of chromosome III was one of the first regions to be sequenced by the Caenorhabditis elegans genome project. We have performed an essential gene analysis on the left part of this cluster, in the region around dpy-17III balanced by the duplication sDp3. We isolated 151 essential gene mutations and characterized them with regard to their arrest stages. To facilitate positioning of these mutations, we generated six new deficiencies that, together with preexisting chromosomal rearrangements, subdivide the region into 14 zones. The 151 mutations were mapped into these zones. They define 112 genes, of which 110 were previously unidentified. Thirteen of the zones have been anchored to the physical sequence by polymerase chain reaction deficiency mapping. Of the 112 essential genes mapped, 105 are within these 13 zones. They span 4.2?Mb of nucleotide sequence. From the nucleotide sequence data, 920 genes are predicted. From a Poisson distribution of our mutations, we predict that 234 of the genes will be essential genes. Thus, the 105 genes constitute 45% of the estimated number of essential genes in the physically defined zones and between 2 and 5% of all essential genes in C. elegans.  相似文献   

15.
16.
17.
A human being or person cannot be reduced to a set of human genes, or human genome. Genetic essentialism is wrong, because as a person the entity should have self-conscious and social interaction capacity which is grown in an interpersonal relationship. Genetic determinism is wrong too, the relationship between a gene and a trait is not a linear model of causation, but rather a non-linear one. Human genome is a complexity system and functions in a complexity system of human body and a complexity of systems of natural/social environment. Genetic determinism also caused the issue of how much responsibility an agent should take for her/his action, and how much degrees of freedom will a human being have. Human genome research caused several conceptual issues. Can we call a gene 'good' or 'bad', 'superior' of 'inferior'? Is a boy who is detected to have the gene of Huntington's chorea or Alzheimer disease a patient? What should the term 'eugenics' mean? What do the terms such as 'gene therapy', 'treatment' and 'enhancement' and 'human cloning' mean etc.? The research of human genome and its application caused and will cause ethical issues. Can human genome research and its application be used for eugenics, or only for the treatment and prevention of diseases? Must the principle of informed consent/choice be insisted in human genome research and its application? How to protecting gene privacy and combating the discrimination on the basis of genes? How to promote the quality between persons, harmony between ethnic groups and peace between countries? How to establish a fair, just, equal and equitable relationship between developing and developed countries in regarding to human genome research and its application?  相似文献   

18.
Several publicly funded large-scale sequencing efforts have been initiated with the goal of completing the first reference human genome sequence by the year 2005. Here we present the results of analysis of 11.8 Mb of genomic sequence from chromosome 16. The apparent gene density varies throughout the region, but the number of genes predicted (84) suggests that this is a gene-poor region. This result may also suggest that the total number of human genes is likely to be at the lower end of published estimates. One of the most interesting aspects of this region of the genome is the presence of highly homologous, recently duplicated tracts of sequence distributed throughout the p-arm. Such duplications have implications for mapping and gene analysis as well as the predisposition to recurrent chromosomal structural rearrangements associated with genetic disease.  相似文献   

19.
Bacteriophage K1F specifically infects Escherichia coli strains that produce the K1 polysaccharide capsule. Like several other K1 capsule-specific phages, K1F encodes an endo-neuraminidase (endosialidase) that is part of the tail structure which allows the phage to recognize and degrade the polysaccharide capsule. The complete nucleotide sequence of the K1F genome reveals that it is closely related to bacteriophage T7 in both genome organization and sequence similarity. The most striking difference between the two phages is that K1F encodes the endosialidase in the analogous position to the T7 tail fiber gene. This is in contrast with bacteriophage K1-5, another K1-specific phage, which encodes a very similar endosialidase which is part of a tail gene "module" at the end of the phage genome. It appears that diverse phages have acquired endosialidase genes by horizontal gene transfer and that these genes or gene products have adapted to different genome and virion architectures.  相似文献   

20.
The rabbit genome encodes an opal suppressor tRNA gene. The coding region is strictly conserved between the rabbit gene and the corresponding gene in the human genome. The rabbit opal suppressor gene contains the consensus sequence in the 3' internal control region but like the human and chicken genes, the rabbit 5' internal control region contains two additional nucleotides. The 5' flanking sequences of the rabbit and the human opal suppressor genes contain extensive regions of homology. A subset of these homologies is also present 5' to the chicken opal suppressor gene. Both the rabbit and the human genomes also encode a pseudogene. That of the rabbit lacks the 3' half of the coding region. Neither pseudogene has homologous regions to the 5' flanking regions of the genes. The presence of 5' homologies flanking only the transcribed genes and not the pseudogenes suggests that these regions may be regulatory control elements specifically involved in the expression of the eukaryotic opal suppressor gene. Moreover the strict conservation of coding sequences indicates functional importance for the opal suppressor tRNA genes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号