共查询到20条相似文献,搜索用时 15 毫秒
1.
Profile-based detection of microRNA precursors in animal genomes 总被引:8,自引:0,他引:8
MOTIVATION: MicroRNAs (miRNA) are essential 21-22 nt regulatory RNAs produced from larger hairpin-like precursors. Local sequence alignment tools such as BLAST are able to identify new members of known miRNA families, but not all of them. We set out to estimate how many new miRNAs could be recovered using a profile-based strategy such as that implemented in the ERPIN program. RESULTS: We constructed alignments for 18 miRNA families and performed ERPIN searches on animal genomes. Results were compared to those of a WU-BLAST search at the same E-value cutoff. The two combined approaches produced 265 new miRNA candidates that were not found in miRNA databases. About 17% of hits were ERPIN specific. They showed better structural characteristics than BLAST-specific hits and included interesting candidates such as members of the miR-17 cluster in Tetraodon. Profile-based RNA detection will be an important complement of similarity search programs in the completion of miRNA collections. 相似文献
2.
Transcription and processing of human microRNA precursors 总被引:17,自引:0,他引:17
Cullen BR 《Molecular cell》2004,16(6):861-865
3.
4.
Correlation between sequence conservation and the genomic context after gene duplication 总被引:4,自引:0,他引:4
下载免费PDF全文

A key complication in comparative genomics for reliable gene function prediction is the existence of duplicated genes. To study the effect of gene duplication on function prediction, we analyze orthologs between pairs of genomes where in one genome the orthologous gene has duplicated after the speciation of the two genomes (i.e. inparalogs). For these duplicated genes we investigate whether the gene that is most similar on the sequence level is also the gene that has retained the ancestral gene-neighborhood. Although the majority of investigated cases show a consistent pattern between sequence similarity and gene-neighborhood conservation, a substantial fraction, 29–38%, is inconsistent. The observation of inconsistency is not the result of a chance outcome owing to a lack of divergence time between inparalogs, but rather it seems to be the result of a chance outcome caused by very similar rates of sequence evolution of both inparalogs relative to their ortholog. If one-to-one orthologous relationships are required, it is advisable to combine contextual information (i.e. gene-neighborhood in prokaryotes and co-expression in eukaryotes) with protein sequence information to predict the most probable functional equivalent ortholog in the presence of inparalogs. 相似文献
5.
This study was undertaken to identify novel candidate genes at quantitative trait loci (QTL) on chicken chromosome Z (GGAZ) by comparing orthologous regions of chicken, human and mouse genomes. Primer sequences from marker flanking QTL positions (https://acedb.asg.wur.nl/) were obtained from www.iastate.edu/chickmap and blasted against the chicken genome (www.ensembl.org) using BLASTN. The best matches were those with the highest score, lowest E-values and highest percent identity. Orthologous regions in mice and humans, together with genes located on or around those loci were identified using the Ensembl website. Forty-six chicken genes, 91 mouse genes and 60 human genes associated with QTL on GGAZ were identified in the current study. Among the most promising candidate genes for egg production and egg shell quality are annexin A1 (ANXA1), osteoclast stimulating factor (OSF), thrombospondin-4 (THBS4), programmed cell death proteins (PDCD), follistatin (FST), growth hormone receptor (GHR), interferon (IFN) alpha and beta. The chicken IFN alpha and beta were located on GGAZ around position 13,000,000 bp on the draft chicken sequence map. The neuronal nicotinic acetylcholine receptor (nAChR) is located at a QTL region for abdominal fat (GGAZ 25483091 bp). Nicotine is an agonist at the nAChRs and has been shown to decrease lipolysis and triglyceride uptake, thereby reducing net storage in adipose tissue. Therefore, the nAchRs could be used as therapeutic targets for regulating feed intake and obesity. This study has identified 197 putative candidate genes in probable QTL regions of chicken chromosome Z. 相似文献
6.
Evolutionary conservation and selection of human disease gene orthologs in the rat and mouse genomes 总被引:2,自引:1,他引:2
Huang H Winter EE Wang H Weinstock KG Xing H Goodstadt L Stenson PD Cooper DN Smith D Albà MM Ponting CP Fechtel K 《Genome biology》2004,5(7):R47-15
Background
Model organisms have contributed substantially to our understanding of the etiology of human disease as well as having assisted with the development of new treatment modalities. The availability of the human, mouse and, most recently, the rat genome sequences now permit the comprehensive investigation of the rodent orthologs of genes associated with human disease. Here, we investigate whether human disease genes differ significantly from their rodent orthologs with respect to their overall levels of conservation and their rates of evolutionary change.Results
Human disease genes are unevenly distributed among human chromosomes and are highly represented (99.5%) among human-rodent ortholog sets. Differences are revealed in evolutionary conservation and selection between different categories of human disease genes. Although selection appears not to have greatly discriminated between disease and non-disease genes, synonymous substitution rates are significantly higher for disease genes. In neurological and malformation syndrome disease systems, associated genes have evolved slowly whereas genes of the immune, hematological and pulmonary disease systems have changed more rapidly. Amino-acid substitutions associated with human inherited disease occur at sites that are more highly conserved than the average; nevertheless, 15 substituting amino acids associated with human disease were identified as wild-type amino acids in the rat. Rodent orthologs of human trinucleotide repeat-expansion disease genes were found to contain substantially fewer of such repeats. Six human genes that share the same characteristics as triplet repeat-expansion disease-associated genes were identified; although four of these genes are expressed in the brain, none is currently known to be associated with disease.Conclusions
Most human disease genes have been retained in rodent genomes. Synonymous nucleotide substitutions occur at a higher rate in disease genes, a finding that may reflect increased mutation rates in the chromosomal regions in which disease genes are found. Rodent orthologs associated with neurological function exhibit the greatest evolutionary conservation; this suggests that rodent models of human neurological disease are likely to most faithfully represent human disease processes. However, with regard to neurological triplet repeat expansion-associated human disease genes, the contraction, relative to human, of rodent trinucleotide repeats suggests that rodent loci may not achieve a 'critical repeat threshold' necessary to undergo spontaneous pathological repeat expansions. The identification of six genes in this study that have multiple characteristics associated with repeat expansion-disease genes raises the possibility that not all human loci capable of facilitating neurological disease by repeat expansion have as yet been identified. 相似文献7.
Wynn SL Fisher RA Pagel C Price M Liu QY Khan IM Zammit P Dadrah K Mazrani W Kessling A Lee JS Buluwela L 《Genomics》2000,68(1):57-62
The SON gene, which maps to human chromosome 21q22.1-q22.2, encodes a novel regulatory protein. Here we describe the organization of the Son locus in the mouse genome. The mouse Son gene spans a region of approximately 35 kb. The coding region is more than 8 kb in length and has been completely sequenced. The gene is organized into 11 coding exons and 1 noncoding 3'UTR exon, with over 70% of the coding region residing in one 5.7-kb exon. The gene contains at least one alternative exon, N/C exon 1, which can be used, by splicing, to generate a truncated form of the SON protein. Further investigation of the mouse Son locus has identified the genes directly flanking Son. The glycinamide ribonucleotide formyltransferase gene, Gart, is encoded 5' of Son in a head-to-head arrangement, with the start of both genes lying within 899 bp. Sequence comparison with the expressed sequence tagged database identified a novel gene within 65 bp of the 3' end of Son, which we have named Donson. In this unusually compact gene cluster, we have found overlap in the pattern of expression between Gart, Son, and Donson. However, at least two of these genes have very different functions. While GART is involved in purine biosynthesis, we find that SON shows the characteristics of "SR- type" proteins, which are involved in mRNA processing and gene expression. 相似文献
8.
9.
Long-term conservation of six duplicated structural genes in cephalopod mitochondrial genomes 总被引:1,自引:0,他引:1
Yokobori S Fukuda N Nakamura M Aoyama T Oshima T 《Molecular biology and evolution》2004,21(11):2034-2046
The complete nucleotide sequences of the mitochondrial (mt) genomes of three cephalopods, Octopus vulgaris (Octopodiformes, Octopoda, Incirrata), Todarodes pacificus (Decapodiformes, Oegopsida, Ommastrephidae), and Watasenia scintillans (Decapodiformes, Oegopsida, Enoploteuthidae), were determined. These three mt genomes encode the standard set of metazoan mt genes. However, W. scintillans and T. pacificus mt genomes share duplications of the longest noncoding region, three cytochrome oxidase subunit genes and two ATP synthase subunit genes, and the tRNA(Asp) gene. Southern hybridization analysis of the W. scintillans mt genome shows that this single genome carries both duplicated regions. The near-identical sequence of the duplicates suggests that there are certain concerted evolutionary mechanisms, at least in cephalopod mitochondria. Molecular phylogenetic analyses of mt protein genes are suggestive, although not statistically significantly so, of a monophyletic relationship between W. scintillans and T. pacificus. 相似文献
10.
Mouse models are often used to study human genes because it is believed that the expression and function are similar for the majority of orthologous genes between the two species. However, recent comparisons of microarray data from thousands of orthologous human and mouse genes suggested rapid evolution of gene expression profiles under minimal or no selective constraint. These findings appear to contradict non-array-based observations from many individual genes and imply the uselessness of mouse models for studying human genes. Because absolute levels of gene expression are not comparable between species when the data are generated by species-specific microarrays, use of relative mRNA abundance among tissues (RA) is preferred to that of absolute expression signals. We thus reanalyze human and mouse genome-wide gene expression data generated by oligonucleotide microarrays. We show that the mean correlation coefficient among expression profiles detected by different probe sets of the same gene is only 0.38 for humans and 0.28 for mice, indicating that current measures of expression divergence are flawed because the large estimation error (discrepancy in expression signal detected by different probe sets of the same gene) is mistakenly included in the between-species divergence. When this error is subtracted, 84% of human-mouse orthologous gene pairs show significantly lower expression divergence than that of random gene pairs. In contrast to a previous finding, but consistent with the common sense, expression profiles of orthologous tissues between species are more similar to each other than to those of nonorthologous tissues. Furthermore, the evolutionary rate of expression divergence and that of coding sequence divergence are found to be weakly, but significantly positively correlated, when RA and the Euclidean distance are used to measure expression-profile divergence. These results highlight the importance of proper consideration of various estimation errors in comparing the microarray data between species. 相似文献
11.
12.
13.
14.
The human genome reference (HGR) completion marked the genomics era beginning, yet despite its utility universal application is limited by the small number of individuals used in its development. This is highlighted by the presence of high-quality sequence reads failing to map within the HGR. Sequences failing to map generally represent 2–5 % of total reads, which may harbor regions that would enhance our understanding of population variation, evolution, and disease. Alternatively, complete de novo assemblies can be created, but these effectively ignore the groundwork of the HGR. In an effort to find a middle ground, we developed a bioinformatic pipeline that maps paired-end reads to the HGR as separate single reads, exports unmappable reads, de novo assembles these reads per individual and then combines assemblies into a secondary reference assembly used for comparative analysis. Using 45 diverse 1000 Genomes Project individuals, we identified 351,361 contigs covering 195.5 Mb of sequence unincorporated in GRCh38. 30,879 contigs are represented in multiple individuals with ~40 % showing high sequence complexity. Genomic coordinates were generated for 99.9 %, with 52.5 % exhibiting high-quality mapping scores. Comparative genomic analyses with archaic humans and primates revealed significant sequence alignments and comparisons with model organism RefSeq gene datasets identified novel human genes. If incorporated, these sequences will expand the HGR, but more importantly our data highlight that with this method low coverage (~10–20×) next-generation sequencing can still be used to identify novel unmapped sequences to explore biological functions contributing to human phenotypic variation, disease and functionality for personal genomic medicine. 相似文献
15.
We previously described the production of monoclonal antibodies against a preparation of membrane glycoproteins from human brain [Berglund et al. (1987) J. Neurochem. 48, 809-815]. One of the glycoproteins, recognized by monoclonal antibody CF3, was specifically expressed in the brain. We now report the isolation and characterization of this glycoprotein, called glycoprotein 135 (Gp135). Gp135 was purified by means of lentil lectin affinity chromatography and immunoaffinity chromatography, using monoclonal antibody CF3, from a crude membrane extract of human brain cortex. Gp135 was shown to consist of a glycosylated single polypeptide chain with an apparent molecular mass of 135 kDa. The size of the polypeptide moiety was estimated to 115 kDa following N-glycanase digestion. The glycoprotein is anchored in the membrane by a glycosylphosphatidylinositol tail, as shown by phospholipase C digestion and liposome incorporation experiments. Amino acid sequence analysis of the amino terminal, and of an internal peptide obtained by V8 protease digestion of the glycoprotein, revealed a strong similarity to three previously described glycoproteins from chicken (contactin and F11) and mouse (F3) brains. These glycoproteins belong to the immunoglobulin superfamily and are implicated in cell adhesion phenomena in the developing brain. Gp135 may be the human counterpart to one or several of these glycoproteins. 相似文献
16.
17.
18.
Sequence similarity search is a fundamental way of analyzing nucleotide sequences. Despite decades of research, this is not a solved problem because there exist many similarities that are not found by current methods. Search methods are typically based on a seed-and-extend approach, which has many variants (e.g. spaced seeds, transition seeds), and it remains unclear how to optimize this approach. This study designs and tests seeding methods for inter-mammal and inter-insect genome comparison. By considering substitution patterns of real genomes, we design sets of multiple complementary transition seeds, which have better performance (sensitivity per run time) than previous seeding strategies. Often the best seed patterns have more transition positions than those used previously. We also point out that recent computer memory sizes (e.g. 60 GB) make it feasible to use multiple (e.g. eight) seeds for whole mammal genomes. Interestingly, the most sensitive settings achieve diminishing returns for human–dog and melanogaster–pseudoobscura comparisons, but not for human–mouse, which suggests that we still miss many human–mouse alignments. Our optimized heuristics find ∼20 000 new human–mouse alignments that are missing from the standard UCSC alignments. We tabulate seed patterns and parameters that work well so they can be used in future research. 相似文献
19.
A very simple new program is presented (G-SQUARES). It is useful in order to visualize the composition and basic structural features of whole genomes and selected chromosome regions. The frequency of all dimer and tetramer sequences is reported. Overall structural features are calculated, such as the tendency for alternation. A direct visual comparison among different sequences is easily available. Furthermore, the features which are visualized indicate further studies which should be carried out. Examples are presented on Alu sequences, CpG islands, whole eukaryotic and bacterial genomes. 相似文献