首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
MicroRNAs are short (approximately 22 nt) regulatory RNA molecules that play key roles in metazoan development and have been implicated in human disease. First discovered in Caenorhabditis elegans, over 2500 microRNAs have been isolated in metazoans and plants; it has been estimated that there may be more than a thousand microRNA genes in the human genome alone. Motivated by the experimental observation of strong conservation of the microRNA let-7 among nearly all metazoans, we developed a novel methodology to characterize the class of such strongly conserved sequences: we identified a non-redundant set of all sequences 20 to 29 bases in length that are shared among three insects: fly, bee and mosquito. Among the few hundred sequences greater than 20 bases in length are close to 40% of the 78 confirmed fly microRNAs, along with other non-coding RNAs and coding sequence.  相似文献   

2.
3.
4.
5.
6.
Neisseria meningitidis is one of the main agents of bacterial meningitis, causing substantial morbidity and mortality worldwide. However, most of the time N. meningitidis is carried as a commensal not associated with invasive disease. The genomic basis of the difference between disease-associated and carried isolates of N. meningitidis may provide critical insight into mechanisms of virulence, yet it has remained elusive. Here, we have taken a comparative genomics approach to interrogate the difference between disease-associated and carried isolates of N. meningitidis at the level of individual nucleotide variations (i.e., single nucleotide polymorphisms [SNPs]). We aligned complete genome sequences of 8 disease-associated and 4 carried isolates of N. meningitidis to search for SNPs that show mutually exclusive patterns of variation between the two groups. We found 63 SNPs that distinguish the 8 disease-associated genomes from the 4 carried genomes of N. meningitidis, which is far more than can be expected by chance alone given the level of nucleotide variation among the genomes. The putative list of SNPs that discriminate between disease-associated and carriage genomes may be expected to change with increased sampling or changes in the identities of the isolates being compared. Nevertheless, we show that these discriminating SNPs are more likely to reflect phenotypic differences than shared evolutionary history. Discriminating SNPs were mapped to genes, and the functions of the genes were evaluated for possible connections to virulence mechanisms. A number of overrepresented functional categories related to virulence were uncovered among SNP-associated genes, including genes related to the category "symbiosis, encompassing mutualism through parasitism."  相似文献   

7.
8.
9.
Although whole human genome sequencing can be done with readily available technical and financial resources, the need for detailed analyses of genomes of certain populations still exists. Here we present, for the first time, sequencing and analysis of a Turkish human genome. We have performed 35x coverage using paired-end sequencing, where over 95% of sequencing reads are mapped to the reference genome covering more than 99% of the bases. The assembly of unmapped reads rendered 11,654 contigs, 2,168 of which did not reveal any homology to known sequences, resulting in ∼1 Mbp of unmapped sequence. Single nucleotide polymorphism (SNP) discovery resulted in 3,537,794 SNP calls with 29,184 SNPs identified in coding regions, where 106 were nonsense and 259 were categorized as having a high-impact effect. The homo/hetero zygosity (1,415,123∶2,122,671 or 1∶1.5) and transition/transversion ratios (2,383,204∶1,154,590 or 2.06∶1) were within expected limits. Of the identified SNPs, 480,396 were potentially novel with 2,925 in coding regions, including 48 nonsense and 95 high-impact SNPs. Functional analysis of novel high-impact SNPs revealed various interaction networks, notably involving hereditary and neurological disorders or diseases. Assembly results indicated 713,640 indels (1∶1.09 insertion/deletion ratio), ranging from −52 bp to 34 bp in length and causing about 180 codon insertion/deletions and 246 frame shifts. Using paired-end- and read-depth-based methods, we discovered 9,109 structural variants and compared our variant findings with other populations. Our results suggest that whole genome sequencing is a valuable tool for understanding variations in the human genome across different populations. Detailed analyses of genomes of diverse origins greatly benefits research in genetics and medicine and should be conducted on a larger scale.  相似文献   

10.
11.
12.
13.
It has become clear that a large proportion of functional DNA in the human genome does not code for protein. Identification of this non-coding functional sequence using comparative approaches is proving difficult and has previously been thought to require deep sequencing of multiple vertebrates. Here we introduce a new model and comparative method that, instead of nucleotide substitutions, uses the evolutionary imprint of insertions and deletions (indels) to infer the past consequences of selection. The model predicts the distribution of indels under neutrality, and shows an excellent fit to human–mouse ancestral repeat data. Across the genome, many unusually long ungapped regions are detected that are unaccounted for by the neutral model, and which we predict to be highly enriched in functional DNA that has been subject to purifying selection with respect to indels. We use the model to determine the proportion under indel-purifying selection to be between 2.56% and 3.25% of human euchromatin. Since annotated protein-coding genes comprise only 1.2% of euchromatin, these results lend further weight to the proposition that more than half the functional complement of the human genome is non-protein-coding. The method is surprisingly powerful at identifying selected sequence using only two or three mammalian genomes. Applying the method to the human, mouse, and dog genomes, we identify 90 Mb of human sequence under indel-purifying selection, at a predicted 10% false-discovery rate and 75% sensitivity. As expected, most of the identified sequence represents unannotated material, while the recovered proportions of known protein-coding and microRNA genes closely match the predicted sensitivity of the method. The method's high sensitivity to functional sequence such as microRNAs suggest that as yet unannotated microRNA genes are enriched among the sequences identified. Futhermore, its independence of substitutions allowed us to identify sequence that has been subject to heterogeneous selection, that is, sequence subject to both positive selection with respect to substitutions and purifying selection with respect to indels. The ability to identify elements under heterogeneous selection enables, for the first time, the genome-wide investigation of positive selection on functional elements other than protein-coding genes.  相似文献   

14.
15.
Mononucleotide repeats (MNRs) are abundant in eukaryotic genomes and exhibit a high degree of length variability due to insertion and deletion events. However, the relationship between these repeats and mutation rates in surrounding sequences has not been systematically investigated. We have analyzed the frequency of single nucleotide polymorphisms (SNPs) at positions close to and within MNRs in the human genome. Overall, we find a 2- to 4-fold increase in the SNP frequency at positions immediately adjacent to the boundaries of MNRs, relative to that at more distant bases. This relationship exhibits a strong asymmetry between 3' and 5' ends of repeat tracts and is dependent upon the repeat motif, length and orientation of surrounding repeats. Our analysis suggests that the incorporation or exclusion of bases adjacent to the boundary of the repeat through substitutions, in which these nucleotides mutate towards or away from the base present within the repeat, respectively, may be another mechanism by which MNRs expand and contract in the human genome.  相似文献   

16.
Kim KJ  Lee HL 《Molecules and cells》2005,19(1):104-113
Large inversions are well characterized in the chloroplast genomes of land plants. In contrast, reports of small inversions are rare and involve limited plant groups. In this study, we report the widespread occurrence of small inversions ranging from 5 to 50 bp in fully and partially sequenced chloroplast genomes of both monocots and dicots. We found that small inversions were much more common than large inversions. The small inversions were scattered over the chloroplast genome including the IR, SSC, and LSC regions. Several small inversions were uncovered in chloroplast genomes even though they shared the same overall gene order. The majority of these small inversions were located within 100 bp downstream of the 3' ends of genes. All had inverted repeat sequences, ranging from 11 to 24 bp, at their ends. Such small inversions form stem-loop hairpin structures that usually have the function of stabilizing the corresponding mRNA molecules. Intra-molecular recombination between the inverted sequences in the stem-forming regions are responsible for generating flip-flop orientations of the loops. The presence of two different orientations of the stem-loop in the trnL-F noncoding region of a single species of Jasminum elegans suggests that a short inversion can be generated within a short period of time. Small inversions of non-coding sequences may influence sequence alignment and character interpretation in phylogeny reconstructions, as shown in nine species of Jasminum. Many small inversions may have been generated by parallel or back mutation events during chloroplast genome evolution. Our data indicate that caution is needed when using chloroplast non-coding sequences for phylogenetic analysis.  相似文献   

17.
18.
19.
Fast algorithms for large-scale genome alignment and comparison   总被引:35,自引:5,他引:30       下载免费PDF全文
We describe a suffix-tree algorithm that can align the entire genome sequences of eukaryotic and prokaryotic organisms with minimal use of computer time and memory. The new system, MUMmer 2, runs three times faster while using one-third as much memory as the original MUMmer system. It has been used successfully to align the entire human and mouse genomes to each other, and to align numerous smaller eukaryotic and prokaryotic genomes. A new module permits the alignment of multiple DNA sequence fragments, which has proven valuable in the comparison of incomplete genome sequences. We also describe a method to align more distantly related genomes by detecting protein sequence homology. This extension to MUMmer aligns two genomes after translating the sequence in all six reading frames, extracts all matching protein sequences and then clusters together matches. This method has been applied to both incomplete and complete genome sequences in order to detect regions of conserved synteny, in which multiple proteins from one organism are found in the same order and orientation in another. The system code is being made freely available by the authors.  相似文献   

20.
We examined evolutionary mechanisms in the tetraploid Elymus caninus by comparing the phylogenetic relationships of 21 accessions suggested by sequence data from two single copy nuclear genes, the largest subunit of RNA polymerase II (RPB2) and phosphoenolpyruvate carboxylase (pepC), and one non-coding chloroplast region, TrnD/T. Elymus caninus is known combining two different genomes, an St genome and an H genome. Data from two single copy nuclear genes showed that there are two versions of the St genome in the species, St1 and St2. Most accessions combined one of these versions with an H genome version but two accessions had both versions of the St sequence for RPB2. This suggests that the RPB2gene may have been duplicated without chromosome doubling, possibly induced by transposable element. Our data also indicate that the H genome sequences in E. caninus have multiple origins, and a close phylogenetic relationship between Hordeum bogdanii and H sequences in some accessions of E. caninus. Thus, it is more likely that H. bogdanii is one of the major donors of the H copy in E. caninus. The maternal origin of E. caninus is the St genome species. There was no correlation between the geographic origin of the accessions and their sequence divergence.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号