首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Genetic variation of 49 human papillomavirus (HPV) 6 and 22 HPV11 isolates from recurrent respiratory papillomatosis (RRP) (n = 17), genital warts (n = 43), anal cancer (n = 6) and cervical neoplasia cells (n = 5), was determined by sequencing the long control region (LCR) and the E6 and E7 genes. Comparative analysis of genetic variability was examined to determine whether different disease states resulting from HPV6 or HPV11 infection cluster into distinct variant groups. Sequence variation analysis of HPV6 revealed that isolates cluster into variants within previously described HPV6 lineages, with the majority (65%) clustering to HPV6 sublineage B1 across the three genomic regions examined. Overall 72 HPV6 and 25 HPV11 single nucleotide variations, insertions and deletions were observed within samples examined. In addition, missense alterations were observed in the E6/E7 genes for 6 HPV6 and 5 HPV11 variants. No nucleotide variations were identified in any isolates at the four E2 binding sites for HPV6 or HPV11, nor were any isolates found to be identical to the HPV6 lineage A or HPV11 sublineage A1 reference genomes. Overall, a high degree of sequence conservation was observed between isolates across each of the regions investigated for both HPV6 and HPV11. Genetic variants identified a slight association with HPV6 and anogenital lesions (p = 0.04). This study provides important information on the genetic diversity of circulating HPV 6 and HPV11 variants within the Australian population and supports the observation that the majority of HPV6 isolates cluster to the HPV6 sublineage B1 with anogenital lesions demonstrating an association with this sublineage (p = 0.02). Comparative analysis of Australian isolates for both HPV6 and HPV11 to those from other geographical regions based on the LCR revealed a high degree of sequence similarity throughout the world, confirming previous observations that there are no geographically specific variants for these HPV types.  相似文献   

2.
An international effort is underway to generate a comprehensive haplotype map (HapMap) of the human genome represented by an estimated 300000 to 1 million ‘tag’ single nucleotide polymorphisms (SNPs). Our analysis indicates that the current human SNP map is not sufficiently dense to support the HapMap project. For example, 24.6% of the genome currently lacks SNPs at the minimal density and spacing that would be required to construct even a conservative tag SNP map containing 300 000 SNPs. In an effort to improve the human SNP map, we identified 140 696 additional SNP candidates using a new bioinformatics pipeline. Over 51 000 of these SNPs mapped to the largest gaps in the human SNP map, leading to significant improvements in these regions. Our SNPs will be immediately useful for the HapMap project, and will allow for the inclusion of many additional genomic intervals in the final HapMap. Nevertheless, our results also indicate that additional SNP discovery projects will be required both to define the haplotype architecture of the human genome and to construct comprehensive tag SNP maps that will be useful for genetic linkage studies in humans.  相似文献   

3.
4.
Original results of the analysis of genetic linkage between some genomic markers and two complex clinical phenotypes, schizophrenia and mental retardation, in pedigrees from Daghestan genetic isolates are described. Interpopulation differences in the epidemiology of the complex phenotypes were studied and in their genetic linkage was demonstrated. These differences are evidently related to the genetic structure of the isolates determined by their demographic history. The epidemiological index MR characterizing the lifetime morbid risk of schizophrenia varies in the Daghestan isolates studied from 0 to 4.95%, which is almost five times higher than the average worldwide population rate, 1%. Comparative genetic mapping in different isolates permitted determination of the most probable genetic linkages and associations of loci in chromosomal regions 17p11.1–12, 3q13.3, and a locus from 22q with schizophrenia and locus 12q23 with mental retardation. There is evidence that this approach is effective for detailed study of the relationship between the genetic (allele and locus) and clinical heterogeneity of complex diseases, which favors successful identification of the genes determining them. The study of linkage disequilibrium (LD) in genetic isolates of Daghestan ethnic populations (which have a common genetic background) may be an effective methodological approach for revealing the numerous contradictory results of mapping of genes of the same complex disease performed by different researchers in different regions of the world.  相似文献   

5.
Sequence capture methods for targeted next generation sequencing promise to massively reduce cost of genomics projects compared to untargeted sequencing. However, evaluated capture methods specifically dedicated to biologically relevant genomic regions are rare. Whole exome capture has been shown to be a powerful tool to discover the genetic origin of disease and provides a reduction in target size and thus calculative sequencing capacity of > 90-fold compared to untargeted whole genome sequencing. For further cost reduction, a valuable complementing approach is the analysis of smaller, relevant gene subsets but involving large cohorts of samples. However, effective adjustment of target sizes and sample numbers is hampered by the limited scalability of enrichment systems. We report a highly scalable and automated method to capture a 480 Kb exome subset of 115 cancer-related genes using microfluidic DNA arrays. The arrays are adaptable from 125 Kb to 1 Mb target size and/or one to eight samples without barcoding strategies, representing a further 26 – 270-fold reduction of calculative sequencing capacity compared to whole exome sequencing. Illumina GAII analysis of a HapMap genome enriched for this exome subset revealed a completeness of > 96%. Uniformity was such that > 68% of exons had at least half the median depth of coverage. An analysis of reference SNPs revealed a sensitivity of up to 93% and a specificity of 98.2% or higher.  相似文献   

6.
The human genome reference (HGR) completion marked the genomics era beginning, yet despite its utility universal application is limited by the small number of individuals used in its development. This is highlighted by the presence of high-quality sequence reads failing to map within the HGR. Sequences failing to map generally represent 2–5 % of total reads, which may harbor regions that would enhance our understanding of population variation, evolution, and disease. Alternatively, complete de novo assemblies can be created, but these effectively ignore the groundwork of the HGR. In an effort to find a middle ground, we developed a bioinformatic pipeline that maps paired-end reads to the HGR as separate single reads, exports unmappable reads, de novo assembles these reads per individual and then combines assemblies into a secondary reference assembly used for comparative analysis. Using 45 diverse 1000 Genomes Project individuals, we identified 351,361 contigs covering 195.5 Mb of sequence unincorporated in GRCh38. 30,879 contigs are represented in multiple individuals with ~40 % showing high sequence complexity. Genomic coordinates were generated for 99.9 %, with 52.5 % exhibiting high-quality mapping scores. Comparative genomic analyses with archaic humans and primates revealed significant sequence alignments and comparisons with model organism RefSeq gene datasets identified novel human genes. If incorporated, these sequences will expand the HGR, but more importantly our data highlight that with this method low coverage (~10–20×) next-generation sequencing can still be used to identify novel unmapped sequences to explore biological functions contributing to human phenotypic variation, disease and functionality for personal genomic medicine.  相似文献   

7.
8.
Chlamydomonas reinhardtii is a widely used reference organism in studies of photosynthesis, cilia, and biofuels. Most research in this field uses a few dozen standard laboratory strains that are reported to share a common ancestry, but exhibit substantial phenotypic differences. In order to facilitate ongoing Chlamydomonas research and explain the phenotypic variation, we mapped the genetic diversity within these strains using whole-genome resequencing. We identified 524,640 single nucleotide variants and 4812 structural variants among 39 commonly used laboratory strains. Nearly all (98.2%) of the total observed genetic diversity was attributable to the presence of two, previously unrecognized, alternate haplotypes that are distributed in a mosaic pattern among the extant laboratory strains. We propose that these two haplotypes are the remnants of an ancestral cross between two strains with ∼2% relative divergence. These haplotype patterns create a fingerprint for each strain that facilitates the positive identification of that strain and reveals its relatedness to other strains. The presence of these alternate haplotype regions affects phenotype scoring and gene expression measurements. Here, we present a rich set of genetic differences as a community resource to allow researchers to more accurately conduct and interpret their experiments with Chlamydomonas.  相似文献   

9.
Comparative genomic hybridizations have been used to examine genetic relationships among bacteria. The microarrays used in these experiments may have open reading frames from one or more reference strains (whole-genome microarrays), or they may be composed of random DNA fragments from a large number of strains (mixed-genome microarrays [MGMs]). In this work both experimental and virtual arrays are analyzed to assess the validity of genetic inferences from these experiments with a focus on MGMs. Empirical data are analyzed from an Enterococcus MGM, while a virtual MGM is constructed in silico using sequenced genomes (Streptococcus). On average, a small MGM is capable of correctly deriving phylogenetic relationships between seven species of Enterococcus with accuracies of 100% (n = 100 probes) and 95% (n = 46 probes); more probes are required for intraspecific differentiation. Compared to multilocus sequence methods and whole-genome microarrays, MGMs provide additional discrimination between closely related strains and offer the possibility of identifying unique strain or lineage markers. Representational bias can have mixed effects. Microarrays composed of probes from a single genome can be used to derive phylogenetic relationships, although branch length can be exaggerated for the reference strain. We describe a case where disproportional representation of different strains used to construct an MGM can result in inaccurate phylogenetic inferences, and we illustrate an algorithm that is capable of correcting this type of bias. The bias correction algorithm automatically provides bootstrap confidence values and can provide multiple bias-corrected trees with high confidence values.  相似文献   

10.

Background

DNA barcoding promises to revolutionize the way taxonomists work, facilitating species identification by using small, standardized portions of the genome as substitutes for morphology. The concept has gained considerable momentum in many animal groups, but the higher plant world has been largely recalcitrant to the effort. In plants, efforts are concentrated on various regions of the plastid genome, but no agreement exists as to what kinds of regions are ideal, though most researchers agree that more than one region is necessary. One reason for this discrepancy is differences in the tests that are used to evaluate the performance of the proposed regions. Most tests have been made in a floristic setting, where the genetic distance and therefore the level of variation of the regions between taxa is large, or in a limited set of congeneric species.

Methodology and Principal Findings

Here we present the first in-depth coverage of a large taxonomic group, all 86 known species (except two doubtful ones) of crocus. Even six average-sized barcode regions do not identify all crocus species. This is currently an unrealistic burden in a barcode context. Whereas most proposed regions work well in a floristic context, the majority will – as is the case in crocus – undoubtedly be less efficient in a taxonomic setting. However, a reasonable but less than perfect level of identification may be reached – even in a taxonomic context.

Conclusions/Significance

The time is ripe for selecting barcode regions in plants, and for prudent examination of their utility. Thus, there is no reason for the plant community to hold back the barcoding effort by continued search for the Holy Grail. We must acknowledge that an emerging system will be far from perfect, fraught with problems and work best in a floristic setting.  相似文献   

11.
12.
High-throughput sequence alignment using Graphics Processing Units   总被引:1,自引:0,他引:1  

Background  

The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies.  相似文献   

13.

Background

The ability to identify regions of the genome inherited with a dominant trait in one or more families has become increasingly valuable with the wide availability of high throughput sequencing technology. While a number of methods exist for mapping of homozygous variants segregating with recessive traits in consanguineous families, dominant conditions are conventionally analysed by linkage analysis, which requires computationally demanding haplotype reconstruction from marker genotypes and, even using advanced parallel approximation implementations, can take substantial time, particularly for large pedigrees. In addition, linkage analysis lacks sensitivity in the presence of phenocopies (individuals sharing the trait but not the genetic variant responsible). Combinatorial Conflicting Homozygosity (CCH) analysis uses high density biallelic single nucleotide polymorphism (SNP) marker genotypes to identify genetic loci within which consecutive markers are not homozygous for different alleles. This allows inference of identical by descent (IBD) inheritance of a haplotype among a set or subsets of related or unrelated individuals.

Results

A single genome-wide conflicting homozygosity analysis takes <3 seconds and parallelisation permits multiple combinations of subsets of individuals to be analysed quickly. Analysis of unrelated individuals demonstrated that in the absence of IBD inheritance, runs of no CH exceeding 4 cM are not observed. At this threshold, CCH is >97% sensitive and specific for IBD regions within a pedigree exceeding this length and was able to identify the locus responsible for a dominantly inherited kidney disease in a Turkish Cypriot family in which six out 17 affected individuals were phenocopies. It also revealed shared ancestry at the disease-linked locus among affected individuals from two different Cypriot populations.

Conclusions

CCH does not require computationally demanding haplotype reconstruction and can detect regions of shared inheritance of a haplotype among subsets of related or unrelated individuals directly from SNP genotype data. In contrast to parametric linkage allowing for phenocopies, CCH directly provides the exact number and identity of individuals sharing each locus. CCH can also identify regions of shared ancestry among ostensibly unrelated individuals who share a trait. CCH is implemented in Python and is freely available (as source code) from http://sourceforge.net/projects/cchsnp/.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1360-4) contains supplementary material, which is available to authorized users.  相似文献   

14.
S10-spc-α is a 17.5 kb cluster of 32 genes encoding ribosomal proteins. This locus has an unusual composition and organization in Leptospira interrogans. We demonstrate the highly conserved nature of this region among diverse Leptospira and show its utility as a phylogenetically informative region. Comparative analyses were performed by PCR using primer sets covering the whole locus. Correctly sized fragments were obtained by PCR from all L. interrogans strains tested for each primer set indicating that this locus is well conserved in this species. Few differences were detected in amplification profiles between different pathogenic species, indicating that the S10-spc-α locus is conserved among pathogenic Leptospira. In contrast, PCR analysis of this locus using DNA from saprophytic Leptospira species and species with an intermediate pathogenic capacity generated varied results. Sequence alignment of the S10-spc-α locus from two pathogenic species, L. interrogans and L. borgpetersenii, with the corresponding locus from the saprophyte L. biflexa serovar Patoc showed that genetic organization of this locus is well conserved within Leptospira. Multilocus sequence typing (MLST) of four conserved regions resulted in the construction of well-defined phylogenetic trees that help resolve questions about the interrelationships of pathogenic Leptospira. Based on the results of secY sequence analysis, we found that reliable species identification of pathogenic Leptospira is possible by comparative analysis of a 245 bp region commonly used as a target for diagnostic PCR for leptospirosis. Comparative analysis of Leptospira strains revealed that strain H6 previously classified as L. inadai actually belongs to the pathogenic species L. interrogans and that L. meyeri strain ICF phylogenetically co-localized with the pathogenic clusters. These findings demonstrate that the S10-spc-α locus is highly conserved throughout the genus and may be more useful in comparing evolution of the genus than loci studied previously.  相似文献   

15.
With the aim of understanding relationship between genetic and phenotypic variations in cultivated tomato, single nucleotide polymorphism (SNP) markers covering the whole genome of cultivated tomato were developed and genome-wide association studies (GWAS) were performed. The whole genomes of six tomato lines were sequenced with the ABI-5500xl SOLiD sequencer. Sequence reads covering ∼13.7× of the genome for each line were obtained, and mapped onto tomato reference genomes (SL2.40) to detect ∼1.5 million SNP candidates. Of the identified SNPs, 1.5% were considered to confer gene functions. In the subsequent Illumina GoldenGate assay for 1536 SNPs, 1293 SNPs were successfully genotyped, and 1248 showed polymorphisms among 663 tomato accessions. The whole-genome linkage disequilibrium (LD) analysis detected highly biased LD decays between euchromatic (58 kb) and heterochromatic regions (13.8 Mb). Subsequent GWAS identified SNPs that were significantly associated with agronomical traits, with SNP loci located near genes that were previously reported as candidates for these traits. This study demonstrates that attractive loci can be identified by performing GWAS with a large number of SNPs obtained from re-sequencing analysis.  相似文献   

16.
A new insertion sequence (IS), IS1405, was isolated and characterized from a Ralstonia solanacearum race 1 strain by the method of insertional inactivation of the sacB gene. Sequence analysis indicated that the IS is closely related to the members of IS5 family, but the extent of nucleotide sequence identity in 5′ and 3′ noncoding regions between IS1405 and other members of IS5 family is only 23 to 31%. Nucleotide sequences of these regions were used to design specific oligonucleotide primers for detection of race 1 strains by PCR. The PCR amplified a specific DNA fragment for all R. solanacearum race 1 strains tested, and no amplification was observed with some other plant-pathogenic bacteria. Analysis of nucleotide sequences flanking IS1405 and additional five endogenous IS1405s that reside in the chromosome of R. solanacearum race 1 strains indicated that IS1405 prefers a target site of CTAR and has two different insertional orientations with respect to this target site. Restriction fragment length polymorphism (RFLP) pattern analysis using IS1405 as a probe revealed extensive genetic variation among strains of R. solanacearum race 1 isolated from eight different host plants in Taiwan. The RFLP patterns were then used to subdivide the race 1 strains into two groups and several subgroups, which allowed for tracking different subgroup strains of R. solanacearum through a host plant community. Furthermore, specific insertion sites of IS1405 in certain subgroups were used as a genetic marker to develop subgroup-specific primers for detection of R. solanacearum, and thus, the subgroup strains can be easily identified through a rapid PCR assay rather than RFLP analysis.  相似文献   

17.
We typed 147 simple sequence length polymorphisms in the SWXJ recombinant inbred (RI) strain set spanning Chromosomes (Chrs) 1–6. The strain distribution pattern for these loci was combined with data from 18 previously typed loci for SWXJ, resulting in new chromosome maps for this RI set, with an average density of 3.5 cM between loci. This is the first systematic effort to develop a more highly resolved genetic map for the SWXJ RI set and thereby improves the usefulness of this genetic tool for mapping genes underlying both simple and complex genetic disorders.  相似文献   

18.
We developed 21,499 genome-wide insertion–deletion (InDel) markers (2- to 54-bp in silico fragment length polymorphism) by comparing the genomic sequences of four (desi, kabuli and wild C. reticulatum) chickpea [Cicer arietinum (L.)] accessions. InDel markers showing 2- to 6-bp fragment length polymorphism among accessions were abundant (76.8%) in the chickpea genome. The physically mapped 7,643 and 13,856 markers on eight chromosomes and unanchored scaffolds, respectively, were structurally and functionally annotated. The 4,506 coding (23% large-effect frameshift mutations) and regulatory InDel markers were identified from 3,228 genes (representing 11.7% of total 27,571 desi genes), suggesting their functional relevance for trait association/genetic mapping. High amplification (97%) and intra-specific polymorphic (60–83%) potential and wider genetic diversity (15–89%) were detected by genome-wide 6,254 InDel markers among desi, kabuli and wild accessions using even a simpler cost-effective agarose gel-based assay. This signifies added advantages of this user-friendly genetic marker system for manifold large-scale genotyping applications in laboratories with limited infrastructure and resources. Utilizing 6,254 InDel markers-based high-density (inter-marker distance: 0.212 cM) inter-specific genetic linkage map (ICC 4958 × ICC 17160) of chickpea as a reference, three major genomic regions harboring six flowering and maturity time robust QTLs (16.4–27.5% phenotypic variation explained, 8.1–11.5 logarithm of odds) were identified. Integration of genetic and physical maps at these target QTL intervals mapped on three chromosomes delineated five InDel markers-containing candidate genes tightly linked to the QTLs governing flowering and maturity time in chickpea. Taken together, our study demonstrated the practical utility of developing and high-throughput genotyping of such beneficial InDel markers at a genome-wide scale to expedite genomics-assisted breeding applications in chickpea.  相似文献   

19.
Physical maps and recombination frequency of six rice chromosomes   总被引:2,自引:0,他引:2  
We constructed physical maps of rice chromosomes 1, 2, and 6-9 with P1-derived artificial chromosome (PAC) and bacterial artificial chromosome (BAC) clones. These maps, with only 20 gaps, cover more than 97% of the predicted length of the six chromosomes. We submitted a total of 193 Mbp of non-overlapping sequences to public databases. We analyzed the DNA sequences of 1316 genetic markers and six centromere-specific repeats to facilitate characterization of chromosomal recombination frequency and of the genomic composition and structure of the centromeric regions. We found marked changes in the relative recombination rate along the length of each chromosome. Chromosomal recombination at the centromere core and surrounding regions on the six chromosomes was completely suppressed. These regions have a total physical length of about 23 Mbp, corresponding to 11.4% of the entire size of the six chromosomes. Chromosome 6 has the longest quiescent region, with about 5.6 Mbp, followed by chromosome 8, with quiescent region about half this size. Repetitive sequences accounted for at least 40% of the total genomic sequence on the partly sequenced centromeric region of chromosome 1. Rice CentO satellite DNA is arrayed in clusters and is closely associated with the presence of Centromeric Retrotransposon of Rice (CRR)- and RIce RetroElement 7 (RIRE7)-like retroelement sequences. We also detected relatively small coldspot regions outside the centromeric region; their repetitive content and gene density were similar to those of regions with normal recombination rates. Sequence analysis of these regions suggests that either the amount or the organization patterns of repetitive sequences may play a role in the inactivation of recombination.  相似文献   

20.
Metabarcoding has improved the way we understand plants within our environment, from their ecology and conservation to invasive species management. The notion of identifying plant taxa within environmental samples relies on the ability to match unknown sequences to known reference libraries. Without comprehensive reference databases, species can go undetected or be incorrectly assigned, leading to false‐positive and false‐negative detections. To improve our ability to generate reference sequence databases, we developed a targeted capture approach using the OZBaits_CP V1.0 set, designed to capture chloroplast gene regions across the entirety of flowering plant diversity. We focused on generating a reference database for coastal temperate plant species given the lack of reference sequences for these taxa. Our approach was successful across all specimens with a target gene recovery rate of 92%, which was achieved in a single assay (i.e., samples were pooled), thus making this approach much faster and more efficient than standard barcoding. Further testing of this database highlighted 80% of all samples could be discriminated to family level across all gene regions with some genes achieving greater resolution than others—which was also dependent on the taxon of interest. Thus, we demonstrate the importance of generating reference sequences across multiple chloroplast gene regions as no single loci are sufficient to discriminate across all plant groups. The targeted capture approach outlined in this study provides a way forward to achieve this.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号