首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

Many molecules are flexible and undergo significant shape deformation as part of their function, and yet most existing molecular shape comparison (MSC) methods treat them as rigid bodies, which may lead to incorrect shape recognition.  相似文献   

2.

Background

As Next-Generation Sequencing data becomes available, existing hardware environments do not provide sufficient storage space and computational power to store and process the data due to their enormous size. This is and will be a frequent problem that is encountered everyday by researchers who are working on genetic data. There are some options available for compressing and storing such data, such as general-purpose compression software, PBAT/PLINK binary format, etc. However, these currently available methods either do not offer sufficient compression rates, or require a great amount of CPU time for decompression and loading every time the data is accessed.

Results

Here, we propose a novel and simple algorithm for storing such sequencing data. We show that, the compression factor of the algorithm ranges from 16 to several hundreds, which potentially allows SNP data of hundreds of Gigabytes to be stored in hundreds of Megabytes. We provide a C++ implementation of the algorithm, which supports direct loading and parallel loading of the compressed format without requiring extra time for decompression. By applying the algorithm to simulated and real datasets, we show that the algorithm gives greater compression rate than the commonly used compression methods, and the data-loading process takes less time. Also, The C++ library provides direct-data-retrieving functions, which allows the compressed information to be easily accessed by other C++ programs.

Conclusions

The SpeedGene algorithm enables the storage and the analysis of next generation sequencing data in current hardware environment, making system upgrades unnecessary.  相似文献   

3.
David I Smith 《Cytometry》2002,47(1):60-62
Of the cancers unique to women, ovarian cancer has the highest mortality rate. Over 26,000 women are diagnosed with this disease in the U.S. annually, and 60% of those diagnosed will die of the disease. One of the greatest problems with this disease is the lack of strong early warning signs or symptoms resulting in advanced stage at presentation in most women, followed by the outgrowth of chemotherapy-resistant disease in the majority of patients. The 5-year survival for patients with early stage disease ranges from 50-90%, but it is less than 25% for advanced-stage disease. In collaboration with researchers at Millennium Predictive Medicine (Cambridge, MA), the Ovarian Cancer Program of the Mayo Clinic Cancer Center analyzed gene expression in over 50 primary ovarian tumors, as compared with normal ovarian epithelial cells. The technologies utilized included microarray analysis with nitrocellulose filters containing 25,000 arrayed human cDNAs, as well as the construction of subtraction suppression hybridization cDNA libraries and their subsequent sequencing. Our specific focus has been on genes that are underexpressed during the development of ovarian cancer, although this analysis has revealed a large number of consistently up- and down-regulated genes. There were more down-regulated genes in ovarian tumors than up-regulated genes. In addition, the number of genes that had altered expression levels was quite large. For example, we found 409 genes down-regulated at least 5-fold, and 72 genes up-regulated at least 5-fold in 33% of the tumors analyzed. We also observed that most of the expression alterations observed in later stage (Stages III/IV) tumors were also observed in early-stage tumors (Stages I/II). This was corroborated using comparative genomic hybridization analysis on the same tumors that were expression profiled. This analysis revealed that the late-stage tumors had more gene amplification than early-stage tumors, but most regions of change (either increases or decreases) were in common between different stage tumors. We also have verified the altered expression levels of several of these genes using several complementary strategies. Finally, we are taking top candidate genes that are consistently under-expressed in ovarian tumors and attempting to determine their functional role in the development of ovarian cancer.  相似文献   

4.
EXALT (EXpression signature AnaLysis Tool) is a computational system enabling comparisons of microarray data across experimental platforms and different laboratories . An essential feature of EXALT is a database holding thousands of gene expression signatures extracted from the Gene Expression Omnibus, and encoded in a searchable format. This novel approach to performing global comparisons of shared microarray data may have enormous value when coupled directly with a shared data repository.  相似文献   

5.

Background  

Molecular signatures are sets of genes, proteins, genetic variants or other variables that can be used as markers for a particular phenotype. Reliable signature discovery methods could yield valuable insight into cell biology and mechanisms of human disease. However, it is currently not clear how to control error rates such as the false discovery rate (FDR) in signature discovery. Moreover, signatures for cancer gene expression have been shown to be unstable, that is, difficult to replicate in independent studies, casting doubts on their reliability.  相似文献   

6.
7.
Advances in sequencing technology and genome-wide association studies are now revealing the complex interactions between hosts and pathogen through genomic variation signatures, which arise from evolutionary co-existence.  相似文献   

8.
9.
SIMS: computation of a smooth invariant molecular surface.   总被引:1,自引:0,他引:1  
SIMS, a new method of calculating a smooth invariant molecular dot surface, is presented. The SIMS method generates the smooth molecular surface by rolling two probe spheres. A solvent probe sphere is rolled over the molecule and produces a Richards-Connolly molecular surface (MS), which envelops the solvent-excluded volume of the molecule. In deep crevices, Connolly's method of calculating the MS has two deficiencies. First, it produces self-intersecting parts of the molecular surface, which must be removed to obtain the correct MS. Second, the correct MS is not smooth, i.e., the direction of the normal vector of the MS is not continuous, and some points of the MS are singular. We present an exact method for removing self-intersecting parts and smoothing the singular regions of the MS. The singular MS is smoothed by rolling a smoothing probe sphere over the inward side of the singular MS. The MS in the vicinity of singularities is replaced with the reentrant surface of the smoothing probe sphere. The smoothing method does not disturb the topology of a singular MS, and the smooth MS is a better approximation of the dielectric border between high dielectric solvent and the low dielectric molecular interior. The SIMS method generates a smooth molecular dot surface, which has a quasi-uniform dot distribution in two orthogonal directions on the molecular surface, which is invariant with molecular rotation and stable under changes in the molecular conformation, and which can be used in a variety of implicit methods of modeling solvent effects. The SIMS program is faster than the Connolly MS program, and in a matter of seconds generates a smooth dot MS of a 200-residue protein. The program is available from the authors on request (see http:@femto.med.unc.edu/SIMS).  相似文献   

10.
The Persian cat is mainly characterized by an extremely brachycephalic face as part of the standard body conformation. Despite the popularity, world-wide distribution, and economic importance of the Persian cat as a fancy breed, little is known about the genetics of their hallmark morphology, brachycephaly. Over 800 cats from different breeds including Persian, non-Persian breeds (Abyssinian, Cornish Rex, Bengal, La Perm, Norwegian Forest, Maine Coon, Manx, Oriental, and Siamese), and Persian-derived breeds (British Shorthair, Scottish Fold, Selkirk Rex) were genotyped with the Illumina 63 K feline DNA array. The experimental strategy was composed of three main steps: (i) the Persian dataset was screened for runs of homozygosity to find and select highly homozygous regions; (ii) selected Persian homozygous regions were evaluated for the difference of homozygosity between Persians and those considered non-Persian breeds, and, (iii) the Persian homozygous regions most divergent from the non-Persian breeds were investigated by haplotype analysis in the Persian-derived breeds. Four regions with high homozygosity (H > 0.7) were detected, each with an average length of 1 Mb. Three regions can be considered unique to the Persian breed, with a less conservative haplotype pattern in the Persian-derived breeds. Moreover, two genes, CHL1 and CNTN6 known to determine face shape modification in humans, reside in one of the identified regions and therefore are positional candidates for the brachycephalic face in Persians. In total, the homozygous regions contained several neuronal genes that could be involved in the Persian cat behavior and can provide new insights into cat domestication.  相似文献   

11.
TH Lee  TS Wu  CP Tseng  JT Qiu 《PloS one》2012,7(8):e42051

Background

Genotyping of human papillomarvirus (HPV) is crucial for patient management in a clinical setting. This study accesses the combined use of broad-range real-time PCR and high-resolution melting (HRM) analysis for rapid identification of HPV genotypes.

Methods

Genomic DNA sequences of 8 high-risk genotypes (HPV16/18/39/45/52/56/58/68) were subject to bioinformatic analysis to select for appropriate PCR amplicon. Asymmetric broad-range real-time PCR in the presence of HRM dye and two unlabeled probes specific to HPV16 and 18 was employed to generate HRM molecular signatures for HPV genotyping. The method was validated via assessment of 119 clinical HPV isolates.

Results

A DNA fragment within the L1 region was selected as the PCR amplicon ranging from 215–221 bp for different HPV genotypes. Each genotype displayed a distinct HRM molecular signature with minimal inter-assay variability. According to the HRM molecular signatures, HPV genotypes can be determined with one PCR within 3 h from the time of viral DNA isolation. In the validation assay, a 91% accuracy rate was achieved when the genotypes were in the database. Concomitantly, the HRM molecular signatures for additional 6 low-risk genotypes were established.

Conclusions

This assay provides a novel approach for HPV genotyping in a rapid and cost-effective manner.  相似文献   

12.
Thermotogae species are currently identified mainly on the basis of their unique toga and distinct branching in the rRNA and other phylogenetic trees. No biochemical or molecular markers are known that clearly distinguish the species from this phylum from all other bacteria. The taxonomic/evolutionary relationships within this phylum, which consists of a single family, are also unclear. We report detailed phylogenetic analyses on Thermotogae species based on concatenated sequences for many ribosomal as well as other conserved proteins that identify a number of distinct clades within this phylum. Additionally, comprehensive analyses of protein sequences from Thermotogae genomes have identified >60 Conserved Signature Indels (CSI) that are specific for the Thermotogae phylum or its different subgroups. Eighteen CSIs in important proteins such as PolI, RecA, TrpRS and ribosomal proteins L4, L7/L12, S8, S9, etc. are uniquely present in various Thermotogae species and provide molecular markers for the phylum. Many CSIs were specific for a number of Thermotogae subgroups. Twelve of these CSIs were specific for a clade consisting of various Thermotoga species except Tt. lettingae, which was separated from other Thermotoga species by a long branch in phylogenetic trees; Fourteen CSIs were specific for a clade consisting of the Fervidobacterium and Thermosipho genera and eight additional CSIs were specific for the genus Thermosipho. In addition, the existence of a clade consisting of the deep branching species Petrotoga mobilis, Kosmotoga olearia and Thermotogales bacterium mesG1 was supported by seven CSIs. The deep branching of this clade was also supported by a number of CSIs that were present in various Thermotogae species, but absent in this clade and all other bacteria. Most of these clades were strongly supported by phylogenetic analyses based on two datasets of protein sequences and they identify potential higher taxonomic grouping (viz. families) within this phylum. We also report 16 CSIs that are shared by either some or all Thermotogae species and some species from other taxa such as Archaea, Aquificae, Firmicutes, Proteobacteria, Deinococcus, Fusobacteria, Dictyoglomus, Chloroflexi and eukaryotes. The shared presence of some of these CSIs could be due to lateral gene transfers between these groups. However, no clear preference for any particular group was observed in this regard. The molecular probes based on different genes/proteins, which contain these Thermotogae-specific CSIs, provide novel and highly specific means for identification of both known as well as previously unknown Thermotogae species in different environments. Additionally, these CSIs also provide valuable tools for genetic and biochemical studies that could lead to discovery of novel properties that are unique to these bacteria.  相似文献   

13.
Although the APOE region is the strongest genetic risk factor for Alzheimer's diseases (ADs), its pathogenic role remains poorly understood. Elucidating genetic predisposition to ADs, a subset of age‐related diseases characteristic for postreproductive period, is hampered by the undefined role of evolution in establishing molecular mechanisms of such diseases. This uncertainty is inevitable source of natural‐selection–free genetic heterogeneity in predisposition to ADs. We performed first large‐scale analysis of linkage disequilibrium (LD) structures characterized by 30 polymorphisms from five genes in the APOE 19q13.3 region (BCAM, NECTIN2, TOMM40, APOE, and APOC1) in 2,673 AD‐affected and 16,246 unaffected individuals from five cohorts. Consistent with the undefined role of evolution in age‐related diseases, we found that these structures, being highly heterogeneous, are significantly different in subjects with and without ADs. The pattern of the difference represents molecular signature of AD comprised of single nucleotide polymorphisms (SNPs) from all five genes in the APOE region. Significant differences in LD in subjects with and without ADs indicate SNPs from different genes likely involved in AD pathogenesis. Significant and highly heterogeneous molecular signatures of ADs provide unprecedented insight into complex polygenetic predisposition to ADs in the APOE region. These findings are more consistent with a complex haplotype than with a single genetic variant origin of ADs in this region.  相似文献   

14.
15.
16.
We present a new method for conducting protein structure similarity searches, which improves on the efficiency of some existing techniques. Our method is grounded in the theory of differential geometry on 3D space curve matching. We generate shape signatures for proteins that are invariant, localized, robust, compact, and biologically meaningful. The invariancy of the shape signatures allows us to improve similarity searching efficiency by adopting a hierarchical coarse-to-fine strategy. We index the shape signatures using an efficient hashing-based technique. With the help of this technique we screen out unlikely candidates and perform detailed pairwise alignments only for a small number of candidates that survive the screening process. Contrary to other hashing based techniques, our technique employs domain specific information (not just geometric information) in constructing the hash key, and hence, is more tuned to the domain of biology. Furthermore, the invariancy, localization, and compactness of the shape signatures allow us to utilize a well-known local sequence alignment algorithm for aligning two protein structures. One measure of the efficacy of the proposed technique is that we were able to perform structure alignment queries 36 times faster (on the average) than a well-known method while keeping the quality of the query results at an approximately similar level.  相似文献   

17.
Protein folding taking shape: Workshop on molecular chaperones   总被引:1,自引:0,他引:1  
  相似文献   

18.
Modern genetic and immunological techniques have become important tools for assessing protistan species diversity for both the identification and quantification of specific taxa in natural microbial communities. Although these methods are still gaining use among ecologists, the new approaches have already had a significant impact on our understanding of protistan diversity and biogeography. For example, genetic studies of environmental samples have uncovered many protistan phylotypes that do not match the DNA sequences of any cultured organisms, and whose morphological identities are unknown at the present time. Additionally, rapid and sensitive methods for detecting and enumerating taxa of special importance (e.g. bloom-forming algae, parasitic protists) have enabled much more detailed distributional and experimental studies than have been possible using traditional methods. Nevertheless, while the application of molecular approaches has advanced some aspects of aquatic protistan ecology, significant issues still thwart the widespread adoption of these approaches. These issues include the highly technical nature of some of the molecular methods, the reconciliation of morphology-based and sequence-based species identifications, and the species concept itself.  相似文献   

19.
The size and shape of macromolecules such as proteins and nucleic acids play an important role in their functions. Prior efforts to quantify these properties have been based on various discretization or tessellation procedures involving analytical or numerical computations. In this article, we present an analytically exact method for computing the metric properties of macromolecules based on the alpha shape theory. This method uses the duality between alpha complex and the weighted Voronoi decomposition of a molecule. We describe the intuitive ideas and concepts behind the alpha shape theory and the algorithm for computing areas and volumes of macromolecules. We apply our method to compute areas and volumes of a number of protein systems. We also discuss several difficulties commonly encountered in molecular shape computations and outline methods to overcome these problems. Proteins 33:1–17, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

20.
Prostate cancer (PCa) has a variable biological potential. It constitutes the second most common cancer amongst men worldwide and the fifth most common cancer in Saudi Arabia. Identifying men at higher risk of developing PCa, differentiating indolent from aggressive disease and predicting the likelihood of progression will improve decision-making and selection for active surveillance protocols. Biomarkers have been utilized for PCa screening and predicting cancer behavior and response to treatment. The prostate specific antigen (PSA) screening helps detect PCa in early stages, while implementing a plan for management and outcome. However, PSA screening is still controversial, due to the risks of over diagnosis and treatment, and its inability to detect a good proportion of advanced tumors. Alternatively, a new era of PCa biomarkers has emerged with higher PCa specificity than PSA and its isoforms hopefully improving screening methods, such as Prostate Health Index (PHI) score, Progensa Prostate Cancer Antigen 3 (PCA3), Mi-Prostate Score (MiPS), Prostate Stem Cell Antigen (PSCA), 4Kscore test, and Urokinase Plasminogen Activation (uPA and uPAR). Few novel biomarkers have shown promise in preliminary results. This review will display promising biomarkers including some important FDA approved ones, highlighting their clinical implication and future place in the PCa puzzle, along with addressing their current limitations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号