首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Comparison of primate genomic sequences has demonstrated that the intra-and interspecific genetic variation is provided by retroelements (REs). The human genome contains many thousands of polymorphic RE copies, which are regarded as a promising source of new generation molecular genetic markers. However, the absence of systematized data on the RE number, distribution, genomic context, and abundance in various human populations limits the use of RE insertion polymorphism. We designed the first bilingual (Russian/English) web resource on the known polymorphic REs discovered both by our team and other researchers. The database contains the information about the genomic location of each RE, its position relative to known and predicted genes, abundance in human populations, and other data. Our web portal () allows a search of the database with user-specified parameters. The database makes it possible to most comprehensively analyze the RE distribution in the human genome and to design molecular genetic markers for studies of human genome diversity and biomedical applications.  相似文献   

2.
MOTIVATION: Many bioinformatic approaches exist for finding novel genes within genomic sequence data. Traditionally, homology search-based methods are often the first approach employed in determining whether a novel gene exists that is similar to a known gene. Unfortunately, distantly related genes or motifs often are difficult to find using single query-based homology search algorithms against large sequence datasets such as the human genome. Therefore, the motivation behind this work was to develop an approach to enhance the sensitivity of traditional single query-based homology algorithms against genomic data without losing search selectivity. RESULTS: We demonstrate that by searching against a genome fragmented into all possible reading frames, the sensitivity of homology-based searches is enhanced without degrading its selectivity. Using the ETS-domain, bromodomain and acetyl-CoA acetyltransferase gene as queries, we were able to demonstrate that direct protein-protein searches using BLAST2P or FASTA3 against a human genome segmented among all possible reading frames and translated was substantially more sensitive than traditional protein-DNA searches against a raw genomic sequence using an application such as TBLAST2N. Receiver operating characteristic analysis was employed to demonstrate that the algorithms remained selective, while comparisons of the algorithms showed that the protein-protein searches were more sensitive in identifying hits. Therefore, through the overprediction of reading frames by this method and the increased sensitivity of protein-protein based homology search algorithms, a genome can be deeply mined, potentially finding hits overlooked by protein-DNA searches against raw genomic data.  相似文献   

3.
PANZEA is the first public database for studying maize genomic diversity. It was initiated as a repository of genomic diversity for an NSF Plant Genome project on 'Maize Evolutionary Genomics'. PANZEA is hosted at the Bioinformatics Research Center, North Carolina State University, and is open to the public (http://statgen.ncsu.edu/panzea). PANZEA is designed to capture the interrelationships between germplasm, molecular diversity, phenotypic diversity and genome structure. It has the ability to store, integrate and visualize DNA sequence, enzymatic, SSR (simple sequence repeat) marker, germplasm and phenotypic data. The relational data model is selected and implemented in Oracle. An automated DNA sequence data submission tool has been created that allows project researchers to remotely submit their DNA sequence data directly to PANZEA. On-line database search forms and reports have been created to allow users to search or download germplasm, DNA sequence, gene/locus data and much more, directly from the web.  相似文献   

4.
Rat Genome Database (RGD): mapping disease onto the genome   总被引:5,自引:0,他引:5       下载免费PDF全文
The Rat Genome Database (RGD, http://rgd.mcw.edu) is an NIH-funded project whose stated mission is ‘to collect, consolidate and integrate data generated from ongoing rat genetic and genomic research efforts and make these data widely available to the scientific community’. In a collaboration between the Bioinformatics Research Center at the Medical College of Wisconsin, the Jackson Laboratory and the National Center for Biotechnology Information, RGD has been created to meet these stated aims. The rat is uniquely suited to its role as a model of human disease and the primary focus of RGD is to aid researchers in their study of the rat and in applying their results to studies in a wider context. In support of this we have integrated a large amount of rat genetic and genomic resources in RGD and these are constantly being expanded through ongoing literature and bulk dataset curation. RGD version 2.0, released in June 2001, includes curated data on rat genes, quantitative trait loci (QTL), microsatellite markers and rat strains used in genetic and genomic research. VCMap, a dynamic sequence-based homology tool was introduced, and allows researchers of rat, mouse and human to view mapped genes and sequences and their locations in the other two organisms, an essential tool for comparative genomics. In addition, RGD provides tools for gene prediction, radiation hybrid mapping, polymorphic marker selection and more. Future developments will include the introduction of disease-based curation expanding the curated information to cover popular disease systems studied in the rat. This will be integrated with the emerging rat genomic sequence and annotation pipelines to provide a high-quality disease-centric resource, applicable to human and mouse via comparative tools such as VCMap. RGD has a defined community outreach focus with a Visiting Scientist program and the Rat Community Forum, a web-based forum for rat researchers and others interested in using the rat as an experimental model. Thus, RGD is not only a valuable resource for those working with the rat but also for researchers in other model organisms wishing to harness the existing genetic and physiological data available in the rat to complement their own work.  相似文献   

5.
HOWDY: an integrated database system for human genome research   总被引:1,自引:0,他引:1  
HOWDY is an integrated database system for accessing and analyzing human genomic information (http://www-alis.tokyo.jst.go.jp/HOWDY/). HOWDY stores information about relationships between genetic objects and the data extracted from a number of databases. HOWDY consists of an Internet accessible user interface that allows thorough searching of the human genomic databases using the gene symbols and their aliases. It also permits flexible editing of the sequence data. The database can be searched using simple words and the search can be restricted to a specific cytogenetic location. Linear maps displaying markers and genes on contig sequences are available, from which an object can be chosen. Any search starting point identifies all the information matching the query. HOWDY provides a convenient search environment of human genomic data for scientists unsure which database is most appropriate for their search.  相似文献   

6.
SUMMARY: The amount of genomic sequence data being generated and made available through public databases continues to increase at an ever-expanding rate. Downloading, copying, sharing and manipulating these large datasets are becoming difficult and time consuming for researchers. We need to consider using advanced compression techniques as part of a standard data format for genomic data. The inherent structure of genome data allows for more efficient lossless compression than can be obtained through the use of generic compression programs. We apply a series of techniques to James Watson's genome that in combination reduce it to a mere 4MB, small enough to be sent as an email attachment.  相似文献   

7.
8.
Comparison of primate genomes sequences has confirmed the evidence that substantial part of intra- and interspecies differences is provided by retroelements. Human genome contains thousands of polymorphic retroelement copies considered to be perspective molecular genetic markers of new generation. However utilization of polymorphic retroelements as molecular genetic markers is limited due to lack of systematic data on their number, genomic context and distribution among human populations. We have created first bilingual (Russian/English) internet-resource devoted to known polymorphic retroelements discovered in human genome by our group as well as by other researchers worldwide. The database contains information about each retroelement copy location, position relative to known and predicted genes, frequency of alleles in human populations and others. Our internet portal allows to perform a search in database using multiple search conditions and available on http://labcfg.ibch.ru/home.html. The database provides an opportunity to investigate distribution of polymorphic retroelements in human genome and to design new genetic markers for various population and medical studies.  相似文献   

9.
The Homeodomain Resource is an annotated collection of non-redundant protein sequences, three-dimensional structures and genomic information for the homeodomain protein family. Release 2.0 contains 765 full-length homeodomain-containing sequences, 29 experimentally derived structures and 116 homeobox loci implicated in human genetic disorders. Entries are fully hyperlinked to facilitate easy retrieval of the original records from source databases. A simple search engine with a graphical user interface is provided to query the component databases and assemble customized data sets. A new feature for this release is the addition of more automated methods for database searching, maintenance and implementation of efficient data management. The Homeodomain Resource is freely available through the WWW at http://genome.nhgri.nih.gov/homeodomain  相似文献   

10.
Leukemias are exceptionally well studied at the molecular level and a wealth of high-throughput data has been published. But further utilization of these data by researchers is severely hampered by the lack of accessible integrative tools for viewing and analysis. We developed the Leukemia Gene Atlas (LGA) as a public platform designed to support research and analysis of diverse genomic data published in the field of leukemia. With respect to leukemia research, the LGA is a unique resource with comprehensive search and browse functions. It provides extensive analysis and visualization tools for various types of molecular data. Currently, its database contains data from more than 5,800 leukemia and hematopoiesis samples generated by microarray gene expression, DNA methylation, SNP and next generation sequencing analyses. The LGA allows easy retrieval of large published data sets and thus helps to avoid redundant investigations. It is accessible at www.leukemia-gene-atlas.org.  相似文献   

11.
B Markus  I Alshafee  O S Birk 《Heredity》2014,112(2):182-189
The Bedouin Israeli population is highly inbred and structured with a very high prevalence of recessive diseases. Many studies in the past two decades focused on linkage analysis in large, multiple consanguineous pedigrees of this population. The advent of high-throughput technologies motivated researchers to search for rare variants shared between smaller pedigrees, integrating data from clinically similar yet seemingly non-related sporadic cases. However, such analyses are challenging because, without pedigree data, there is no prior knowledge regarding possible relatedness between the sporadic cases. Here, we describe models and techniques for the study of relationships between pedigrees and use them for the inference of tribal co-ancestry, delineating the complex social interactions between different tribes in the Negev Bedouins of southern Israel. Through our analysis, we differentiate between tribes that share many yet small genomic segments because of co-ancestry versus tribes that share larger segments because of recent admixture. The emergent pattern is well correlated with the prevalence of rare mutations in the different tribes. Tribes that do not intermarry, mostly because of social restrictions, hold private mutations, whereas tribes that do intermarry demonstrate a genetic flow of mutations between them. Thus, social structure within an inbred community can be delineated through genomic data, with implications to genetic counseling and genetic mapping.  相似文献   

12.
13.
14.
The ’omics revolution has made a large amount of sequence data available to researchers and the industry. This has had a profound impact in the field of bioinformatics, stimulating unprecedented advancements in this discipline. Mostly, this is usually looked at from the perspective of human ’omics, in particular human genomics. Plant and animal genomics, however, have also been deeply influenced by next‐generation sequencing technologies, with several genomics applications now popular among researchers and the breeding industry. Genomics tends to generate huge amounts of data, and genomic sequence data account for an increasing proportion of big data in biological sciences, due largely to decreasing sequencing and genotyping costs and to large‐scale sequencing and resequencing projects. The analysis of big data poses a challenge to scientists, as data gathering currently takes place at a faster pace than does data processing and analysis, and the associated computational burden is increasingly taxing, making even simple manipulation, visualization and transferring of data a cumbersome operation. The time consumed by the processing and analysing of huge data sets may be at the expense of data quality assessment and critical interpretation. Additionally, when analysing lots of data, something is likely to go awry—the software may crash or stop—and it can be very frustrating to track the error. We herein review the most relevant issues related to tackling these challenges and problems, from the perspective of animal genomics, and provide researchers that lack extensive computing experience with guidelines that will help when processing large genomic data sets.  相似文献   

15.
Many researchers have questioned the ability of biota to adapt to rapid anthropogenic environmental shifts. Here, we synthesize emerging genomic evidence for rapid insect evolution in response to human pressure. These new data reveal diverse genomic mechanisms (single locus, polygenic, structural shifts; introgression) underpinning rapid adaptive responses to a variety of anthropogenic selective pressures. While the effects of some human impacts (e.g. pollution; pesticides) have been previously documented, here we highlight startling new evidence for rapid evolutionary responses to additional anthropogenic processes such as deforestation. These recent findings indicate that diverse insect assemblages can indeed respond dynamically to major anthropogenic evolutionary challenges. Our synthesis also emphasizes the critical roles of genomic architecture, standing variation and gene flow in maintaining future adaptive potential. Broadly, it is clear that genomic approaches are essential for predicting, monitoring and responding to ongoing anthropogenic biodiversity shifts in a fast-changing world.  相似文献   

16.

Background

Suffix arrays and their variants are used widely for representing genomes in search applications. Enhanced suffix arrays (ESAs) provide fast search speed, but require large auxiliary data structures for storing longest common prefix and child interval information. We explore techniques for compressing ESAs to accelerate genomic search and reduce memory requirements.

Results

We evaluate various bitpacking techniques that store integers in fewer than 32 bits each, as well as bytecoding methods that reserve a single byte per integer whenever possible. Our results on the fly, chicken, and human genomes show that bytecoding with an exception guide array is the fastest method for retrieving auxiliary information. Genomic searching can be further accelerated using a data structure called a discriminating character array, which reduces memory accesses to the suffix array and the genome string. Finally, integrating storage of the auxiliary and discriminating character arrays further speeds up genomic search.

Conclusions

The combination of exception guide arrays, a discriminating character array, and integrated data storage provide a 2- to 3-fold increase in speed for genomic searching compared with using bytecoding alone, and is 20 % faster and 40 % more space-efficient than an uncompressed ESA.
  相似文献   

17.
18.
The recognition of immune epitopes is an important molecular mechanism of the vertebrate immune system to discriminate between self and non-self. Increasing amounts of data on immune epitopes are becoming available due to technological advances in epitope-mapping techniques and the availability of genomic information for pathogens. Organizing this data poses a challenge that is similar to the successful effort that was required to organize genomic data, which needed the establishment of centralized databases that complement the primary literature to make the data readily accessible and searchable by researchers. As described in this Innovation article, the Immune Epitope Database and Analysis Resource aims to achieve the same for the more complex and context-dependent information on immune epitopes, and to integrate this data with existing and emerging knowledge resources.  相似文献   

19.
Joly Y  Zeps N  Knoppers BM 《Human genetics》2011,130(3):441-449
Large-scale, public genomic databases have greatly improved the capacity of researchers to do genomic research. In order to ensure that the scientific community uses data from these public resources properly, data access agreements have been developed to complement already existing legal and ethical norms. Sanctions to address cases of data misuse constitute an essential part of this compliance framework meant to protect stakeholders in genomic research. Yet very little research and community debate has been done on this most important topic. This paper presents a review of different sanctions that could be invoked in cases of non-compliance from data users. They have been identified through comprehensive research and analysis of over 450 documents (journal articles, policy, guidelines, access policies, etc.) related to this topic. Given the considerable impact on users of even the milder sanctions considered in our paper, it is essential that stakeholders strive to achieve the highest degree of standardization and transparency when designing controlled-access agreements. It is only fair, after all, that users be able to expect that the border between acceptable and unacceptable conduct is clearly delineated and predictable in controlled-access policies. This suggests the importance for researchers to undertake additional empirical studies on the clarity and accessibility of existing database access agreements and related policies in the near future.  相似文献   

20.
A long‐standing question in biology is how organisms change through time and space in response to their environment. This knowledge is of particular relevance to predicting how organisms might respond to future environmental changes caused by human‐induced global change. Usually researchers make inferences about past events based on an understanding of current static genetic patterns, but these are limited in their capacity to inform on underlying past processes. Natural history collections (NHCs) represent a unique and critical source of information to provide temporally deep and spatially broad time‐series of samples. By using NHC samples, researchers can directly observe genetic changes over time and space and link those changes with specific ecological/evolutionary events. Until recently, such genetic studies were hindered by the intrinsic challenges of NHC samples (i.e. low yield of highly fragmented DNA). However, recent methodological and technological developments have revolutionized the possibilities in the novel field of NHC genomics. In this Special Feature, we compile a range of studies spanning from methodological aspects to particular case studies which demonstrate the enormous potential of NHC samples for accessing large genomic data sets from the past to advance our knowledge on how populations and species respond to global change at multiple spatial–temporal scales. We also highlight possible limitations, recommendations and a few opportunities for future researchers aiming to study NHC genomics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号