首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.

Background  

Regions of protein sequences with biased amino acid composition (so-called Low-Complexity Regions (LCRs)) are abundant in the protein universe. A number of studies have revealed that i) these regions show significant divergence across protein families; ii) the genetic mechanisms from which they arise lends them remarkable degrees of compositional plasticity. They have therefore proved difficult to compare using conventional sequence analysis techniques, and functions remain to be elucidated for most of them. Here we undertake a systematic investigation of LCRs in order to explore their possible functional significance, placed in the particular context of Protein-Protein Interaction (PPI) networks and Gene Ontology (GO)-term analysis.  相似文献   

3.

Background  

Biological sequence repeats arranged in tandem patterns are widespread in DNA and proteins. While many software tools have been designed to detect DNA tandem repeats (TRs), useful algorithms for identifying protein TRs with varied levels of degeneracy are still needed.  相似文献   

4.
5.

Background  

Existing tools for multiple-sequence alignment focus on aligning protein sequence or protein-coding DNA sequence, and are often based on extensions to Needleman-Wunsch-like pairwise alignment methods. We introduce a new tool, Sigma, with a new algorithm and scoring scheme designed specifically for non-coding DNA sequence. This problem acquires importance with the increasing number of published sequences of closely-related species. In particular, studies of gene regulation seek to take advantage of comparative genomics, and recent algorithms for finding regulatory sites in phylogenetically-related intergenic sequence require alignment as a preprocessing step. Much can also be learned about evolution from intergenic DNA, which tends to evolve faster than coding DNA. Sigma uses a strategy of seeking the best possible gapless local alignments (a strategy earlier used by DiAlign), at each step making the best possible alignment consistent with existing alignments, and scores the significance of the alignment based on the lengths of the aligned fragments and a background model which may be supplied or estimated from an auxiliary file of intergenic DNA.  相似文献   

6.
7.
Molecular studies of unstable regions in the human genome have identified region-specific low-copy repeats (LCRs). Unlike highly repetitive sequences (e.g. Alus and LINEs), LCRs are usually of 10-400 kb in size and exhibit > or = 95-97% similarity. According to computer analyses of available sequencing data, LCRs may constitute >5% of the human genome. Through the process of non-allelic homologous recombination using paralogous genomic segments as substrates, LCRs have been shown to facilitate meiotic DNA rearrangements associated with disease traits, referred to as genomic disorders. In addition, this LCR-based complex genome architecture appears to play a major role in both primate karyotype evolution and human tumorigenesis.  相似文献   

8.
Genome architecture, rearrangements and genomic disorders   总被引:35,自引:0,他引:35  
An increasing number of human diseases are recognized to result from recurrent DNA rearrangements involving unstable genomic regions. These are termed genomic disorders, in which the clinical phenotype is a consequence of abnormal dosage of gene(s) located within the rearranged genomic fragments. Both inter- and intrachromosomal rearrangements are facilitated by the presence of region-specific low-copy repeats (LCRs) and result from nonallelic homologous recombination (NAHR) between paralogous genomic segments. LCRs usually span approximately 10-400 kb of genomic DNA, share >or= 97% sequence identity, and provide the substrates for homologous recombination, thus predisposing the region to rearrangements. Moreover, it has been suggested that higher order genomic architecture involving LCRs plays a significant role in karyotypic evolution accompanying primate speciation.  相似文献   

9.
10.

Background  

Deluged by the rate and complexity of completed genomic sequences, the need to align longer sequences becomes more urgent, and many more tools have thus been developed. In the initial stage of genomic sequence analysis, a biologist is usually faced with the questions of how to choose the best tool to align sequences of interest and how to analyze and visualize the alignment results, and then with the question of whether poorly aligned regions produced by the tool are indeed not homologous or are just results due to inappropriate alignment tools or scoring systems used. Although several systematic evaluations of multiple sequence alignment (MSA) programs have been proposed, they may not provide a standard-bearer for most biologists because those poorly aligned regions in these evaluations are never discussed. Thus, a tool that allows cross comparison of the alignment results obtained by different tools simultaneously could help a biologist evaluate their correctness and accuracy.  相似文献   

11.
High-throughput sequence alignment using Graphics Processing Units   总被引:1,自引:0,他引:1  

Background  

The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies.  相似文献   

12.

Background  

Large-scale sequence comparison is a powerful tool for biological inference in modern molecular biology. Comparing new sequences to those in annotated databases is a useful source of functional and structural information about these sequences. Using software such as the basic local alignment search tool (BLAST) or HMMPFAM to identify statistically significant matches between newly sequenced segments of genetic material and those in databases is an important task for most molecular biologists. Searching algorithms are intrinsically slow and data-intensive, especially in light of the rapid growth of biological sequence databases due to the emergence of high throughput DNA sequencing techniques. Thus, traditional bioinformatics tools are impractical on PCs and even on dedicated UNIX servers. To take advantage of larger databases and more reliable methods, high performance computation becomes necessary.  相似文献   

13.

Background  

Improvements in DNA sequencing technology and methodology have led to the rapid expansion of databases comprising DNA sequence, gene and genome data. Lower operational costs and heightened interest resulting from initial intriguing novel discoveries from genomics are also contributing to the accumulation of these data sets. A major challenge is to analyze and to mine data from these databases, especially whole genomes. There is a need for computational tools that look globally at genomes for data mining.  相似文献   

14.
Spontaneous, submembrane local Ca2+ releases (LCRs) generated by the sarcoplasmic reticulum in sinoatrial nodal cells, the cells of the primary cardiac pacemaker, activate inward Na+/Ca2+-exchange current to accelerate the diastolic depolarization rate, and therefore to impact on cycle length. Since LCRs are generated by Ca2+ release channel (i.e. ryanodine receptor) openings, they exhibit a degree of stochastic behavior, manifested as notable cycle-to-cycle variations in the time of their occurrence.

Aim

The present study tested whether variation in LCR periodicity contributes to intrinsic (beat-to-beat) cycle length variability in single sinoatrial nodal cells.

Methods

We imaged single rabbit sinoatrial nodal cells using a 2D-camera to capture LCRs over the entire cell, and, in selected cells, simultaneously measured action potentials by perforated patch clamp.

Results

LCRs begin to occur on the descending part of the action potential-induced whole-cell Ca2+ transient, at about the time of the maximum diastolic potential. Shortly after the maximum diastolic potential (mean 54±7.7 ms, n = 14), the ensemble of waxing LCR activity converts the decay of the global Ca2+ transient into a rise, resulting in a late, whole-cell diastolic Ca2+ elevation, accompanied by a notable acceleration in diastolic depolarization rate. On average, cells (n = 9) generate 13.2±3.7 LCRs per cycle (mean±SEM), varying in size (7.1±4.2 µm) and duration (44.2±27.1 ms), with both size and duration being greater for later-occurring LCRs. While the timing of each LCR occurrence also varies, the LCR period (i.e. the time from the preceding Ca2+ transient peak to an LCR’s subsequent occurrence) averaged for all LCRs in a given cycle closely predicts the time of occurrence of the next action potential, i.e. the cycle length.

Conclusion

Intrinsic cycle length variability in single sinoatrial nodal cells is linked to beat-to-beat variations in the average period of individual LCRs each cycle.  相似文献   

15.

Background  

SARS coronavirus (SARS-CoV) was identified as the etiological agent of SARS, and extensive investigations indicated that it originated from an animal source (probably bats) and was recently introduced into the human population via wildlife animals from wet markets in southern China. Previous studies revealed that the spike (S) protein of SARS had experienced adaptive evolution, but whether other functional proteins of SARS have undergone adaptive evolution is not known.  相似文献   

16.

Background  

The multitude of motif detection algorithms developed to date have largely focused on the detection of patterns in primary sequence. Since sequence-dependent DNA structure and flexibility may also play a role in protein-DNA interactions, the simultaneous exploration of sequence- and structure-based hypotheses about the composition of binding sites and the ordering of features in a regulatory region should be considered as well. The consideration of structural features requires the development of new detection tools that can deal with data types other than primary sequence.  相似文献   

17.

Background  

In order to take full advantage of the newly available public human genome sequence data and associated annotations, biologists require visualization tools ("genome browsers") that can accommodate the high frequency of alternative splicing in human genes and other complexities.  相似文献   

18.

Background  

New "next generation" DNA sequencing technologies offer individual researchers the ability to rapidly generate large amounts of genome sequence data at dramatically reduced costs. As a result, a need has arisen for new software tools for storage, management and analysis of genome sequence data. Although bioinformatic tools are available for the analysis and management of genome sequences, limitations still remain. For example, restrictions on the submission of data and use of these tools may be imposed, thereby making them unsuitable for sequencing projects that need to remain in-house or proprietary during their initial stages. Furthermore, the availability and use of next generation sequencing in industrial, governmental and academic environments requires biologist to have access to computational support for the curation and analysis of the data generated; however, this type of support is not always immediately available.  相似文献   

19.
Hidden Markov model speed heuristic and iterative HMM search procedure   总被引:1,自引:0,他引:1  

Background  

Profile hidden Markov models (profile-HMMs) are sensitive tools for remote protein homology detection, but the main scoring algorithms, Viterbi or Forward, require considerable time to search large sequence databases.  相似文献   

20.

Background

Generation of long (>5 Kb) DNA sequencing reads provides an approach for interrogation of complex regions in the human genome. Currently, large-insert whole genome sequencing (WGS) technologies from Pacific Biosciences (PacBio) enable analysis of chromosomal structural variations (SVs), but the cost to achieve the required sequence coverage across the entire human genome is high.

Results

We developed a method (termed PacBio-LITS) that combines oligonucleotide-based DNA target-capture enrichment technologies with PacBio large-insert library preparation to facilitate SV studies at specific chromosomal regions. PacBio-LITS provides deep sequence coverage at the specified sites at substantially reduced cost compared with PacBio WGS. The efficacy of PacBio-LITS is illustrated by delineating the breakpoint junctions of low copy repeat (LCR)-associated complex structural rearrangements on chr17p11.2 in patients diagnosed with Potocki–Lupski syndrome (PTLS; MIM#610883). We successfully identified previously determined breakpoint junctions in three PTLS cases, and also were able to discover novel junctions in repetitive sequences, including LCR-mediated breakpoints. The new information has enabled us to propose mechanisms for formation of these structural variants.

Conclusions

The new method leverages the cost efficiency of targeted capture-sequencing as well as the mappability and scaffolding capabilities of long sequencing reads generated by the PacBio platform. It is therefore suitable for studying complex SVs, especially those involving LCRs, inversions, and the generation of chimeric Alu elements at the breakpoints. Other genomic research applications, such as haplotype phasing and small insertion and deletion validation could also benefit from this technology.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1370-2) contains supplementary material, which is available to authorized users.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号