Username:    Password:    
Home-What is XomicX-How it Works-About Us-Register-Contact
  Looking to Hire?   Looking for Work?   Information  


Latest Articles

Online Resources

News items on 'Bioinformatics'

Select a source:
Download this feed as:
- a list(text or HTML) or
- a summary(text or HTML)

28 January 2022De Novo Discovery of Structured ncRNA Motifs in Genomic Sequences

De novo discovery of “motifs” capturing the commonalities among related noncoding ncRNA structured RNAs is among the most difficult problems in computational biology. This chapter outlines the challenges presented by this problem, together with some approaches towards solving them, with an emphasis on an approach based on the CMfinder CMfinder program as a case study. Applications to genomic screens for novel de novo structured ncRNA ncRNA s, including structured RNA elements in untranslated portions of protein-coding genes, are presented. (Source: Springer protocols feed by Bioinformatics)

28 January 2022Systems Medicine: Sketching the Landscape

To understand the meaning of the term Systems Medicine and to distinguish it from seemingly related other expressions currently in use, such as precision, personalized, -omics, or big data medicine, its underlying history and development into present time needs to be highlighted. Having this development in mind, it becomes evident that Systems Medicine is a genuine concept as well as a novel way of tackling the manifold complexity that occurs in nowadays clinical medicine—and not just a rebranding of what has previously been done in the past. So looking back it seems clear to many in the field that Systems Medicine has its origin in an integrative method to unravel biocomplexity, namely, Systems Biology. Here scientist by now gained useful experience that is on the verge toward imple...

28 January 2022Epistasis, Complexity, and Multifactor Dimensionality Reduction

Genome-wide association studies (GWASs) and other high-throughput initiatives have led to an information explosion in human genetics and genetic epidemiology. Conversion of this wealth of new information about genomic variation to knowledge about public health and human biology will depend critically on the complexity of the genotype to phenotype mapping relationship. We review here computational approaches to genetic analysis that embrace, rather than ignore, the complexity of human health. We focus on multifactor dimensionality reduction (MDR) as an approach for modeling one of these complexities: epistasis or gene–gene interaction. (Source: Springer protocols feed by Bioinformatics)

MedWorm Message:

28 January 2022Taking Bioinformatics to Systems Medicine

Systems medicine promotes a range of approaches and strategies to study human health and disease at a systems level with the aim of improving the overall well-being of (healthy) individuals, and preventing, diagnosing, or curing disease. In this chapter we discuss how bioinformatics critically contributes to systems medicine. First, we explain the role of bioinformatics in the management and analysis of data. In particular we show the importance of publicly available biological and clinical repositories to support systems medicine studies. Second, we discuss how the integration and analysis of multiple types of omics data through integrative bioinformatics may facilitate the determination of more predictive and robust disease signatures, lead to a better understanding of (patho)physiologic...

28 January 2022The Principles of RNA Structure Architecture

Being informational, enzymatic, as well as a nanoscale molecular machine, ribonucleic acid (RNA) permeates all areas of biology and has been exploited in biotechnology as drug and sensor. Here we describe the composition and fundamental properties of RNA and how the single-stranded RNA chains fold and shape certain motifs that are repeatedly observed in different structures. Small and large molecular mass RNA binders are being touched upon, as is the technology for selecting RNA molecules in vitro that bind almost any kind of natural or artificial target. Recognizing the versatility of RNA is expected to foster the development of tools which monitor RNA in the environment, including plants, animals, and patients. Many of the noncoding RNAs are yet to be identified in the rapidly emerging g...

28 January 2022Systems Medicine: The Future of Medical Genomics, Healthcare, and Wellness

Recent advances in genomics have led to the rapid and relatively inexpensive collection of patient molecular data including multiple types of omics data. The integration of these data with clinical measurements has the potential to impact on our understanding of the molecular basis of disease and on disease management. Systems medicine is an approach to understanding disease through an integration of large patient datasets. It offers the possibility for personalized strategies for healthcare through the development of a new taxonomy of disease. Advanced computing will be an important component in effectively implementing systems medicine. In this chapter we describe three computational challenges associated with systems medicine: disease subtype discovery using integrated datasets, obtaini...

28 January 2022Text Mining for Drug–Drug Interaction

In order to understand the mechanisms of drug–drug interaction (DDI), the study of pharmacokinetics (PK), pharmacodynamics (PD), and pharmacogenetics (PG) data are significant. In recent years, drug PK parameters, drug interaction parameters, and PG data have been unevenly collected in different databases and published extensively in literature. Also the lack of an appropriate PK ontology and a well-annotated PK corpus, which provide the background knowledge and the criteria of determining DDI, respectively, lead to the difficulty of developing DDI text mining tools for PK data collection from the literature and data integration from multiple databases. (Source: Springer protocols feed by Bioinformatics)

MedWorm Message:

28 January 2022Next-Generation Pathology

The field of pathology is rapidly transforming from a semiquantitative and empirical science toward a big data discipline. Large data sets from across multiple omics fields may now be extracted from a patient’s tissue sample. Tissue is, however, complex, heterogeneous, and prone to artifact. A reductionist view of tissue and disease progression, which does not take this complexity into account, may lead to single biomarkers failing in clinical trials. The integration of standardized multi-omics big data and the retention of valuable information on spatial heterogeneity are imperative to model complex disease mechanisms. Mathematical modeling through systems pathology approaches is the ideal medium to distill the significant information from these large, multi-parametric, and hierarch...

28 January 2022MicroRNA Target Finding by Comparative Genomics

MicroRNAs (miRNAs) have been implicated in virtually every metazoan biological process, exerting a widespread impact on gene expression. MicroRNA repression is conferred by relatively short “seed match” sequences, although the degree of repression varies widely for individual target sites. The factors controlling whether, and to what extent, a target site is repressed are not fully understood. As an alternative to target prediction based on sequence alone, comparative genomics has emerged as an invaluable tool for identifying miRNA targets that are conserved by natural selection, and hence likely effective and important. Here we present a general method for quantifying conservation of miRNA seed match sites, separating it from background conservation, controlling for various bi...

28 January 2022Systems Medicine in Pharmaceutical Research and Development

The development of new drug therapies requires substantial and ever increasing investments from the pharmaceutical company. Ten years ago, the average time from early target identification and optimization until initial market authorization of a new drug compound took more than 10 years and involved costs in the order of one billion US dollars. Recent studies indicate even a significant growth of costs in the meanwhile, mainly driven by the increasing complexity of diseases addressed by pharmaceutical research. (Source: Springer protocols feed by Bioinformatics)

MedWorm Message:

28 January 2022Use of Ancestral Haplotypes in Genome-Wide Association Studies

We herein present a haplotype-based method to perform genome-wide association studies. The method relies on hidden Markov models to describe haplotypes from a population as a mosaic of a set of ancestral haplotypes. For a given position in the genome, haplotypes deriving from the same ancestral haplotype are also likely to carry the same risk alleles. Therefore, the model can be used in several applications such as haplotype reconstruction, imputation, association studies or genomic predictions. We illustrate then the model with two applications: the fine-mapping of a QTL affecting live weight in cattle and association studies in a stratified cattle population. Both applications show the potential of the method and the high linkage disequilibrium between ancestral haplotypes and causative ...

28 January 2022Systems Medicine in Oncology: Signaling Network Modeling and New-Generation Decision-Support Systems

Two different perspectives are the main focus of this book chapter: (1) A perspective that looks to the future, with the goal of devising rational associations of targeted inhibitors against distinct altered signaling-network pathways. This goal implies a sufficiently in-depth molecular diagnosis of the personal cancer of a given patient. A sufficiently robust and extended dynamic modeling will suggest rational combinations of the abovementioned oncoprotein inhibitors. The work toward new selective drugs, in the field of medicinal chemistry, is very intensive. Rational associations of selective drug inhibitors will become progressively a more realistic goal within the next 3–5 years. Toward the possibility of an implementation in standard oncologic structures of technologically suffi...

28 January 2022Genomic Selection in Animal Breeding Programs

Genomic selection can have a major impact on animal breeding programs, especially where traits that are important in the breeding objective are hard to select for otherwise. Genomic selection provides more accurate estimates for breeding value earlier in the life of breeding animals, giving more selection accuracy and allowing lower generation intervals. From sheep to dairy cattle, the rates of genetic improvement could increase from 20 to 100 % and hard-to-measure traits can be improved more effectively. (Source: Springer protocols feed by Bioinformatics)

28 January 2022Neurological Diseases from a Systems Medicine Point of View

The difficulty to understand, diagnose, and treat neurological disorders stems from the great complexity of the central nervous system on different levels of physiological granularity. The individual components, their interactions, and dynamics involved in brain development and function can be represented as molecular, cellular, or functional networks, where diseases are perturbations of networks. These networks can become a useful research tool in investigating neurological disorders if they are properly tailored to reflect corresponding mechanisms. Here, we review approaches to construct networks specific for neurological disorders describing disease-related pathology on different scales: the molecular, cellular, and brain level. We also briefly discuss cross-scale network analysis as a ...

28 January 2022Computational Prediction of RNA–RNA Interactions

We describe different tools and approaches for RNA–RNA interaction prediction. Recognition of ncRNA targets is predominantly governed by two principles, namely the stability of the duplex between the two interacting RNAs and the internal structure of both mRNA and ncRNA. Thus, approaches can be distinguished into different major categories depending on how they consider inter- and intramolecular structure. The first class completely neglects the internal structure and measures only the stability of the duplex. The second class of approaches abstracts from specific intramolecular structures and uses an ensemble-based approach to calculate the effect of internal structure on a putative binding site, thus measuring the accessibility of the binding sites. (Source: Springer protocols feed...

28 January 2022Systems Medicine and Infection

By using a systems-based approach, mathematical and computational techniques can be used to develop models that describe the important mechanisms involved in infectious diseases. An iterative approach to model development allows new discoveries to continually improve the model and ultimately increase the accuracy of predictions. (Source: Springer protocols feed by Bioinformatics)

28 January 2022Abstract Shape Analysis of RNA

Abstract shape analysis abstract shape analysis is a method to learn more about the complete Boltzmann ensemble of the secondary structures of a single RNA molecule. Abstract shapes classify competing secondary structures into classes that are defined by their arrangement of helices. It allows us to compute, in addition to the structure of minimal free energy, a set of structures that represents relevant and interesting structural alternatives. Furthermore, it allows to compute probabilities of all structures within a shape class. This allows to ensure that our representative subset covers the complete Boltzmann ensemble, except for a portion of negligible probability. This chapter explains the main functions of abstract shape analysis, as implemented in the tool RNA shape...

28 January 2022Systems Medicine for Lung Diseases: Phenotypes and Precision Medicine in Cancer, Infection, and Allergy

Lung diseases cause an enormous socioeconomic burden. Four of them are among the ten most important causes of deaths worldwide: Pneumonia has the highest death toll of all infectious diseases, lung cancer kills the most people of all malignant proliferative disorders, chronic obstructive pulmonary disease (COPD) ranks third in mortality among the chronic noncommunicable diseases, and tuberculosis is still one of the most important chronic infectious diseases. Despite all efforts, for example, by the World Health Organization and clinical and experimental researchers, these diseases are still highly prevalent and harmful. This is in part due to the specific organization of tissue homeostasis, architecture, and immunity of the lung. Recently, several consortia have formed and aim to bring to...

28 January 2022An Introduction to RNA Databases

We present an introduction to RNA databases. The history and technology behind RNA databases are briefly discussed. We examine differing methods of data collection and curation and discuss their impact on both the scope and accuracy of the resulting databases. Finally, we demonstrate these principles through detailed examination of four leading RNA databases: Noncode, miRBase, Rfam, and SILVA. (Source: Springer protocols feed by Bioinformatics)

MedWorm Message:

28 January 2022Third-Kind Encounters in Biomedicine: Immunology Meets Mathematics and Informatics to Become Quantitative and Predictive

The understanding of the immune response is right now at the center of biomedical research. There are growing expectations that immune-based interventions will in the midterm provide new, personalized, and targeted therapeutic options for many severe and highly prevalent diseases, from aggressive cancers to infectious and autoimmune diseases. To this end, immunology should surpass its current descriptive and phenomenological nature, and become quantitative, and thereby predictive. (Source: Springer protocols feed by Bioinformatics)

MedWorm Message: Have you tried our new medical search engine? More powerful than before. 100% free.

28 January 2022Validation of Genome-Wide Association Studies (GWAS) Results

Validation of the results of genome-wide association studies or genomic selection studies is an essential component of the experimental program. Validation allows users to quantify the benefit of applying gene tests or genomic prediction, relative to the costs of implementing the program. Further, if implemented, an appropriate weight in a selection index can only be derived if estimates of the accuracy of genomic predictions are available. In this chapter the reasons for validation are explored, and a range of commonly encountered scenarios described. General principles are stated, and options for performing validation discussed. Designs for validation are heavily dependent on the availability of phenotyped animals, and also on the pedigree structures that characterize the breeding progra...

28 January 2022Modeling and Simulation Tools: From Systems Biology to Systems Medicine

Modeling is an integral component of modern biology. In this chapter we look into the role of the model, as it pertains to Systems Medicine, and the software that is required to instantiate and run it. We do this by comparing the development, implementation, and characteristics of tools that have been developed to work with two divergent methodologies: Systems Biology and Pharmacometrics. From the Systems Biology perspective we consider the concept of “Software as a Medical Device” and what this may imply for the migration of research-oriented, simulation software into the domain of human health. (Source: Springer protocols feed by Bioinformatics)

28 January 2022Implementing a QTL Detection Study (GWAS) Using Genomic Prediction Methodology

Genomic prediction exploits historical genotypic and phenotypic data to predict performance on selection candidates based only on their genotypes. It achieves this by a process known as training that derives the values of all the chromosome fragments that can be characterized by regressing the historical phenotypes on some or all of the genotyped loci. A genome-wide association study (GWAS) involves a genome-wide search for chromosome fragments with significant association with phenotype. One Bayesian approach to GWAS makes inferences using samples from the posterior distribution of genotypic effects obtained in the training phase of genomic prediction. Here we describe how to do this from commonly used Bayesian methods for genomic prediction, and we comment on how to interpret the results...

28 January 2022Mathematical and Statistical Techniques for Systems Medicine: The Wnt Signaling Pathway as a Case Study

We present methods for the analysis of a single model, comprising applications of standard dynamical systems approaches such as nondimensionalization, steady state, asymptotic and sensitivity analysis, and more recent statistical and algebraic approaches to compare models with data. We present parameter estimation and model comparison techniques, focusing on Bayesian analysis and coplanarity via algebraic geometry. Our intention is that this (non-exhaustive) review may serve as a useful starting point for the analysis of models in systems medicine. (Source: Springer protocols feed by Bioinformatics)

28 January 2022Quality Control for Genome-Wide Association Studies

This chapter overviews the quality control (QC) issues for SNP-based genotyping methods used in genome-wide association studies. The main metrics for evaluating the quality of the genotypes are discussed followed by a worked out example of QC pipeline starting with raw data and finishing with a fully filtered dataset ready for downstream analysis. The emphasis is on automation of data storage, filtering, and manipulation to ensure data integrity throughput the process and on how to extract a global summary from these high dimensional datasets to allow better-informed downstream analytical decisions. All examples will be run using the R statistical programming language followed by a practical example using a fully automated QC pipeline for the Illumina platform. (Source: Springer protocols ...

28 January 2022Anatomy and Physiology of Multiscale Modeling and Simulation in Systems Medicine

Systems medicine is the application of systems biology concepts, methods, and tools to medical research and practice. It aims to integrate data and knowledge from different disciplines into biomedical models and simulations for the understanding, prevention, cure, and management of complex diseases. Complex diseases arise from the interactions among disease-influencing factors across multiple levels of biological organization from the environment to molecules. To tackle the enormous challenges posed by complex diseases, we need a modeling and simulation framework capable of capturing and integrating information originating from multiple spatiotemporal and organizational scales. Multiscale modeling and simulation in systems medicine is an emerging methodology and discipline that has already...

28 January 2022Training in Systems Approaches for the Next Generation of Life Scientists and Medical Doctors

We describe the current challenges and scattered best practices of introducing the wider systems medicine topics into the medical education as well as possibilities for systems medicine training at the doctoral and lifelong levels. (Source: Springer protocols feed by Bioinformatics)

MedWorm Message: Have you tried our new medical search engine? More powerful than before. 100% free.

28 January 2022Network-Assisted Disease Classification and Biomarker Discovery

Developing improved approaches for diagnosis, treatment, and prevention of diseases is a major goal of biomedical research. Therefore, the discovery of biomarker signatures from high-throughput “omics” data is an active research topic in the field of bioinformatics and systems medicine. A major issue is the low reproducibility and the limited biological interpretability of candidate biomarker signatures identified from high-throughput data. This impedes the use of discovered biomarker signatures into clinical applications. Currently, much focus is placed on developing strategies to improve reproducibility and interpretability. Researchers have fruitfully started to incorporate prior knowledge derived from pathways and molecular networks into the process of biomarker identificat...

28 January 2022Accessing Biomedical Literature in the Current Information Landscape

We present this chapter in the light of three consecutive steps of literature access: searching for citations, retrieving full text, and viewing the article. The first section presents the current state of practice of biomedical literature access, including an analysis of the search tools most frequently used by the users, including PubMed, Google Scholar, Web of Science, Scopus, and Embase, and a study on biomedical literature archives such as PubMed Central. The next section describes current research and the state-of-the-art systems motivated by the challenges a user faces during query formulation and interpretation of search results. The research solutions are classified into five key areas related to text and data mining, text similarity search, semantic search, query support, relevan...

28 January 2022Mathematical Models of Pluripotent Stem Cells: At the Dawn of Predictive Regenerative Medicine

Regenerative medicine, ranging from stem cell therapy to organ regeneration, is promising to revolutionize treatments of diseases and aging. These approaches require a perfect understanding of cell reprogramming and differentiation. Predictive modeling of cellular systems has the potential to provide insights about the dynamics of cellular processes, and guide their control. Moreover in many cases, it provides alternative to experimental tests, difficult to perform for practical or ethical reasons. The variety and accuracy of biological processes represented in mathematical models grew in-line with the discovery of underlying molecular mechanisms. High-throughput data generation led to the development of models based on data analysis, as an alternative to more established modeling based on...

MedWorm Message:

28 January 2022The Art of Editing RNA Structural Alignments

Manual editing of RNA structural alignments may be considered more art than science, since it still requires an expert biologist to take multiple levels of information into account and be slightly creative when constructing high-quality alignments. Even though the task is rather tedious, it is rewarded by great insight into the evolution of structure and function of your favorite RNA molecule. In this chapter I will review the methods and considerations that go into constructing RNA structural alignments at the secondary and tertiary structure level; introduce software, databases, and algorithms that have proven useful in semiautomating the work process; and suggest future directions towards full automatization. (Source: Springer protocols feed by Bioinformatics)

28 January 2022RNA Systems Biology for Cancer: From Diagnosis to Therapy

It is due to the advances in high-throughput omics data generation that RNA species have re-entered the focus of biomedical research. International collaborate efforts, like the ENCODE and GENCODE projects, have spawned thousands of previously unknown functional non-coding RNAs (ncRNAs) with various but primarily regulatory roles. Many of these are linked to the emergence and progression of human diseases. In particular, interdisciplinary studies integrating bioinformatics, systems biology, and biotechnological approaches have successfully characterized the role of ncRNAs in different human cancers. These efforts led to the identification of a new tool-kit for cancer diagnosis, monitoring, and treatment, which is now starting to enter and impact on clinical practice. This chapter is to ela...

28 January 2022RNA Structural Alignments, Part I: Sankoff-Based Approaches for Structural Alignments

Simultaneous alignment and secondary structure prediction of RNA sequences is often referred to as “RNA structural alignment.” A class of the methods for structural alignment is based on the principles proposed by Sankoff more than 25 years ago. The Sankoff algorithm simultaneously folds and aligns two or more sequences. The advantage of this algorithm over those that separate the folding and alignment steps is that it makes better predictions. The disadvantage is that it is slower and requires more computer memory to run. The amount of computational resources needed to run the Sankoff algorithm is so high that it took more than a decade before the first implementation of a Sankoff style algorithm was published. However, with the faster computers available today and the improve...

MedWorm Message:

28 January 2022From Systems Understanding to Personalized Medicine: Lessons and Recommendations Based on a Multidisciplinary and Translational Analysis of COPD

In conclusion, in our hands the scope and efforts of systems medicine need to concurrently consider these aspects of clinical implementation, which inherently drives the selection of the most relevant and urgent issues and methods that need further development in a systems analysis of disease. (Source: Springer protocols feed by Bioinformatics)

28 January 2022Concepts and Introduction to RNA Bioinformatics

RNA bioinformatics and computational RNA biology have emerged from implementing methods for predicting the secondary structure of single sequences. The field has evolved to exploit multiple sequences to take evolutionary information into account, such as compensating (and structure preserving) base changes. These methods have been developed further and applied for computational screens of genomic sequence. Furthermore, a number of additional directions have emerged. These include methods to search for RNA 3D structure, RNA–RNA interactions, and design of interfering RNAs (RNAi) as well as methods for interactions between RNA and proteins. (Source: Springer protocols feed by Bioinformatics)

28 January 2022Computational Modeling of Human Metabolism and Its Application to Systems Biomedicine

Modern high-throughput techniques offer immense opportunities to investigate whole-systems behavior, such as those underlying human diseases. However, the complexity of the data presents challenges in interpretation, and new avenues are needed to address the complexity of both diseases and data. Constraint-based modeling is one formalism applied in systems biology. It relies on a genome-scale reconstruction that captures extensive biochemical knowledge regarding an organism. The human genome-scale metabolic reconstruction is increasingly used to understand normal cellular and disease states because metabolism is an important factor in many human diseases. The application of human genome-scale reconstruction ranges from mere querying of the model as a knowledge base to studies that take adv...

28 January 2022SCFGs in RNA Secondary Structure Prediction RNA secondary structure prediction : A Hands-on Approach

Stochastic context-free grammars (SCFGs) were first established in the context of natural language modelling, and only later found their applications in RNA secondary structure prediction. In this chapter, we discuss the basic SCFG algorithms (CYK and inside–outside algorithms) in an application-centered manner and use the pfold grammar as a case study to show how the algorithms can be adapted to a grammar in a nonstandard form. We extend our discussion to the use of grammars with additional information (such as evolutionary information) to improve the quality of predictions. Finally, we provide a brief survey of programs that use stochastic context-free grammars for RNA secondary structure prediction and modelling. (Source: Springer protocols feed by Bioinformatics)

28 January 2022Free Energy Minimization to Predict RNA Secondary Structures and Computational RNA Design

Determining the RNA secondary structure from sequence data by computational predictions is a long-standing problem. Its solution has been approached in two distinctive ways. If a multiple sequence alignment of a collection of homologous sequences is available, the comparative method uses phylogeny to determine conserved base pairs that are more likely to form as a result of billions of years of evolution than by chance. In the case of single sequences, recursive algorithms that compute free energy structures by using empirically derived energy parameters have been developed. This latter approach of RNA folding prediction by energy minimization is widely used to predict RNA secondary structure from sequence. For a significant number of RNA molecules, the secondary structure of the RNA molec...

28 January 2022Energy-Directed RNA Structure Prediction

In this chapter we present the classic dynamic programming algorithms for RNA structure prediction by energy minimization, as well as variations of this approach that allow to compute suboptimal foldings, or even the partition function over all possible secondary structures. The latter are essential in order to deal with the inaccuracy of minimum free energy (MFE) structure prediction, and can be used, for example, to derive reliability measures that assign a confidence value to all or part of a predicted structure. In addition, we discuss recently proposed alternatives to the MFE criterion such as the use of maximum expected accuracy (MEA) or centroid structures. The dynamic programming algorithms implicitly assume that the RNA molecule is in thermodynamic equilibrium. However, especially...

28 January 2022Using Deep Sequencing Data for Identification of Editing Sites in Mature miRNAs

Deep sequencing has many possible applications; one of them is the identification and quantification of RNA editing sites. The most common type of RNA editing is adenosine to inosine (A-to-I) editing. A prerequisite for this editing process is a double-stranded RNA (dsRNA) structure. Such dsRNAs are formed as part of the microRNA (miRNA) maturation process, and it is therefore expected that miRNAs are affected by A-to-I editing. Indeed, tens of editing sites were found in miRNAs, some of which change the miRNA binding specificity. Here, we describe a protocol for the identification of RNA editing sites in mature miRNAs using deep sequencing data. (Source: Springer protocols feed by Bioinformatics)

MedWorm Message: Have you tried our new medical search engine? More powerful than before. 100% free.

28 January 2022Bioinformatics of siRNA Design

RNA interference mediated by small interfering RNAs is a powerful tool for investigation of gene functions and is increasingly used as a therapeutic agent. However, not all siRNAs are equally potent, and although simple rules for the selection of good siRNAs were proposed early on, siRNAs are still plagued with widely fluctuating efficiency. Recently, new design tools incorporating both the structural features of the targeted RNAs and the sequence features of the siRNAs substantially improved the efficacy of siRNAs. In this chapter we will present a review of sequence and structure-based algorithms behind them. (Source: Springer protocols feed by Bioinformatics)

28 January 2022Prediction of miRNA Targets

Computational methods for miRNA target prediction are currently undergoing extensive review and evaluation. There is still a great need for improvement of these tools and bioinformatics approaches are looking towards high-throughput experiments in order to validate predictions. The combination of large-scale techniques with computational tools will not only provide greater credence to computational predictions but also lead to the better understanding of specific biological questions. Current miRNA target prediction tools utilize probabilistic learning algorithms, machine learning methods and even empirical biologically defined rules in order to build models based on experimentally verified miRNA targets. Large-scale protein downregulation assays and next-generation sequencing (NGS) are no...

28 January 2022R for Genome-Wide Association Studies

In recent years R has become de facto statistical programming language of choice for statisticians and it is also arguably the most widely used generic environment for analysis of high-throughput genomic data. In this chapter we discuss some approaches to improve performance of R when working with large SNP datasets. (Source: Springer protocols feed by Bioinformatics)

MedWorm Message:

28 January 2022Detection of Post-Transcriptional RNA Editing Events

The advent of deep sequencing technologies has greatly improved the study of complex eukaryotic genomes and transcriptomes, providing the unique opportunity to investigate posttranscriptional molecular mechanisms as alternative splicing and RNA editing at single base-pair resolution. RNA editing by adenosine deamination (A-to-I) is widespread in humans and can lead to a variety of biological effects depending on the RNA type or the RNA region involved in the editing modification. (Source: Springer protocols feed by Bioinformatics)

28 January 2022Genomic Best Linear Unbiased Prediction (gBLUP) for the Estimation of Genomic Breeding Values

Genomic best linear unbiased prediction (gBLUP) is a method that utilizes genomic relationships to estimate the genetic merit of an individual. For this purpose, a genomic relationship matrix is used, estimated from DNA marker information. The matrix defines the covariance between individuals based on observed similarity at the genomic level, rather than on expected similarity based on pedigree, so that more accurate predictions of merit can be made. gBLUP has been used for the prediction of merit in livestock breeding, may also have some applications to the prediction of disease risk, and is also useful in the estimation of variance components and genomic heritabilities. (Source: Springer protocols feed by Bioinformatics)

28 January 2022Transcriptome Assembly and Alternative Splicing Analysis

Alternative Splicing (AS) is the molecular phenomenon whereby multiple transcripts are produced from the same gene locus. As a consequence, it is responsible for the expansion of eukaryotic transcriptomes. Aberrant AS is involved in the onset and progression of several human diseases. Therefore, the characterization of exon–intron structure of a gene and the detection of corresponding transcript isoforms is an extremely relevant biological task. Nonetheless, the computational prediction of AS events and the repertoire of alternative transcripts is yet a challenging issue. (Source: Springer protocols feed by Bioinformatics)

28 January 2022Association Weight Matrix: A Network-Based Approach Towards Functional Genome-Wide Association Studies

In this chapter we describe the Association Weight Matrix (AWM), a novel procedure to exploit the results from genome-wide association studies (GWAS) and, in combination with network inference algorithms, generate gene networks with regulatory and functional significance. In simple terms, the AWM is a matrix with rows represented by genes and columns represented by phenotypes. Individual {i, j}th elements in the AWM correspond to the association of the SNP in the ith gene to the jth phenotype. While our main objective is to provide a recipe-like tutorial on how to build and use AWM, we also take the opportunity to briefly reason the logic behind each step in the process. To conclude, we discuss the impact on AWM of issues like the number of phenotypes under scrutiny, the density of the SNP...

28 January 2022Quantifying Entire Transcriptomes by Aligned RNA-Seq Data

Massive Parallel Sequencing methods (MPS) can extend and improve the knowledge obtained by conventional microarray technology, both for mRNAs and noncoding RNAs. Although RNA quality and library preparation protocols are the main source of variability, the bioinformatics pipelines for RNA-seq data analysis are very complex and the choice of different tools at each stage of the analysis can significantly affect the overall results. In this chapter we describe the pipelines we use to detect miRNA and mRNA differential expression. (Source: Springer protocols feed by Bioinformatics)

28 January 2022Statistical Analysis of Genomic Data

In this chapter we describe methods for statistical analysis of GWAS data with the goal of quantifying evidence for genomic effects associated with trait variation, while avoiding spurious associations due to evidence not being well quantified or due to population structure. (Source: Springer protocols feed by Bioinformatics)

28 January 2022Fast Prediction of RNA–RNA Interaction Using Heuristic Algorithm

We describe the algorithm’s concurrency and parallelism for a multicore chip. The proposed algorithm has been performed on some datasets including CopA-CopT, R1inv-R2inv, Tar-Tar*, DIS-DIS, and IncRNA54-RepZ in Escherichia coli bacteria. The method has high validity and efficiency, and it is run in low computational time in comparison to other approaches. (Source: Springer protocols feed by Bioinformatics)

MedWorm Message:

28 January 2022Designing a GWAS: Power, Sample Size, and Data Structure

In this chapter we describe a novel Bayesian approach to designing GWAS studies with the goal of ensuring robust detection of effects of genomic loci associated with trait variation. (Source: Springer protocols feed by Bioinformatics)

28 January 2022Modeling and Predicting RNA Three-Dimensional Structures

Modeling the three-dimensional structure of RNAs is a milestone toward better understanding and prediction of nucleic acids molecular functions. Physics-based approaches and molecular dynamics simulations are not tractable on large molecules with all-atom models. To address this issue, coarse-grained models of RNA three-dimensional structures have been developed. In this chapter, we describe a graphical modeling based on the Leontis–Westhof extended base-pair classification. This representation of RNA structures enables us to identify highly conserved structural motifs with complex nucleotide interactions in structure databases. Further, we show how to take advantage of this knowledge to quickly and simply predict three-dimensional structures of large RNA molecules. (Source: Springer...

28 January 2022Higher Order Interactions: Detection of Epistasis Using Machine Learning and Evolutionary Computation

Higher order interactions are known to affect many different phenotypic traits. The advent of large-scale genotyping has, however, shown that finding interactions is not a trivial task. Classical genome-wide association studies (GWAS) are a useful starting point for unraveling the genetic architecture of a phenotypic trait. However, to move beyond the additive model we need new analysis tools specifically developed to deal with high-dimensional genotypic data. Here we show that evolutionary algorithms are a useful tool in high-dimensional analyses designed to identify gene–gene interactions in current large-scale genotypic data. (Source: Springer protocols feed by Bioinformatics)

MedWorm Message:

28 January 2022Drawing and Editing the Secondary Structure(s) of RNA

We describe the file formats and structural descriptions accepted by popular RNA visualization tools. We also provide command lines and Python scripts to ease the user’s access to advanced features. Finally, we discuss and illustrate alternative approaches to visualize the secondary structure in the presence of probing data, pseudoknots, RNA–RNA interactions, and comparative data. (Source: Springer protocols feed by Bioinformatics)

28 January 2022Biological Information Extraction and Co-occurrence Analysis

Nowadays, it is possible to identify terms corresponding to biological entities within passages in biomedical text corpora: critically, their potential relationships then need to be detected. These relationships are typically detected by co-occurrence analysis, revealing associations between bioentities through their coexistence in single sentences and/or entire abstracts. These associations implicitly define networks, whose nodes represent terms/bioentities/concepts being connected by relationship edges; edge weights might represent confidence for these semantic connections. (Source: Springer protocols feed by Bioinformatics)

28 January 2022De Novo Secondary Structure Motif Discovery Using RNAProfile

We describe here how conserved secondary structure motifs shared by functionally related RNA sequences can be detected through the software tool RNAProfile. RNAProfile takes as input a set of unaligned RNA sequences expected to share a common motif, and outputs the regions that are most conserved throughout the sequences, according to a similarity measure that takes into account both the sequence of the regions and the secondary structure they can form according to base-pairing and thermodynamic rules. (Source: Springer protocols feed by Bioinformatics)

28 January 2022Mapping of Biomedical Text to Concepts of Lexicons, Terminologies, and Ontologies

Concept mapping is a fundamental task in biomedical text mining in which textual mentions of concepts of interest are annotated with specific entries of lexicons, terminologies, ontologies, or databases representing these concepts. Though there has been a significant amount of research, there are still a limited number of practical, publicly available tools for concept mapping of biomedical text specified by the user as an independent task. In this chapter, several tools that can automatically map biomedical text to concepts from a wide range of terminological resources are presented, followed by those that can map to more restricted sets of these resources. This presentation is intended to serve as a guide to researchers without a background in biomedical concept mapping of text for the s...

28 January 2022RNA Secondary Structure Prediction from Multi-Aligned Sequences

It has been well accepted that the RNA secondary structures of most functional non-coding RNAs (ncRNAs) are closely related to their functions and are conserved during evolution. Hence, prediction of conserved secondary structures from evolutionarily related sequences is one important task in RNA bioinformatics; the methods are useful not only to further functional analyses of ncRNAs but also to improve the accuracy of secondary structure predictions and to find novel functional RNAs from the genome. In this review, I focus on common secondary structure prediction from a given aligned RNA sequence, in which one secondary structure whose length is equal to that of the input alignment is predicted. I systematically review and classify existing tools and algorithms for the problem, by utilizi...

28 January 2022Introduction to Biomedical Literature Text Mining: Context and Objectives

Information: If you are reading this, you know how important it is and almost certainly look to the biomedical literature for a large part of the information you need. We work hard to find more and more biomedical literature, seeking new content from multiple sources. But, can there be too much of a good thing? (Source: Springer protocols feed by Bioinformatics)

28 January 2022A Simple Protocol for the Inference of RNA Global Pairwise Alignments

In conclusion, the proposed workflow for pairwise RNA alignment depends on the input RNA primary sequence identity and the availability of reliable secondary structures. (Source: Springer protocols feed by Bioinformatics)

MedWorm Message: Have you tried our new medical search engine? More powerful than before. 100% free.

28 January 2022Automated Modeling of RNA 3D Structure

This chapter gives an overview over the current methods for automated modeling of RNA structures, with emphasis on template-based methods. The currently used approaches to RNA modeling are presented with a side view on the protein world, where many similar ideas have been used. Two main programs for automated template-based modeling are presented: ModeRNA assembling structures from fragments and MacroMoleculeBuilder performing a simulation to satisfy spatial restraints. Both approaches have in common that they require an alignment of the target sequence to a known RNA structure that is used as a modeling template. As a way to find promising template structures and to align the target and template sequences, we propose a pipeline combining the ParAlign and Infernal programs on RNA family da...

MedWorm Message: Have you tried our new medical search engine? More powerful than before. 100% free.

28 January 2022NGS-Trex: An Automatic Analysis Workflow for RNA-Seq Data

RNA-Seq technology allows the rapid analysis of whole transcriptomes taking advantage of next-generation sequencing platforms. Moreover with the constant decrease of the cost of NGS analysis RNA-Seq is becoming very popular and widespread. Unfortunately data analysis is quite demanding in terms of bioinformatic skills and infrastructures required, thus limiting the potential users of this method. (Source: Springer protocols feed by Bioinformatics)

28 January 2022Phylogeny and Evolution of RNA Structure

Darwin’s conviction that all living beings on Earth are related and the graph of relatedness is tree-shaped has been essentially confirmed by phylogenetic reconstruction first from morphology and later from data obtained by molecular sequencing. Limitations of the phylogenetic tree concept were recognized as more and more sequence information became available. The other path-breaking idea of Darwin, natural selection of fitter variants in populations, is cast into simple mathematical form and extended to mutation-selection dynamics. In this form the theory is directly applicable to RNA evolution in vitro and to virus evolution. Phylogeny and population dynamics of RNA provide complementary insights into evolution and the interplay between the two concepts will be pursued throughout t...

28 January 2022Deciphering Metatranscriptomic Data

Metatranscriptomic data contributes another piece of the puzzle to understanding the phylogenetic structure and function of a community of organisms. High-quality total RNA is a bountiful mixture of ribosomal, transfer, messenger and other noncoding RNAs, where each family of RNA is vital to answering questions concerning the hidden microbial world. Software tools designed for deciphering metatranscriptomic data fall under two main categories: the first is to reassemble millions of short nucleotide fragments produced by high-throughput sequencing technologies into the original full-length transcriptomes for all organisms within a sample, and the second is to taxonomically classify the organisms and determine their individual functional roles within a community. Species identification is ma...

28 January 2022RNA Structural Alignments, Part II: Non-Sankoff Approaches for Structural Alignments

In structural alignments of RNA sequences, the computational cost of Sankoff algorithm, which simultaneously optimizes the score of the common secondary structure and the score of the alignment, is too high for long sequences (O(L 6) time for two sequences of length L). In this chapter, we introduce the methods that predict the structures and the alignment separately to avoid the heavy computations in Sankoff algorithm. In those methods, neither of those two prediction processes is independent, but each of them utilizes the information of the other process. The first process typically includes prediction of base-pairing probabilities (BPPs) or the candidates of the stems, and the alignment process utilizes those results. At the same time, it is also important to reflect the informat...

28 January 2022Computational Design of Artificial RNA Molecules for Gene Regulation

RNA interference (RNAi) is a powerful tool for the regulation of gene expression. Small exogenous noncoding RNAs (ncRNAs) such as siRNA and shRNA are the active silencing agents, intended to target and cleave complementary mRNAs in a specific way. They are widely and successfully employed in functional studies, and several ongoing and already completed siRNA-based clinical trials suggest encouraging results in the regulation of overexpressed genes in disease. (Source: Springer protocols feed by Bioinformatics)

28 January 2022Introduction to RNA Secondary Structure Comparison

Many methods have been proposed for RNA secondary structure comparison, and new ones are still being developed. In this chapter, we first consider structure representations and discuss their suitability for structure comparison. Then, we take a look at the more commonly used methods, restricting ourselves to structures without pseudo-knots. For comparing structures of the same sequence, we study base pair distances. For structures of different sequences (and of different length), we study variants of the tree edit model. We name some of the available tools and give pointers to the literature. We end with a short review on comparing structures with pseudo-knots as an unsolved problem and topic of active research. (Source: Springer protocols feed by Bioinformatics)

28 January 2022Analysis of Alternative Splicing Events in Custom Gene Datasets by AStalavista

Alternative splicing (AS) is a eukaryotic principle to derive more than one RNA product from transcribed genes by removing distinct subsets of introns from a premature polymer. We know today that this process is highly regulated and makes up a large part of the differences between species, cell types, and states. The key to compare AS across different genes or organisms is to tokenize the AS phenomenon into atomary units, so-called AS events. These events then usually are grouped by common patterns to investigate the underlying molecular mechanisms that drive their regulation. However, attempts to decompose loci with AS observations into events are often hampered by applying a limited set of a priori defined event patterns which are not capable to describe all AS configurations and therefo...

28 January 2022Class-Specific Prediction of ncRNAs

Many RNA families, i.e., groups of homologous RNA genes, belong to RNA classes, such as tRNAs, snoRNAs, or microRNAs, that are characterized by common sequence motifs and/or common secondary structure features. The detection of new members of RNA classes, as well as the comprehensive annotation of genomes with members of RNA classes is a challenging task that goes beyond simple homology search. Computational methods addressing this problem typically use a three-tiered approach: In the first step an efficient and sensitive filter is employed. In the second step the candidate set is narrowed down using computationally expensive methods geared towards specificity. In the final step the hits are annotated with class-specific features and scored. Here we review the tools that are currently avai...

28 January 2022ASPicDB: A Database Web Tool for Alternative Splicing Analysis

Alternative splicing (AS) is a basic molecular phenomenon that increases the functional complexity of higher eukaryotic transcriptomes. Indeed, through AS individual gene loci can generate multiple RNAs from the same pre-mRNA. AS has been investigated in a variety of clinical and pathological studies, such as the transcriptome regulation in cancer. In human, recent works based on massive RNA sequencing indicate that >95 % of pre-mRNAs are processed to yield multiple transcripts. Given the biological relevance of AS, several computational efforts have been done leading to the implementation of novel algorithms and specific specialized databases. Here we describe the web application ASPicDB that allows the recovery of detailed biological information about the splicing mechanism. ASPicDB p...

MedWorm Message:

28 January 2022Computational Prediction of MicroRNA Genes

The computational identification of novel microRNA (miRNA) genes is a challenging task in bioinformatics. Massive amounts of data describing unknown functional RNA transcripts have to be analyzed for putative miRNA candidates with automated computational pipelines. Beyond those miRNAs that meet the classical definition, high-throughput sequencing techniques have revealed additional miRNA-like molecules that are derived by alternative biogenesis pathways. Exhaustive bioinformatics analyses on such data involve statistical issues as well as precise sequence and structure inspection not only of the functional mature part but also of the whole precursor sequence of the putative miRNA. Apart from a considerable amount of species-specific miRNAs, the majority of all those genes are conserved at ...

MedWorm Message: Have you tried our new medical search engine? More powerful than before. 100% free.

28 January 2022Rfam: Annotating Families of Non-Coding RNA Sequences

The primary task of the Rfam database is to collate experimentally validated noncoding RNA (ncRNA) sequences from the published literature and facilitate the prediction and annotation of new homologues in novel nucleotide sequences. We group homologous ncRNA sequences into “families” and related families are further grouped into “clans.” We collate and manually curate data cross-references for these families from other databases and external resources. Our Web site offers researchers a simple interface to Rfam and provides tools with which to annotate their own sequences using our covariance models (CMs), through our tools for searching, browsing, and downloading information on Rfam families. In this chapter, we will work through examples of annotating a query seque...

28 January 2022Annotating Functional RNAs in Genomes Using Infernal

Many different types of functional non-coding RNAs participate in a wide range of important cellular functions but the large majority of these RNAs are not routinely annotated in published genomes. Several programs have been developed for identifying RNAs, including specific tools tailored to a particular RNA family as well as more general ones designed to work for any family. Many of these tools utilize covariance models (CMs), statistical models of the conserved sequence, and structure of an RNA family. In this chapter, as an illustrative example, the Infernal software package and CMs from the Rfam database are used to identify RNAs in the genome of the archaeon Methanobrevibacter ruminantium, uncovering some additional RNAs not present in the genome’s initial annotation. Analysis ...

28 January 2022A Guideline for the Annotation of UTR Regulatory Elements in the UTRsite Collection

Gene expression regulatory elements are scattered in gene promoters and pre-mRNAs. In particular, RNA elements lying in untranslated regions (5′ and 3′UTRs) are poorly studied because of their peculiar features (i.e., a combination of primary and secondary structure elements) which also pose remarkable computational challenges. Several years ago, we began collecting experimentally characterized UTR regulatory elements, developing the specialized database UTRsite. This paper describes the detailed guidelines to annotate cis-regulatory elements in 5′ and 3′ UnTranslated Regions (UTRs) by computational analyses, retracing all main steps used by UTRsite curators. (Source: Springer protocols feed by Bioinformatics)

28 January 2022Energy-Based RNA Consensus Secondary Structure Prediction in Multiple Sequence Alignments

Many biologically important RNA structures are conserved in evolution leading to characteristic mutational patterns. RNAalifold is a widely used program to predict consensus secondary structures in multiple alignments by combining evolutionary information with traditional energy-based RNA folding algorithms. Here we describe the theory and applications of the RNAalifold algorithm. Consensus secondary structure prediction not only leads to significantly more accurate structure models, but it also allows to study structural conservation of functional RNAs. (Source: Springer protocols feed by Bioinformatics)

28 January 2022Exploring the RNA Editing Potential of RNA-Seq Data by ExpEdit

Revealing the impact of A-to-I RNA editing in RNA-Seq experiments is relevant in humans because RNA editing can influence gene expression. In addition, its deregulation has been linked to a variety of human diseases. Exploiting the RNA editing potential in complete RNA-Seq datasets, however, is a challenging task. Indeed, no dedicated software is available, and sometimes deep computational skills and appropriate hardware resources are required. To explore the impact of known RNA editing events in massive transcriptome sequencing experiments, we developed the ExpEdit web service application. In the present work, we provide an overview of ExpEdit as well as methodologies to investigate known RNA editing in human RNA-Seq datasets. (Source: Springer protocols feed by Bioinformatics)

28 January 2022Introduction to Stochastic Context Free Grammars

Stochastic context free grammars are a formalism which plays a prominent role in RNA secondary structure analysis. This chapter provides the theoretical background on stochastic context free grammars. We recall the general definitions and study the basic properties, virtues, and shortcomings of stochastic context free grammars. We then introduce two ways in which they are used in RNA secondary structure analysis, secondary structure prediction and RNA family modeling. This prepares for the discussion of applications of stochastic context free grammars in the chapters on Rfam (6), Pfold (8), and Infernal (9). (Source: Springer protocols feed by Bioinformatics)

28 January 2022The ViennaRNA Web Services

The ViennaRNA package is a widely used collection of programs for thermodynamic RNA secondary structure prediction. Over the years, many additional tools have been developed building on the core programs of the package to also address issues related to noncoding RNA detection, RNA folding kinetics, or efficient sequence design considering RNA-RNA hybridizations. The ViennaRNA web services provide easy and user-friendly web access to these tools. This chapter describes how to use this online platform to perform tasks such as prediction of minimum free energy structures, prediction of RNA-RNA hybrids, or noncoding RNA detection. The ViennaRNA web services can be used free of charge and can be accessed via . (Source: Springer protocols f...

28 January 2022The Determination of RNA Folding Nearest Neighbor Parameters

The stability of RNA secondary structure can be predicted using a set of nearest neighbor parameters. These parameters are widely used by algorithms that predict secondary structure. This contribution introduces the UV optical melting experiments that are used to determine the folding stability of short RNA strands. It explains how the nearest neighbor parameters are chosen and how the values are fit to the data. A sample nearest neighbor calculation is provided. The contribution concludes with new methods that use the database of sequences with known structures to determine parameter values. (Source: Springer protocols feed by Bioinformatics)

28 January 2022Accurate Mapping of RNA-Seq Data

The mapping of RNA-Seq data on genome is not the same as DNA-Seq data, because the junction reads span two exons and have no identical matches at reference genome. In this chapter, we describe a junction read aligner SpliceMap that is based on an algorithm of “half-read seeding” and “seeding extension.” Four analysis steps are integrated in SpliceMap (half-read mapping, seeding selection, seeding extension and junction search, and paired-end filtering), and all toning parameters of these steps can be editable in a single configuration file. Thus, SpliceMap can be executed by a single command. While we describe the analysis steps of SpliceMap, we illustrate how to choose the parameters according to the research interest and RNA-Seq data quality by an example of human...

MedWorm Message: Have you tried our new medical search engine? More powerful than before. 100% free.

28 January 2022RNA–Protein Interactions: An Overview

RNA binding proteins (RBPs) are key players in the regulation of gene expression. In this chapter we discuss the main protein–RNA recognition modes used by RBPs in order to regulate multiple steps of RNA processing. We discuss traditional and state-of-the-art technologies that can be used to study RNAs bound by individual RBPs, or vice versa, for both in vitro and in vivo methodologies. To help highlight the biological significance of RBP mediated regulation, online resources on experimentally verified protein–RNA interactions are briefly presented. Finally, we present the major tools to computationally infer RNA binding sites according to the modeling features and to the unsupervised or supervised frameworks that are adopted. Since some RNA binding site search algorithms are d...

MedWorm Message: Have you tried our new medical search engine? More powerful than before. 100% free.

28 January 2022Quality Control of RNA-Seq Experiments

Direct sequencing of the complementary DNA (cDNA) using high-throughput sequencing technologies (RNA-seq) is widely used and allows for more comprehensive understanding of the transcriptome than microarray. In theory, RNA-seq should be able to precisely identify and quantify all RNA species, small or large, at low or high abundance. However, RNA-seq is a complicated, multistep process involving reverse transcription, amplification, fragmentation, purification, adaptor ligation, and sequencing. Improper operations at any of these steps could make biased or even unusable data. Additionally, RNA-seq intrinsic biases (such as GC bias and nucleotide composition bias) and transcriptome complexity can also make data imperfect. Therefore, comprehensive quality assessment is the first and most crit...

28 January 2022Genotype Imputation to Increase Sample Size in Pedigreed Populations

Genotype imputation is a cost-effective way to increase the power of genomic selection or genome-wide association studies. While several genotype imputation algorithms are available, this chapter focuses on a heuristic algorithm, as implemented in the AlphaImpute software. This algorithm combines long-range phasing, haplotype library imputation, and segregation analysis and it is specifically designed to work with pedigreed populations. (Source: Springer protocols feed by Bioinformatics)

28 January 2022RIP-Seq Data Analysis to Determine RNA–Protein Associations

Next-generation sequencing (NGS) technologies have opened new avenues of unprecedented power for research in molecular biology and genetics. In particular, their application to the study of RNA-binding proteins (RBPs), extracted through immunoprecipitation (RIP), permits to sequence and characterize all RNAs that were found to be bound in vivo by a given RBP (RIP-Seq). On the other hand, NGS-based experiments, including RIP-Seq, produce millions of short sequence fragments that have to be processed with suitable bioinformatic tools and methods to recover and/or quantify the original sequence sample. In this chapter we provide a survey of different approaches that can be taken for the analysis of RIP-Seq data and the identification of the RNAs bound by a given RBP. (Source: Springer protoco...

28 January 2022Detection of Signatures of Selection Using F ST

Natural selection has molded the evolution of species across all taxa. Much more recently, on an evolutionary scale, human-oriented selection started to play an important role in shaping organisms, markedly so after the domestication of animals and plants. These selection processes have left traceable marks in the genome. Following from the recent advances in molecular genetics technologies, a number of methods have been developed to detect such signals, termed genomic signatures of selection. In this chapter we discuss a straightforward protocol based on the F ST statistic to identify genomic regions that exhibit high variation in allelic frequency between groups, which is a characteristic of genomic regions that have gone through differential selection. How to define the borders o...

28 January 2022e-DNA Meta-Barcoding: From NGS Raw Data to Taxonomic Profiling

In recent years, thanks to the essential support provided by the Next-Generation Sequencing (NGS) technologies, Metagenomics is enabling the direct access to the taxonomic and functional composition of mixed microbial communities living in any environmental niche, without the prerequisite to isolate or culture the single organisms. This approach has already been successfully applied for the analysis of many habitats, such as water or soil natural environments, also characterized by extreme physical and chemical conditions, food supply chains, and animal organisms, including humans. A shotgun sequencing approach can lead to investigate both organisms and genes diversity. Anyway, if the purpose is limited to explore the taxonomic complexity, an amplicon-based approach, based on PCR-targeted ...

28 January 2022Genotype Phasing in Populations of Closely Related Individuals

Knowledge of phase has many potential applications for empowering genomic information. For example, phase can facilitate the identification of identical by descent sharing between pairs of individuals, as part of the process of genotype imputation, or to facilitate parent of origin of allele modeling in order to quantify the effect of parental imprinting. (Source: Springer protocols feed by Bioinformatics)

28 January 2022Mining the Electronic Health Record for Disease Knowledge

The growing amount and availability of electronic health record (EHR) data present enhanced opportunities for discovering new knowledge about diseases. In the past decade, there has been an increasing number of data and text mining studies focused on the identification of disease associations (e.g., disease–disease, disease–drug, and disease–gene) in structured and unstructured EHR data. This chapter presents a knowledge discovery framework for mining the EHR for disease knowledge and describes each step for data selection, preprocessing, transformation, data mining, and interpretation/validation. Topics including natural language processing, standards, and data privacy and security are also discussed in the context of this framework. (Source: Springer protocols feed by B...

28 January 2022Detecting Regions of Homozygosity to Map the Cause of Recessively Inherited Disease

Homozygosity is a component of genetic patterning that can be used to search for the cause of genetic disease. In this chapter, methods are presented to analyze SNP data for the presence of homozygosity. Two exercises demonstrate methods to define runs of homozygosity, to identify shared homozygosity between individuals, and to evaluate the results in light of the expectations of a recessively inherited genetic disorder. An example dataset is used to aid in data interpretation. (Source: Springer protocols feed by Bioinformatics)

28 January 2022Systematic Drug Repurposing Through Text Mining

Drug development remains a time-consuming and highly expensive process with high attrition rates at each stage. Given the safety hurdles drugs must pass due to increased regulatory scrutiny, it is essential for pharmaceutical companies to maximize their return on investment by effectively extending drug life cycles. There have been many effective techniques, such as phenotypic screening and compound profiling, which identify new indications for existing drugs, often referred to as drug repurposing or drug repositioning. This chapter explores the use of text mining leveraging several publicly available knowledge resources and mechanism of action representations to link existing drugs to new diseases from biomedical abstracts in an attempt to generate biologically meaningful alternative drug...

MedWorm Message:

28 January 2022Genome-Enabled Prediction Using the BLR (Bayesian Linear Regression) R-Package

The BLR (Bayesian linear regression) package of R implements several Bayesian regression models for continuous traits. The package was originally developed for implementing the Bayesian LASSO (BL) of Park and Casella (J Am Stat Assoc 103(482):681–686, 2008), extended to accommodate fixed effects and regressions on pedigree using methods described by de los Campos et al. (Genetics 182(1):375–385, 2009). In 2010 we further developed the code into an R-package, reprogrammed some internal aspects of the algorithm in the C language to increase computational speed, and further documented the package (Plant Genome J 3(2):106–116, 2010). The first version of BLR was launched in 2010 and since then the package has been used for multiple publications and is being routinely used for...

MedWorm Message: Have you tried our new medical search engine? More powerful than before. 100% free.

28 January 2022Role of Text Mining in Early Identification of Potential Drug Safety Issues

Drugs are an important part of today’s medicine, designed to treat, control, and prevent diseases; however, besides their therapeutic effects, drugs may also cause adverse effects that range from cosmetic to severe morbidity and mortality. To identify these potential drug safety issues early, surveillance must be conducted for each drug throughout its life cycle, from drug development to different phases of clinical trials, and continued after market approval. A major aim of pharmacovigilance is to identify the potential drug–event associations that may be novel in nature, severity, and/or frequency. Currently, the state-of-the-art approach for signal detection is through automated procedures by analyzing vast quantities of data for clinical knowledge. There exists a variety of...

28 January 2022Bayesian Methods Applied to GWAS

Bayesian multiple-regression methods are being successfully used for genomic prediction and selection. These regression models simultaneously fit many more markers than the number of observations available for the analysis. Thus, the Bayes theorem is used to combine prior beliefs of marker effects, which are expressed in terms of prior distributions, with information from data for inference. Often, the analyses are too complex for closed-form solutions and Markov chain Monte Carlo (MCMC) sampling is used to draw inferences from posterior distributions. This chapter describes how these Bayesian multiple-regression analyses can be used for GWAS. In most GWAS, false positives are controlled by limiting the genome-wise error rate, which is the probability of one or more false-positive results,...

28 January 2022Integrative Literature and Data Mining to Rank Disease Candidate Genes

While the genomics-derived discoveries promise benefits to basic research and health care, the speed and affordability of sequencing following recent technological advances has further aggravated the data deluge. Seamless integration of the ever-increasing clinical, genomic, and experimental data and efficient mining for knowledge extraction, delivering actionable insight and generating testable hypotheses are therefore critical for the needs of biomedical research. For instance, high-throughput techniques are frequently applied to detect disease candidate genes. Experimental validation of these candidates however is both time-consuming and expensive. Hence, several computational approaches based on literature and data mining have been developed to identify the most promising candidates fo...

28 January 2022Mixed Effects Structural Equation Models and Phenotypic Causal Networks

Complex networks with causal relationships among variables are pervasive in biology. Their study, however, requires special modeling approaches. Structural equation models (SEM) allow the representation of causal mechanisms among phenotypic traits and inferring the magnitude of causal relationships. This information is important not only in understanding how variables relate to each other in a biological system, but also to predict how this system reacts under external interventions which are common in fields related to health and food production. Nevertheless, fitting a SEM requires defining a priori the causal structure among traits, which is the qualitative information that describes how traits are causally related to each other. Here, we present directions for the applications of SEM t...

28 January 2022Mining Emerging Biomedical Literature for Understanding Disease Associations in Drug Discovery

Systematically evaluating the exponentially growing body of scientific literature has become a critical task that every drug discovery organization must engage in in order to understand emerging trends for scientific investment and strategy development. Developing trends analysis uses the number of publications within a 3-year window to determine concepts derived from well-established disease and gene ontologies to aid in recognizing and predicting emerging areas of scientific discoveries relevant to that space. In this chapter, we describe such a method and use obesity and psoriasis as use-case examples by analyzing the frequency of disease-related MeSH terms in PubMed abstracts over time. We share how our system can be used to predict emerging trends at a relatively early stage and we an...

28 January 2022Using PLINK for Genome-Wide Association Studies (GWAS) and Data Analysis

Within this chapter we introduce the basic PLINK functions for reading in data, applying quality control, and running association analyses. Three worked examples are provided to illustrate: data management and assessment of population substructure, association analysis of a quantitative trait, and qualitative or case–control association analyses. (Source: Springer protocols feed by Bioinformatics)

28 January 2022Scientific Collaboration Networks Using Biomedical Text

The combination of scientific knowledge and experience is the key success for biomedical research. This chapter demonstrates some of the strategies used to help in identifying key opinion leaders with the expertise you need, thus enabling an effort to increase collaborative biomedical research. (Source: Springer protocols feed by Bioinformatics)

28 January 2022Overview of Statistical Methods for Genome-Wide Association Studies (GWAS)

This chapter provides an overview of statistical methods for genome-wide association studies (GWAS) in animals, plants, and humans. The simplest form of GWAS, a marker-by-marker analysis, is illustrated with a simple example. The problem of selecting a significance threshold that accounts for the large amount of multiple testing that occurs in GWAS is discussed. Population structure causes false positive associations in GWAS if not accounted for, and methods to deal with this are presented. Methodology for more complex models for GWAS, including haplotype-based approaches, accounting for identical by descent versus identical by state, and fitting all markers simultaneously are described and illustrated with examples. (Source: Springer protocols feed by Bioinformatics)

28 January 2022Mining Biological Networks from Full-Text Articles

The study of biological networks is playing an increasingly important role in the life sciences. Many different kinds of biological system can be modelled as networks; perhaps the most important examples are protein–protein interaction (PPI) networks, metabolic pathways, gene regulatory networks, and signalling networks. Although much useful information is easily accessible in publicly databases, a lot of extra relevant data lies scattered in numerous published papers. Hence there is a pressing need for automated text-mining methods capable of extracting such information from full-text articles. Here we present practical guidelines for constructing a text-mining pipeline from existing code and software components capable of extracting PPI networks from full-text articles. This approa...

MedWorm Message: Have you tried our new medical search engine? More powerful than before. 100% free.

28 January 2022Managing Large SNP Datasets with SNPpy

Using relational databases to manage SNP datasets is a very useful technique that has significant advantages over alternative methods, including the ability to leverage the power of relational databases to perform data validation, and the use of the powerful SQL query language to export data. SNPpy is a Python program which uses the PostgreSQL database and the SQLAlchemy Python library to automate SNP data management. This chapter shows how to use SNPpy to store and manage large datasets. (Source: Springer protocols feed by Bioinformatics)

MedWorm Message: Have you tried our new medical search engine? More powerful than before. 100% free.

28 January 2022Functional Molecular Units for Guiding Biomarker Panel Design

The field of biomarker research has experienced a major boost in recent years, and the number of publications on biomarker studies evaluating given, but also proposing novel biomarker candidates is increasing rapidly for numerous clinically relevant disease areas. However, individual markers often lack sensitivity and specificity in the clinical context, resting essentially on the intra-individual phenotype variability hampering sensitivity, or on assessing more general processes downstream of the causative molecular events characterizing a disease term, in consequence impairing disease specificity. The trend to circumvent these shortcomings goes towards utilizing multimarker panels, thus combining the strength of individual markers to further enhance performance regarding both sensitivity...

28 January 2022Descriptive Statistics of Data: Understanding the Data Set and Phenotypes of Interest

A good understanding of the design of an experiment and the observational data that have been collected as part of the experiment is a key pre-requisite for correct and meaningful preparation of field data for further analysis. In this chapter, I provide a guideline of how an understanding of the field data can be gained, preparation steps that arise as a consequence of the experimental or data structure, and how to fit a linear model to extract data for further analysis. (Source: Springer protocols feed by Bioinformatics)

28 January 2022Roles for Text Mining in Protein Function Prediction

The Human Genome Project has provided science with a hugely valuable resource: the blueprints for life; the specification of all of the genes that make up a human. While the genes have all been identified and deciphered, it is proteins that are the workhorses of the human body: they are essential to virtually all cell functions and are the primary mechanism through which biological function is carried out. Hence in order to fully understand what happens at a molecular level in biological organisms, and eventually to enable development of treatments for diseases where some aspect of a biological system goes awry, we must understand the functions of proteins. However, experimental characterization of protein function cannot scale to the vast amount of DNA sequence data now available. Computa...

28 January 2022Incorporating Prior Knowledge to Increase the Power of Genome-Wide Association Studies

Typical methods of analyzing genome-wide single nucleotide variant (SNV) data in cases and controls involve testing each variant’s genotypes separately for phenotype association, and then using a substantial multiple-testing penalty to minimize the rate of false positives. This approach, however, can result in low power for modestly associated SNVs. Furthermore, simply looking at the most associated SNVs may not directly yield biological insights about disease etiology. SNVset methods attempt to address both limitations of the traditional approach by testing biologically meaningful sets of SNVs (e.g., genes or pathways). The number of tests run in a SNVset analysis is typically much lower (hundreds or thousands instead of millions) than in a traditional analysis, so the false-positiv...

28 January 2022Predicting Future Discoveries from Current Scientific Literature

Knowledge discovery in biomedicine is a time-consuming process starting from the basic research, through preclinical testing, towards possible clinical applications. Crossing of conceptual boundaries is often needed for groundbreaking biomedical research that generates highly inventive discoveries. We demonstrate the ability of a creative literature mining method to advance valuable new discoveries based on rare ideas from existing literature. When emerging ideas from scientific literature are put together as fragments of knowledge in a systematic way, they may lead to original, sometimes surprising, research findings. If enough scientific evidence is already published for the association of such findings, they can be considered as scientific hypotheses. In this chapter, we describe a meth...

28 January 2022Applications of Multifactor Dimensionality Reduction to Genome-Wide Data Using the R Package ‘MDR’

This chapter describes how to use the R package ‘MDR’ to search and identify gene–gene interactions in high-dimensional data and illustrates applications for exploratory analysis of multi-locus models by providing specific examples. (Source: Springer protocols feed by Bioinformatics)

XomicX - Tel: +31 (0)10 73 70 410 - Mobile: +31 (0)6 - 28 91 94 13
Address: Postbox 303, Rotterdam, The Netherlands
Email: info(at) - Trade Register: 50283618 - Account number: 5551968 - Terms and conditions
Development by BitWorX - Hosting by RackWerk