GCB 2021 - Workshops

Workshops

Wednesday, September 08, 2021, 2 - 5 pm

WS 1: Metabolic Reconstruction and Flux Modelling

WS 2: Cell segmentation using KNIME Analytics Platform and its Tensorflow2 Integration

WS 3: BioC++ - solving daily bioinformatic tasks with C++ efficiently

Thursday, September 09, 2021 9 am -12pm

WS 4: Modern hashing for alignment-free sequence analysis

WS 5: Bioinformatics tools for analyzing clinical metaproteomics samples of the human gut

Thursday, September 09, 2 - 5 pm

WS 6: How to build and analyse mathematical models of biological systems using Python & modelbase

WS 7: Non-targeted label-free proteomics

WS 8: Exploring Target Structures with ProteinsPlus

WS 1: Metabolic Reconstruction and Flux Modelling

Wednesday, September 08, 2021, 2 pm

Organizers:
Sam Seaver, Nadine Töpfer
IPK Gatersleben

Part I: Introduction to Model Reconstruction
Genome-scale metabolic reconstructions help us to understand and engineer metabolism. Next-generation sequencing technologies are delivering genomes and transcriptomes for an ever-widening range of species. While such omic data can, in principle, be used to compare metabolic reconstructions in different species, organs and environmental conditions, these comparisons require a standardized framework for the reconstruction of metabolic networks from transcript data. We introduce PlantSEED as such a framework to cover the primary metabolism for any plant species.

Part II: Introduction to Flux Balance Analysis
Flux Balance Analysis (FBA) is a linear programming technique for predicting metabolic flux distributions by using the underlying stoichiometry of a metabolic network and a biologically motivated objective function. FBA was initially developed to model steady-state metabolism of microbial systems. Over the past decade various applications to modelling plant metabolic systems have emerged. Genome-scale metabolic reconstructions, for instance those reconstructed by the PlantSEED framework, can be analysed using FBA and extensions of it.

WS 2: Cell segmentation using KNIME Analytics Platform and its Tensorflow2 Integration

Wednesday, September 08, 2021, 2 pm

Organizers:
Janina Mothes, PhD - Data Scientist (Life Sciences), KNIME GmbH
Temesgen H. Dadi, PhD - Technical Data Scientist (Life Sciences), KNIME GmbH

Image analysis is one of the hallmarks of biomedical research due to its wide range of potential applications. This includes enhancing our understanding of brain function by analyzing the connectivity of individual neuronal processes and synapses through serial transmission electron microscopy (EM). Machine learning approaches, in particular convolutional neural networks, allow the automatic segmentation of neural structures in EM images, an important step towards automating the extraction of neuronal connectivity.
The open source KNIME Analytics Platform offers an accessible tool based on the visual programming paradigm to analyse diverse kinds of data, including images. In addition, one can choose from a wide array of data transformations, machine learning algorithms, and visualizations and combine those in one reproducible workflow. KNIME Analytics Platform is freely available from https://www.knime.com/downloads.
In this hands-on tutorial, participants will produce a workflow to create and train a specific Convolutional Network (U-Net) for segmenting cell images. We will start by importing and cleaning up the input data (Transmission Electron Microscopy data)[1,2]. Afterwards, with the help of the KNIME Tensorflow2 integration, we will then train a U-Net model and use the trained network to predict the segmentation of unseen data. In the last step, we visualize our results.

Participants will learn how to
- Use the open source KNIME Analytics Platform for importing, blending and transforming data
- Work with images in KNIME Analytics Platform
- Train a U-Net model and apply it to unseen data
- Visualize the results

Intended audience and level - Beginner
Students (grad/undergrad), researchers, principal investigators with an interest in machine learning, images, data manipulation are welcome to attend the tutorial. A little background on machine learning and imaging data is a plus. We will provide a short introduction to the KNIME Analytics Platform, cell segmentation, and convolutional neural networks, before starting the hands-on sessions.

Requirements
For a hands-on tutorial, participants need to bring their own laptop. All the necessary software and data will be made available for download before the tutorial day.

[1] Arganda-Carreras, Ignacio, et al. "Crowdsourcing the creation of image segmentation algorithms for connectomics." Frontiers in neuroanatomy 9 (2015): 142.
[2] Cardona, Albert, et al. "An integrated micro-and macroarchitectural analysis of the Drosophila brain by computer-assisted serial section electron microscopy." PLoS Biol 8.10 (2010): e1000502.

WS 3: BioC++ - solving daily bioinformatic tasks with C++ efficiently

Wednesday, September 08, 2021, 2 pm

Organizers:
René Rahn, Max Planck Institute for Molecular Genetics, Algorithmic Bioinformatics, Germany
Marcel Ehrhardt, Free University Berlin, Algorithmic Bioinformatics, Germany
Enrico Seiler, Max Planck Institute for Molecular Genetics, Algorithmic Bioinformatics, Germany

Overview:
In this half-day tutorial we are going to teach how to use modern C++ and utilise efficient C++ libraries to rapidly develop tools and scripts for operating on and manipulating large-scale sequencing data.

Motivation:
The high variability and heterogeneity often observed within various genomic data is challenging for many standard tools, for example for read alignment and variant calling. Often, these tools are wrapped in complicated pre- and postprocessing data curation steps in order to obtain results with higher quality. However, these additional steps incur a high maintenance and performance burden to the established work process and often do not scale with larger data sets. Seldomly, C++ is considered as the language of choice for these small processes, although it is the main language used in high-performance computing. We are going to show that implementing modern C++ can be as easy as using other modern high-level languages.

Course outline:
This tutorial is organised as a half-day tutorial. At the beginning we are going to introduce fundamental concepts and principles of the C++ programming language. Further, we will teach how modern C++ features such as ranges and concepts can be used to rapidly develop high-quality C++ applications. This introduction to C++ follows a practical session where participants will read in typical files from sequencing experiments using the C++ library SeqAn and operate on the data with the taught principles to solve diverse problems, e.g. filtering out reads with low sequencing quality and others. In the last 30 minutes of the day we are going to summarise the learned concepts and compare the developed methods to current approaches.

Intended audience and level:
This tutorial is mostly suited for computational biologist and bioinformaticians with research focus on sequence analysis (e.g., genomics, metagenomics, proteomics, read alignment, variant detection, etc.). A fundamental knowledge about sequencing experiments and the involved data is required. We expect that attendees have an intermediate knowledge in programming with any high-level programming language, e.g. Python, Java or C++. Some basic C++-knowledge is helpful but not mandatory to successfully complete the course.

This tutorial is targeting beginners and intermediate C++ developers that want to learn more about modern C++ features like ranges and concepts.

Learning Objectives:
Students will develop

* skills in developing an application using the C++ programming language
* knowledge and understanding of modern C++ features, such as ranges and concepts
* knowledge and understanding about how to develop and sustain high-quality software

Requirements:
Attendees should bring their own laptop.
Software for the tutorial can be installed beforehand, but we will also dedicate some extra time for installing required software during the tutorial.

    Git
    g++ >= 7
    SeqAn 3 - (https://github.com/seqan/seqan3)
    CMake >= 3.12

or, VirtualBox if the attendee wishes to use the provided virtual image running Ubuntu.

WS 4: Modern hashing for alignment-free sequence analysis

Thursday, September 09, 2021, 9 am

Organizers:
Sven Rahmann - Center for Bioinformatics, Saarland University
Jens Zentgraf - TU Dortmund University

In recent years, alignment free sequence analysis methods have gained importance, due to their superior speed at equivalent results in comparison to traditional mapping- and alignment-based methods. Recently, methods have emerged that are able to index very large collections of sequenced DNA samples (e.g. any genome ever sequenced).

The basis of each alignment-free method is a so called k-mer dictionary (or key-value-store) hat associates a value (e.g., a transcript ID, chromosome number, species ID or counter) to each DNA substring of length k (from a genome or a sequenced sample). Almost always, such a dictionary is implemented via hashing. Ideally, considering that billions of k-mers have to be processed, such a hash table is both small and fast. It is both a science and an art to design fast and small hash tables for a given task.

In this two-part tutorial/workshop, we will first give an overview of state-of-the-art hashing methods (and low-level engineering tricks) and then have a small number of contributed application-specific talks.

The tutorial part is addressed to bioinformaticians who would like to know more about the underlying hashing algorithms. Following the tutorial will enable you to better understand the underlying methods (and their limitations) of many state-of-the-art sequence analysis tools in genomics, transcriptomics, metagenomics and pangenomics. It will also help you to design your own method efficiently when the need arisies.

The tutorial will cover:
- introduction to k-mer key-value stores
- canonical encodings of k-mers
- basics of Cuckoo hashing
- extensions of Cuckoo hashing: buckets, several hash functions
- theoretical load thresholds
- bit-level layout of a hash table
- performance engineering:
- importance of cache locality and prefetching
- shortcuts for unsuccessful lookups
- compact encoding of hash choices and values
- optimization of a hash table
- parallelization

The workshop part will cover several application examples.
The following list is preliminary and may change until the conference.
- general-purpose k-mer counting
- metagenomic read classification
- xenograft sorting, i.e., classifying reads from a mixed sample containing graft tumor and host reads

WS 5: Bioinformatics tools for analyzing clinical metaproteomics samples of the human gut

Thursday, September 09, 2021, 9 am

Organizers:
Dr. Robert Heyer - Otto von Guericke University, Magdeburg
Dr. Stephan Fuchs - Robert Koch-Institut, Berlin
Dr. Thilo Muth - Federal Institute for Materials Research and Testing, Berlin
Kay Schallert - Otto von Guericke University, Magdeburg
Prof. D. Benndorf - Otto von Guericke University, Magdeburg, FH Köthen

Metaproteomics analyzes the entirety of proteins from whole microbial communities such as complex microbiomes from medical and technical applications, e.g., in fecal diagnostics and the operation of biogas plants or wastewater treatment plants. A precondition for successful metaproteomics studies is, in addition to experimental knowledge, comprehensive knowledge about the bioinformatics data evaluation (Heyer et al., 2017). This workshop aims to train people by a hands-on workshop in the required bioinformatics tools and skills required for the complete workflow of metaproteomics data analysis. It starts with identifying peptides and inferring proteins from mass spectrometry data using the MetaProteomeAnalyzer (Muth et al., 2015, Muth et al., 2018, Heyer et al., 2019) and the taxonomic and functional annotation using Prophane (Schiebenhoefer et al. 2020). Subsequently, we will illuminate the biostatistical data analysis and data visualization. As a use case, we selected fecal samples from patients with inflammatory bowel disease.

References
Heyer, R., Schallert, K., Zoun, R., Becher, B., Saake, G., Benndorf, D., (2017). Challenges and perspectives of metaproteomic data analysis. Journal of Biotechnology, 261:24-36. doi: 10.1016/j.jbiotec.2017.06.1201
Heyer, R., Schallert, K., (shared first), Büdel, A., Zoun, R., Dorl, S., Kohrs, F., Püttker, S., Siewert, C., Muth, T., Saake, G., Reichl, U., Benndorf, D., (2019) MPA-WORKFLOW: A robust and universal metaproteomics workflow for research studies and routine diagnostics within 24°h using phenol extraction, FASP digest, and the MetaProteomeAnalyzer. Frontiers in Microbiology, 10: 1883. doi: 10.3389/fmicb.2019.01883
Muth, T., Behne, A., Heyer, R., Kohrs, F., Benndorf, D., Hoffmann, M., Lehtevä, M., Reichl, U., Martens, L., Rapp, E. (2015). The MetaProteomeAnalyzer: a powerful open-source software suite for metaproteomics data analysis and interpretation. Journal of Proteome Research, 14, 1557-1565. doi: 10.1002/pmic.201400559
Muth, T., Kohrs, F., Heyer, R., Benndorf, D., Rapp, E., Reichl, U., Martens, L., Renard, B.Y. (2018) MPA Portable: a Stand-alone Software Package for Analyzing Metaproteome Samples on the Go. Analytical Chemistry, 90 (1), pp 685–689, doi: 10.1021/acs.analchem.7b03544
Schiebenhoefer, H., Schallert, K., Renard, B.Y., Trappe, K., Schmid, E., Benndorf, E., Riedel, K., Muth, T., Fuchs, S., (2020), A complete and flexible workflow for metaproteomics data analysis based on MetaProteomeAnalyzer and Prophane. Nature Protocols, 15, 3212–3239. https://doi.org/10.1038/s41596-020-0368-7

WS 6: How to build and analyse mathematical models of biological systems using Python & modelbase

Thursday, September 09, 2021, 2 pm – 5 pm

Organizers:
Dr Anna Matuszyńska, Marvin van Aalst
Heinrich Heine University Düsseldorf

Computational mathematical models of biological and biomedical systems have been successfully applied to advance our understanding of various regulatory processes, metabolic fluxes, effects of drug therapies and disease evolution or transmission. Many computational approaches have been developed to support model construction and analysis. We have developed modelbase, an open-source toolbox, to facilitate synergies within the emerging research fields of systems biology and medicine making the overall process of model construction more consistent, understandable, transparent and reproducible.

During this workshop, we will show how to use the software by building a toy model of glycolysis and analysing its dynamic behaviour. We will start with a short introduction to differential equation-based kinetic modelling, during which we will give guidelines to when they should be used and when other techniques are more applicable. We will also demonstrate key features of modelbase using a previously developed model of carbon fixation. During the second part (2 hours) we will provide an interactive learning experience where you will perform an in silico labelling experiment on the glycolysis pathway and investigate how drugs cure cancer, playing with a simple pharmacodynamic model.

Considering the increasing use of Python by computational biologists, a fully embedded modelling package like modelbase is desired and we are happy to present it to you.

Requirements:

basic knowledge of Python

Resources:

modelbase software: https://gitlab.com/qtb-hhu/modelbase-software
Documentation: https://modelbase.readthedocs.io/en/latest/

WS 7: Non-targeted label-free proteomics

Thursday, September 09, 2021, 2 pm

Organizers:
Dr. Timo Sachsenberg
de.NBI/CIBI Tübingen

The course introduces key concepts of non-targeted label-free proteomics. Non-targeted methods are ideal for unbiased discovery studies and scale well for large-scale studies (e.g., clinical proteomics). Based on example datasets we will then introduce several open-source software tools for proteomics primarily focusing on OpenMS (www.openms.org). We will demonstrate how these tools can be combined into complex data analysis workflows including visualization of results.

Prerequisites:
Target audience are computational scientists interested in working with raw mass spectrometric data.

Learning goals:
- Introduction to computational mass spectrometry proteomics
- OpenMS and the integration platform KNIME
- Optional: Developing tools with the OpenMS library
- Optional: Large scale data processing with OpenMS (nextflow or galaxy)

WS 8: Exploring Target Structures with ProteinsPlus

Thursday, September 09, 2021, 2 pm

Organizers:
Katrin Schöning-Stierand, Christiane Ehrt and Matthias Rarey
Universität Hamburg, ZBH – Center for Bioinformatics, Hamburg, Germany

Contents:
Three-dimensional protein structures are a fundamental basis for understanding, modulating, and manipulating protein functionality. With almost 175,000 structures (access date: March 1st, 2021), the Protein Data Bank (PDB) is one of the most important bioinformatics resources for life sciences. Roughly 1,000 structures are SARS-CoV-2 structures, forming a good basis for structure-based modeling processes.
In this workshop, we present the ProteinsPlus server [1-3] enriching structural knowledge from the PDB by additional computed information required for typical biological research questions. ProteinsPlus enables easy access to this information for all researchers in the fields of molecular life sciences. The provided computational services comprise various tools for the assessment, representation, preprocessing, and interconnection of structural data. Many of the provided tools focus on protein binding pockets and molecular interactions to small molecules due to their relevance for drug design.
Participants will get to know a combination of tools and web services for searching and analyzing protein structure data. The focus will be on protein preparation for molecular docking scenarios related to COVID-19. We will work with the ProteinsPlus web service that contains a diverse range of software solutions for the analysis of protein structures and its application in molecular modeling approaches.

Learning goals:
This course is designed for life and computer scientists with interest in protein structures, but only very basic experience in 3D modeling. Topics include: Finding and selecting protein structure data, evaluating the quality of experimental data, preprocessing structure data for modeling, first modeling steps like the analysis of binding site properties and conformational flexibility, fully automated docking. The usage of the ProteinsPlus tools is free and open to all users.

Prerequisites:
General knowledge of proteins and their role in life sciences.

[1] https://proteins.plus
[2] K. Schöning-Stierand, K. Diedrich, R. Fährrolfes, F. Flachsenberg, A. Meyder, E. Nittinger, R. Steinegger, M. Rarey. Nucleic Acids Res. 2020, 48: W48-W53.
[3] R. Fährrolfes, S. Bietz, F. Flachsenberg, A. Meyder, E. Nittinger, T. Otto, A. Volkamer, M. Rarey, Nucleic Acids Res. 2017, 45: W337–W343.

Supported by

Workshops

WS 1: Metabolic Reconstruction and Flux Modelling

WS 2: Cell segmentation using KNIME Analytics Platform and its Tensorflow2 Integration

WS 3: BioC++ - solving daily bioinformatic tasks with C++ efficiently

WS 4: Modern hashing for alignment-free sequence analysis

WS 5: Bioinformatics tools for analyzing clinical metaproteomics samples of the human gut

WS 6: How to build and analyse mathematical models of biological systems using Python & ​modelbase

WS 7: Non-targeted label-free proteomics

WS 8: Exploring Target Structures with ProteinsPlus

Supported by

WS 6: How to build and analyse mathematical models of biological systems using Python & modelbase