BioHackathon Europe 2025, Berlin, Germany, 2025

BioHackathon Europe is an annual event that brings together bioinformaticians and computational biologists from around the world. It’s organised by ELIXIR Europe, and offers an intense week of hacking, with participants working on diverse and exciting projects. BioHackathon is a community-driven event, which provides an opportunity for members of the life sciences community to meet and work together on topics of common interest. The goal is to create code that addresses challenges in bioinformatics research.

Source

Previous BioHackathon Europe preprints

YAML instructions

biohackathon_name: "BioHackathon Europe 2025"
biohackathon_url: "https://biohackathon-europe.org/"
biohackathon_location: "Berlin, Germany, 2025"

Preprints

  • Tools to develop constraint-based models in R: adapting existing toolboxes

    As part of the BioHackathon Europe 2025, we here report on the progress of the hacking team preparing tools to develop constraint-based models in R for the Systems Biology community. This preliminary development relies on the adaptation of existing toolboxes. In this project, we proposed the (re)development of an R based framework for developing and simulating constraint-based models. We proposed to expand the Sybil library for model simulation with the functionalities for model reconstruction and analysis available in the widely used RAVEN toolbox in Matlab. The outcome will facilitate constraint based modelling to experimental scientists, thereby contributing to bridge the gap between data users and data generators. It will also be more FAIR by being usable with non-proprietary software, and align with software best practices as collected by the ELIXIR Tools Platform. We will work towards increased reproducibility by also considering implementation of FROG analysis in R. Moreover, as a tool developed by the ELIXIR Systems Biology Community for the wider community, the long-term maintenance burden is spread across a wider membership.Two weeks before the BioHackathon, we discovered a new tool in R allowing the simulation of models, called cobrar (https://github.com/Waschina/cobrar). Which calls for an assessment of its current state and definition of new development areas.
  • Bidirectional bridge: GitHub ⇄ bio.tools

    Research software metadata can be found across many code repositories and software registries. Here, we describe the tooling for a bidirectional bridge between the software development platform GitHub and the ELIXIR bio.tools registry of life sciences software tools and data resources. The developed bridge maps and improves metadata records across these two platforms, thereby benefiting both and helping make research software more FAIR: findable, accessible, interoperable, and reusable. Specifically, the bridge enables production of high-quality, rich bio.tools entries from the content already available in GitHub repositories, and uses bio.tools records to suggest improvements to GitHub repositories through pull requests or issues. This includes adding missing information and standardized descriptions for increased compliance with Software Management Plans. The bidirectional bridge makes extensive use of existing APIs (GitHub, bio.tools, Europe PMC) and large language models (LLMs) to enrich metadata on both platforms. By automating metadata extraction, improvement suggestion, and integration, the bridge reduces the manual overhead required to FAIRify research software, lowering barriers for researchers to contribute or maintain well-annotated, reusable software.
  • METRICS - Monitoring of Key Performance Indicators for ELIXIR Services

    Key Performance Indicators (KPIs) are increasingly requested by a diverse range of stakeholders across the research ecosystem. Funders want to measure the impact of projects and related services they fund, or research organisations want to track the service use for informed decision making. Service providers themselves are also interested in monitoring their services to gather feedback and improve service quality. KPIs are a simple, but powerful tool for these purposes.As part of the BioHackathon Europe 2025, we report on the activities of the METRICS project, which addresses the need for consistent and transparent evaluation of services across ELIXIR and related initiatives using KPIs. The project brings together experts from multiple ELIXIR Nodes and scientific domains to identify, harmonise, and semantically model KPIs that reflect service quality, usage, sustainability, and impact. By exploring existing evaluation frameworks, and processes, the team aims to design a flexible yet coherent foundation for KPI monitoring of ELIXIR services. This report summarises the project’s motivation, current landscape analysis, and initial steps toward developing an ontology-driven framework for KPI representation, fostering interoperability and supporting evidence-based management of life science infrastructures.
  • BioHackEU25 Report Project 16: MiCoReCa (Microbiome Community Resource Catalogue) - Towards Centralized Curation And Integration Of Microbiome Bioinformatics Resources

    The rapid growth of microbiome research has led to the development of numerous bioinformatics tools and databases, but information about them remains fragmented across disparate, often outdated cataloging efforts, hindering resource discovery and utilization. To address this critical gap, the ELIXIR Microbiome Community proposes the development of MiCoReCa (Microbiome Community Resource Catalogue), a comprehensive, dynamic, open-access catalogue of microbiome-related bioinformatics resources (tools, workflows, training, standards, and databases). Leveraging our community’s expertise, this initiative will utilize standardized ontologies like EDAM and cross-reference established platforms like bio.tools and WorkflowHub to create a centralized, findable inventory. A key feature is the community-driven process for identifying and curating missing ontological terms and metadata, ensuring MiCoReCa’s accuracy and relevance in collaboration with partner platforms. Furthermore, the catalogue will integrate links to training materials from TeSS to support appropriate tool usage, and connect with OpenEBench for benchmarking capabilities. This project will not only provide a vital resource for the microbiome field, enhancing research efficiency and reproducibility, but will also establish a sustainable, adaptable infrastructure potentially applicable to other ELIXIR Communities. This effort represents a significant contribution by the ELIXIR Microbiome Community to streamline microbiome bioinformatics.
  • Decoding Complex Genotype-Phenotype Interactions by Discretizing the Genome

    Background: Despite the ease and affordability of genome sequencing in biomedical research, the genetic causes of many diseases or their subtypes remain unknown due to diverse biological mechanisms that complicate genotype-phenotype relationships. Most previous studies have focused on single variants or sets of variants presumed to be directly causal for the disease. However, incomplete penetrance, in which some individuals carry disease-associated variants yet exhibit no phenotype, suggests that these variants, the genomic background and other secondary factors combine to shape the susceptibility to the disease.
  • BioHackEU25 report: Towards a Robust Validation Service for Data and Metadata in ARC RO-Crates

    Robust validation of both research data and its accompanying metadata is essential for ensuring adherence to FAIR principles. Current approaches often handle these aspects separately, hindering a holistic quality assessment. Building upon previous BioHackathon work establishing ARCs (Annotated Research Context) as RO-Crates (ARC RO-Crate), we aim to develop and demonstrate an integrated validation strategy for FAIR digital objects. It distinguishes between validating the metadata descriptor and the payload data files.For the metadata descriptor, validation will ensure structural and semantic compliance to the base RO-Crate specification and the ARC-ISA family of RO-Crate profiles, using and extending the RO-Crate validator tool.For the payload data files, validation targets the actual content, since data files often require domain-specific structural and value constraints, which requires explicit schema definitions. For this, we will integrate Frictionless for checking data content against community standards (e.g. MIAPPE, as demonstrated in the HORIZON project AGENT). Crucially, this project will also explore mechanisms for specifying expected data structures’ requirements within the ARC RO-Crate itself. This aims to provide a more self-contained description of data, investigating how such internal requirements can be linked to data validation frameworks, complementing the crate’s metadata validation.The overall goal is to provide a powerful, holistic validation mechanism for ARC RO-Crates, enhancing their reliability, trustworthiness, and FAIRness. A MIAPPE-compliant plant phenomics dataset will serve as a use case. This integrated validation approach aims to streamline quality control for researchers and will be packaged as a deployable microservice, offering broad applicability across diverse research workflows.
  • Mining the potential of knowledge graphs for metadata on training

    Training metadata in the life‑science community is increasingly standardized through Bioschemas, yet remains fragmented and under‑utilized. In this work we harvested training records from ELIXR’s TeSS platform and the Galaxy Training Network, converting them into a unified knowledge graph. A dedicated pipeline parses RDF/Turtle dumps, deduplicates entries, and builds rich indexes (keyword, provider, location, date, topic) that power a Model Context Protocol (MCP) server. The MCP offers live and offline search tools—including keyword, provider, location, date, topic, and SPARQL queries—enabling natural‑language access to training resources via LLM‑driven clients. User‑story driven evaluations demonstrate the system’s ability to generate custom learning paths, assemble trainer profiles, and link training data to external repositories. Findings highlight gaps in persistent identifiers (ORCID, ROR) and location granularity, informing recommendations for metadata providers. The project showcases how knowledge‑graph‑backed metadata can enhance discoverability, interoperability, and AI‑assisted exploration of scientific training materials.
  • BioHackEU25 report: Scop3PTM Next - Interactive visualization of PTM data across sequence, structure and interactions

    Scop3PTM Next was developed during BioHackathon Europe 2025 to address the need for integrated visualization of protein-centric data across sequence (and modification), interaction and structural contexts. The project delivers an open-source library of modular JavaScript components, built with Vue.js and documented with Storybook, enabling reusable and interoperable visualizations. The framework provides 1D sequence tracks, contact-map networks and interactive 3D structural renders, using MolSpecView for rendering 3D structures linked to Nightingale 1D tracks. Together, these components offer a unified interface for exploring PTM features across multiple representational layers. This work establishes the basis for a community-oriented visualization library to support proteomics analysis.