Meetings
BioHackSWAT4HCLS 2025
BioHackathon Europe 2025
4th BioHackathon Germany
DBCLS BioHackathon 2025
ELIXIR INTOXICOM
Recent preprints
-
BioHackEU25 report: Towards a Robust Validation Service for Data and Metadata in ARC RO-Crates
Robust validation of both research data and its accompanying metadata is essential for ensuring adherence to FAIR principles. Current approaches often handle these aspects separately, hindering a holistic quality assessment. Building upon previous BioHackathon work establishing ARCs (Annotated Research Context) as RO-Crates (ARC RO-Crate), we aim to develop and demonstrate an integrated validation strategy for FAIR digital objects. It distinguishes between validating the metadata descriptor and the payload data files.For the metadata descriptor, validation will ensure structural and semantic compliance to the base RO-Crate specification and the ARC-ISA family of RO-Crate profiles, using and extending the RO-Crate validator tool.For the payload data files, validation targets the actual content, since data files often require domain-specific structural and value constraints, which requires explicit schema definitions. For this, we will integrate Frictionless for checking data content against community standards (e.g. MIAPPE, as demonstrated in the HORIZON project AGENT). Crucially, this project will also explore mechanisms for specifying expected data structures’ requirements within the ARC RO-Crate itself. This aims to provide a more self-contained description of data, investigating how such internal requirements can be linked to data validation frameworks, complementing the crate’s metadata validation.The overall goal is to provide a powerful, holistic validation mechanism for ARC RO-Crates, enhancing their reliability, trustworthiness, and FAIRness. A MIAPPE-compliant plant phenomics dataset will serve as a use case. This integrated validation approach aims to streamline quality control for researchers and will be packaged as a deployable microservice, offering broad applicability across diverse research workflows. -
MCP server tools with RDF shapes
In this paper, we present the work we have done during the Japan Biohackathon 2025 about implementing MCP servers supported by RDF data shapes to improve natural language interactions with large RDF datasets using SPARQL. -
Mining the potential of knowledge graphs for metadata on training
Training metadata in the life‑science community is increasingly standardized through Bioschemas, yet remains fragmented and under‑utilized. In this work we harvested training records from ELIXR’s TeSS platform and the Galaxy Training Network, converting them into a unified knowledge graph. A dedicated pipeline parses RDF/Turtle dumps, deduplicates entries, and builds rich indexes (keyword, provider, location, date, topic) that power a Model Context Protocol (MCP) server. The MCP offers live and offline search tools—including keyword, provider, location, date, topic, and SPARQL queries—enabling natural‑language access to training resources via LLM‑driven clients. User‑story driven evaluations demonstrate the system’s ability to generate custom learning paths, assemble trainer profiles, and link training data to external repositories. Findings highlight gaps in persistent identifiers (ORCID, ROR) and location granularity, informing recommendations for metadata providers. The project showcases how knowledge‑graph‑backed metadata can enhance discoverability, interoperability, and AI‑assisted exploration of scientific training materials. -
BioHackEU25 report: Scop3PTM Next - Interactive visualization of PTM data across sequence, structure and interactions
Scop3PTM Next was developed during BioHackathon Europe 2025 to address the need for integrated visualization of protein-centric data across sequence (and modification), interaction and structural contexts. The project delivers an open-source library of modular JavaScript components, built with Vue.js and documented with Storybook, enabling reusable and interoperable visualizations. The framework provides 1D sequence tracks, contact-map networks and interactive 3D structural renders, using MolSpecView for rendering 3D structures linked to Nightingale 1D tracks. Together, these components offer a unified interface for exploring PTM features across multiple representational layers. This work establishes the basis for a community-oriented visualization library to support proteomics analysis. -
A Blueprint for Open Science: How Transatlantic Teams Built and Deployed Knowledge Graphs to Enable Biological (AI) Models
Knowledge graphs (KGs) and large language models (LLMs) are increasingly applied in biomedical research; however, LLMs’ tendency to hallucinate and lack of evidence traceability poses significant challenges for rigorous scientific applications. To address these limitations, the NVIDIA - AWS Open Data Knowledge Graph Hackathon, which brought together transatlantic teams, catalyzed the development of novel frameworks that built or integrated KGs with graph-based retrieval-augmented generation (GraphRAG) to enhance evidence-grounded generative AI. The hackathon took place on October 1-3, 2025, at two locations - the AWS Skills Center in Arlington, VA, USA and the European Bioinformatics Institute (EBI) Training Center in Cambridge, UK. Across seven prototype projects, participating teams developed systems that construct, validate, and deploy biomedical KGs using open data and cloud-native infrastructure. These included GeNETwork, which integrates pediatric oncology datasets to identify therapeutic targets; ECoGraph, a multi-omics graph framework for characterizing colorectal cancer drivers; ClassiGraph, a graph neural network classifier for cancer subtypes; EasyGiraffe, a validator for multisite polygenicity extraction; MIDAS (Model Integration and Data Assembly System), a pipeline for harmonizing heterogeneous biomedical datasets; KG Model Garbage Collection, a framework for detecting and pruning erroneous AI-generated edges; and BioGraphRAG, which combines precision medicine and literature-derived KGs for evidence-based question answering. Together, these prototypes demonstrate practical strategies for constructing and deploying biomedical KGs and highlight the potential of GraphRAG to produce interpretable, verifiable AI-driven insights. By emphasizing open data, reproducible pipelines, and evidence-grounded reasoning, this work advances methodologies for trustworthy generative AI in biomedical discovery. -
DBCLS BioHackathon 2025 report on the WikiBlitz
As part of the DBCLS BioHackathon 2025, we organized a WikiBlitz to improve biodiversity knowledge by integrating iNaturalist, GBIF, Wikidata, and Wikipedia. Participants identified local flora and fauna, filling gaps in multilingual Wikipedia articles. This report summarizes the methodology, results, and insights, illustrating the usefulness of combining citizen science with digital platforms to enrich ecological data and promote biodiversity awareness. -
on2vec: Ontology Embeddings with Graph Neural Networks and Sentence Transformers
Ontologies provide structured vocabularies and relationships essential for organizing biological knowledge, yet their symbolic nature limits integration with modern machine learning methods. Leveraging recent advances in graph neural networks (GNNs) and transformer-based language models, we present on2vec, a toolkit developed during the DBCLS BioHackathon 2025 for generating vector embeddings from OWL ontologies. on2vec integrates structural information from ontology hierarchies with semantic features from textual annotations using HuggingFace Sentence Transformers, producing domain-aware embeddings suitable for downstream biomedical applications and ontology-based reasoning tasks.