Meetings
BioHackSWAT4HCLS 2025
BioHackathon Europe 2025
4th BioHackathon Germany
DBCLS BioHackathon 2025
ELIXIR INTOXICOM
Recent preprints
-
Mining the potential of knowledge graphs for metadata on training
Training metadata in the life‑science community is increasingly standardized through Bioschemas, yet remains fragmented and under‑utilized. In this work we harvested training records from ELIXR’s TeSS platform and the Galaxy Training Network, converting them into a unified knowledge graph. A dedicated pipeline parses RDF/Turtle dumps, deduplicates entries, and builds rich indexes (keyword, provider, location, date, topic) that power a Model Context Protocol (MCP) server. The MCP offers live and offline search tools—including keyword, provider, location, date, topic, and SPARQL queries—enabling natural‑language access to training resources via LLM‑driven clients. User‑story driven evaluations demonstrate the system’s ability to generate custom learning paths, assemble trainer profiles, and link training data to external repositories. Findings highlight gaps in persistent identifiers (ORCID, ROR) and location granularity, informing recommendations for metadata providers. The project showcases how knowledge‑graph‑backed metadata can enhance discoverability, interoperability, and AI‑assisted exploration of scientific training materials. -
BioHackEU25 report: Scop3PTM Next - Interactive visualization of PTM data across sequence, structure and interactions
Scop3PTM Next was developed during BioHackathon Europe 2025 to address the need for integrated visualization of protein-centric data across sequence (and modification), interaction and structural contexts. The project delivers an open-source library of modular JavaScript components, built with Vue.js and documented with Storybook, enabling reusable and interoperable visualizations. The framework provides 1D sequence tracks, contact-map networks and interactive 3D structural renders, using MolSpecView for rendering 3D structures linked to Nightingale 1D tracks. Together, these components offer a unified interface for exploring PTM features across multiple representational layers. This work establishes the basis for a community-oriented visualization library to support proteomics analysis. -
A Blueprint for Open Science: How Transatlantic Teams Built and Deployed Knowledge Graphs to Enable Biological (AI) Models
Knowledge graphs (KGs) and large language models (LLMs) are increasingly applied in biomedical research; however, LLMs’ tendency to hallucinate and lack of evidence traceability poses significant challenges for rigorous scientific applications. To address these limitations, the NVIDIA - AWS Open Data Knowledge Graph Hackathon, which brought together transatlantic teams, catalyzed the development of novel frameworks that built or integrated KGs with graph-based retrieval-augmented generation (GraphRAG) to enhance evidence-grounded generative AI. The hackathon took place on October 1-3, 2025, at two locations - the AWS Skills Center in Arlington, VA, USA and the European Bioinformatics Institute (EBI) Training Center in Cambridge, UK. Across seven prototype projects, participating teams developed systems that construct, validate, and deploy biomedical KGs using open data and cloud-native infrastructure. These included GeNETwork, which integrates pediatric oncology datasets to identify therapeutic targets; ECoGraph, a multi-omics graph framework for characterizing colorectal cancer drivers; ClassiGraph, a graph neural network classifier for cancer subtypes; EasyGiraffe, a validator for multisite polygenicity extraction; MIDAS (Model Integration and Data Assembly System), a pipeline for harmonizing heterogeneous biomedical datasets; KG Model Garbage Collection, a framework for detecting and pruning erroneous AI-generated edges; and BioGraphRAG, which combines precision medicine and literature-derived KGs for evidence-based question answering. Together, these prototypes demonstrate practical strategies for constructing and deploying biomedical KGs and highlight the potential of GraphRAG to produce interpretable, verifiable AI-driven insights. By emphasizing open data, reproducible pipelines, and evidence-grounded reasoning, this work advances methodologies for trustworthy generative AI in biomedical discovery. -
DBCLS BioHackathon 2025 report on the WikiBlitz
As part of the DBCLS BioHackathon 2025, we organized a WikiBlitz to improve biodiversity knowledge by integrating iNaturalist, GBIF, Wikidata, and Wikipedia. Participants identified local flora and fauna, filling gaps in multilingual Wikipedia articles. This report summarizes the methodology, results, and insights, illustrating the usefulness of combining citizen science with digital platforms to enrich ecological data and promote biodiversity awareness. -
on2vec: Ontology Embeddings with Graph Neural Networks and Sentence Transformers
Ontologies provide structured vocabularies and relationships essential for organizing biological knowledge, yet their symbolic nature limits integration with modern machine learning methods. Leveraging recent advances in graph neural networks (GNNs) and transformer-based language models, we present on2vec, a toolkit developed during the DBCLS BioHackathon 2025 for generating vector embeddings from OWL ontologies. on2vec integrates structural information from ontology hierarchies with semantic features from textual annotations using HuggingFace Sentence Transformers, producing domain-aware embeddings suitable for downstream biomedical applications and ontology-based reasoning tasks. -
AI in Practice: Insights from a Community Survey of Biohackathon Participants
Understanding the practical application of artificial intelligence (AI) in research is increasingly important as it becomes embedded in life sciences and bioinformatics. This paper reports on a multilingual survey, developed through community discussions at the 2025 BioHackathon in Japan and distributed through its networks, to capture current practices, successes, and challenges in AI adoption. The survey, offered in English, Japanese, and Thai, received 105 responses spanning diverse demographics, regions, and professional backgrounds. Findings reveal that most participants are frequent AI users, with tools like ChatGPT, Gemini, and Claude widely adopted, with ChatGPT as number one response. AI is primarily used to assist or draft tasks in coding, research, and writing, while full task automation remains uncommon, reflecting a preference for AI as a collaborative aid rather than a replacement. Successes were noted in efficiency, coding support, and proposal writing, whereas challenges centered on accuracy and reliability. Institutional support emerged as a key factor: respondents in Japan, Thailand, and the private sector reported stronger support and higher satisfaction than English-speaking or academic counterparts. By documenting real-world practices and concerns, this survey provides a valuable community-driven resource to guide responsible AI development and foster international collaboration in bioinformatics. -
Translating and Formalizing the MIRAGE Guidelines to a Prototype MIRAGE Ontology and DCAT3 Extension Vocabulary for Glycomics Data Management
The Minimum Information Required for A Glycomics Experiment (MIRAGE) guidelines have established comprehensive reporting standards for glycomics research, yet their implementation in semantic web technologies remains limited. We present the first comprehensive semantic formalization of MIRAGE guidelines through an integrated RDF ontology framework comprising the MIRAGE Ontology and MIRAGE-DCAT3 vocabulary. The MIRAGE Ontology models glycan structures, biological specimens, analytical instruments, and experimental processes with formal OWL semantics and SHACL validation constraints. The complementary MIRAGE-DCAT3 vocabulary extends W3C DCAT3 with glycomics-specific metadata properties for dataset cataloging and discovery. Our implementation addresses critical challenges in glycomics data interoperability through comprehensive mappings to established ontologies including GlycoRDF, PSI-MS, and DCTERMS. This semantic framework enables automated quality assessment, federated data querying, and enhanced reproducibility in glycomics research, supporting broader adoption of FAIR principles in the glycobiology community. The framework demonstrates comprehensive coverage of MIRAGE reporting requirements across multiple analytical platforms including mass spectrometry, liquid chromatography, capillary electrophoresis, NMR spectroscopy, and lectin microarray analysis.