Meetings

Recent preprints

  • Translating and Formalizing the MIRAGE Guidelines to a Prototype MIRAGE Ontology and DCAT3 Extension Vocabulary for Glycomics Data Management

    The Minimum Information Required for A Glycomics Experiment (MIRAGE) guidelines have established comprehensive reporting standards for glycomics research, yet their implementation in semantic web technologies remains limited. We present the first comprehensive semantic formalization of MIRAGE guidelines through an integrated RDF ontology framework comprising the MIRAGE Ontology and MIRAGE-DCAT3 vocabulary. The MIRAGE Ontology models glycan structures, biological specimens, analytical instruments, and experimental processes with formal OWL semantics and SHACL validation constraints. The complementary MIRAGE-DCAT3 vocabulary extends W3C DCAT3 with glycomics-specific metadata properties for dataset cataloging and discovery. Our implementation addresses critical challenges in glycomics data interoperability through comprehensive mappings to established ontologies including GlycoRDF, PSI-MS, and DCTERMS. This semantic framework enables automated quality assessment, federated data querying, and enhanced reproducibility in glycomics research, supporting broader adoption of FAIR principles in the glycobiology community. The framework demonstrates comprehensive coverage of MIRAGE reporting requirements across multiple analytical platforms including mass spectrometry, liquid chromatography, capillary electrophoresis, NMR spectroscopy, and lectin microarray analysis.
  • DBCLS BioHackathon 2025 report: Creation and Publication Analytical Workflow of Creators' Interests

    At the DBCLS BioHackathon 2025, we converted metatranscriptomic analytical shell scripts into Common Workflow Language (CWL) containerized with Docker. Sub-workflows were created for metagenomic assembly, read mapping, and gene annotation, and validated with test datasets. The workflows, released on GitHub and WorkflowHub, improve reproducibility and address issues of reusability and software environment dependency. We also evaluated CWL best practices from the perspective of life scientists, classifying them by difficulty, importance, and applicability to promote FAIR principles and software quality. In parallel, we established a benchmarking framework for pangenome-based structural variant (SV) calling using data from the Dai population. Graph-based references from the Human and Chinese Pangenome Consortia were compared with linear references using minimap2 and vg giraffe. Results showed improved alignment accuracy and variant detection with pangenomes, demonstrating their value for reducing mapping bias and enhancing SV discovery.
  • A Standards-Compliant, Multi-Modal Platform for Offline Access to SRA Metadata

    The SRAmetaDBB project, presented at BioHackathon Japan 2023, introduced an experimental JavaScript pipeline for creating SQLite databases from NCBI SRA (Sequence Read Archive) metadata dumps, with a vision for offline analysis and integration with Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems. While promising, the prototype faced significant challenges in performance, memory management, and production readiness when scaling to the full SRA dataset of over 45 million records. This paper presents SRAKE (SRA Knowledge Engine), a complete reimplementation in Go that not only addresses these limitations but extends the original vision with semantic search capabilities, quality control mechanisms, and multiple access interfaces. SRAKE achieves a 20-fold improvement in ingestion speed, maintains constant memory usage through zero-copy streaming, and provides standards-compliant interfaces following clig.dev guidelines. The platform introduces biomedical-specific semantic search using SapBERT embeddings via ONNX Runtime, implements comprehensive quality control thresholds for search results, and offers multiple access modalities including a CLI, REST API, MCP server for AI integration, and a simple web interface. Our development implementation demonstrates that SRAKE successfully transforms the experimental SRAmetaDBB concept into a production-ready platform, and seamless integration with modern AI workflows while maintaining the core vision of providing offline-capable, LLM-ready access to SRA metadata.
  • A Lightweight PURL Resolver for Linked Life Science Data

    Knowledge graphs in the life sciences are increasingly published using the Resource Description Framework (RDF) and queried via SPARQL endpoints. While these technologies enable powerful data integration, the identifiers returned in SPARQL results often do not resolve to meaningful resources, leaving users with non-actionable links. To address this issue, we developed a lightweight Persistent Uniform Resource Locator (PURL) resolver during the BioHackathon Japan 2025. The resolver is implemented in PHP, chosen for its ubiquity on standard web servers and its compatibility with the EasyRDF library for RDF handling. It is easy to configure, requires minimal maintenance, and supports both database redirects and ontology term rendering with content negotiation for RDF serializations. The system is available as open-source software (https://github.com/JKoblitz/purl-resolver) and deployed at https://purl.dsmz.de, where it now resolves most identifiers from the DSMZ Digital Diversity SPARQL endpoint (https://sparql.dsmz.de). Database IRIs lead to the corresponding web interfaces, ontology IRIs from the DSMZ Digital Diversity Ontology render directly as term pages, and unmapped entities are delegated to database-side resolvers. This approach enhances the usability of knowledge graphs by ensuring that all identifiers remain actionable for both humans and machines.
  • AI for Computational Biology: Highlights from the first BioAI Hackathon at University of Warsaw

    The BioAI Hackathon at the Centre of New Technologies at the University of Warsaw convened 43 international researchers to collaboratively explore artificial intelligence (AI) approaches for solving complex challenges in computational biology. Nine interdisciplinary and multi-institutional teams addressed the following problems: disease-gene prioritization, microbiome analysis, drug-protein interactions, alternative splicing prediction, chromatin architecture study and toxicological profiling. Using cutting-edge tools such as graph neural networks (GNNs), large language models (LLMs), and multi-omics integration frameworks, participants developed scalable and reproducible analytical pipelines. The results include: a disease gene prioritization framework using GNNs, a microbiome dynamics analysis for poultry health prediction and the construction of chromatin structure-aware regulatory networks. All projects follow the open science principles and display translational potential. This hackathon underscores the transformative role of AI in biomedicine and the value of collaborative, time-bounded innovation for accelerating discovery in life sciences. All projects are publicly available on GitHub: https://github.com/SFGLab
  • Enhancing Digital Infrastructures and Data Handling Practices for Single Specimen Barcoding - the 2024 BGE Barcoding Hackathon

    The 2024 BGE Barcoding Hackathon, hosted by the Biodiversity Genomics Europe (BGE) project in Leiden, Netherlands, focused on advancing digital infrastructures and data handling practices for single specimen barcoding. This event brought together 24 participants from various consortium member institutes to enhance workflows in DNA barcoding, which complements genome sequencing efforts within BGE. The hackathon was structured around four thematic pillars: Data Generation and Processing, BOLD Release Candidate, Wider Data Integration, and Reference Data Curation. Participants worked on optimizing barcode generation pipelines, developing a new version of the Barcode of Life Data Systems (BOLD) data publishing platform, harmonizing data standards for better integration with databases like ENA and UNITE, and automating curation processes for DNA barcoding records. The outcomes include improved workflows, enhanced data interoperability, and refined curation standards, which collectively support BGE’s goals of achieving a step change in molecular biodiversity monitoring and research.
  • Oncomatch- Optimizing Oncology Combination Therapy Prediction

    Advances in precision medicine are reshaping cancer treatment by tailoring therapies to a patient’s specific genetic profile. Despite this, matching cancer mutations to effective drugs remains a complex task due to variability in mutations across cancer types and limited tools for practical clinical application. In this project, initially developed during the BioIT Hackathon2025, we created OncoMatch—an open-data-powered web application designed to bridge thisgap by integrating genomic, transcriptomic, proteomic, and drug-target interaction data tosupport therapy selection.Building on prior work in colorectal cancer, we expanded our scope to include bladder, ovarian,and non-small cell lung cancer (NSCLC), using the COSMIC and DrugCentral databasesto identify relevant gene mutations and therapeutics. We developed two novel scoringsystems—the Cancer Precision Score (CPS) and Gene Precision Score (GPS)—to evaluatedrug specificity and potential effectiveness. Using data from DrugCentral, LINCS L1000,and DeepCoverMOA, we created a unified bioactivity dataset for over 4,000 drugs, including measures such as IC50 and Kd values.The OncoMatch platform features interactive tools to visualize drug bioactivity, assess multiomic and structural similarity among compounds, and explore potential drug combinations. Users can query drugs by cancer type and gene mutation, generating insights into the mostpromising therapies and alternatives. Our open source approach not only democratizes access to high quality bioinformatics tools but also encourages data driven, personalized cancer care. Future directions include refining subtype level predictions and improving the platform’s utility for combinatorial therapy planning. We have developed a streamlit app to make it easy to access this data. That app can be found at https://oncomatchapp-precision-medicine.streamlit.app.
  • 1
  • 2