<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.4">Jekyll</generator><link href="https://index.biohackrxiv.org//feed/by_tag/BH25JP.xml" rel="self" type="application/atom+xml" /><link href="https://index.biohackrxiv.org//" rel="alternate" type="text/html" /><updated>2026-06-14T20:19:19+00:00</updated><id>https://index.biohackrxiv.org//feed/by_tag/BH25JP.xml</id><title type="html">BioHackrXiv Preprints</title><subtitle>Preprints for BioHackathons</subtitle><author><name>GitHub User</name><email>your-email@domain.com</email></author><entry><title type="html">QPX: Pathway analysis environment</title><link href="https://index.biohackrxiv.org//2026/01/06/m37f2.html" rel="alternate" type="text/html" title="QPX: Pathway analysis environment" /><published>2026-01-06T00:00:00+00:00</published><updated>2026-01-06T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2026/01/06/m37f2</id><content type="html" xml:base="https://index.biohackrxiv.org//2026/01/06/m37f2.html"><![CDATA[<p>Building on our work at DBCLS BioHackathon 2023 (BH23), where we introduced QPX and promoted pathway modeling with WikiPathways (Pico et al., 2008)
using PathVisio (Kutmon et al., 2015), we now focused on creating new pathway diagrams for diverse species and registering them in WikiPathways with
functional annotations. In parallel, we deployed WikiPathways node data into Elasticsearch to enable fast and flexible search and integration of
pathway information.</p>]]></content><author><name>Hidemasa Bono</name></author><category term="BH25JP" /><summary type="html"><![CDATA[Building on our work at DBCLS BioHackathon 2023 (BH23), where we introduced QPX and promoted pathway modeling with WikiPathways (Pico et al., 2008) using PathVisio (Kutmon et al., 2015), we now focused on creating new pathway diagrams for diverse species and registering them in WikiPathways with functional annotations. In parallel, we deployed WikiPathways node data into Elasticsearch to enable fast and flexible search and integration of pathway information.]]></summary></entry><entry><title type="html">MCP server tools with RDF shapes</title><link href="https://index.biohackrxiv.org//2025/12/16/8qeh5.html" rel="alternate" type="text/html" title="MCP server tools with RDF shapes" /><published>2025-12-16T00:00:00+00:00</published><updated>2025-12-16T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2025/12/16/8qeh5</id><content type="html" xml:base="https://index.biohackrxiv.org//2025/12/16/8qeh5.html"><![CDATA[<p>In this paper, we present the work we have done during the Japan Biohackathon 2025 about implementing MCP servers
supported by RDF data shapes to improve natural language interactions with large RDF datasets using SPARQL.</p>]]></content><author><name>Jose Emilio Labra-Gayo</name></author><category term="BH25JP" /><summary type="html"><![CDATA[In this paper, we present the work we have done during the Japan Biohackathon 2025 about implementing MCP servers supported by RDF data shapes to improve natural language interactions with large RDF datasets using SPARQL.]]></summary></entry><entry><title type="html">DBCLS BioHackathon 2025 report on the WikiBlitz</title><link href="https://index.biohackrxiv.org//2025/10/24/7s6da.html" rel="alternate" type="text/html" title="DBCLS BioHackathon 2025 report on the WikiBlitz" /><published>2025-10-24T00:00:00+00:00</published><updated>2025-10-24T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2025/10/24/7s6da</id><content type="html" xml:base="https://index.biohackrxiv.org//2025/10/24/7s6da.html"><![CDATA[<p>As part of the DBCLS BioHackathon 2025, we organized a WikiBlitz to improve biodiversity knowledge by integrating iNaturalist, GBIF, Wikidata, and Wikipedia.
Participants identified local flora and fauna, filling gaps in multilingual Wikipedia articles. This report summarizes the methodology, results, and insights,
illustrating the usefulness of combining citizen science with digital platforms to enrich ecological data and promote biodiversity awareness.</p>]]></content><author><name>Andra Waagmeester</name></author><category term="BH25JP" /><category term="justdoi:10.1093/biosci/biaf104" /><category term="cito:usesMethodIn:10.37044/osf.io/5ue2s_v1" /><summary type="html"><![CDATA[As part of the DBCLS BioHackathon 2025, we organized a WikiBlitz to improve biodiversity knowledge by integrating iNaturalist, GBIF, Wikidata, and Wikipedia. Participants identified local flora and fauna, filling gaps in multilingual Wikipedia articles. This report summarizes the methodology, results, and insights, illustrating the usefulness of combining citizen science with digital platforms to enrich ecological data and promote biodiversity awareness.]]></summary></entry><entry><title type="html">on2vec: Ontology Embeddings with Graph Neural Networks and Sentence Transformers</title><link href="https://index.biohackrxiv.org//2025/10/21/4f763.html" rel="alternate" type="text/html" title="on2vec: Ontology Embeddings with Graph Neural Networks and Sentence Transformers" /><published>2025-10-21T00:00:00+00:00</published><updated>2025-10-21T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2025/10/21/4f763</id><content type="html" xml:base="https://index.biohackrxiv.org//2025/10/21/4f763.html"><![CDATA[<p>Ontologies provide structured vocabularies and relationships essential for organizing biological knowledge, yet their
symbolic nature limits integration with modern machine learning methods. Leveraging recent advances in graph neural
networks (GNNs) and transformer-based language models, we present on2vec, a toolkit developed during the DBCLS BioHackathon 2025
for generating vector embeddings from OWL ontologies. on2vec integrates structural information from ontology hierarchies with
semantic features from textual annotations using HuggingFace Sentence Transformers, producing domain-aware embeddings suitable
for downstream biomedical applications and ontology-based reasoning tasks.</p>]]></content><author><name>David Steinberg</name></author><category term="BH25JP" /><summary type="html"><![CDATA[Ontologies provide structured vocabularies and relationships essential for organizing biological knowledge, yet their symbolic nature limits integration with modern machine learning methods. Leveraging recent advances in graph neural networks (GNNs) and transformer-based language models, we present on2vec, a toolkit developed during the DBCLS BioHackathon 2025 for generating vector embeddings from OWL ontologies. on2vec integrates structural information from ontology hierarchies with semantic features from textual annotations using HuggingFace Sentence Transformers, producing domain-aware embeddings suitable for downstream biomedical applications and ontology-based reasoning tasks.]]></summary></entry><entry><title type="html">AI in Practice: Insights from a Community Survey of Biohackathon Participants</title><link href="https://index.biohackrxiv.org//2025/10/12/pza7v.html" rel="alternate" type="text/html" title="AI in Practice: Insights from a Community Survey of Biohackathon Participants" /><published>2025-10-12T00:00:00+00:00</published><updated>2025-10-12T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2025/10/12/pza7v</id><content type="html" xml:base="https://index.biohackrxiv.org//2025/10/12/pza7v.html"><![CDATA[<p>Understanding the practical application of artificial intelligence (AI) in research is increasingly
important as it becomes embedded in life sciences and bioinformatics. This paper reports on
a multilingual survey, developed through community discussions at the 2025 BioHackathon
in Japan and distributed through its networks, to capture current practices, successes, and
challenges in AI adoption. The survey, oﬀered in English, Japanese, and Thai, received 105
responses spanning diverse demographics, regions, and professional backgrounds. Findings
reveal that most participants are frequent AI users, with tools like ChatGPT, Gemini, and
Claude widely adopted, with ChatGPT as number one response. AI is primarily used to assist
or draft tasks in coding, research, and writing, while full task automation remains uncommon,
reflecting a preference for AI as a collaborative aid rather than a replacement. Successes
were noted in eﬃciency, coding support, and proposal writing, whereas challenges centered
on accuracy and reliability. Institutional support emerged as a key factor: respondents in
Japan, Thailand, and the private sector reported stronger support and higher satisfaction than
English-speaking or academic counterparts. By documenting real-world practices and concerns,
this survey provides a valuable community-driven resource to guide responsible AI development
and foster international collaboration in bioinformatics.</p>]]></content><author><name>Lucas Feriau</name></author><category term="BH25JP" /><summary type="html"><![CDATA[Understanding the practical application of artificial intelligence (AI) in research is increasingly important as it becomes embedded in life sciences and bioinformatics. This paper reports on a multilingual survey, developed through community discussions at the 2025 BioHackathon in Japan and distributed through its networks, to capture current practices, successes, and challenges in AI adoption. The survey, oﬀered in English, Japanese, and Thai, received 105 responses spanning diverse demographics, regions, and professional backgrounds. Findings reveal that most participants are frequent AI users, with tools like ChatGPT, Gemini, and Claude widely adopted, with ChatGPT as number one response. AI is primarily used to assist or draft tasks in coding, research, and writing, while full task automation remains uncommon, reflecting a preference for AI as a collaborative aid rather than a replacement. Successes were noted in eﬃciency, coding support, and proposal writing, whereas challenges centered on accuracy and reliability. Institutional support emerged as a key factor: respondents in Japan, Thailand, and the private sector reported stronger support and higher satisfaction than English-speaking or academic counterparts. By documenting real-world practices and concerns, this survey provides a valuable community-driven resource to guide responsible AI development and foster international collaboration in bioinformatics.]]></summary></entry><entry><title type="html">A Lightweight PURL Resolver for Linked Life Science Data</title><link href="https://index.biohackrxiv.org//2025/09/30/8kap3.html" rel="alternate" type="text/html" title="A Lightweight PURL Resolver for Linked Life Science Data" /><published>2025-09-30T00:00:00+00:00</published><updated>2025-09-30T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2025/09/30/8kap3</id><content type="html" xml:base="https://index.biohackrxiv.org//2025/09/30/8kap3.html"><![CDATA[<p>Knowledge graphs in the life sciences are increasingly published using the Resource Description Framework (RDF) and
queried via SPARQL endpoints. While these technologies enable powerful data integration, the identifiers returned in
SPARQL results often do not resolve to meaningful resources, leaving users with non-actionable links. To address this
issue, we developed a lightweight Persistent Uniform Resource Locator (PURL) resolver during the BioHackathon Japan 2025.
The resolver is implemented in PHP, chosen for its ubiquity on standard web servers and its compatibility with
the EasyRDF library for RDF handling. It is easy to configure, requires minimal maintenance, and supports both database
redirects and ontology term rendering with content negotiation for RDF serializations. The system is available as
open-source software (https://github.com/JKoblitz/purl-resolver) and deployed at https://purl.dsmz.de, where it now
resolves most identifiers from the DSMZ Digital Diversity SPARQL endpoint (https://sparql.dsmz.de). Database IRIs
lead to the corresponding web interfaces, ontology IRIs from the DSMZ Digital Diversity Ontology render directly as
term pages, and unmapped entities are delegated to database-side resolvers. This approach enhances the usability of
knowledge graphs by ensuring that all identifiers remain actionable for both humans and machines.</p>]]></content><author><name>Julia Koblitz</name></author><category term="BH25JP" /><category term="cito:citesAsDataSource:10.1093/nar/gkaa1025" /><category term="cito:citesAsRelated:10.1093/nar/gkr1097" /><category term="cito:citesAsDataSource:10.1093/nar/gkac803" /><category term="cito:citesAsDataSource:10.1093/nar/gkae959" /><summary type="html"><![CDATA[Knowledge graphs in the life sciences are increasingly published using the Resource Description Framework (RDF) and queried via SPARQL endpoints. While these technologies enable powerful data integration, the identifiers returned in SPARQL results often do not resolve to meaningful resources, leaving users with non-actionable links. To address this issue, we developed a lightweight Persistent Uniform Resource Locator (PURL) resolver during the BioHackathon Japan 2025. The resolver is implemented in PHP, chosen for its ubiquity on standard web servers and its compatibility with the EasyRDF library for RDF handling. It is easy to configure, requires minimal maintenance, and supports both database redirects and ontology term rendering with content negotiation for RDF serializations. The system is available as open-source software (https://github.com/JKoblitz/purl-resolver) and deployed at https://purl.dsmz.de, where it now resolves most identifiers from the DSMZ Digital Diversity SPARQL endpoint (https://sparql.dsmz.de). Database IRIs lead to the corresponding web interfaces, ontology IRIs from the DSMZ Digital Diversity Ontology render directly as term pages, and unmapped entities are delegated to database-side resolvers. This approach enhances the usability of knowledge graphs by ensuring that all identifiers remain actionable for both humans and machines.]]></summary></entry><entry><title type="html">A Standards-Compliant, Multi-Modal Platform for Offline Access to SRA Metadata</title><link href="https://index.biohackrxiv.org//2025/09/30/9jau6.html" rel="alternate" type="text/html" title="A Standards-Compliant, Multi-Modal Platform for Offline Access to SRA Metadata" /><published>2025-09-30T00:00:00+00:00</published><updated>2025-09-30T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2025/09/30/9jau6</id><content type="html" xml:base="https://index.biohackrxiv.org//2025/09/30/9jau6.html"><![CDATA[<p>The SRAmetaDBB project, presented at BioHackathon Japan 2023, introduced an experimental JavaScript pipeline
for creating SQLite databases from NCBI SRA (Sequence Read Archive) metadata dumps, with a vision for offline
analysis and integration with Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems.
While promising, the prototype faced significant challenges in performance, memory management, and production
readiness when scaling to the full SRA dataset of over 45 million records. This paper presents SRAKE (SRA
Knowledge Engine), a complete reimplementation in Go that not only addresses these limitations but extends
the original vision with semantic search capabilities, quality control mechanisms, and multiple access
interfaces. SRAKE achieves a 20-fold improvement in ingestion speed, maintains constant memory usage through
zero-copy streaming, and provides standards-compliant interfaces following clig.dev guidelines. The platform
introduces biomedical-specific semantic search using SapBERT embeddings via ONNX Runtime, implements
comprehensive quality control thresholds for search results, and offers multiple access modalities including
a CLI, REST API, MCP server for AI integration, and a simple web interface. Our development implementation
demonstrates that SRAKE successfully transforms the experimental SRAmetaDBB concept into a production-ready
platform, and seamless integration with modern AI workflows while maintaining the core vision of providing
offline-capable, LLM-ready access to SRA metadata.</p>]]></content><author><name>Nishad Thalhath</name></author><category term="BH25JP" /><summary type="html"><![CDATA[The SRAmetaDBB project, presented at BioHackathon Japan 2023, introduced an experimental JavaScript pipeline for creating SQLite databases from NCBI SRA (Sequence Read Archive) metadata dumps, with a vision for offline analysis and integration with Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems. While promising, the prototype faced significant challenges in performance, memory management, and production readiness when scaling to the full SRA dataset of over 45 million records. This paper presents SRAKE (SRA Knowledge Engine), a complete reimplementation in Go that not only addresses these limitations but extends the original vision with semantic search capabilities, quality control mechanisms, and multiple access interfaces. SRAKE achieves a 20-fold improvement in ingestion speed, maintains constant memory usage through zero-copy streaming, and provides standards-compliant interfaces following clig.dev guidelines. The platform introduces biomedical-specific semantic search using SapBERT embeddings via ONNX Runtime, implements comprehensive quality control thresholds for search results, and offers multiple access modalities including a CLI, REST API, MCP server for AI integration, and a simple web interface. Our development implementation demonstrates that SRAKE successfully transforms the experimental SRAmetaDBB concept into a production-ready platform, and seamless integration with modern AI workflows while maintaining the core vision of providing offline-capable, LLM-ready access to SRA metadata.]]></summary></entry><entry><title type="html">DBCLS BioHackathon 2025 report: Creation and Publication Analytical Workflow of Creators’ Interests</title><link href="https://index.biohackrxiv.org//2025/09/30/qd5sz.html" rel="alternate" type="text/html" title="DBCLS BioHackathon 2025 report: Creation and Publication Analytical Workflow of Creators’ Interests" /><published>2025-09-30T00:00:00+00:00</published><updated>2025-09-30T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2025/09/30/qd5sz</id><content type="html" xml:base="https://index.biohackrxiv.org//2025/09/30/qd5sz.html"><![CDATA[<p>At the DBCLS BioHackathon 2025, we converted metatranscriptomic analytical shell scripts into Common Workflow Language (CWL) containerized
with Docker. Sub-workflows were created for metagenomic assembly, read mapping, and gene annotation, and validated with test datasets. The
workflows, released on GitHub and WorkflowHub, improve reproducibility and address issues of reusability and software environment dependency.
We also evaluated CWL best practices from the perspective of life scientists, classifying them by difficulty, importance, and applicability
to promote FAIR principles and software quality. In parallel, we established a benchmarking framework for pangenome-based structural variant
(SV) calling using data from the Dai population. Graph-based references from the Human and Chinese Pangenome Consortia were compared with
linear references using minimap2 and vg giraffe. Results showed improved alignment accuracy and variant detection with pangenomes,
demonstrating their value for reducing mapping bias and enhancing SV discovery.</p>]]></content><author><name>Ryo Mameda</name></author><category term="BH25JP" /><category term="justdoi:10.1038/s41597-025-05652-y" /><category term="justdoi:10.1145/3676288.3676300" /><category term="justdoi:10.1038/s41586-025-09290-7" /><category term="justdoi:10.1038/s41586-023-06173-7" /><summary type="html"><![CDATA[At the DBCLS BioHackathon 2025, we converted metatranscriptomic analytical shell scripts into Common Workflow Language (CWL) containerized with Docker. Sub-workflows were created for metagenomic assembly, read mapping, and gene annotation, and validated with test datasets. The workflows, released on GitHub and WorkflowHub, improve reproducibility and address issues of reusability and software environment dependency. We also evaluated CWL best practices from the perspective of life scientists, classifying them by difficulty, importance, and applicability to promote FAIR principles and software quality. In parallel, we established a benchmarking framework for pangenome-based structural variant (SV) calling using data from the Dai population. Graph-based references from the Human and Chinese Pangenome Consortia were compared with linear references using minimap2 and vg giraffe. Results showed improved alignment accuracy and variant detection with pangenomes, demonstrating their value for reducing mapping bias and enhancing SV discovery.]]></summary></entry><entry><title type="html">Translating and Formalizing the MIRAGE Guidelines to a Prototype MIRAGE Ontology and DCAT3 Extension Vocabulary for Glycomics Data Management</title><link href="https://index.biohackrxiv.org//2025/09/30/wj8bz.html" rel="alternate" type="text/html" title="Translating and Formalizing the MIRAGE Guidelines to a Prototype MIRAGE Ontology and DCAT3 Extension Vocabulary for Glycomics Data Management" /><published>2025-09-30T00:00:00+00:00</published><updated>2025-09-30T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2025/09/30/wj8bz</id><content type="html" xml:base="https://index.biohackrxiv.org//2025/09/30/wj8bz.html"><![CDATA[<p>The Minimum Information Required for A Glycomics Experiment (MIRAGE) guidelines have
established comprehensive reporting standards for glycomics research, yet their implementation
in semantic web technologies remains limited. We present the first comprehensive semantic
formalization of MIRAGE guidelines through an integrated RDF ontology framework comprising
the MIRAGE Ontology and MIRAGE-DCAT3 vocabulary. The MIRAGE Ontology
models glycan structures, biological specimens, analytical instruments, and experimental
processes with formal OWL semantics and SHACL validation constraints. The complementary
MIRAGE-DCAT3 vocabulary extends W3C DCAT3 with glycomics-specific metadata properties
for dataset cataloging and discovery. Our implementation addresses critical challenges in
glycomics data interoperability through comprehensive mappings to established ontologies
including GlycoRDF, PSI-MS, and DCTERMS. This semantic framework enables automated
quality assessment, federated data querying, and enhanced reproducibility in glycomics research,
supporting broader adoption of FAIR principles in the glycobiology community. The
framework demonstrates comprehensive coverage of MIRAGE reporting requirements across
multiple analytical platforms including mass spectrometry, liquid chromatography, capillary
electrophoresis, NMR spectroscopy, and lectin microarray analysis.</p>]]></content><author><name>Achille Zappa</name></author><category term="BH25JP" /><summary type="html"><![CDATA[The Minimum Information Required for A Glycomics Experiment (MIRAGE) guidelines have established comprehensive reporting standards for glycomics research, yet their implementation in semantic web technologies remains limited. We present the first comprehensive semantic formalization of MIRAGE guidelines through an integrated RDF ontology framework comprising the MIRAGE Ontology and MIRAGE-DCAT3 vocabulary. The MIRAGE Ontology models glycan structures, biological specimens, analytical instruments, and experimental processes with formal OWL semantics and SHACL validation constraints. The complementary MIRAGE-DCAT3 vocabulary extends W3C DCAT3 with glycomics-specific metadata properties for dataset cataloging and discovery. Our implementation addresses critical challenges in glycomics data interoperability through comprehensive mappings to established ontologies including GlycoRDF, PSI-MS, and DCTERMS. This semantic framework enables automated quality assessment, federated data querying, and enhanced reproducibility in glycomics research, supporting broader adoption of FAIR principles in the glycobiology community. The framework demonstrates comprehensive coverage of MIRAGE reporting requirements across multiple analytical platforms including mass spectrometry, liquid chromatography, capillary electrophoresis, NMR spectroscopy, and lectin microarray analysis.]]></summary></entry></feed>