<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.4">Jekyll</generator><link href="https://index.biohackrxiv.org//feed/by_tag/BH23JP.xml" rel="self" type="application/atom+xml" /><link href="https://index.biohackrxiv.org//" rel="alternate" type="text/html" /><updated>2026-06-14T20:19:19+00:00</updated><id>https://index.biohackrxiv.org//feed/by_tag/BH23JP.xml</id><title type="html">BioHackrXiv Preprints</title><subtitle>Preprints for BioHackathons</subtitle><author><name>GitHub User</name><email>your-email@domain.com</email></author><entry><title type="html">Enhancement of the Interoperability of Trait Data on Genetic Resources between Japan and France</title><link href="https://index.biohackrxiv.org//2025/12/23/hw2fj.html" rel="alternate" type="text/html" title="Enhancement of the Interoperability of Trait Data on Genetic Resources between Japan and France" /><published>2025-12-23T00:00:00+00:00</published><updated>2025-12-23T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2025/12/23/hw2fj</id><content type="html" xml:base="https://index.biohackrxiv.org//2025/12/23/hw2fj.html"><![CDATA[<p>Japan’s National Agriculture and Food Research Organization initiated a collaborative research project with France’s National Research Institute
for Agriculture, Food and Environment to evaluate wheat genetic resources and to identify materials with desirable traits using standardized
criteria. This paper presents the current status of trait data standardization between the two organizations and outlines a direction for
standardization. Trait data for genetic resources in Japan and France are managed using independently developed standards. The lack of mapping
standards hinders data integration and interoperability. To support experts in the mapping process, we developed a tool that translates trait
terms. A generative AI-based translation tool appears to be applicable for collecting relevant information to support mapping between trait
terms, as well as translating newly submitted Japanese trait terms into English.</p>]]></content><author><name>Akane Takezaki</name></author><category term="BH23JP" /><summary type="html"><![CDATA[Japan’s National Agriculture and Food Research Organization initiated a collaborative research project with France’s National Research Institute for Agriculture, Food and Environment to evaluate wheat genetic resources and to identify materials with desirable traits using standardized criteria. This paper presents the current status of trait data standardization between the two organizations and outlines a direction for standardization. Trait data for genetic resources in Japan and France are managed using independently developed standards. The lack of mapping standards hinders data integration and interoperability. To support experts in the mapping process, we developed a tool that translates trait terms. A generative AI-based translation tool appears to be applicable for collecting relevant information to support mapping between trait terms, as well as translating newly submitted Japanese trait terms into English.]]></summary></entry><entry><title type="html">SPARQL services for InterMine databases</title><link href="https://index.biohackrxiv.org//2024/04/24/dpnry.html" rel="alternate" type="text/html" title="SPARQL services for InterMine databases" /><published>2024-04-24T00:00:00+00:00</published><updated>2024-04-24T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2024/04/24/dpnry</id><content type="html" xml:base="https://index.biohackrxiv.org//2024/04/24/dpnry.html"><![CDATA[<p>InterMine is an open source data warehouse system that can be used to create biological databases that can be accessed via web query tools. There are many public InterMine instances that are currently deployed worldwide and they share a core data model pertaining to common biological entities. Besides the core data model, each instance of InterMine typically has an extended data model to cover data specific to that particular deployment. The data is organised according to the graph-based data model but exists in a relational store (Postgres). The goal of this project was to explore the possibility of translating InterMine data from relational form to a graph form using Resource Description Framework (RDF) as the exchange format. This could provide a route to exposing data from InterMine instances as RDF triples and thus making it possible to query the data using the SPARQL Protocol and RDF Querying Language (SPARQL).</p>]]></content><author><name>François Belleau</name></author><category term="BH23JP" /><summary type="html"><![CDATA[InterMine is an open source data warehouse system that can be used to create biological databases that can be accessed via web query tools. There are many public InterMine instances that are currently deployed worldwide and they share a core data model pertaining to common biological entities. Besides the core data model, each instance of InterMine typically has an extended data model to cover data specific to that particular deployment. The data is organised according to the graph-based data model but exists in a relational store (Postgres). The goal of this project was to explore the possibility of translating InterMine data from relational form to a graph form using Resource Description Framework (RDF) as the exchange format. This could provide a route to exposing data from InterMine instances as RDF triples and thus making it possible to query the data using the SPARQL Protocol and RDF Querying Language (SPARQL).]]></summary></entry><entry><title type="html">BioHackJP 2023 Report R1:Improving phenotype ontology interoperability</title><link href="https://index.biohackrxiv.org//2024/01/24/d27fw.html" rel="alternate" type="text/html" title="BioHackJP 2023 Report R1:Improving phenotype ontology interoperability" /><published>2024-01-24T00:00:00+00:00</published><updated>2024-01-24T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2024/01/24/d27fw</id><content type="html" xml:base="https://index.biohackrxiv.org//2024/01/24/d27fw.html"><![CDATA[<p>Ontologies play a crucial role in data management and especially in life science, they have been indispensable for decades as the complexity of life science data requires rigor. Biomedical ontologies often undergo change and improvement, as e.g. disease and phenotype ontologies develop constantly along with our scientific understanding. In order to bridge the gap between ontologies and annotated datasets and thus to semantically enable applications and datasets to retrieve insights and improve interoperability, ontology mapping plays a key role.To implement a sophisticated search supported by semantics, interoperability to address cross-disciplinary needs is crucial. In this paper we focus on different aspects of interoperability of ontologies, especially in the phenotype and disease domain and how they could be improved. During the BioHackJP 2023, a variety of approaches were discussed and evaluated. In this paper, we report overviews of the result of each investigation including, 1: Linguistic and Social Interoperability, 2: Technical and Structural Interoperability, 3: Ontology Alignments and Mappings, 4: Use of Large Language Models (LLMs), 5: Model Mice Exploration, and discuss future works to address these challenges.</p>]]></content><author><name>Eisuke Dohi</name></author><category term="BH23JP" /><summary type="html"><![CDATA[Ontologies play a crucial role in data management and especially in life science, they have been indispensable for decades as the complexity of life science data requires rigor. Biomedical ontologies often undergo change and improvement, as e.g. disease and phenotype ontologies develop constantly along with our scientific understanding. In order to bridge the gap between ontologies and annotated datasets and thus to semantically enable applications and datasets to retrieve insights and improve interoperability, ontology mapping plays a key role.To implement a sophisticated search supported by semantics, interoperability to address cross-disciplinary needs is crucial. In this paper we focus on different aspects of interoperability of ontologies, especially in the phenotype and disease domain and how they could be improved. During the BioHackJP 2023, a variety of approaches were discussed and evaluated. In this paper, we report overviews of the result of each investigation including, 1: Linguistic and Social Interoperability, 2: Technical and Structural Interoperability, 3: Ontology Alignments and Mappings, 4: Use of Large Language Models (LLMs), 5: Model Mice Exploration, and discuss future works to address these challenges.]]></summary></entry><entry><title type="html">BioHackJP 2023 Report R1: Mapping human genome variations to their mouse counterparts for identifying disease model mouse strains</title><link href="https://index.biohackrxiv.org//2024/01/20/8kuzr.html" rel="alternate" type="text/html" title="BioHackJP 2023 Report R1: Mapping human genome variations to their mouse counterparts for identifying disease model mouse strains" /><published>2024-01-20T00:00:00+00:00</published><updated>2024-01-20T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2024/01/20/8kuzr</id><content type="html" xml:base="https://index.biohackrxiv.org//2024/01/20/8kuzr.html"><![CDATA[<p>In disease model mouse strains used for human disease studies, information on genomic variations is essential for elucidating the relationship between haplotypes and disease susceptibility. To select a disease model mouse appropriately, it is crucial to identify mouse variants with the same effect as disease-causing variants in humans. In BioHackathon Japan J2023, we focused on nucleotide variants involved in amino acid substitutions. We developed an API that matches mouse variants from the MoG+ database to human variants within gene regions defined by HGNC identifiers or symbols. After the Hackathon, we will map non-coding variants in addition to coding variants. The outcomes of our variant mapping will be presented as links connecting the comprehensive human variation database, TogoVar, and the model mouse genome database, MoG.</p>]]></content><author><name>Nobutaka Mitsuhashi</name></author><category term="BH23JP" /><summary type="html"><![CDATA[In disease model mouse strains used for human disease studies, information on genomic variations is essential for elucidating the relationship between haplotypes and disease susceptibility. To select a disease model mouse appropriately, it is crucial to identify mouse variants with the same effect as disease-causing variants in humans. In BioHackathon Japan J2023, we focused on nucleotide variants involved in amino acid substitutions. We developed an API that matches mouse variants from the MoG+ database to human variants within gene regions defined by HGNC identifiers or symbols. After the Hackathon, we will map non-coding variants in addition to coding variants. The outcomes of our variant mapping will be presented as links connecting the comprehensive human variation database, TogoVar, and the model mouse genome database, MoG.]]></summary></entry><entry><title type="html">Efforts to analyze pathways in non-model organisms</title><link href="https://index.biohackrxiv.org//2023/10/26/spf3q.html" rel="alternate" type="text/html" title="Efforts to analyze pathways in non-model organisms" /><published>2023-10-26T00:00:00+00:00</published><updated>2023-10-26T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2023/10/26/spf3q</id><content type="html" xml:base="https://index.biohackrxiv.org//2023/10/26/spf3q.html"><![CDATA[<p>In addition to functional annotation of genes, annotating genes to pathways is important in current molecular biology.But, pathway diagrams are required to annotate genes to nodes of those.Therefore, it is important to draw pathway diagrams with assignment to genes and metabolites.Existing metabolic pathway databases focus on generic pathways, while secondary metabolism is emphasized in organisms producing useful substances.Moreover they cannot accept third party annotation of those data.A practical system for pathway analyses is therefore really needed.Following on from the previous BioHackathon (BH23), we first discussed how to create a database of pathway information in non-model species in a domestic version of the BioHackathon called BH23.9 held in Shirahama, Wakayama, Japan (25-29 September 2023).We then gave a tutorial on how to write a pathway diagram using PathVisio, which is a free open-source pathway analysis and drawing software which allows drawing, editing, and analyzing biological pathways. Finally we tried to establish the conversion system from text data to Graphical Pathway Markup Language (GPML), which is called txt2gpml.txt2gpml will drastically reduce the time and effort required to create pathway diagrams.After a stimulus discussion in BH23 and BH23.9, we could clarify the current issues in the pathway analysis for non-model organisms.</p>]]></content><author><name>Naoya Oec</name></author><category term="BH23JP" /><summary type="html"><![CDATA[In addition to functional annotation of genes, annotating genes to pathways is important in current molecular biology.But, pathway diagrams are required to annotate genes to nodes of those.Therefore, it is important to draw pathway diagrams with assignment to genes and metabolites.Existing metabolic pathway databases focus on generic pathways, while secondary metabolism is emphasized in organisms producing useful substances.Moreover they cannot accept third party annotation of those data.A practical system for pathway analyses is therefore really needed.Following on from the previous BioHackathon (BH23), we first discussed how to create a database of pathway information in non-model species in a domestic version of the BioHackathon called BH23.9 held in Shirahama, Wakayama, Japan (25-29 September 2023).We then gave a tutorial on how to write a pathway diagram using PathVisio, which is a free open-source pathway analysis and drawing software which allows drawing, editing, and analyzing biological pathways. Finally we tried to establish the conversion system from text data to Graphical Pathway Markup Language (GPML), which is called txt2gpml.txt2gpml will drastically reduce the time and effort required to create pathway diagrams.After a stimulus discussion in BH23 and BH23.9, we could clarify the current issues in the pathway analysis for non-model organisms.]]></summary></entry><entry><title type="html">BioHackJP 2023 Report R3: Plant data integration for findability across multiple databases</title><link href="https://index.biohackrxiv.org//2023/09/14/ghzcx.html" rel="alternate" type="text/html" title="BioHackJP 2023 Report R3: Plant data integration for findability across multiple databases" /><published>2023-09-14T00:00:00+00:00</published><updated>2023-09-14T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2023/09/14/ghzcx</id><content type="html" xml:base="https://index.biohackrxiv.org//2023/09/14/ghzcx.html"><![CDATA[<p>Plant research generate vast amount of heterogeneous data available in dispersed repositories. Therefore, accessing, integrating, and analyzing these datasets is a challenge caused by their low findability as well as format and standards variability. Several solutions including data standards (MIAPPE, BrAPI) and portals (FAIDARE) are recommended by the ELIXIR plant community through the RDM Kit plant pages. The BioHackathon Japan 2023 was an ideal event to outreach those solutions toward the Japanese researchers and bioinformaticians in order to increase visibility of Japanese databases in the plant research data discovery portal FAIDARE and explore the use of the Breeding API for knowledge graph.</p>]]></content><author><name>Cyril Pommier</name></author><category term="BH23JP" /><summary type="html"><![CDATA[Plant research generate vast amount of heterogeneous data available in dispersed repositories. Therefore, accessing, integrating, and analyzing these datasets is a challenge caused by their low findability as well as format and standards variability. Several solutions including data standards (MIAPPE, BrAPI) and portals (FAIDARE) are recommended by the ELIXIR plant community through the RDM Kit plant pages. The BioHackathon Japan 2023 was an ideal event to outreach those solutions toward the Japanese researchers and bioinformaticians in order to increase visibility of Japanese databases in the plant research data discovery portal FAIDARE and explore the use of the Breeding API for knowledge graph.]]></summary></entry><entry><title type="html">Redesign of the validation framework in LinkML</title><link href="https://index.biohackrxiv.org//2023/07/18/8ukwz.html" rel="alternate" type="text/html" title="Redesign of the validation framework in LinkML" /><published>2023-07-18T00:00:00+00:00</published><updated>2023-07-18T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2023/07/18/8ukwz</id><content type="html" xml:base="https://index.biohackrxiv.org//2023/07/18/8ukwz.html"><![CDATA[<p>LinkML is a data modeling language that can be used to describe the structure and semantics of data from a specific domain. But as with any modeling language, there is a need for tools that support validation of data. The LinkML provides a set of validation tools but there is a growing need to adapt the tools for a broader audience. The work highlighted in this report describes the efforts of redesigning the validation framework in LinkML to better support a wider range of validation scenarios and use cases.</p>]]></content><author><name>Deepak Unni</name></author><category term="BH23JP" /><summary type="html"><![CDATA[LinkML is a data modeling language that can be used to describe the structure and semantics of data from a specific domain. But as with any modeling language, there is a need for tools that support validation of data. The LinkML provides a set of validation tools but there is a growing need to adapt the tools for a broader audience. The work highlighted in this report describes the efforts of redesigning the validation framework in LinkML to better support a wider range of validation scenarios and use cases.]]></summary></entry><entry><title type="html">Machine learning of transcriptome data treated with DNA base editor</title><link href="https://index.biohackrxiv.org//2023/07/13/zytkj.html" rel="alternate" type="text/html" title="Machine learning of transcriptome data treated with DNA base editor" /><published>2023-07-13T00:00:00+00:00</published><updated>2023-07-13T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2023/07/13/zytkj</id><content type="html" xml:base="https://index.biohackrxiv.org//2023/07/13/zytkj.html"><![CDATA[<p>Base Editor, a technique that utilizes Cas9 nickase fused with deaminase to introduce single base substitutions, has significantly facilitated the creation of valuable genome variants in medical and agricultural fields. However, a phenomenon known as RNA off-target effects is recognized with Base Editor, resulting in unintended substitutions in the transcriptome. It has been reported that such substitutions often occur in specific base motifs (ACW), but whether these motif mutations are dominant has not been investigated. In this study, we constructed a pipeline for analyzing RNA off-target effects, called the Pipeline for CRISPR-induced Transcriptome-wide Unintended RNA Editing (PiCTURE), and analyzed RNA-seq data previously reported. We found minor RNA off-target effects associated with the reported base motifs, and most were indistinguishable in motif analysis.Consequently, we trained a Large Language Model (LLM) specialized for DNA base sequences on RNA off-target sequences and developed a classifier for assessing the risk of RNA off-target effects based on the sequences. When the model’s estimations were applied to the RNA off-target data for BE4-rAPOBEC1 and BE4-RrA3F, satisfactory determination results were obtained. This study is the first to demonstrate the efficacy of machine learning approaches in determining RNA off-target effects caused by Base Editor and presents a predictive model for the safer use of Base Editor.</p>]]></content><author><name>Kazuki Nakamae</name></author><category term="BH23JP" /><summary type="html"><![CDATA[Base Editor, a technique that utilizes Cas9 nickase fused with deaminase to introduce single base substitutions, has significantly facilitated the creation of valuable genome variants in medical and agricultural fields. However, a phenomenon known as RNA off-target effects is recognized with Base Editor, resulting in unintended substitutions in the transcriptome. It has been reported that such substitutions often occur in specific base motifs (ACW), but whether these motif mutations are dominant has not been investigated. In this study, we constructed a pipeline for analyzing RNA off-target effects, called the Pipeline for CRISPR-induced Transcriptome-wide Unintended RNA Editing (PiCTURE), and analyzed RNA-seq data previously reported. We found minor RNA off-target effects associated with the reported base motifs, and most were indistinguishable in motif analysis.Consequently, we trained a Large Language Model (LLM) specialized for DNA base sequences on RNA off-target sequences and developed a classifier for assessing the risk of RNA off-target effects based on the sequences. When the model’s estimations were applied to the RNA off-target data for BE4-rAPOBEC1 and BE4-RrA3F, satisfactory determination results were obtained. This study is the first to demonstrate the efficacy of machine learning approaches in determining RNA off-target effects caused by Base Editor and presents a predictive model for the safer use of Base Editor.]]></summary></entry><entry><title type="html">BioHackJP 2023 Report R3: Expand the pathway analysis environment to non-model organisms</title><link href="https://index.biohackrxiv.org//2023/07/12/4uskb.html" rel="alternate" type="text/html" title="BioHackJP 2023 Report R3: Expand the pathway analysis environment to non-model organisms" /><published>2023-07-12T00:00:00+00:00</published><updated>2023-07-12T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2023/07/12/4uskb</id><content type="html" xml:base="https://index.biohackrxiv.org//2023/07/12/4uskb.html"><![CDATA[<p>Despite decades of pathway database efforts and freely available pathway modeling tools, most researchers publish their biological pathway knowledge as static image figures made with general illustration tools. Prior to the BioHackathon, we had identified 103,009 pathway figures in the literature and performed optical character recognition (OCR) (Pathway Figure OCR (Hanspers et al., 2020). As an initial exploration, we extracted chemical names, disease terms, and human gene names. We knew, however, that many of the pathways represented biological processes and entities specific for plant, microbial and numerous non-model organisms.To expand the pathway analysis environment to non-model organisms whose genomic and functional annotations are not organized in a central public database, we sought to expand the number of organism species included in the Pathway Figure OCR (PFOCR) database. Also, with continuing goal of expanding the use of WikiPathways (Pico et al., 2008) and the practice of modeling pathway information as proper data models, we trained new users of PathVisio (Kutmon et al., 2015) and guided them through the process of publishing at WikiPathways.</p>]]></content><author><name>Alexander Pico</name></author><category term="BH23JP" /><summary type="html"><![CDATA[Despite decades of pathway database efforts and freely available pathway modeling tools, most researchers publish their biological pathway knowledge as static image figures made with general illustration tools. Prior to the BioHackathon, we had identified 103,009 pathway figures in the literature and performed optical character recognition (OCR) (Pathway Figure OCR (Hanspers et al., 2020). As an initial exploration, we extracted chemical names, disease terms, and human gene names. We knew, however, that many of the pathways represented biological processes and entities specific for plant, microbial and numerous non-model organisms.To expand the pathway analysis environment to non-model organisms whose genomic and functional annotations are not organized in a central public database, we sought to expand the number of organism species included in the Pathway Figure OCR (PFOCR) database. Also, with continuing goal of expanding the use of WikiPathways (Pico et al., 2008) and the practice of modeling pathway information as proper data models, we trained new users of PathVisio (Kutmon et al., 2015) and guided them through the process of publishing at WikiPathways.]]></summary></entry><entry><title type="html">RDF Data integration using Shape Expressions</title><link href="https://index.biohackrxiv.org//2023/07/04/md73k.html" rel="alternate" type="text/html" title="RDF Data integration using Shape Expressions" /><published>2023-07-04T00:00:00+00:00</published><updated>2023-07-04T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2023/07/04/md73k</id><content type="html" xml:base="https://index.biohackrxiv.org//2023/07/04/md73k.html"><![CDATA[<p>The paper contains a report of the activities that have been done during the Biohackathon 2023 in Shodoshima, Japan in a project about RDF data integration using Shape Expressions. The paper describes several approaches that have been discussed to create RDF data subsets and some preliminary results applying some of those technologies. It also describes the work that has been done comparing RDF data modeling approaches like ShEx, LinkML and YAML files from rdfconfig.</p>]]></content><author><name>Jose Emilio Labra-Gayo</name></author><category term="BH23JP" /><summary type="html"><![CDATA[The paper contains a report of the activities that have been done during the Biohackathon 2023 in Shodoshima, Japan in a project about RDF data integration using Shape Expressions. The paper describes several approaches that have been discussed to create RDF data subsets and some preliminary results applying some of those technologies. It also describes the work that has been done comparing RDF data modeling approaches like ShEx, LinkML and YAML files from rdfconfig.]]></summary></entry></feed>