{
  "version": "https://jsonfeed.org/version/1.1",
  "title": "BioHackrXiv Preprints",
  "description": "Preprints for BioHackathons",
  "home_page_url": "https://index.biohackrxiv.org//",
  "feed_url": "https://index.biohackrxiv.org//feed.json",
  "icon": "https://index.biohackrxiv.org//assets/images/chem-bla-ics_logo.png",
  "language": "en",
  "authors": [
    {
      "name": "Egon Willighagen",
      "url": "https://orcid.org/0000-0001-7542-0286",
      "_orcid": "0000-0001-7542-0286"
    }
  ],
  "items": [
    {
      "id": "https://doi.org/10.37044/osf.io/etp3g_v1",
      "url": "https://index.biohackrxiv.org//2025/06/11/tp3g.html",
      "title": "Software Quality Indicators: extraction, categorisation andrecommendations from canonical sources",
      "content_html": "<p>Research software plays a central role in modern science, and its quality is increasinglyrecognized as essential for reproducibility, sustainability, and trust. Numerous initiatives haveproposed indicators to guide quality assessment, yet these indicators are dispersed acrossdomains and vary in scope, terminology, and practical use. This work presents a curatedcatalogue of software quality indicators tailored to the needs of research software. Developedduring BioHackathon Europe 2024 and refined in collaboration with the ELIXIR Tools Platformand EVERSE project, the catalogue consolidates and structures indicators from a range ofauthoritative sources.</p>\n\n<p>Over 300 indicators were gathered and systematically reviewed for relevance, clarity, andimplementation feasibility. Each was classified into thematic categories—such as Documen-tation, Security, Usability, and Sustainability—and annotated with target applicability, easeof evaluation, and recommended actions. Redundant, overly abstract, or narrowly scopedindicators were excluded or flagged, while additional tags highlighted cross-cutting concernssuch as licensing, testing, and community practices.</p>\n\n<p>The resulting open dataset, available as a structured spreadsheet, includes detailed metadataand decision criteria to support reuse, adaptation, and extension. The catalogue offers afoundation for context-specific assessment frameworks. Intended users include research softwaredevelopers and maintainers, evaluators, and developers of quality-focused tools and guidelines.</p>",
      "summary": "Research software plays a central role in modern science, and its quality is increasinglyrecognized as essential for reproducibility, sustainability, and trust. Numerous initiatives haveproposed indicators to guide quality assessment, yet these indicators are dispersed acrossdomains and vary in scope, terminology, and practical use. This work presents a curatedcatalogue of software quality indicators tailored to the needs of research software. Developedduring BioHackathon Europe 2024 and refined in collaboration with the ELIXIR Tools Platformand EVERSE project, the catalogue consolidates and structures indicators from a range ofauthoritative sources.",
      
      "date_published": "2025-06-11T00:00:00+00:00",
      "date_modified": "2025-06-11T00:00:00+00:00",
      "tags": ["BH24EU"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.37044/osf.io/3a8cn_v1",
      "url": "https://index.biohackrxiv.org//2025/06/04/3a8cn.html",
      "title": "Addressing Background Genomic and Environmental Effects on Health through Accelerated Computing and Machine Learning: Results from the 2025 Hackathon at Carnegie Mellon University",
      "content_html": "<p>In March 2025, 34 scientists from the United States, Ireland, the United Kingdom, Switzerland,France, Germany, Spain, India, and Australia gathered in Pittsburgh, Pennsylvania and virtuallyfor a collaborative biohackathon, hosted by DNAnexus and Carnegie Mellon University Libraries.The goal of the hackathon was to explore machine learning approaches for multimodalproblems in computational biology using public datasets. Teams worked on the followinginnovative projects: applying machine learning techniques for clustering and similarity analysisof haplotypes; adapting the StructLMM framework to study Gene-Gene (GxG) interactions;creating a nextflow workflow for generating an imputation reference panel using large-scalecohort data; optimizing discovery of causal relationships in large electronic health record (EHR)datasets using the open source causal analysis software Tetrad; examining the evolution of agraph neural network in a Lenski-esque experiment; and developing tools and workflows forgenerating pathway intersection diagrams and graph-based analyses for multiomics data. Allprojects were dedicated to study the background genomic and environmental effects underlyingcomplex genotype-phenotype relationships. Their objective was to set foundations for furtherstudies on predicting complex phenotypic traits using integrative multi-omic and environmentalanalyses.</p>",
      "summary": "In March 2025, 34 scientists from the United States, Ireland, the United Kingdom, Switzerland,France, Germany, Spain, India, and Australia gathered in Pittsburgh, Pennsylvania and virtuallyfor a collaborative biohackathon, hosted by DNAnexus and Carnegie Mellon University Libraries.The goal of the hackathon was to explore machine learning approaches for multimodalproblems in computational biology using public datasets. Teams worked on the followinginnovative projects: applying machine learning techniques for clustering and similarity analysisof haplotypes; adapting the StructLMM framework to study Gene-Gene (GxG) interactions;creating a nextflow workflow for generating an imputation reference panel using large-scalecohort data; optimizing discovery of causal relationships in large electronic health record (EHR)datasets using the open source causal analysis software Tetrad; examining the evolution of agraph neural network in a Lenski-esque experiment; and developing tools and workflows forgenerating pathway intersection diagrams and graph-based analyses for multiomics data. Allprojects were dedicated to study the background genomic and environmental effects underlyingcomplex genotype-phenotype relationships. Their objective was to set foundations for furtherstudies on predicting complex phenotypic traits using integrative multi-omic and environmentalanalyses.",
      
      "date_published": "2025-06-04T00:00:00+00:00",
      "date_modified": "2025-06-04T00:00:00+00:00",
      "tags": ["CMUDNA25"],
      
      
      
      
        "authors": [
        
          
            { "name": "Siddharth Sabata", "url": "https://orcid.org/" },
          
        
          
            { "name": "Jędrzej Kubica", "url": "https://orcid.org/" },
          
        
          
            { "name": "Rishika Gupta", "url": "https://orcid.org/" },
          
        
          
            { "name": "Lars Warren Ericson", "url": "https://orcid.org/" },
          
        
          
            { "name": "Halimat Chisom Atanda", "url": "https://orcid.org/" },
          
        
          
            { "name": "Gobikrishnan Subramaniam", "url": "https://orcid.org/" },
          
        
          
            { "name": "Abraham G. Moller", "url": "https://orcid.org/" },
          
        
          
            { "name": "Rachael Oluwakamiye Abolade", "url": "https://orcid.org/" },
          
        
          
            { "name": "Arth Banka", "url": "https://orcid.org/" },
          
        
          
            { "name": "Samuel Blechman", "url": "https://orcid.org/" },
          
        
          
            { "name": "Rorry Brenner", "url": "https://orcid.org/" },
          
        
          
            { "name": "Maria Chikina", "url": "https://orcid.org/" },
          
        
          
            { "name": "Li Chuin Chong", "url": "https://orcid.org/" },
          
        
          
            { "name": "Nicholas Cooley", "url": "https://orcid.org/" },
          
        
          
            { "name": "Daniel Chang", "url": "https://orcid.org/" },
          
        
          
            { "name": "Phil Greer", "url": "https://orcid.org/" },
          
        
          
            { "name": "Anshika Gupta", "url": "https://orcid.org/" },
          
        
          
            { "name": "Avish A. Jha", "url": "https://orcid.org/" },
          
        
          
            { "name": "Emrah Kacar", "url": "https://orcid.org/" },
          
        
          
            { "name": "Nanami Kubota", "url": "https://orcid.org/" },
          
        
          
            { "name": "William Lu", "url": "https://orcid.org/" },
          
        
          
            { "name": "Louison Luo", "url": "https://orcid.org/" },
          
        
          
            { "name": "Tien Ly", "url": "https://orcid.org/" },
          
        
          
            { "name": "Rajarshi Mondal", "url": "https://orcid.org/" },
          
        
          
            { "name": "Ciara O’Donoghue", "url": "https://orcid.org/" },
          
        
          
            { "name": "Aung Myat Phyo", "url": "https://orcid.org/" },
          
        
          
            { "name": "Peng Qiu", "url": "https://orcid.org/" },
          
        
          
            { "name": "Glenn Ross-Dolan", "url": "https://orcid.org/" },
          
        
          
            { "name": "Ali Saadat", "url": "https://orcid.org/" },
          
        
          
            { "name": "Shivank Sadasivan", "url": "https://orcid.org/" },
          
        
          
            { "name": "Rebecca Satterwhite", "url": "https://orcid.org/" },
          
        
          
            { "name": "Soham Shirolkar", "url": "https://orcid.org/" },
          
        
          
            { "name": "Yuning Zheng", "url": "https://orcid.org/" },
          
        
          
            { "name": "Huajin Wang", "url": "https://orcid.org/0000-0003-0121-4257" },
          
        
          
            { "name": "Melanie Gainey", "url": "https://orcid.org/" },
          
        
          
            { "name": "Ben Busby", "url": "https://orcid.org/0000-0001-5267-4988" }
          
        
        ]
      
    },
    {
      "id": "https://doi.org/10.37044/osf.io/xza73_v1",
      "url": "https://index.biohackrxiv.org//2025/06/02/xza73.html",
      "title": "Leveraging RDF and CURIE metadata resolution with identifiers.org",
      "content_html": "<p>Identifiers.org provides two core services for CURIEs in life sciences. One is a registry of CURIE prefixes and URL locations that contain entries for the main life sciences datasets. The other is a resolver that allows for consistent data access using registry information to redirect to current URLs for CURIE identifiers. For this work, we aimed to expand these services to facilitate the integration of CURIE-related metadata into different contexts. The first part of this exports the registry in RDF with a SPARQL server to allow queries on the dataset. Through these, RDF-based systems can associate with registry metadata on different data collections. Allowing, for example, services that have identifiers.org URLs to collect metadata on the collection that it references. The second part expands on the existing metadata resolver to be able to collect CURIE-related metadata from different metadata providers.While the previous resolver could only collect LDJSON notations from pages, it can now be expanded to collect from any metadata provider.For this work, we implement two proof of concept retrievers, one for EBI Search, a text search engine that allows for metadata acquisition, and one for TogoID, an ID mapping service for life sciences.Finally, we gather some future tasks for identifiers.org services.</p>",
      "summary": "Identifiers.org provides two core services for CURIEs in life sciences. One is a registry of CURIE prefixes and URL locations that contain entries for the main life sciences datasets. The other is a resolver that allows for consistent data access using registry information to redirect to current URLs for CURIE identifiers. For this work, we aimed to expand these services to facilitate the integration of CURIE-related metadata into different contexts. The first part of this exports the registry in RDF with a SPARQL server to allow queries on the dataset. Through these, RDF-based systems can associate with registry metadata on different data collections. Allowing, for example, services that have identifiers.org URLs to collect metadata on the collection that it references. The second part expands on the existing metadata resolver to be able to collect CURIE-related metadata from different metadata providers.While the previous resolver could only collect LDJSON notations from pages, it can now be expanded to collect from any metadata provider.For this work, we implement two proof of concept retrievers, one for EBI Search, a text search engine that allows for metadata acquisition, and one for TogoID, an ID mapping service for life sciences.Finally, we gather some future tasks for identifiers.org services.",
      
      "date_published": "2025-06-02T00:00:00+00:00",
      "date_modified": "2025-06-02T00:00:00+00:00",
      "tags": ["BH24EU"],
      
      
      
      
        "authors": [
        
          
            { "name": "Renato Juacaba Neto", "url": "https://orcid.org/0000-0002-0626-984X" },
          
        
          
            { "name": "Nick Juty", "url": "https://orcid.org/" },
          
        
          
            { "name": "Vijay Subramoniam", "url": "https://orcid.org/" },
          
        
          
            { "name": "Rose Neis", "url": "https://orcid.org/" },
          
        
          
            { "name": "Shuya Ikeda", "url": "https://orcid.org/" },
          
        
          
            { "name": "Shuichi Kawashima", "url": "https://orcid.org/0000-0001-7883-3756" },
          
        
          
            { "name": "Yasunori Yamamoto", "url": "https://orcid.org/0000-0002-6943-6887" },
          
        
          
            { "name": "Toshiaki Katayama", "url": "https://orcid.org/0000-0003-2391-0384" },
          
        
          
            { "name": "Henning Hermjakob", "url": "https://orcid.org/" }
          
        
        ]
      
    },
    {
      "id": "https://doi.org/10.37044/osf.io/ptmg5_v1",
      "url": "https://index.biohackrxiv.org//2025/05/17/ptmg5.html",
      "title": "BioHackEU24 report: Expanding FAIR database integration through elucidation and transformation of underlying graph schemas",
      "content_html": "<p>The BioDataFuse (BDF) project aims to enhance the interoperability of biomedical data through modular integration of data from diverse life sciences resources into context-specific knowledge graphs. This paper discusses the efforts made during BioHackathon Europe 2024 to improve the FAIR (Findable, Accessible, Interoperable, and Reusable) data integration process by clarifying and transforming graph schemas. We explored tools such as VoID-generator, RDF-config, and sheXer for data schema extraction and the integration of RDF Portal data into the BDF framework. By leveraging these tools, we automated the generation of SPARQL queries, created GraphQL endpoints, and enhanced BDF’s ability to integrate new databases. Additionally, we explored the potential of large language models (LLMs) for automated reasoning and data interpretation within the BDF ecosystem. This work lays the foundation for building more efficient and standardized data models, contributing to the seamless integration of multiple biomedical databases.</p>",
      "summary": "The BioDataFuse (BDF) project aims to enhance the interoperability of biomedical data through modular integration of data from diverse life sciences resources into context-specific knowledge graphs. This paper discusses the efforts made during BioHackathon Europe 2024 to improve the FAIR (Findable, Accessible, Interoperable, and Reusable) data integration process by clarifying and transforming graph schemas. We explored tools such as VoID-generator, RDF-config, and sheXer for data schema extraction and the integration of RDF Portal data into the BDF framework. By leveraging these tools, we automated the generation of SPARQL queries, created GraphQL endpoints, and enhanced BDF’s ability to integrate new databases. Additionally, we explored the potential of large language models (LLMs) for automated reasoning and data interpretation within the BDF ecosystem. This work lays the foundation for building more efficient and standardized data models, contributing to the seamless integration of multiple biomedical databases.",
      
      "date_published": "2025-05-17T00:00:00+00:00",
      "date_modified": "2025-05-17T00:00:00+00:00",
      "tags": ["BH24EU"],
      
      
      
      
        "authors": [
        
          
            { "name": "Javier Millán Acosta", "url": "https://orcid.org/" },
          
        
          
            { "name": "Shuichi Kawashima", "url": "https://orcid.org/0000-0001-7883-3756" },
          
        
          
            { "name": "Toshiaki Katayama", "url": "https://orcid.org/0000-0003-2391-0384" },
          
        
          
            { "name": "Jerven Bolleman", "url": "https://orcid.org/0000-0002-7449-1266" },
          
        
          
            { "name": "Dominik Martinat", "url": "https://orcid.org/" },
          
        
          
            { "name": "Harald Detering", "url": "https://orcid.org/" },
          
        
          
            { "name": "Jose Emilio Labra Gayo", "url": "https://orcid.org/" },
          
        
          
            { "name": "Yojana Gadiya", "url": "https://orcid.org/0000-0002-7683-0452" },
          
        
          
            { "name": "Tooba Abbassi-Daloii", "url": "https://orcid.org/0000-0002-4904-3269" }
          
        
        ]
      
    },
    {
      "id": "https://doi.org/10.37044/osf.io/xqe8h_v1",
      "url": "https://index.biohackrxiv.org//2025/05/04/xqe8h.html",
      "title": "BioHackSWAT4HCLS25 report: Towards AI-Ready Datasets for the Life Sciences",
      "content_html": "<p>At the SWAT4HCLS 2025 Hackathon, we continued our work on dataset interoperability and AI-readiness, extending our\nefforts from the 2024 Elixir Biohackathon. This report outlines the progress made in graph serialization, metadata\nembedding, and knowledge graph analysis, which further enhance machine learning workflows and data integration</p>",
      "summary": "At the SWAT4HCLS 2025 Hackathon, we continued our work on dataset interoperability and AI-readiness, extending our efforts from the 2024 Elixir Biohackathon. This report outlines the progress made in graph serialization, metadata embedding, and knowledge graph analysis, which further enhance machine learning workflows and data integration",
      
      "date_published": "2025-05-04T00:00:00+00:00",
      "date_modified": "2025-05-04T00:00:00+00:00",
      "tags": ["SWAT4HCLS25"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.37044/osf.io/chbsj_v1",
      "url": "https://index.biohackrxiv.org//2025/05/04/chbsj.html",
      "title": "Reusable RDM Planning Environments for Trainings and Workshops: A BioHackathon Europe 2024 Report",
      "content_html": "<p>This report provides an overview of our activities and accomplishments related to the creation of reusable RDM\n(Research Data Management) Planning Environments for trainings and workshops conducted during the ELIXIR BioHackathon\nEurope 2024. ELIXIR recognizes the critical role of effective data management planning in enabling sustainable and\nreproducible research outcomes. This effectiveness is achieved through the use of appropriate Data Management\nPlanning tools, such as the Data Stewardship Wizard. The Data Stewardship Wizard is used to conduct various trainings\nwhich require instance with data which are different for each training. Goal of this project was to provide easy and\neffective way to prepare “recipes” for DSW Data Seeder</p>",
      "summary": "This report provides an overview of our activities and accomplishments related to the creation of reusable RDM (Research Data Management) Planning Environments for trainings and workshops conducted during the ELIXIR BioHackathon Europe 2024. ELIXIR recognizes the critical role of effective data management planning in enabling sustainable and reproducible research outcomes. This effectiveness is achieved through the use of appropriate Data Management Planning tools, such as the Data Stewardship Wizard. The Data Stewardship Wizard is used to conduct various trainings which require instance with data which are different for each training. Goal of this project was to provide easy and effective way to prepare “recipes” for DSW Data Seeder",
      
      "date_published": "2025-05-04T00:00:00+00:00",
      "date_modified": "2025-05-04T00:00:00+00:00",
      "tags": ["BH24EU"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.37044/osf.io/8m5ey_v1",
      "url": "https://index.biohackrxiv.org//2025/05/04/8m5ey.html",
      "title": "Enhancing bio.tools by Semantic Literature Mining",
      "content_html": "<p>Mining mentions of software tools in scientific literature is important for resource discovery and analysis in bioinformatics. Despite advancements in deep-learning-based natural language processing techniques, accurately identifying software mentions remains challenging due to naming ambiguities, inconsistent citation practices, and homonyms. In this study, we developed methods to enhance the bio.tools registry through integration with Europe PMC. We systematically explored three distinct article-tool relationships: direct associations, citations of associated articles, and textual mentions without explicit citations. A hybrid approach combining rule-based heuristics and machine learning was evaluated at a F1-score of 74.4% in contextual software mention disambiguation tasks. We further demonstrated the potential for mining software co-mentions and co-citations from EuropePMC, constructing interactive networks in Cytoscape to visualize relationships between tools. Leveraging bio.tools metadata significantly improved disambiguation accuracy, including for tools with generic names. In the future, we will expand annotated datasets, handle software synonyms, and make bio.tools software mentions retrievable through the Europe PMC Annotations API to enrich bio.tools with usage data, making software more findable, including for recommendation systems.</p>",
      "summary": "Mining mentions of software tools in scientific literature is important for resource discovery and analysis in bioinformatics. Despite advancements in deep-learning-based natural language processing techniques, accurately identifying software mentions remains challenging due to naming ambiguities, inconsistent citation practices, and homonyms. In this study, we developed methods to enhance the bio.tools registry through integration with Europe PMC. We systematically explored three distinct article-tool relationships: direct associations, citations of associated articles, and textual mentions without explicit citations. A hybrid approach combining rule-based heuristics and machine learning was evaluated at a F1-score of 74.4% in contextual software mention disambiguation tasks. We further demonstrated the potential for mining software co-mentions and co-citations from EuropePMC, constructing interactive networks in Cytoscape to visualize relationships between tools. Leveraging bio.tools metadata significantly improved disambiguation accuracy, including for tools with generic names. In the future, we will expand annotated datasets, handle software synonyms, and make bio.tools software mentions retrievable through the Europe PMC Annotations API to enrich bio.tools with usage data, making software more findable, including for recommendation systems.",
      
      "date_published": "2025-05-04T00:00:00+00:00",
      "date_modified": "2025-05-04T00:00:00+00:00",
      "tags": ["BH24EU"],
      
      
      
      
        "authors": [
        
          
            { "name": "Aleksandra Szmigiel", "url": "https://orcid.org/" },
          
        
          
            { "name": "Ana Mendes", "url": "https://orcid.org/" },
          
        
          
            { "name": "Erik Jaaniso", "url": "https://orcid.org/" },
          
        
          
            { "name": "Magnus Palmblad", "url": "https://orcid.org/0000-0002-5865-8994" },
          
        
          
            { "name": "Rob M. Ewing", "url": "https://orcid.org/" },
          
        
          
            { "name": "SANTOSH TIRUNAGARI", "url": "https://orcid.org/" },
          
        
          
            { "name": "Tess AV Afanasyeva", "url": "https://orcid.org/" },
          
        
          
            { "name": "Vedran Kasalica", "url": "https://orcid.org/0000-0002-0097-1056" },
          
        
          
            { "name": "Veit Schwämmle", "url": "https://orcid.org/" },
          
        
          
            { "name": "Zunaira Shafique", "url": "https://orcid.org/" }
          
        
        ]
      
    },
    {
      "id": "https://doi.org/10.37044/osf.io/dsgnw_v1",
      "url": "https://index.biohackrxiv.org//2025/04/08/dsgnw.html",
      "title": "BioHackEU24 report: Integrating Bioconductor packages with the ELIXIR Research Software Ecosystem using EDAM",
      "content_html": "<p>This project seeks to enhance the ELIXIR Research Software Ecosystem (RSEc) by increasing the findability, accessibility, interoperability, and reusability (FAIR principles) of Bioconductor’s extensive collection of over 2,000 bioinformatics packages. By aligning Bioconductor metadata with the EDAM ontology and integrating detailed package descriptions into the <em>bio.tools</em> registry, we aim to improve the discoverability and usability of bioinformatics analysis tools. Short-term goals include mapping Bioconductor’s biocViews controlled vocabulary to EDAM concepts, developing a set of manually annotated “gold standard” packages, and evaluating tools for automated EDAM concept suggestions. Long-term, we intend to expand EDAM coverage across Bioconductor, phase out biocViews, and implement automated synchronisation with <em>bio.tools</em>. This initiative fosters collaboration between Bioconductor and ELIXIR, establishing a foundation for sustainable software management in European bioinformatics.Key results from the ELIXIR BioHackathon 2024 week include substantial progress in mapping the biocViews vocabulary to EDAM concepts, initiating the curation of a reference set of packages with manual annotations, integrating Bioconductor metadata into the ELIXIR Research Software Ecosystem (RSEc) with automated updates, and prototyping a tool for automated EDAM concept suggestions. Together, these achievements establish a strong foundation for further integration and refinement.</p>",
      "summary": "This project seeks to enhance the ELIXIR Research Software Ecosystem (RSEc) by increasing the findability, accessibility, interoperability, and reusability (FAIR principles) of Bioconductor’s extensive collection of over 2,000 bioinformatics packages. By aligning Bioconductor metadata with the EDAM ontology and integrating detailed package descriptions into the bio.tools registry, we aim to improve the discoverability and usability of bioinformatics analysis tools. Short-term goals include mapping Bioconductor’s biocViews controlled vocabulary to EDAM concepts, developing a set of manually annotated “gold standard” packages, and evaluating tools for automated EDAM concept suggestions. Long-term, we intend to expand EDAM coverage across Bioconductor, phase out biocViews, and implement automated synchronisation with bio.tools. This initiative fosters collaboration between Bioconductor and ELIXIR, establishing a foundation for sustainable software management in European bioinformatics.Key results from the ELIXIR BioHackathon 2024 week include substantial progress in mapping the biocViews vocabulary to EDAM concepts, initiating the curation of a reference set of packages with manual annotations, integrating Bioconductor metadata into the ELIXIR Research Software Ecosystem (RSEc) with automated updates, and prototyping a tool for automated EDAM concept suggestions. Together, these achievements establish a strong foundation for further integration and refinement.",
      
      "date_published": "2025-04-08T00:00:00+00:00",
      "date_modified": "2025-04-08T00:00:00+00:00",
      "tags": ["BH24EU"],
      
      
      
      
        "authors": [
        
          
            { "name": "Claire Rioualen", "url": "https://orcid.org/0000-0002-7684-8679" },
          
        
          
            { "name": "Aurélien Barre", "url": "https://orcid.org/" },
          
        
          
            { "name": "Benjamin Dartigues", "url": "https://orcid.org/" },
          
        
          
            { "name": "Vincent J Carey", "url": "https://orcid.org/" },
          
        
          
            { "name": "Matus Kalas", "url": "https://orcid.org/" },
          
        
          
            { "name": "Sebastian Lobentanzer", "url": "https://orcid.org/0000-0003-3399-6695" },
          
        
          
            { "name": "Hervé MENAGER", "url": "https://orcid.org/0000-0002-7552-1009" },
          
        
          
            { "name": "Steffen Neumann", "url": "https://orcid.org/" },
          
        
          
            { "name": "Kozo Nishida", "url": "https://orcid.org/" },
          
        
          
            { "name": "Veit Schwämmle", "url": "https://orcid.org/" },
          
        
          
            { "name": "Anh Nguyet Vu", "url": "https://orcid.org/0000-0003-1488-6730" },
          
        
          
            { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" },
          
        
          
            { "name": "Maria A Doyle", "url": "https://orcid.org/" }
          
        
        ]
      
    },
    {
      "id": "https://doi.org/10.37044/osf.io/4sgdq_v1",
      "url": "https://index.biohackrxiv.org//2025/04/02/4sgdq.html",
      "title": "An assessment of Croissant ML metadata descriptors for AI-ready datasets",
      "content_html": "<p>To advance the use of machine learning to address humanity’s grand challenges such as the understanding of disease conditions and biodiversity loss in the anthropocene, it is important to promote FAIR AI-ready datasets, since data scientists and bioinformaticians spend 80% of their time in data finding and preparation. Metadata descriptors for datasets are pivotal for the creation of machine learning models as they facilitate the definition of strategies for data discovery, feature selection, data cleaning, and data pre-processing. ML-ready datasets, whether by design or after pre-processing, can be enriched with metadata so they become FAIRer, i.e., autonomously discoverable and processable by machines (machine-actionable). Croissant ML is an extension of schema.org to better describe ML-ready datasets, released early 2024 and already adopted by some ML-model platforms such as Hugging Face (see Croissant ML viewer documentation) and OpenML. However, as it commonly happens with metadata, there are some limitations to the amount of metadata that can be automatically extracted. How much Croissant metadata can be programmatically extracted from ML-ready datasets? And how could this automation be improved? In this project, we explored answers to these two questions.</p>",
      "summary": "To advance the use of machine learning to address humanity’s grand challenges such as the understanding of disease conditions and biodiversity loss in the anthropocene, it is important to promote FAIR AI-ready datasets, since data scientists and bioinformaticians spend 80% of their time in data finding and preparation. Metadata descriptors for datasets are pivotal for the creation of machine learning models as they facilitate the definition of strategies for data discovery, feature selection, data cleaning, and data pre-processing. ML-ready datasets, whether by design or after pre-processing, can be enriched with metadata so they become FAIRer, i.e., autonomously discoverable and processable by machines (machine-actionable). Croissant ML is an extension of schema.org to better describe ML-ready datasets, released early 2024 and already adopted by some ML-model platforms such as Hugging Face (see Croissant ML viewer documentation) and OpenML. However, as it commonly happens with metadata, there are some limitations to the amount of metadata that can be automatically extracted. How much Croissant metadata can be programmatically extracted from ML-ready datasets? And how could this automation be improved? In this project, we explored answers to these two questions.",
      
      "date_published": "2025-04-02T00:00:00+00:00",
      "date_modified": "2025-04-02T00:00:00+00:00",
      "tags": ["BH24EU"],
      
      
      
      
        "authors": [
        
          
            { "name": "Jerven Bolleman", "url": "https://orcid.org/0000-0002-7449-1266" },
          
        
          
            { "name": "Leyla Jael Castro", "url": "https://orcid.org/0000-0003-3986-0510" },
          
        
          
            { "name": "Alban Gaignard", "url": "https://orcid.org/0000-0002-3597-8557" },
          
        
          
            { "name": "Agoritsa Kalampaliki", "url": "https://orcid.org/" },
          
        
          
            { "name": "Matúš Kalaš", "url": "https://orcid.org/0000-0002-1509-4981" },
          
        
          
            { "name": "Edwin Jun Kiat Ong", "url": "https://orcid.org/" },
          
        
          
            { "name": "Núria Queralt-Rosinach", "url": "https://orcid.org/0000-0003-0169-8159" },
          
        
          
            { "name": "Nelson Quiñones", "url": "https://orcid.org/" },
          
        
          
            { "name": "Rohitha Ravinder", "url": "https://orcid.org/" },
          
        
          
            { "name": "Dhwani Solanki", "url": "https://orcid.org/" },
          
        
          
            { "name": "David Steinberg", "url": "https://orcid.org/0000-0001-6683-2270" },
          
        
          
            { "name": "Claus Weiland", "url": "https://orcid.org/" },
          
        
          
            { "name": "Daphne Wijnbergen", "url": "https://orcid.org/" }
          
        
        ]
      
    },
    {
      "id": "https://doi.org/10.37044/osf.io/5uhwz_v2",
      "url": "https://index.biohackrxiv.org//2025/03/14/5uhwz.html",
      "title": "2024 OME-NGFF workflows hackathon",
      "content_html": "<p>The 2024 OME-NGFF Workflows Hackathon, held at the BioVisionCenter at the University of Zurich, brought together an international group of researchers and developers to develop the ecosystem around the open, scalable, and FAIR bioimage file format OME-Zarr. Over five days, participants tackled key challenges in four main areas: (1) advancing the OME-Zarr specification, (2) enabling workflow interoperability by integrating OME-Zarr image processing tasks across multiple open-source frameworks, (3) expanding Java support for Zarr v3 and enhancing the compatibility of OME-Zarr with the popular bioimage analysis software Fiji, and (4) improving the Python resources supporting OME-Zarr. The event led to the release of OME-Zarr 0.5, which formalizes the adoption of Zarr v3 and introduces a sharding strategy to reduce file system overhead. This report provides an overview of the key discussions, outcomes, and future directions emerging from the hackathon, with the goal of fostering continued community engagement in developing OME-Zarr as a robust open bioimaging standard.</p>",
      "summary": "The 2024 OME-NGFF Workflows Hackathon, held at the BioVisionCenter at the University of Zurich, brought together an international group of researchers and developers to develop the ecosystem around the open, scalable, and FAIR bioimage file format OME-Zarr. Over five days, participants tackled key challenges in four main areas: (1) advancing the OME-Zarr specification, (2) enabling workflow interoperability by integrating OME-Zarr image processing tasks across multiple open-source frameworks, (3) expanding Java support for Zarr v3 and enhancing the compatibility of OME-Zarr with the popular bioimage analysis software Fiji, and (4) improving the Python resources supporting OME-Zarr. The event led to the release of OME-Zarr 0.5, which formalizes the adoption of Zarr v3 and introduces a sharding strategy to reduce file system overhead. This report provides an overview of the key discussions, outcomes, and future directions emerging from the hackathon, with the goal of fostering continued community engagement in developing OME-Zarr as a robust open bioimaging standard.",
      
      "date_published": "2025-03-14T00:00:00+00:00",
      "date_modified": "2025-03-14T00:00:00+00:00",
      "tags": ["OMENGFFWH24"],
      
      
      
      
        "authors": [
        
          
            { "name": "Joel Lüthi", "url": "https://orcid.org/0000-0003-3023-170X" },
          
        
          
            { "name": "Marvin Albert", "url": "https://orcid.org/" },
          
        
          
            { "name": "Liviu Anita", "url": "https://orcid.org/" },
          
        
          
            { "name": "Kola Babalola", "url": "https://orcid.org/" },
          
        
          
            { "name": "Davis Bennett", "url": "https://orcid.org/" },
          
        
          
            { "name": "John A. Bogovic", "url": "https://orcid.org/" },
          
        
          
            { "name": "Lorenzo Cerrone", "url": "https://orcid.org/" },
          
        
          
            { "name": "Rémy Dornier", "url": "https://orcid.org/" },
          
        
          
            { "name": "Jan Eglinger", "url": "https://orcid.org/" },
          
        
          
            { "name": "Vera Galinova", "url": "https://orcid.org/" },
          
        
          
            { "name": "Reto Gerber", "url": "https://orcid.org/0000-0001-5414-8906" },
          
        
          
            { "name": "Oane Gros", "url": "https://orcid.org/" },
          
        
          
            { "name": "Stefan Hahmann", "url": "https://orcid.org/" },
          
        
          
            { "name": "Max Hess", "url": "https://orcid.org/" },
          
        
          
            { "name": "Ruth Hornbachner", "url": "https://orcid.org/" },
          
        
          
            { "name": "Dmytro Horyslavets", "url": "https://orcid.org/" },
          
        
          
            { "name": "Rachael Huxford", "url": "https://orcid.org/" },
          
        
          
            { "name": "Daniel Krentzel", "url": "https://orcid.org/0000-0002-6234-7259" },
          
        
          
            { "name": "Tong LI", "url": "https://orcid.org/0000-0002-8240-4476" },
          
        
          
            { "name": "Luca Marconato", "url": "https://orcid.org/" },
          
        
          
            { "name": "Matthew McCormick", "url": "https://orcid.org/0000-0001-9475-3756" },
          
        
          
            { "name": "Franziska Moos", "url": "https://orcid.org/" },
          
        
          
            { "name": "Filip Mroz", "url": "https://orcid.org/" },
          
        
          
            { "name": "Bugra Özdemir", "url": "https://orcid.org/" },
          
        
          
            { "name": "Benjamin Pavie", "url": "https://orcid.org/" },
          
        
          
            { "name": "Eric Perlman", "url": "https://orcid.org/" },
          
        
          
            { "name": "Maximilian Schulz", "url": "https://orcid.org/" },
          
        
          
            { "name": "Leonardo Schwarz", "url": "https://orcid.org/" },
          
        
          
            { "name": "Hannes M. Spitz", "url": "https://orcid.org/" },
          
        
          
            { "name": "David Stansby", "url": "https://orcid.org/" },
          
        
          
            { "name": "Fabio Steffen", "url": "https://orcid.org/" },
          
        
          
            { "name": "Szymon Stoma", "url": "https://orcid.org/" },
          
        
          
            { "name": "Flurin Sturzenegger", "url": "https://orcid.org/" },
          
        
          
            { "name": "Wouter-Michiel Vierdag", "url": "https://orcid.org/0000-0003-1666-5421" },
          
        
          
            { "name": "Jonas Windhager", "url": "https://orcid.org/" },
          
        
          
            { "name": "Kevin Yamauchi", "url": "https://orcid.org/" },
          
        
          
            { "name": "Igor Zubarev", "url": "https://orcid.org/" },
          
        
          
            { "name": "Josh Moore", "url": "https://orcid.org/0000-0003-4028-811X" },
          
        
          
            { "name": "Norman Rzepka", "url": "https://orcid.org/" },
          
        
          
            { "name": "Christian Tischer", "url": "https://orcid.org/" },
          
        
          
            { "name": "Vladimir Ulman", "url": "https://orcid.org/" },
          
        
          
            { "name": "Virginie Uhlmann", "url": "https://orcid.org/" }
          
        
        ]
      
    }
  ]
}