Meetings

Recent preprints

  • BioHackJP 2023 Report R1:Improving phenotype ontology interoperability

    Ontologies play a crucial role in data management and especially in life science, they have been indispensable for decades as the complexity of life science data requires rigor. Biomedical ontologies often undergo change and improvement, as e.g. disease and phenotype ontologies develop constantly along with our scientific understanding. In order to bridge the gap between ontologies and annotated datasets and thus to semantically enable applications and datasets to retrieve insights and improve interoperability, ontology mapping plays a key role.To implement a sophisticated search supported by semantics, interoperability to address cross-disciplinary needs is crucial. In this paper we focus on different aspects of interoperability of ontologies, especially in the phenotype and disease domain and how they could be improved. During the BioHackJP 2023, a variety of approaches were discussed and evaluated. In this paper, we report overviews of the result of each investigation including, 1: Linguistic and Social Interoperability, 2: Technical and Structural Interoperability, 3: Ontology Alignments and Mappings, 4: Use of Large Language Models (LLMs), 5: Model Mice Exploration, and discuss future works to address these challenges.
  • BioHackJP 2023 Report R1: Mapping human genome variations to their mouse counterparts for identifying disease model mouse strains

    In disease model mouse strains used for human disease studies, information on genomic variations is essential for elucidating the relationship between haplotypes and disease susceptibility. To select a disease model mouse appropriately, it is crucial to identify mouse variants with the same effect as disease-causing variants in humans. In BioHackathon Japan J2023, we focused on nucleotide variants involved in amino acid substitutions. We developed an API that matches mouse variants from the MoG+ database to human variants within gene regions defined by HGNC identifiers or symbols. After the Hackathon, we will map non-coding variants in addition to coding variants. The outcomes of our variant mapping will be presented as links connecting the comprehensive human variation database, TogoVar, and the model mouse genome database, MoG.
  • BioHackEU23 report: Enabling FAIR Digital Objects with RO-Crate, Signposting and Bioschemas

    As part of the BioHackathon Europe 2023, we here report from the progress of the hackathon project #15: “Enabling FAIR Digital Objects with RO-Crate, Signposting and Bioschemas”. We added Signposting to three existing resources, and made a Chrome browser extension to show Signposting headers. We added RO-Crate to two existing resources, and explored making a hybrid FDO using both a Handle PID Record and Signposting/RO-Crate approach.
  • Benchmarks for Bioinformatics Workflow Bake Offs

    This BioHackathon Project focused on establishing a “Great Bake Off of Bioinformatics Workflows” by developing workflow-level benchmarks for evaluating tools in computational tasks. Initially tested in proteomics, the project expanded to genomics and metabolomics. Collaborating with ELIXIR Implementation Studies, the team created rudimentary benchmarks, aiming for formalization before production use. The project consolidated efforts to produce a minimum set of workflow-specific benchmarks, aligning tools and workflow definitions. Short-term goals include drafting benchmarks with examples, while long-term plans involve implementing them in the Workflomics project and Proteomics Community ELIXIR Implementation Studies for community sharing.
  • BioHackEU23 report: Enabling continuous RDM using Annotated Research Contexts with RO-Crate profiles for ISA

    A prevailing paradigm in Research Data Management (RDM) is to publish research datasets in designated archives upon conclusion of a research process. However, it is beneficial to abandon the notion of final or static data artifacts and instead adopt a continuous approach towards working with research data, where data is constantly shared, versioned, and updated. This immutable yet evolving perspective allows for the application of existing technologies and processes from software engineering, such as continuous integration, release practices, and version management backed by decades of experience, and adaptable to RDM.To facilitate this, we propose the Annotated Research Context (ARC), a data and metadata layout convention based on the well-established ISA model for metadata annotation and implemented using Git repositories. ARCs are amenable towards frequent, lightweight data management operations, such as (meta)data validation and transformation. The Omnipy Python library is designed to help develop stepwise validated (meta)data transformations as scalable data flows that can be incrementally designed, updated, and rerun as requirements or data evolve.To demonstrate the concept of continuous RDM we will use Omnipy to define and orchestrate Git-backed CI/CD (Continuous Integration/Continuous Delivery) data flows to convert ISA metadata present in ARCs into validated RO-Crate representations adhering to the Bioschemas convention. A RO-Crate package combines the actual research data with its metadata description. Downstream, this allows semantic interpretation by Galaxy for e.g. workflow execution as well as machine-readable data access and data harvesting for search engines such as FAIDARE.
  • BioHackEU23 report: Extending interoperability of experimental data using modular queries across biomedical resources

    This report provides an overview of the significant accomplishments achieved during the ELIXIR Biohackathon 2023 under Project 17: “Extending interoperability of experimental data using modular queries across biomedical resources”. The project diligently addressed four key aspects: the expansion of data resources, the creation of knowledge graphs, advancements in data visualization, and the development of a use-case-driven pipeline. The collective efforts during the Biohackathon aimed to enhance the integration and accessibility of experimental data across diverse biomedical resources by developing a tool named BioDataFuse.
  • The fourth annual Carnegie Mellon Libraries hackathon for biomedical data management, knowledge graphs, and deep learning

    In October 2023, a group of 44 scientists hailing from several U.S. states, Canada, Poland, and Switzerland came together for a hybrid in-person and virtual hackathon. The event was jointly hosted by Carnegie Mellon University Libraries and DNAnexus, a California-based cloud computing and bioinformatics company. This collaborative effort revolved around the theme of “Data Management and Graph Extraction for Large Transformer Models in the Biomedical Space.” In the spirit of fostering collaboration, participants organized themselves into five teams, which ultimately resulted in the successful completion of four hackathon projects. These projects encompassed a wide range of topics, from detecting features contributing to virus susceptibility to validating models using knowledge graphs. Repositories for the hackathon projects are available at https://github.com/collaborativebioinformatics. We hope that the insights and experiences shared by these teams, as detailed in the following manuscript, will prove valuable to the broader scientific community.