Meetings

Recent preprints

  • Genome Annotation and Other Post-Assembly Workflows for the Tree of Life

    Rapid advances in genome sequencing technologies have resulted in an explosion of referencequality genome assemblies across the tree of life. While these resources will be invaluable towards goals of species and biodiversity conservation, their application is limited when they lack accurate annotations of their functional elements. The European Reference Genome Atlas (ERGA) is the European node of the Earth Biogenome Project (EBP) and aims to share resources and knowledge to create fully-annotated reference genomes. ERGA strives to do this in a distributed manner, bringing together researchers from across the world, with common goals and understandings.In the BioHackathon Europe 2023, we came together to construct and test tools, pipelines and workflows for annotating protein-coding regions in assembled genomes. We specifically aimed to evaluate (a) the performance in a wide variety of non-model organisms and (b) the “usability” of pipelines for newcomers to annotation. This work required installing and implementing tools in a number of computational environments and infrastructures, sharing of both genomic resources and expertise between researchers from a range of institutes, and evaluation of annotation workflows performance and what input data is required in order to achieve a high quality genome annotation. Here we present the results of over 20 researchers in 8 time-zones working towards a robust implementation of genome annotation workflows in eukaryotic organisms.
  • Improving Bioschemas creation and community adoption through process improvements, tool development, and advancing compliance to FAIR standards

    Nowadays scientists massively produce diverse datasets in many communities. They need to combine them to answer scientific or novel questions. To do so, these diverse computational resources need first to be found by search engines. Bioschemas provides a simple and lightweight mechanism to annotate online resources in a standardized way and expose key metadata. To improve the accessibility and value of Bioschemas to existing and emerging communities, we aim to develop an automated system to assess the adoption of Bioschemas, work with identified groups that have specific needs addressable by Bioschemas, address usability issues in the Bioschemas profile and type development process, and extend the reach of Bioschemas by making it available in a domain-agnostic manner.
  • Bioschemas Resource Index for Chem and Plants

    As part of the BioHackathon Europe 2023, we here report on the progress of the hacking team preparing a resource index and knowledge graph based on the JSON-LD Bioschemas markup from several resources in the life- and natural sciences, predominantly from the fields of plant- and (bio)chemistry research. This preliminary analysis will allow us to better understand how Bioschemas markup is currently used in these two communities, so we can take actions to improve guidelines and validation on the Bioschemas markup and the data providers side. The lessons learnt will be useful for other communities as well. The ultimate goal is facilitating and improving interoperability across resources.
  • BioHackJP 2023 Report R1:Improving phenotype ontology interoperability

    Ontologies play a crucial role in data management and especially in life science, they have been indispensable for decades as the complexity of life science data requires rigor. Biomedical ontologies often undergo change and improvement, as e.g. disease and phenotype ontologies develop constantly along with our scientific understanding. In order to bridge the gap between ontologies and annotated datasets and thus to semantically enable applications and datasets to retrieve insights and improve interoperability, ontology mapping plays a key role.To implement a sophisticated search supported by semantics, interoperability to address cross-disciplinary needs is crucial. In this paper we focus on different aspects of interoperability of ontologies, especially in the phenotype and disease domain and how they could be improved. During the BioHackJP 2023, a variety of approaches were discussed and evaluated. In this paper, we report overviews of the result of each investigation including, 1: Linguistic and Social Interoperability, 2: Technical and Structural Interoperability, 3: Ontology Alignments and Mappings, 4: Use of Large Language Models (LLMs), 5: Model Mice Exploration, and discuss future works to address these challenges.
  • BioHackJP 2023 Report R1: Mapping human genome variations to their mouse counterparts for identifying disease model mouse strains

    In disease model mouse strains used for human disease studies, information on genomic variations is essential for elucidating the relationship between haplotypes and disease susceptibility. To select a disease model mouse appropriately, it is crucial to identify mouse variants with the same effect as disease-causing variants in humans. In BioHackathon Japan J2023, we focused on nucleotide variants involved in amino acid substitutions. We developed an API that matches mouse variants from the MoG+ database to human variants within gene regions defined by HGNC identifiers or symbols. After the Hackathon, we will map non-coding variants in addition to coding variants. The outcomes of our variant mapping will be presented as links connecting the comprehensive human variation database, TogoVar, and the model mouse genome database, MoG.
  • BioHackEU23 report: Enabling FAIR Digital Objects with RO-Crate, Signposting and Bioschemas

    As part of the BioHackathon Europe 2023, we here report from the progress of the hackathon project #15: “Enabling FAIR Digital Objects with RO-Crate, Signposting and Bioschemas”. We added Signposting to three existing resources, and made a Chrome browser extension to show Signposting headers. We added RO-Crate to two existing resources, and explored making a hybrid FDO using both a Handle PID Record and Signposting/RO-Crate approach.
  • Benchmarks for Bioinformatics Workflow Bake Offs

    This BioHackathon Project focused on establishing a “Great Bake Off of Bioinformatics Workflows” by developing workflow-level benchmarks for evaluating tools in computational tasks. Initially tested in proteomics, the project expanded to genomics and metabolomics. Collaborating with ELIXIR Implementation Studies, the team created rudimentary benchmarks, aiming for formalization before production use. The project consolidated efforts to produce a minimum set of workflow-specific benchmarks, aligning tools and workflow definitions. Short-term goals include drafting benchmarks with examples, while long-term plans involve implementing them in the Workflomics project and Proteomics Community ELIXIR Implementation Studies for community sharing.