Meetings

Recent preprints

  • DBCLS BioHackathon 2024 Report for Project: Human Glycome Atlas

    As part of BioHackathon 2024, we here report on our analysis of tools reviewed by this group to implement a new knowledgebase called TOHSA for the Human Glycome Atlas (HGA) Project. In particular, we focus on the experiences of the integration process of the QLever framework, a promising Semantic Web tool for handling “Triple Stores” and SPARQL technologies in the scope of creating a reliable and performant Semantic Knowledge-Base (Infrastructure and Portal) for TOHSA. QLever highlights the ongoing relevance and potential of these technologies to deliver scalable and reliable solutions. It was nice to see that “Triple Stores” and SPARQL technology implementations and developments for the community are ongoing and that progressively useful and performant and scalable/reliable, open-source software is being implemented. And we did a general revision and comparison of relevant Semantic Web Frameworks for our use-case.
  • The Plant Breeding Ontology (PBO): towards an ontology for the plant breeding community

    The need of standardizing the language used within a community has been recognized as one of the major components to enable a better integration of data as well as their further analysis. The plant breeding community makes use of a very specialized language, which has been evolving according to the new technologies and needs of their final users (e.g. farmers). That community is disparate all over the world. Therefore, a translation of the most common used terms has always been a key asset to accomplish their objectives as well as the ones of their collaborators. Here, we present PBO (Plant Breeding Ontology), an ontology for the plant breeding community which captures more than 2200 entries where 80 represent the core terms. PBO has translations in 8 different languages: English (main language), Spanish, French, Dutch, German, Japanese, Catalan and Thai, as well as their definitions, synonyms, derived terms and samples of their usage. PBO has been built partially manually and semiautomatically.
  • DBCLS BioHackathon 2024 report: Everything about workflow and container

    Workflow engines are now widely used for genome analysis workflows.On the other hand, there are still difficulties to build and execute their workflows in various aspects.Here are examples of such difficulties:How to develop our workflows in workflow languages such as Common Workflow Language (CWL), Snakemake, Nextflow, and others?How to integrate our workflows with containers such as Docker, Singularity, and Podman?How to integrate our workflows with job schedulers such as Slurm and GridEngine?Our group solved these problems with the following activities. First, we cooperated with other groups to develop their workflows, and to make their workflows integrated with containers.Second, we developed and improved workflow ecosystems to remove the barriers to develop and execute their workflows. Ecosystems include workflow executors, specifications of workflow languages, and workflow-related tools.This paper reports what we did during the DBCLS BioHackathon 2024.
  • Ontologies for single-cell experiments

    Research data management is becoming increasingly important in the scientific community. Acritical challenge in this field is making research data FAIR (findable, accessible, interoperableand reusable, (Wilkinson et al., 2016)). Metadata plays a vital role in this challenge as it allowsresearchers to accurately understand and recreate experiments. To tackle this challenge, variousapproaches are being taken towards this goal, including the development of domain-overarchingand domain-specific standards.In the different scientific communities, multiple general, as well as domain-specific minimuminformation standards have been developed, such as MIAPPE (Ćwiek-Kupczyńska, 2016), theminimum information about a plant phenotyping experiment, MIAME (H. Brazma A., 2001),the minimum information about a microarray experiment, and MINSEQE (B. Brazma A., 2012),the minimum information about a high-throughput sequencing experiment. These standards aredesigned to describe specific types of experiments. Recently, a minimum information standardfor single-cell experiments, minSCe (Minimum Information about a Single-Cell Experiment),has been introduced (Füllgrabe et al., 2020). However, it is not yet widely applied.Minimum information standards are an important part of the solution and should be built upon.In addition, the use of controlled vocabularies and ontology terms is also essential. Ontologyterms have a persistent identifier, an expressive name and a curated definition. Using theseterms enables different researchers to understand and recreate annotated experiments. In thisBioHackathon Europe project, we propose to expand biological, experimental and technicalmetadata schema as well as ontologies for single-cell experiments across domains with a focuson transcriptomics. This will facilitate the sharing and reuse of single-cell data and promotecollaboration among researchers in different domains. Our goal is to improve data managementpractices and enhance the reproducibility of single-cell research.
  • An analysis of sex ratios using a biodiversity data cube

    This investigation uses biodiversity data cubes derived from the datasets mobilised by the Global Biodiversity Information Facility (GBIF), to conduct an analysis of sex ratios of ducks across Europe. Encompassing over 4 million occurrences extracted from nearly 5000 datasets, this study elucidates sex distribution patterns across various species, focussing on temporal and spatial dynamics. The aim of this study is to highlight the availability of open sex data and its potential usefulness in research and monitoring of sex ratios of wild organisms, particularly in sexual dimorphic species.
  • VIB Hackathon on spatial omics tools and methods

    During a three-day hackathon, work was performed on various topics within the field of spatial omics data analysis. The topics were organized in five workgroups and included benchmarking, pipelines, spatial transcriptomics, spatial proteomics, spatial multi-omics and cell-cell communication. Most tools and methods were considered in the context of the Python ecosystem for spatial (SpatialData) and single-cell (scverse) data analysis.
  • Phenological Diversity Trends with Remote Sensing Datacubes

    During the 2024 B-Cubed Hackathon, we extended the R package “rasterdiv” by incorporating Time-Weighted Dynamic Time Warping (TWDTW) to the package’s pre-existing paRao() function for the calculation of parametric Rao’s Quadratic Diversity (Rao’s Q) index. This expands the user’s ability to biodiversity trends when using time series of Earth Observations. Biodiversity indices like Shannon’s H do not consider spatio-temporal dynamics, and others (e.g. Rao’s Q) only incorporate geographic distance between observations, often leaving phenological variation overlooked.Through integrating TWDTW into the paRao() function, users can assess different facets of an ecosystem’s biodiversity by incorporating phenological differences among its plant communities. This is also valuable to distinguish between natural habitats that follow a seasonal phenological trend and artificial land cover types, which may lack phenological changes. Previous studies have also found that the time weighting ability of TWDTW enables the discernment of different floral community types which could otherwise be misclassified as the same with traditional Dynamic Time Warping (DTW).To evaluate the efficacy of TWDTW within the paRao() function, we compared the ability of TWDTW Rao’s Q index against other biodiversity indices at classifying the different plant communities in a disturbed grassland in Calabria, Italy. Our study used a Plant Phenological Index (PPI) time series from the Sentinel-2 satellite network. The results indicated that accounting for phenological cycles can filter out artefacts and better distinguish habitats with differing plant species diversity. This improves the ability to assess ecosystem changes through space and time, providing a more comprehensive understanding of biodiversity dynamics, and the ability to gauge the resilience of different vegetation patches.We conclude that the inclusion of plant phenology in biodiversity assessment is necessary, and that our modifications to paRao() will be valuable to facilitate the accurate detection and description of ecosystem trends in response to our changing environment.