Meetings

Recent preprints

  • An iNaturalist-Pl@ntNet-workflow to identify plant-pollinator interactions – a case study of Isodontia mexicana

    Not only the primary observations of species occurrences are used for research, but also the additional, mostly unintentionally documented information in citizen science observations is beneficial for research, the so-called secondary data. In this study, we investigated the plant-pollinator interactions of the Mexican grass-carrying wasp Isodontia mexicana using a data exchange workflow from two globally operating citizen science platforms, iNaturalist and Pl@ntNet. Images from iNaturalist observations of the target species were used to query the Pl@ntNet application to identify possible plant species present in the pictures. At the same time, botanists manually identified the plants at family, genus and species level from the images. The goals were to calibrate Pl@ntNet’s accuracy in relation to this workflow, to update the list of plant species that I. mexicana visits, and to investigate colour preferences and other interactions of the wasp recorded by citizen scientists. Although the list of plant species visited could be confirmed and expanded, identifying plants from images that predominantly show an insect proved similarly difficult for both experts and the Pl@ntNet app. The presented approach can nevertheless help to get an overview or first insight into species interactions and generate more specific research questions.
  • Bioschemas data harvesting project report

    The promise of Bioschemas is that it makes consuming data from multiple resources more straightforward. However, this hypothesis has not been tested by conducting a large scale harvest of deployed markup and making this available for others to reuse. Therefore, the goal of this hackathon project is to harvest a collection of Bioschemas markup from a number of different sites listed on the Bioschemas live deploys page using the Bioschemas Markup Scraper and Extractor (BMUSE). The harvested data will be made available for others and loaded into a triplestore to allow for further exploration.
  • DS Wizard Meets DAISY: A Romance Solving Data Protection Requirements in Data Management Planning

    This report summarises our activities and achievements in integrating the Data Stewardship Wizard (DSW) and Data Information System (DAISY) tools during the ELIXIR BioHackathon Europe 2021. As a data information system for GDPR compliance, DAISY is focused on a single goal – gathering all information required for GDPR accountability of biomedical research projects. On the other hand, DSW is very flexible and can be used beyond data management planning. We worked on the integration between both tools on two fronts. Firstly, we created a new Knowledge Model in DSW together with a document output template to be able to generate a data protection impact assessment (DPIA). Secondly, we introduced a new integration type between projects in DSW and DAISY that allows the querying of DAISY data upon document generation in DSW. Both of these independent activities brought successful results that were polished and published after the actual BioHackathon. Finally, we provide the related materials as an on-demand training course in the ELIXIR eLearning Platform.
  • Network analysis of specimen co-collection

    We took data on the collectors of specimens from natural history collections. Co-collectors of specimens were extracted from the data and a network of co-collection was constructed. This network was used to analyze the age and gender balance of collectors and how this has changed with time. Men outnumber women in the network, but women participation increases with time, as are the all female pairs of collectors. Most collector pairs have less than 50 years age difference and it is suggested that co-collections above this age difference should be checked for errors. This project has proven the value of analyzing co-collection data, but also highlighted the many additional avenues for future research on this subject.
  • ELIXIR Software Management Plan for Life Sciences

    Data Management Plans are now considered a key element of Open Science. They describe the data management life cycle for the data to be collected, processed and/or generated within the lifetime of a particular project or activity. A Software Manag ement Plan (SMP) plays the same role but for software. Beyond its management perspective, the main advantage of an SMP is that it both provides clear context to the software that is being developed and raises awareness. Although there are a few SMPs already available, most of them require significant technical knowledge to be effectively used. ELIXIR has developed a low-barrier SMP, specifically tailored for life science researchers, aligned to the FAIR Research Software principles. Starting from the Four Recommendations for Open Source Software, the ELIXIR SMP was iteratively refined by surveying the practices of the community and incorporating the received feedback. Currently available as a survey, future plans of the ELIXIR SMP include a human- and machine-readable version, that can be automatically queried and connected to relevant tools and metrics within the ELIXIR Tools ecosystem and beyond.
  • Rapid metagenomic workflow using annotated 16S RNA dataset

    Thanks to the dramatic progress in DNA sequencing technology, it is now possible to decipher sequences in a mixed state. Therefore, the subsequent data analysis has become important, and the demand for metagenomic analysis is very high. Existing metagenomic data analysis workflows for 16S amplicon sequences have been mainly focused on sequences from short reads sequencers, while researchers cannot apply those workflows for sequences from long read sequencers. A practical metagenome workflow for long read sequencers is therefore really needed. In a domestic version of the BioHackathon called BH21.8 held in Aomori, Japan (23-27 August 2021), we first discussed the reproducible workflow for metagenome analysis. We then designed a rapid metagenomic workflow using annotated 16S RNA dataset (Ref16S) and the practical use case for using the workflow developed. Finally, we discussed how to maintain Ref16S and requested Life Science Database Archive in JST NBDC to archive the dataset. After a stimulus discussion in BH21.8, we could clarify the current issues in the metagenomic data analysis. We also could successfully construct a rapid workflow for those data specially from long reads by using newly constructed Ref16S.
  • The COVID-19 epidemiology and monitoring ontology

    The novel COVID-19 infectious disease emerged and spread, causing high mortality and morbidity rates worldwide. In the OBO Foundry, there are more than one hundred ontologies to share and analyse large-scale datasets for biological and biomedical sciences. However, this pandemic revealed that we lack tools for an efficient and timely exchange of this epidemiological data which is necessary to assess the impact of disease outbreaks, the efficacy of mitigating interventions and to provide a rapid response. In this study we present our findings and contributions for the bio-ontologies community.