Meetings

Recent preprints

Dec 7, 2021

Network analysis of specimen co-collection
We took data on the collectors of specimens from natural history collections. Co-collectors of specimens were extracted from the data and a network of co-collection was constructed. This network was used to analyze the age and gender balance of collectors and how this has changed with time. Men outnumber women in the network, but women participation increases with time, as are the all female pairs of collectors. Most collector pairs have less than 50 years age difference and it is suggested that co-collections above this age difference should be checked for errors. This project has proven the value of analyzing co-collection data, but also highlighted the many additional avenues for future research on this subject. less than 1 minute read
Oct 25, 2021

ELIXIR Software Management Plan for Life Sciences
Data Management Plans are now considered a key element of Open Science. They describe the data management life cycle for the data to be collected, processed and/or generated within the lifetime of a particular project or activity. A Software Management Plan (SMP) plays the same role but for software. Beyond its management perspective, the main advantage of an SMP is that it both provides clear context to the software that is being developed and raises awareness. Although there are a few SMPs already available, most of them require significant technical knowledge to be effectively used. ELIXIR has developed a low-barrier SMP, specifically tailored for life science researchers, aligned to the FAIR Research Software principles. Starting from the Four Recommendations for Open Source Software, the ELIXIR SMP was iteratively refined by surveying the practices of the community and incorporating the received feedback. Currently available as a survey, future plans of the ELIXIR SMP include a human- and machine-readable version, that can be automatically queried and connected to relevant tools and metrics within the ELIXIR Tools ecosystem and beyond. less than 1 minute read
Sep 30, 2021

Rapid metagenomic workflow using annotated 16S RNA dataset
Thanks to the dramatic progress in DNA sequencing technology, it is now possible to decipher sequences in a mixed state. Therefore, the subsequent data analysis has become important, and the demand for metagenomic analysis is very high. Existing metagenomic data analysis workflows for 16S amplicon sequences have been mainly focused on sequences from short reads sequencers, while researchers cannot apply those workflows for sequences from long read sequencers. A practical metagenome workflow for long read sequencers is therefore really needed. In a domestic version of the BioHackathon called BH21.8 held in Aomori, Japan (23-27 August 2021), we first discussed the reproducible workflow for metagenome analysis. We then designed a rapid metagenomic workflow using annotated 16S RNA dataset (Ref16S) and the practical use case for using the workflow developed. Finally, we discussed how to maintain Ref16S and requested Life Science Database Archive in JST NBDC to archive the dataset. After a stimulus discussion in BH21.8, we could clarify the current issues in the metagenomic data analysis. We also could successfully construct a rapid workflow for those data specially from long reads by using newly constructed Ref16S. 1 minute read
Aug 11, 2021

The COVID-19 epidemiology and monitoring ontology
The novel COVID-19 infectious disease emerged and spread, causing high mortality and morbidity rates worldwide. In the OBO Foundry, there are more than one hundred ontologies to share and analyse large-scale datasets for biological and biomedical sciences. However, this pandemic revealed that we lack tools for an efficient and timely exchange of this epidemiological data which is necessary to assess the impact of disease outbreaks, the efficacy of mitigating interventions and to provide a rapid response. In this study we present our findings and contributions for the bio-ontologies community. less than 1 minute read
Aug 10, 2021

Measuring outcomes and impact from the BioHackathon Europe
One of the recurring questions when it comes to BioHackathons is how to measure their impact, especially when funded and/or supported by the public purse (e.g., research agencies, research infrastructures, grants). In order to do so, we first need to understand the outcomes from a BioHackathon, which can include software, code, publications, new or strengthened collaborations, along with more intangible effects such as accelerated progress and professional and personal outcomes. In this manuscript, we report on three complementary approaches to assess outcomes of three BioHackathon Europe events: survey-based, publication-based and GitHub-based measures. We found that post-event surveys bring very useful insights into what participants feel they achieved during the hackathon, including progressing much faster on their hacking projects, broadening their professional network and improving their understanding of other technical fields and specialties. With regards to published outcomes, manual tracking of publications from specific servers is straightforward and useful to highlight the scientific legacy of the event, though there is much scope to automate this via text-mining. Finally, GitHub-based measures bring insights on some of the software and data best practices (e.g., license usage) but also on how the hacking activities evolve in time (e.g., activities observed in GitHub repositories prior, during and after the event). Altogether, these three approaches were found to provide insightful preliminary evidence of outcomes, thereby supporting the value of financing such large-scale events with public funds. 1 minute read
Jun 29, 2021

SnpReportR: A Tool for Clinical Reporting of RNAseq Expression and Variants
With the increasing availability of next-generation sequencing (NGS), patients and non-specialist health care professionals are obtaining their genomic information without sufficient bioinformatics skills to analyze and interpret the data. In January 2021, four teams of scientists,clinicians, and developers from around the world worked collaboratively in a virtual hackathon to create a framework for the automated analysis and interpretation of RNA sequencing data inthe clinic. Here, we present SnpReportR: A Tool for Clinical Reporting of RNAseq Expression and Variants aimed for use by clinicians and others without in-depth knowledge of genetics. less than 1 minute read
Jun 22, 2021

Exploiting Bioschemas Markup to Populate IDPcentral
One of the goals of the ELIXIR Intrinsically Disordered Protein (IDP) community is create a registry called IDPcentral. The registry will aggregate data contained in the community’s specialist data sources such as DisProt, MobiDB, and Protein Ensemble Database (PED) so that proteins that are known to be intrinsically disordered can be discovered; with summary details of the protein presented, and the specialist source consulted for more detailed data. At the ELIXIR BioHackathon-Europe 2020, we aimed to investigate the feasibility of populating IDPcentral harvesting the Bioschemas markup that has been deployed on the IDP community data sources. The benefit of using Bioschemas markup, which is embedded in the HTML web pages for each protein in the data source, is that a standard harvesting approach can be used for all data sources; rather than needing bespoke wrappers for each data source API. We expect to harvest the markup using the Bioschemas Markup Scraper and Extractor (BMUSE) tool that has been developed specifically for this purpose. The challenge, however, is that the sources contain overlapping information about proteins but use different identifiers for the proteins. After the data has been harvested, it will need to be processed so that information about a particular protein, which will come from multiple sources, is consolidated into a single concept for the protein, with links back to where each piece of data originated.As well as populating the IDPcentral registry, we plan to consolidate the markup into a knowledge graph that can be queried to gain further insight into the IDPs. 1 minute read

Meetings

Recent preprints

Network analysis of specimen co-collection

ELIXIR Software Management Plan for Life Sciences

Rapid metagenomic workflow using annotated 16S RNA dataset

The COVID-19 epidemiology and monitoring ontology

Measuring outcomes and impact from the BioHackathon Europe

SnpReportR: A Tool for Clinical Reporting of RNAseq Expression and Variants

Exploiting Bioschemas Markup to Populate IDPcentral