Meetings
BioHackSWAT4HCLS 2025
BioHackathon Europe 2025
4th BioHackathon Germany
DBCLS BioHackathon 2025
ELIXIR INTOXICOM
Recent preprints
-
Connecting molecular sequences to their voucher specimens
When sequencing molecules from an organism it is standard practice to create voucher specimens. This ensures that the results are repeatable and that the identification of the organism can be verified. It also means that the sequence data can be linked to a whole host of other data related to the specimen, including traits, other sequences, environmental data, and geography. It is therefore critical that explicit, preferably machine readable, links exist between voucher specimens and sequence. However, such links do not exist in the databases of the International Nucleotide Sequence Database Collaboration (INSDC). If it were possible to create permanent bidirectional links between specimens and sequence it would not only make data more findable, but would also open new avenues for research. In the Biohackathon we built a semi-automated workflow to take specimen data from the Meise Herbarium and search for references to those specimens in the European Nucleotide Archive (ENA). We achieved this by matching data elements of the specimen and sequence together and by adding a “human-in-the-loop” process whereby possible matches could be confirmed. Although we found that it was possible to discover and match sequences to their vouchers in our collection, we encountered many problems of data standardization, missing data and errors. These problems make the process unreliable and unsuitable to rediscover all the possible links that exist. Ultimately, improved standards and training would remove the need for retrospective relinking of specimens with their sequence. Therefore, we make some tentative recommendations for how this could be achieved in the future. -
Linking PubDictionaries with UniBioDicts to support Community Curation
One of the many challenges that biocurators face, is the continuous evolution of ontologies and controlled vocabularies and their lack of coverage of biological concepts. To help biocurators annotate new information that cannot yet be covered with terms from authoritative resources, we produced an update of PubDictionaries: a resource of publicly editable, simple-structured dictionaries, accessible through a dedicated REST API. PubDictionaries was equipped with both an enhanced API and a new software client that connects it to the Unified Biological Dictionaries (UBDs) uniform data exchange format. This client enables efficient search and retrieval of ad hoc created terms, and easy integration with tools that further support the curator’s specific annotation tasks. A demo that combines the Visual Syntax Method (VSM) interface for general-purpose knowledge formalization, with this new PubDictionaries-powered UBD client, shows it is now easy to incorporate the user-created PubDictionaries terminologies into biocuration tools. -
Progress on Data Stewardship Wizard during BioHackathon Europe 2020
We used the Virtual BioHackathon Europe 2020 to work on a number of projects for improvement of the data stewardship wizard: (a) We made first steps to analysis of what is needed to make all questions and answers machine actionable (b) We worked on supporting the Horizon 2020 Data Management Plan Template (c) Several new integrations were made, e.g. to ROR and Wikidata (d) we made a draft plan for supporting multiple languages and (e) we implemented many suggestions for improvement of the knowledge model that had been suggested to us over the past time. Quickly after the BioHackathon, the adapted knowledge model, new integrations and the H2020 template have been made available to all users of the wizard. -
Disease and pathway maps for Rare Diseases
In this article we present a workflow for construction of prototype rare disease maps based on the phenotypic description of a rare disease. We use stable disease and phenotype identifiers to i) retrieve disease-associated genes and genetic variants, ii) identify relevant mechanistic components from pathways, disease maps and interaction repositories, and iii) assemble these components into single diagram, available for visualisation and pathway analysis. The workflow allows construction of prototype diagrams representing mechanisms related to a rare disease, useful for further refinement and data interpretation. -
TogoEx: the integration of gene expression data
Report on the group ‘TogoEx’ in the BioHackathon Europe 2019. -
Characterization of Potential Drug Treatments for COVID-19 using Twitter
Since the classification of COVID-19 as a global pandemic, there have been many attempts to treat and contain the virus. Although there is no specific antiviral treatment recommended for COVID-19, there are several drugs that can potentially help with symptoms. In this work, we mined a large twitter dataset of 280 million tweets of COVID-19 chatter to identify discourse around potential treatments. While seemingly a straightforward task, due to the informal nature of language use in Twitter, we demonstrate the need of machine learning methods to aid in this task. By applying these methods we are able to recover almost 21% additional data than with traditional methods. -
Determining a novel feature-space for SARS-CoV-2 sequence data
The pandemicity & the ability of the SARS-COV-2 to reinfect a cured subject, among other damaging characteristics of it, took everybody by surprise. A global collaborative scientific effort was direly required to bring learned people from different niches of medicine & data science together. Such a platform was provided by COVID19 Virtual BioHackathon, organized from the 5th to the 11th of April, 2020, to ponder on the related pressing issues varying in their diversity from text mining to genomics. Under the “Machine learning” track, we determined optimal k-mer length for feature extraction, constructed continuous distributed representations for protein sequences to create phylogenetic trees in an alignment-free manner, and clustered predicted MHC class I and II binding affinity to aid in vaccine design. All the related work in available in a Github repository under an MIT license for future research.