Genome Annotation and Other Post-Assembly Workflows for the Tree of Life
Rapid advances in genome sequencing technologies have resulted in an explosion of referencequality genome assemblies across the tree of life. While these resources will be invaluable towards goals of species and biodiversity conservation, their application is limited when they lack accurate annotations of their functional elements. The European Reference Genome Atlas (ERGA) is the European node of the Earth Biogenome Project (EBP) and aims to share resources and knowledge to create fully-annotated reference genomes. ERGA strives to do this in a distributed manner, bringing together researchers from across the world, with common goals and understandings.In the BioHackathon Europe 2023, we came together to construct and test tools, pipelines and workflows for annotating protein-coding regions in assembled genomes. We specifically aimed to evaluate (a) the performance in a wide variety of non-model organisms and (b) the “usability” of pipelines for newcomers to annotation. This work required installing and implementing tools in a number of computational environments and infrastructures, sharing of both genomic resources and expertise between researchers from a range of institutes, and evaluation of annotation workflows performance and what input data is required in order to achieve a high quality genome annotation. Here we present the results of over 20 researchers in 8 time-zones working towards a robust implementation of genome annotation workflows in eukaryotic organisms.