Wednesday, June 21, 2017

2017 GBIF Ebbe Nielsen Challenge






For the third year GBIF is running its Ebbe Nielsen Challenge. Developers and data scientists have three months to create and submit tools capable of liberating species records from open data repositories for scientific discovery and reuse. Here some more details:

Background
This year's Challenge will seek to leverage the growth of open data policies among scientific journals and research funders, which require researchers to make the data underlying their findings publicly available. Adoption of these policies represents an important first step toward increasing openness, transparency and reproducibility across all scientific domains, including biodiversity-related research.

To abide by these requirements, researchers often deposit datasets in public open-access repositories. Potential users are then able to find and access the data through repositories as well as data aggregators like OpenAIRE and DataONE. Many of these datasets are already structured in tables that contain the basic elements of biodiversity information needed to build species occurrence records: scientific names, dates, and geographic locations, among others.

However, the practices adopted by most repositories, funders and journals do not yet encourage the use of standardized formats. This approach significantly limits the interoperability and reuse of these datasets. As a result, the wider reuse of data implied if not stated by many open data policies falls short, even in cases where open licensing designations (like those provided through Creative Commons) seem to encourage it.

The challenge
The 2017 GBIF Ebbe Nielsen Challenge seeks submissions that repurpose these datasets and adapting them into the Darwin Core Archive format (DwC-A), the interoperable and reusable standard that powers the publication of almost 800 million species occurrence records from the nearly 1,000 worldwide institutions now active in the GBIF network.

The 2017 Ebbe Nielsen Challenge will task developers and data scientists to create web applications, scripts or other tools that automate the discovery and extraction of relevant biodiversity data from open data repositories. Such tools might generate datasets ready for publication on GBIF.org by:

  • Automating searches of open data available in public repositories
  • Effectively mining the information needed to generate checklists, species occurrence and sampling-event datasets (e.g. scientific names, date and location of occurrence et al.) from datasets in these repositories
  • Mapping datasets’ column headings and/or contents with standardized Darwin Core terms
  • Routinely converting the reformatted data into Darwin Core archive formats ready for publication through GBIF.org

Friday, June 16, 2017

Weekend reads

Hot of the press - more reading material from the DNA barcoding community. Not as many as last week in which I had a lot of catch up to do. Nevertheless, very interesting reads.

Thirty-four species of Culicidae are present in the UK, of which 15 have been implicated as potential vectors of arthropod-borne viruses such as West Nile virus. Identification of mosquito feeding preferences is paramount to the understanding of vector-host-pathogen interactions which, in turn, would assist in the control of disease outbreaks. Results are presented on the application of DNA barcoding for vertebrate species identification in blood-fed female mosquitoes in rural locations. Blood-fed females (n = 134) were collected in southern England from rural sites and identified based on morphological criteria. Blood meals from 59 specimens (44%) were identified as feeding on eight hosts: European rabbit, cow, human, barn swallow, dog, great tit, magpie and blackbird. Analysis of the cytochrome c oxidase subunit I mtDNA barcoding region and the internal transcribed spacer 2 rDNA region of the specimens morphologically identified as Anopheles maculipennis s.l. revealed the presence of An. atroparvus and An. messeae. A similar analysis of specimens morphologically identified as Culex pipiens/Cx. torrentium showed all specimens to be Cx. pipiens (typical form). This study demonstrates the importance of using molecular techniques to support species-level identification in blood-fed mosquitoes to maximize the information obtained in studies investigating host feeding patterns.

We used a 227-bp fragment of the mitochondrial gene cytochrome oxidase I (DNA "barcode") in conjunction with morphological data to study specimens of the Neotropical genus Orthocomotis Dognin, 1906, acquired from natural history collections. We examined over 20 species of Orthocomotis from 17 localities in Colombia, Ecuador, and Peru. The analysis identified 32 haplotypes among the 62 specimens and found no haplotypes shared among species. The molecular study revealed not only the usefulness of short COI sequences in discriminating among Orthocomotis species but also showed distinctness of four clusters which correspond to those based on morphological (genitalia) characters. Moreover, the molecular results suggest the occurrence of rapid speciation in Orthocomotis. We hypothesize that this may be linked to the great biodiversity of potential host plants in Neotropical ecosystems.

Taxonomic identification of pollen has historically been accomplished via light microscopy but requires specialized knowledge and reference collections, particularly when identification to lower taxonomic levels is necessary. Recently, next-generation sequencing technology has been used as a cost-effective alternative for identifying bee-collected pollen; however, this novel approach has not been tested on a spatially or temporally robust number of pollen samples. Here, we compare pollen identification results derived from light microscopy and DNA sequencing techniques with samples collected from honey bee colonies embedded within a gradient of intensive agricultural landscapes in the Northern Great Plains throughout the 2010-2011 growing seasons. We demonstrate that at all taxonomic levels, DNA sequencing was able to discern a greater number of taxa, and was particularly useful for the identification of infrequently detected species. Importantly, substantial phenological overlap did occur for commonly detected taxa using either technique, suggesting that DNA sequencing is an appropriate, and enhancing, substitutive technique for accurately capturing the breadth of bee-collected species of pollen present across agricultural landscapes. We also show that honey bees located in high and low intensity agricultural settings forage on dissimilar plants, though with overlap of the most abundantly collected pollen taxa. We highlight practical applications of utilizing sequencing technology, including addressing ecological issues surrounding land use, climate change, importance of taxa relative to abundance, and evaluating the impact of conservation program habitat enhancement efforts.

BACKGROUND:
Claims abound that the Transvaal red milkwood, Mimusops zeyheri, indigenous to areas with tropical and subtropical commercial fruit trees and fruiting vegetables in South Africa, is relatively pest free owing to its copious concentrations of latex in the above-ground organs. On account of observed fruit fly damage symptoms, a study was conducted to determine whether M. zeyheri was a host to the notorious quarantined Mediterranean fruit fly (Ceratitis capitata).
RESULTS:
Fruit samples were kept for 16-21 days in plastic pots containing moist steam-pasteurised growing medium with tops covered with a mesh sheath capable of retaining emerging flies. Microscopic diagnosis of the trapped flies suggested that the morphological characteristics were congruent with those of C. capitata, which was confirmed through cytochrome c oxidase I (COI) gene sequence alignment with a 100% bootstrap value and 99% confidence probability when compared with those from the National Centre for Biotechnology Information database.
CONCLUSION:
This study demonstrated that M. zeyheri is a host of C. capitata. Therefore, C. capitata from infestation reservoirs of M. zeyheri fruit trees could be a major threat to the tropical and subtropical fruit industries in South Africa owing to the fruit-bearing nature of the new host.

International agreements mandate the expansion of Earth's protected-area network as a bulwark against the continued extinction of wild populations, species, and ecosystems. Yet many protected areas are underfunded, poorly managed, and ecologically damaged; the conundrum is how to increase their coverage and effectiveness simultaneously. Innovative restoration and rewilding programmes in Costa Rica's Area de Conservacion Guanacaste and Mozambique's Parque Nacional da Gorongosa highlight how degraded ecosystems can be rehabilitated, expanded, and woven into the cultural fabric of human societies. Worldwide, enormous potential for biodiversity conservation can be realized by upgrading existing nature reserves while harmonizing them with the needs and aspirations of their constituencies.

Seed dispersal constitutes a pivotal process in an increasingly fragmented world, promoting population connectivity, colonization and range shifts in plants. Unveiling how multiple frugivore species disperse seeds through fragmented landscapes, operating as mobile links, has remained elusive owing to methodological constraints for monitoring seed dispersal events. We combine for the first time DNA barcoding and DNA microsatellites to identify, respectively, the frugivore species and the source trees of animal-dispersed seeds in forest and matrix of a fragmented landscape. We found a high functional complementarity among frugivores in terms of seed deposition at different habitats (forest vs. matrix), perches (isolated trees vs. electricity pylons) and matrix sectors (close vs. far from the forest edge), cross-habitat seed fluxes, dispersal distances, and canopy-cover dependency. Seed rain at the landscape-scale, from forest to distant matrix sectors, was characterized by turnovers in the contribution of frugivores and source-tree habitats: open-habitat frugivores replaced forest-dependent frugivores, whereas matrix trees replaced forest trees. As a result of such turnovers, the magnitude of seed rain was evenly distributed between habitats and landscape sectors. We thus uncover key mechanisms behind 'biodiversity-ecosystem function' relationships, in this case, the relationship between frugivore diversity and landscape-scale seed dispersal. Our results reveal the importance of open-habitat frugivores, isolated fruiting trees, and anthropogenic perching sites (infrastructures) in generating seed dispersal events far from the remnant forest, highlighting their potential to drive regeneration dynamics through the matrix. This study helps to broaden the 'mobile link' concept in seed dispersal studies by providing a comprehensive and integrative view of the way in which multiple frugivore species disseminate seeds through real-world landscapes.

Thursday, June 15, 2017

Plants and climate change

Plants provide us with food, pastures for livestock, and places for recreation and wellbeing. They also directly and indirectly provide numerous invaluable ecosystem services such as water regulation, carbon sequestration and flood prevention. As a result, it is imperative that we understand how plant populations are responding to climate constraints now, and use that information to predict how they are likely to respond to climatic changes in the future.

In fact it might be very important to assess the persistence strategies of plants in any given habitat. Noting its mere presence does not paint a very useful picture as a species may be found in a particular area but that doesn't mean it is making much of a living there; it may, just, be making ends meet for the time being. An international group of ecologists tested the links between climate suitability and persistence strategies for nearly 100 populations of over 30 species of trees and herbs growing on 3 continents and 16 countries across the globe. Some of these data were gathered over the duration of a decade, allowing the researchers to identify emergent patterns linked to climate change with greater confidence.

What they found is that while many species are able to persist in less favourable climate conditions, those same species often do so by adopting last-stand strategies such as shrinking in size and temporarily suspending reproductive and vegetative growth. This merely helps them to survive and makes them more vulnerable to further changes and to disturbances such as wildfires or pest outbreaks. Many such disturbances are more likely today due to changing climates.

Not all plants have the life strategies to persist for extended periods of time in less favourable climates but our research is already helping to pinpoint those that do. One of the next steps is to design management strategies to help support these species and to safeguard the ecosystem services that they provide us.


Wednesday, June 14, 2017

Invasive species hotspots

Human-mediated transport beyond biogeographic barriers has led to the introduction and establishment of alien species in new regions worldwide. However, we lack a global picture of established alien species richness for multiple taxonomic groups. 

The number of established alien species varies across the world and it is where the most established alien species can be found and which factors influence their distribution. An international team created a database for eight animal and plant groups (mammals, birds, amphibians, reptiles, fishes, spiders, ants and vascular plants) that were found to occur in regions outside their original habitat. The study of the distribution of these species led the research team to identify 186 islands and 423 mainland regions in total thereby illustrating the global distribution of established alien species. 

The highest number of alien species can be found on islands and in the coastal regions of continents. The island of Hawaii was found to have the most alien species, followed by the north island of New Zealand and the small Sunda Islands of Indonesia. What these places have in common is that they are remote islands that used to be very isolated, lacking some taxa altogether, e.g. mammals. Today, these island regions are economically highly developed and maintain intense trade relationships with the mainlands. 

We found the number of alien species to be particularly high in densely populated areas as well as in economically highly developed ones. These factors increase the likelihood of humans introducing many new species to an area. This almost invariably results in the destruction of natural habitats, which in turn allows non-indigenous species to spread. Islands and coastal regions seem to be particularly vulnerable because they occupy leading roles in global overseas trade. There is yet another considerable risk besides the introduction of new alien species. Many of the alien plants and animals that, until now, have been kept in people's homes and gardens and are not yet to be found in the wild might well spread in the future. Given the word-wide effects of climate change, this is in fact a distinct possibility.




Tuesday, June 13, 2017

Wide-Open

Number of samples in the NCBI GEO
Open data is a vital pillar of open science and a key enabler for reproducibility, data reuse, and novel discoveries. Enforcement of open-data policies, however, largely relies on manual efforts, which invariably lag behind the increasingly automated generation of biological data.

Researchers routinely deposit data in online repositories. But they are only human and its not rare that they forget to inform a repository to release their data once a paper is published. Open data is a vital pillar of open science, enabling other researchers to reproduce results and use the same datasets to produce novel discoveries. While many scientific journals now require published authors to make the data underlying their findings publicly available, these policies often go unenforced. The challenge is substantial -- the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus repository (GEO) alone contains 80,985 public datasets, spanning hundreds of tissue types in thousands of organisms -- and the rapid growth in data makes it difficult for journals or data repositories to "police" whether datasets that should be made publicly available actually are.

A new tool, developed by University of Washington and Microsoft researchers automatically identifies datasets overdue for public release by applying text mining to dataset references in published articles and parse query results from repositories to determine if the datasets remain private.  The system is called Wide-Open and is available under an open source license on GitHub.

The colleagues tested their tool on two popular data repositories maintained by the NCBI - GEO and the Sequence Read Archive (SRA) . Wide-Open identified a large number of overdue datasets, which spurred repository administrators to respond by releasing 400 datasets within one week.

Monday, June 12, 2017

Defying Muller's Ratchet?

Meloidogyne incognita in action
For most animal species sexual reproduction is favored over asexual reproduction. A proposed mechanism to explain this is Muller's ratchet which assumes that the genomes of an asexual population accumulate deleterious mutations in an irreversible manner. However, this negative effect may not be prevalent in organisms which, while they reproduce asexually, also undergo other forms of recombination. 

Root-knot nematodes (Meloidogyne spp.) exhibit a diversity of reproductive modes ranging from obligatory sexual to fully asexual reproduction. Intriguingly, the most widespread and devastating species to global agriculture are those that reproduce asexually, without meiosis.  Instead of hitting an evolutionary dead-end, these plant pests have a wider geographic range and can infect greater numbers of crops than sexual species. 

To investigate the reasons behind their success, researchers sequenced and assembled the genomes of the three most damaging root-knot nematodes and compared them to a sexual relative. The asexual genomes are large, with numerous duplicated regions resulting from past reproduction events where at least two individual genomes recently hybridized together. They detected signs of positive selection between these gene copies and confirmed functional divergence at the expression pattern level. The colleagues think that it is this peculiar hybrid genome structure that provide these nematodes with a potential for adaptation and plasticity and explains the paradoxical success in the absence of sex:

By analyzing and comparing their genomes, we provide large-scale evidence that these asexual nematodes underwent hybridization and are polyploid. Their duplicated hybrid genome architectures provide these nematodes with multi-copy genes showing diverged sequence and expression patterns where their sexual relatives have very closely related alleles. We suspect these multiple copies provide a reservoir to adapt to different environments and plant hosts, and constitute an evolutionary advantage over their sexual relatives (at least in the short term). Their intriguing parasitic success despite absence of sex could thus be due to their hybrid origin where they combined multiple genomes of adapted parasitic nematodes in one single species.

In addition, Transposable elements (TE) cover a ~1.7 times higher proportion of the genomes of the ameiotic asexual Meloidogyne compared to the sexual relative and might also participate in their plasticity. The intriguing parasitic success of asexually-reproducing Meloidogyne species could be partly explained by their TE-rich composite genomes, resulting from allopolyploidization events, and promoting plasticity and functional divergence between gene copies in the absence of sex and meiosis.

It becomes paramount to understand under what conditions these hybrids came to be. It is a scary thought that similar conditions could favor the rise of even more aggressive and devastating new hybrids.

Friday, June 9, 2017

Weekend reads

After a longer silence due to some changes in the job and travel I am slowly picking up posting duties. I have decided to move the barcoding paper suggestions to Friday and rename these posts. If you happen to have nothing else to do on the weekend or in case you need some good reads for a quite moment, here they are:

Wine is a complex beverage, comprising hundreds of metabolites produced through the action of yeasts and bacteria in fermenting grape must. Commercially, there is now a growing trend away from using wine yeast ( Saccharomyces ) starter cultures, towards the historic practice of uninoculated or "wild" fermentation, where the yeasts and bacteria associated with the grapes and/or winery perform the fermentation. It is the varied metabolic contributions of these numerous non- Saccharomyces species that are thought to impart complexity and desirable taste and aroma attributes to wild ferments in comparison to their inoculated counterparts. To map the microflora of spontaneous fermentation, metagenomic techniques were employed to characterize and monitor the progression of fungal species in five different wild fermentations. Both amplicon-based ribosomal DNA internal transcribed spacer (ITS) phylotyping and shotgun metagenomics were used to assess community structure across different stages of fermentation. While providing a sensitive and highly accurate means of characterizing the wine microbiome, the shotgun metagenomic data also uncovered a significant over-abundance bias in the ITS phylotyping abundance estimations for the common non- Saccharomyces wine yeast genus Metschnikowia . By identifying biases such as that observed for Metschnikowia , abundance mesurements from future ITS-phylotyping datasets can corrected to provide more accurate species representation. Ulitmtaely, as more shotgun metagenomic and single-strain de novo assemblies for key wine species become available, the accuracy of both ITS-amplicon and shotgun studies will greatly increase, providing a powerful methodology for deciphering the influence of the microbial community on the wine flavor and aroma.

Genetic barcodes of arctic medusae and meiobenthic cnidarians have uncovered a fortuitous connection between the medusa Plotocnide borealis Wagner, 1885 and the minute, mud-dwelling polyp Boreohydra simplex Westblad, 1937. Little to no sequence differences exist among independently collected samples identified as Boreohydra simplex and Plotocnide borealis, showing that the two different forms represent a single species that is henceforth known by the older name Plotocnide borealis Wagner, 1885. The polyp form has been observed to produce bulges previously hypothesized to be gonophores, and the results here are consistent with that view. Interestingly, the polyp has also been reported to produce egg cells in the epiderm, a surprising phenomenon that we document here for only the second time. Thus, P. borealis produces eggs in two different life stages, polyp and medusa. This is the first documented case of a metagenetic medusozoan species being able to produce gametes in both the medusa and polyp stage. It remains unclear what environmental/ecological conditions modulate the production of eggs and/or medusa buds in the polyp stage. Similarly, sperm production, fertilization and development are unknown, warranting further studies.

The mosquito family (Diptera: Culicidae) constitutes the most medically important group of arthropods because certain species are vectors of human pathogens. In some parts of the world, the diversity is so high that the accurate delimitation and/or identification of species is challenging. A DNA-based identification system for all animals has been proposed, the so-called DNA barcoding approach. In this study, our objectives were (i) to establish DNA barcode libraries for the mosquitoes of French Guiana based on the COI and the 16S markers, (ii) to compare distance-based and tree-based methods of species delimitation to traditional taxonomy, and (iii) to evaluate the accuracy of each marker in identifying specimens. A total of 266 specimens belonging to 75 morphologically identified species or morphospecies were analyzed allowing us to delimit 86 DNA clusters with only 21 of them already present in the BOLD database. We thus provide a substantial contribution to the global mosquito barcoding initiative. Our results confirm that DNA barcodes can be successfully used to delimit and identify mosquito species with only a few cases where the marker could not distinguish closely related species. Our results also validate the presence of new species identified based on morphology, plus potential cases of cryptic species. We found that both COI and 16S markers performed very well, with successful identifications at the species level of up to 98% for COI and 97% for 16S when compared to traditional taxonomy. This shows great potential for the use of metabarcoding for vector monitoring and eco-epidemiological studies.

Molecular sequences in public databases are mostly annotated by the submitting authors without further validation. This procedure can generate erroneous taxonomic sequence labels. Mislabeled sequences are hard to identify, and they can induce downstream errors because new sequences are typically annotated using existing ones. Furthermore, taxonomic mislabelings in reference sequence databases can bias metagenetic studies which rely on the taxonomy. Despite significant efforts to improve the quality of taxonomic annotations, the curation rate is low because of the labor-intensive manual curation process. Here, we present SATIVA, a phylogeny-aware method to automatically identify taxonomically mislabeled sequences ('mislabels') using statistical models of evolution. We use the Evolutionary Placement Algorithm (EPA) to detect and score sequences whose taxonomic annotation is not supported by the underlying phylogenetic signal, and automatically propose a corrected taxonomic classification for those. Using simulated data, we show that our method attains high accuracy for identification (96.9% sensitivity/91.7% precision) as well as correction (94.9% sensitivity/89.9% precision) of mislabels. Furthermore, an analysis of four widely used microbial 16S reference databases (Greengenes, LTP, RDP and SILVA) indicates that they currently contain between 0.2% and 2.5% mislabels. Finally, we use SATIVA to perform an in-depth evaluation of alternative taxonomies for Cyanobacteria. SATIVA is freely available at https://github.com/amkozlov/sativa.

1. In recent years, large-scale DNA barcoding campaigns have generated an enormous amount of COI barcodes, which are usually stored in NCBI's GenBank and the official Barcode of Life database (BOLD). BOLD data are generally associated with more detailed and better curated meta-data, because a great proportion is based on expert-verified and vouchered material, accessible in public collections. In the course of the initiative German Barcode of Life (GBOL), data were generated for the reference library of 2,846 species of Coleoptera from 13,516 individuals.
2. Confronted with the high effort associated with the identification, verification and data validation, a bioinformatic pipeline, “TaxCI” was developed that i) identifies taxonomic inconsistencies in a given tree topology (optionally including a reference data set), ii) discriminates between different cases of incongruence in order to identify contamination or misidentified specimens, iii) graphically marks those cases in the tree, which finally can be checked again and, if needed, corrected or removed from the dataset. For this, “TaxCI” may use DNA-based species delimitations from other approaches (e.g., mPTP) or may perform implemented threshold-based clustering.
3. The data-processing pipeline was tested on a newly generated set of barcodes, using the available BOLD records as a reference. A data revision based on the first run of the TaxCI tool resulted in the second TaxCI analysis in a taxonomic match ratio very similar to the one recorded from the reference set (92 vs 94%). The revised dataset improved by nearly 20% through this procedure compared to the original, uncorrected one.
4. Overall, the new processing pipeline for DNA barcode data allows for the rapid and easy identification of inconsistencies in large datasets, which can be dealt with before submitting them to public data repositories like BOLD or GenBank. Ultimately, this will increase the quality of submitted data and the speed of data submission, while primarily avoiding the deterioration of the accuracy of the data repositories due to ambiguously identified or contaminated specimens.

Food trade globalization and the growing demand for selected food varieties have led to the intensification of adulteration cases, especially in the form of species substitution/mixing with cheaper taxa. This phenomenon acquired huge economic impact and sometimes even public health implications. DNA barcoding represents a well-proven molecular tool to assess the authenticity of food items, although its diffusion is hampered by analytical constraints and timeframes that are often prohibitive for food market. To address such issues, we have introduced a new technology, named NanoTracer, which allows for rapid and naked-eye molecular traceability of any food, employing limited instrumentation and cost-effective reagents. Moreover, unlike sequencing, this method allows to identify not only the substitution of a fine ingredient, but also its dilution with cheaper ones.

In this study, we used several molecular techniques to develop a fast and reliable protocol (DNA Verity Test, DVT) for the characterization and confirmation of the species or taxa present in herbal infusions. As a model plant for this protocol, Camellia sinensis, a traditional tea plant, was selected due to the following reasons: its historical popularity as a (healthy) beverage, its high selling value, the importation of barely recognizable raw product (i.e., crushed), and the scarcity of studies concerning adulterants or contamination. The DNA Verity Test includes both the sequencing of DNA barcoding markers and genotyping of labeled-PCR DNA barcoding fragments for each sample analyzed. This protocol (DVT) was successively applied to verify the authenticity of 32 commercial teas (simple or admixture), and the main results can be summarized as follows: (1) the DVT protocol is suitable to detect adulteration in tea matrices (contaminations or absence of certified ingredients), and the method can be exported for the study of other similar systems; (2) based on the BLAST analysis of the sequences of rbcL+matK±rps7-trnV(GAC) chloroplast markers, C. sinensis can be taxonomically characterized; (3) rps7-trnV(GAC) can be employed to discriminate C. sinensis from C. pubicosta; (4) ITS2 is not an ideal DNA barcode for tea samples, reflecting potential incomplete lineage sorting and hybridization/introgression phenomena in C. sinensis taxa; (5) the genotyping approach is an easy, inexpensive and rapid pre-screening method to detect anomalies in the tea templates using the trnH(GUG)-psbA barcoding marker; (6) two herbal companies provided no authentic products with a contaminant or without some of the listed ingredients; and (7) the leaf matrices present in some teabags could be constituted using an admixture of different C. sinensis haplotypes and/or allied species (C. pubicosta).

A large-scale comprehensive reference library of DNA barcodes for European marine fishes was assembled, allowing the evaluation of taxonomic uncertainties and species genetic diversity that were otherwise hidden in geographically restricted studies. A total of 4118 DNA barcodes were assigned to 358 species generating 366 Barcode Index Numbers (BIN). Initial examination revealed as much as 141 BIN discordances (more than one species in each BIN). After implementing an auditing and five-grade (A-E) annotation protocol, the number of discordant species BINs was reduced to 44 (13% grade E), while concordant species BINs amounted to 271 (78% grades A and B) and 14 other had insufficient data (grade D). Fifteen species displayed comparatively high intraspecific divergences ranging from 2·6 to 18·5% (grade C), which is biologically paramount information to be considered in fish species monitoring and stock assessment. On balance, this compilation contributed to the detection of 59 European fish species probably in need of taxonomic clarification or re-evaluation. The generalized implementation of an auditing and annotation protocol for reference libraries of DNA barcodes is recommended.