Tuesday, January 29, 2013

The seven deadly sins of DNA Barcoding (2)

Inadequate a priori identification of specimens


Let's get to sin number two in our series on the Collins and Cruickshank paper.

The authors point out an issue that limits the use of  DNA Barcoding as a practical resource. Human error and uncertainty in creating and curating reference libraries result in conflicting identifications because multiple labs are working on the same taxa. In the process of their morphological identifications they ascribe different taxonomic names to the same species. 

The problem is as old as taxonomy. The accuracy of an identification relies heavily on the experience of the identifier and the availability of proper taxonomic keys. I am sure that all collections have specimens in their drawers that are conflicting within and among institutions.

The good thing about DNA Barcoding is the fact that it can unsheathe such issues. The public availability of DNA Barcode data through BOLD allowed e.g. Collins and Cruickshank to show that there are many
unambiguous species-level identifications for ornamental cyprinid fishes and that the amount of error increased over time. This is indeed a problem for DNA Barcoding when it comes to practical applications but it certainly is not a problem that was generated by it. 

Over the years I've collaborated with quite a few curators at museums and many of them started using DNA Barcoding to clean up their collections. While it would be a life-long undertaking to identify incorrect names within their collection by using traditional approaches they use DNA Barcoding as a first filtering mechanism. They weed out the ones that are not placed within their expected group and subsequently revisit the actual voucher specimen to determine the reason for the wrong placement thereby reducing the amount of errors in their collection.

But back to the problem at hand. The rapid accumulation of DNA Barcodes over the last 10 years has led to an increasing amount of contradicting identifications and the introduction of Barcode Index Numbers (BINs) in BOLD has made it quite simple to spot those. The authors emphasize a crucial aspect of DNA barcoding is the maintenance of records, supporting information and voucher specimens; this is what sets BOLD apart from GenBank. I find it important to stress that all this information associated with a barcode record makes it so much easier to investigate contradictions. BOLD also began to provide a framework for community-based annotation of barcode data that can facilitate subsequent communication between researchers on the subject. As a result names can be changed and harmonised hopefully much easier and faster. In addition it can spur necessary discussions about the taxonomic status of some species.

Often the problem of different taxonomic names ascribed to the same species could be solved by increased diligence over how identifications are generated and justified. This would require a few more informations provided with each record but would go a long way to help. In 2011 Bob Hanner and I therefore proposed the implementation of a system of identification confidence to the FishBOL community. It is based on a system that is already in use at the Commonwealth Scientific and Industrial Research Organisation (CSIRO), in Australia. Identifications are rated according to the degree of expertise used and effort made. The system has five levels comprising a range of expertise. Highly reliable identifications should be provided by either an internationally recognized authority of the group, or a specialist that is presently studying the group in the respective region. The lowest level is defined as superficial when the specimen was identified by either a trained identifier who is uncertain of the family placement of the species, an untrained identifier using, at best, figures in a guide, or where the status and expertise of the identifier is unknown. In our paper we also stressed a sixth option: the case of an unknown specimen identified using only the BOLD ID engine or another genetic database. It is essential to introduce such labels to avoid the creation of a self-referential database.

Our authors go even one step further and stipulate that it should be mandatory for publication to provide a bibliography of reference material and morphological characters used for identification. I think this is a very helpful idea and I would go one step further and ask identifiers to provide this information in the database already. Many records will probably never be published formally in a paper but still represent very valuable data. Some taxonomic keys provide a hierarchical numbering system for every species. In such a case the reference to the source and number would be all that is needed to reduce the amount of detective work necessary to clarify issues with conflicting identifications.

No comments:

Post a Comment