Taxonomic Process Workshops

Presenters: variously Garry Jolley-Rogers, Margaret Cawsey, Jim Croft, Jeremy Price

You can read an annotated version of this page here.

Intro (Margaret)

What do we want from this?

  • An understanding of what you do
  • An understanding of where you think Informatics can and cannot contribute to your taxonomic process
  • General information which will contribute to the way we think about our project

We are not alone in the world

  • There are other groups in the world addressing these very issues (e.g. EDIT).
  • We are not working in isolation, but talking to these groups, contributing to them and taking from them what they have already done so we don't waste time reinventing any wheels.
  • However, it is clear that none of these groups have satisfactorily addressed the issues involved in the use of informatics to facilitate the taxonomic process.
  • We have begun our own analysis which we hope to use as a foil to elicit information that we don't have, haven't thought of so we can develop our understanding further, sharpen the analysis and models and perhaps find out what hasn't worked in the past and why.

Workshop run from the Wiki (lead by Garry and facilitated by Margaret) -

Notes from Plant Taxonomists Workshop, 31/07/2008

Present: Judy West (JW), Joe Miller (JM), Brendan Lepschi (BL), Bernard Pfeil (BF), Richard Watts (RW)
Garry Jolley-Rogers (GJR), Jeremy Price (JP), Margaret Cawsey (MC)

Feedback and Responses

Overview

  • JW: Can see why we need to start with an institution but need to deal with organisms and disciplines. Doesn't like institutions in the table.
  • BP: Particularly as institutions change. Taxa are the way to go.

Collecting

  • JW: likes the notion of a "collection event".
    JM: his project went around in the red circle for 8 years and now are going around in the blue circle.
  • 3rd party not clearly understood.
  • See BP's amendment to the flowchart; there needs to be a large circle around Collection events and 3rd party and add a loans process.Need to define types (?).
  • BL: "Loans" is an essential bottleneck; people always need materials; BP: often cannot anticipate the need to borrow material until quite a way into the process.
  • Need to publish where you've collated material from.
  • Imaging could speed up this "process" (Informatics input).
  • BP: raised the use of PDAs to collect field data (Informatics input).


Accession

  • JW: doesn't like the term "reception"; doesn't apply to plants? "Pre-processing"? "Arrival"? Terminology needs to be considered.
  • The processing circle needs to be a big black box. This box needs to be analysed in its own right, and may need to become a stage of the Taxonomic Process with its own flowchart.
  • Data-basing needs to be a process, with arrows leading to/form it and Assessment, Processing, Curation etc. and leading to/from it and the Refining specimen documentation black box (see also BP's amended flowchart.
  • Curation is a stage of the Taxonomic Process in its own right, as opposed to merely a part of the Accession stage, and needs its own flowchart. This will more usefully encompass "Further curation" as well.
  • Borrowing Material for Assessment, Curation, and Research (came up in the meeting).
  • Lending Material.
  • The sub-samples/tracking samples will need expansion to take account of entomological collection processes (e.g. tree-fogging) where there are lots and sub-lots as opposed to merely samples and sub-samples. The specimens from tree-fogging could feed into many different projects, so tracking samples, lots, sublots will have to be treated differently. (BP: the individual specimen is more easily associated with sub-samples of itself, whereas all of the other specimens from the same tree may not be easily associated/tracked together.) This area may well have to become a full stage in the Taxonomic Process with its own flowchart.

Research

The conversation around this subject concentrated a great deal on LIMS.
  • Replace the word "Taxonomic" with "Research" in the Hypothesis box, otherwise the Research stage appears to be relevant only to Alpha taxonomy and not phylogenetic research.
  • Research Hypothesis process is not predictable. Some steps are linked, other elements need not be linked.
  • JW: HubRIS should not look at characters but at links to other methods.
  • The inductive reasoning box needs to be expanded to consider elements such as measurements, analysis.
  • JW: contract someone to look at the sample labelling process. Labelling changes as taxonomic revisions occur and the labelling could be assisted by Informatics e.g. barcodes, and other automatic methods of recording specimen/sample/sub-sample identification.
  • LIMS - needs to be teased out; varies from lab to lab.
  • RW: dynamics in the workforce means that you end up with freezers full of stuff that nobody knows about.
  • Corporate databases should store data on primers which didn't work. Corporate protocols should be developed to ensure that these data are recorded as an accepted part of ordinary work practice. Until both databases and protocols are available, it is likely that these data will remain incompletely recorded.
  • Informatics can assist with the assemblage of metadata which should assist in keeping track of samples and sub-samples and the development of a FailBank (as opposed to SuccessBank = GenBank - JP).
  • It is not necessary to record details of methods, temperatures etc. Only the primer information is of use.
  • BP: If FailBank was linked to primer ordering accounts, e.g. via the oligoform, this would be a useful way of standardising data capture as well as managing the ordering process. You get the primer information from the same LIMS information and this would hook the DNA to the specimen database.
  • JW: a link to ANSHIR and APNI is good, but not to be tightly tied as part of the Oracle database.
  • Data capture needs to be standardised and made as easy as possible so it doesn't drive the scientists crazy. If it does drive them crazy, it won't get done.

Web-based tools to facilitate scientific collaboration

  • It is clear that there is some confusion about what the Wiki will be useful for, accompanied by some ignorance as to the benefits of other web-based tools which facilitate communication and collaboration, each in different ways.
  • At the end of this workshop, the participants indicated an interest in finding out about the variety and usage of such tools.
  • * HubRIS will investigate the logistics of putting together (a) workshop(s) aimed at educating TRIN scientists in more sophisticated use of internet capabilities to facilitate their science and collaboration with colleagues.

-- MargaretC - 31 Jul 2008

Notes from Aquatic Macroinvertebrates workshop 28-29 August 2008

Present: Phil Suter (PS), Jeff Webb (JW), David Yeates, Garry Jolley-Rogers (GJR), Jeremy Price (JP), Margaret Cawsey (MC)

Responses and Further Annotations

General issues with the flowcharts

  • The "black box" terminolgy is confusing to the participants as it is open to misinterpretation. The term should be changed to something else.
  • The flowcharts are getting too complicated and the iteration or non-linearity is making them very difficult to interpret. In a sense, although the process can be interative, in part or in whole, the steps are still linear. Bernard Pfeil's phylogenetic process flowchart demonstrates this, and is far easier to understand and is thus more descriptive of the different components of each sub-process etc. The original flowcharts are important to illustrate the iteration, but the internal components need to be described by more linear flowcharts.

Components of the process

Curation (specimens and data)

  • ANIC will take the mayfly specimens, depending on the quality of curation.
  • The AMI project suffers from the fact that they have nowhere to curate anything or store anything. Researchers are reduced to storing things at home etc.
  • This leads to the recurring issue - the long term problem of data (=specimens, papers etc.) getting lost when researchers retire, leave, drop off the perch. This occurs right through the university system and government.
  • Collection and collection maintenance is a major problem in Australia
  • There is no long-term data repository and databases in Australian Universities; basically, Universities are not suitable as corporate collections and databases.
  • Poor curation also leads to data loss; e.g. where location is not recorded specimens are useless.
  • Where research is done and published, it is crucial that specimens do NOT vanish. For ecological collections, if specimens are lost this is unfortunate but not tragic. It IS tragic for published specimens.
  • Given the university-institutional relationship, a partnership with a corporate time-frame is the only way to go, otherwise published specimens will not be universally available to the taxonomic community. Getting this into the universal informatics mindset is crucial.
  • The information cannot necessarily be parsed from publications because descriptions are not standard across taxa (or within taxa?) and there are things missing.
  • We need to institute the technology of web service from corporate databases i.e. a trusted server on whatever corporate network with the database and which can serve the data to outside aggregators. This means that institutions involved with science need to grasp that they need to run scientific databases.

Collection

Collecting event

Mayflies
  1. One locality (= site), one time, one GPS coordinate, 1 photo, 1 substrate, flowing yes/no, flow speed fast/slow, 1 netful from which desired taxa picked out and put into a jar of ethanol (i.e. only keep taxa of interest.
  2. Take the jar back to the lab.
  3. Sort the jar into taxa - which can take days.
    1. Sort first into morphological groups
    2. each group gets an accession number
    3. DNA samples taken from each group to see whether they've got single or mixes of species.

Flies
  1. One locality, one malaise trap, left out for a week with ethanol.
  2. Take the jar back to the lab.
  3. Sort the taxa of interest and leave the rest of the bycatch in the jar, which is stored. At this time, data are lost because the permit and other constraints pertaining to the capture are not linked to the jar in any way.
  4. Unsure how the jar is documented. David Y. knows where it is. I think the information is in a database. I also think that this information is not readily available to others so the valuable resource i.e. the bycatch might be, to all intents and purposes, lost.
  5. Conclusion: Collection events should be documented on the web so people are generally aware of the availability of valuable bycatches which might contain taxa in which they are interested.
  6. However, David Y. suggests that experts in different taxa manage to collect more of their speciality taxa in traps than others (due to placement for example), so the bycatches are likely to be less valuable for other taxa than the sample was for flies.
  7. David Y. would have to be convinced of the value of using PDAs to collect field data.
Application of informatics to assist
  1. Phil S. and Jeff W. would be amenable to the use of PDAs to collect the site data (point 1) in the field and synch this directly with the database on return to the lab. David Y. would have to be convinced of the value of using PDAs to collect field data.

Notes from Discussion with Helen Thompson ABRS

Details?