Plate reader files

I work a lot with plate readers for measuring enzyme kinetics.
Every reader has it’s own output-format (often several - depending on settings), and they are generally a pain to parse.

I am contemplating writing a package with parsers for this kind of equipment (kinetic, endpoint, absorbance, fluorescence, …).

The only existing example I have found is getEnVisionRawData() from the cellHTS2 bioconductor package. The plater package (https://github.com/ropensci/plater) is dealing with similar data, and could be a good starting point.

Would anybody here be interested in collaborating on such a package?

@seaaan :point_up_2: fyi

Thanks @stefanie!

I would be happy to collaborate on something like this if I could be helpful. The approach I took in plater was to create a general purpose format that people could conveniently copy-and-paste their data into, regardless of the specific format from the particular instrument. This has the advantages of not having to provide a separate function for every instrument and storing metadata about the wells in the same file. It has the disadvantage that users have to copy and paste from the raw instrument file into a new file.

If I understand your goal correctly, you would like to automatically read in the data in the format provided by the instrument. I could imagine doing this in a couple of ways: (1) functions that convert from the raw instrument output to plater format and write a CSV file that users could add any necessary additional data to or (2) functions that convert from the raw instrument output to a tidy data frame. Another possibility would be to create a framework where users could easily create their own function to handle the raw data from their specific instrument.

Happy to help and discuss further! Let me know how I can be useful to you.

1 Like

Hi @seaaan,

Thanks a lot for offering to help.

What I have in mind at present is mostly your suggestion 2: parse the raw instrument file to a tidy data frame, and perhaps in time evolve a framework for easily writing these parsers.

I also like your idea of letting users add extra data (eg sample identifiers and location of controls) in plater-format.

Here’s my initial thoughts on which columns such a data-frame should have for a kinetic absorbance experiment in 384-well format:

  • readerfile (char) the name of the raw file parsed
  • barcode (char) the barcode of the plate (my reader can read barcodes)
  • well384 (char) the well (A01, A02, … P24)
  • absorbance_nm (num) the detected wavelength in nm, eg 405
  • kinetic_step (num) the cycle number (1, 2, … up to number of kinetic steps)
  • kinetic_sec (num) seconds since beginning of experiment
  • OD (num) the measured intensity (for absorbance typically between 0 and 3)
  • chamber_temperature_C (num) the temperature in degrees Celsius.
  • warnings (char) warnings reported for the plate or well

So the first row could look like this in csv-format:
readerfile,barcode,well384,absorbance_nm,kinetic_step,kinetic_sec,OD,chamber_temperature_C,warnings
exp01.xlsx,000XCFR,A01,405,1,3,0.0003,23.2,

I consider this to be the minimum of information needed for analyzing the results (I guess the kinetic_step is redundant, but it is very convenient). Any immediate comments on this?

Further columns I’m considering, but less sure about:

  • kintic_timestamp (date time) wall clock time-stamp of measurement
  • table_version (char) the name (including version) of this format. In the case above it could be “kinetic_absorbance_384_v1”

The table_version could also be an S3 class, but an advantage of a column is that it survives being stored as a csv-file.

Please comment.

That seems fine. Can you post an example raw data file? I would be in favor of including all the columns that you might use. I am also always in favor of storing as much as possible in data frames rather than making a special object, since this makes it more interoperable with other packages, so I would recommend against doing a special object. But you’re the one who will be using it so it’s up to you!

It should be straightforward for me (I think!) to write a function that takes a raw file and gives back the data frame you describe. Do you want me to do that? Or are you looking more for advice on how to go about it? Happy to help however is useful.

Hi @seaaan,

Thanks for the kind offer to help on the implementation. For now I’m mostly looking for advice on the columns of the data frame to store the data in. I find naming stuff surprisingly difficult :wink:
I appreciate your advice to include all relevant information. If we get this right, I hope I will not be the only user of this down the road.

Here’s a second version of the format, please comment:

  • table_version (char) the name (including version) of this format. In this case: “kinetic_absorbance_384_v1”
  • readerfile (char) the name of the raw file parsed
  • readerplate_barcode (char) the barcode of the plate (my reader can read barcodes)
  • well384 (char) the well (A01, A02, … P24)
  • absorbance_nm (num) the detected wavelength in nm, eg 405
  • kinetic_step (num) the cycle number (1, 2, … up to number of kinetic steps)
  • kinetic_sec (num) seconds since beginning of experiment
  • kinetic_timestamp (date time) wall-clock timestamp (ISO 8601) of the reading (eg 2018-05-29T15:29:00Z)
  • absorbance_value (num) the measured intensity (for absorbance typically between 0 and 3)
  • chamber_temperature_C (num) the temperature in degrees Celsius.
  • warnings (char) warnings reported for the plate or well

My thinking is that column names should be specific rather than generic, and they should include the unit (if any). This is to make it easier to write functions that use the data frame, and to track the columns in merges.

I’ll be happy to share examples of raw files. It seems I can only upload images here. What would be a good way to share such files?

Hi again @seaaan,

I’ve uploaded an example here: https://github.com/tp2750/platereader/tree/master/inst/ExampleFiles

As you probably know, these types of software have a dozen settings affecting the layout of the output.

It looks to me like it should be relatively feasible for you to extract the information out of the file. In terms of column names, I have just a couple of suggestions:

Maybe:

readerfile -> reader_file (or just file)

readerplate_barcode -> reader_plate_barcode (or just barcode)

well384 -> well_384 (or just well for consistency if you use a 96-well plate one day)

Otherwise it seems good to me. Let me know what else I can do to help!

Thanks a lot for the suggestions. I’ll keep them in mind.

I’ll see if I can get some time to add a first version to the github repo next week.

I really appreciate the interest you have taken in this, and I hope you will follow the development and possibly continue giving advice.

Also, if you come across an existing project in this problem space, I hope you will let me know :grinning:

Good luck! Ping me on here when you have a first version and I’ll take a look.

Will do,
Thanks a lot.

A group of us just started working on a similar package, but with the optimistic goal of including as many instrument/sensor types as we can. I have a function that I am currently using for reading in raw plate reader data, but I haven’t completely generalized it. It currently is set up for a single instrument, and takes the raw .txt export file and converts it to tidy data. I would be happy to contribute what I have so far to the effort. Or would be happy if you wanted to get involved with our effort on ingestr. So far we’ve been focusing on package infrastructure and a generalized ingest function template to make adding new instruments easier.

Hi @jpshanno

Thanks for following up on this. I’ll be very happy to collaborate.
I wanted to get my existing code cleaned up a bit before adding it to a repo, but it dragged out :smile:

There is a lot of good ideas in ingestr, that I like a lot; eg your solution to the header information.
It makes a lot of sense to use the column names from the input files. In my experience different vendors tend to use different names for the same thing (like “step”, “cycle”, “iteration” etc). Having the ingestion convert this to a standard set of column names makes it easier to work with later. But possibly it is better to do this in a separate step.

I’ll love to see the function you already have for a plate reader; I did not find it in the current version of ingestr.

You didn’t find it because it’s not there yet, I haven’t included it in ingestr because we’re trying to figure out a new standard for handling the header data, and have held off on adding new functions to focus on that. Just inserting it into the global environment isn’t R best practices, so we’re trying to come up with an alternative. I can get it uploaded with an example file and post the link here.

Regarding the column naming, we’ve taken the approach that our goal is going to be import the raw data into a tidy dataframe, and not make any decisions for the user, i.e. just give them the raw sensor data in R rather than in some weird manufacturer’s format. That means we decided not to try to come up with standard column names, because there’ll be just as much variability between researchers as there is between manufacturers.

And we’d love to have you collaborate on ingestr, especially if you have instruments you already wrote these functions for!

Here’s the link to the function I have right now, and the raw export from the instrument. The function has only been tested on data from a 96 well plate, but I tried to write it as generally as I could with the hopes that it wouldn’t need too much work to accommodate other data.

Cool!

I have a parser for Tecan i-control files that I’ll be happy to contribute.
Will it be ok if I make it as a pull request to ingestr?

Pull request created here: https://github.com/jpshanno/ingestr/pull/27