Analytical methods are generally delivered in the form of documents describing the method in terms of analytes which can be studied, supported matrices, reagents, methodological details, statistical performance, interlaboratory validation and other details. Regulatory bodies including the US Environmental Protection Agency (US-EPA), US Geological Survey (USGS), US Department of Agriculture (USDA) and others provide detailed analytical methods for the community to adopt in their analyses. Rich sources of historical analytical data are also available for the community to access and include the US-EPA Environmental Chemistry Methods (https://www.epa.gov/pesticide-analytical-methods/environmental-chemistry-methods-ecm) provided by manufacturers from the agrochemical industry. Instrument vendors also provide access to many hundreds of application notes which can be considered as summary analytical methods, albeit descriptive of particular instruments. This poster describes a cheminformatically-enabled database of methods integrating chemicals related data, extracted from the methods, with the identifiers (names and/or chemical abstracts registry numbers (CASRNs)) mapped to chemical structures. The resulting database of almost 3000 methods can be searched by chemical name, CASRNs, structure and similarity of chemical structure. The resulting database has been integrated into a web-based application and includes integration to public domain mass spectral data and filtering of the methods based on analyte, chemical class, method source and other related metadata.
Integrating an Analytical Methods and Mass Spectral Database with Cheminformatics Capabilities
1. Innovative Research for a Sustainable Future
www.epa.gov/research
Integrating an Analytical Methods and Mass Spectral Database with
Cheminformatics Capabilities
Gregory Janesch1, Erik Carr1, Vicente Samano2, Brian Meyer2 and Antony Williams3
1. ORAU Student Services Contractor to Center for Computational Toxicology & Exposure, Office of Research & Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, USA
2. Senior Environmental Employment Program, US Environmental Protection Agency, Research Triangle Park, USA
3. Center for Computational Toxicology & Exposure, Office of Research & Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, USA
`
ACS West
San Francisco, CA
August 13-17, 2023
There are three kinds of data contained within the database.
- Fact sheets are results-oriented documents with data associated
with one or more substances including basic descriptions of health
effects to monographs with NMR, Raman, and IR spectra.
- Methods document an end-to-end analytical procedure for one or
more substances, sometimes 100s of chemicals. The documents
are curated to extract the chemical compounds and then
annotated with information such as matrix and methodologies.
- Spectra, in the form of lists of m/z-intensity pairs and parameters.
In addition to the above information, records have assorted
metadata stored in the database. These data include information
such as experimental conditions, authors, a synopsis for the method
or fact sheet, and other data depending on what kind of record it is.
Data are open access and are derived from a variety of sources.
These include online spectral databases, vendor methods, research
groups, EPA databases and other government agencies.
At the time of writing the database contains approximately:
- 165,000 spectra (plus 600,000 externally linked spectra)
- >700 fact sheets
- >3300 methods
General Searching
Data
Spectrum Search
Description
A large variety of sources for spectra, documented analytical
procedures and methods, and other associated documentation exist
and are, in theory, easily available with the usual web search.
However, these sources are largely isolated from each other, not
easy to find via general searches because of inconsistencies in
chemical names and identifiers and then are highly varied in format.
To address these challenges, the Analytical Methods and Open
Spectra (AMOS) web application has been developed. AMOS is a
database and associated web-based application containing several
types of records searchable by common identifiers known to
chemists (i.e., CASRNs, InChI Keys and chemical names).
The authors thank the data curation team for their rigorous work in
annotating and identifying information in the records. Chemical data
extraction, curation and annotation is an essential part of this work.
Primary search functionality
searches all records for a
single chemical substance.
One half of the page (Fig.1)
shows the searched
compound (assuming a
match) and yields a table of
records containing that
substance, the data source,
associated methodology, and
a short description of the
record itself.
Selecting a row in that table
allows for viewing the
contents of that record more
closely, whether opening an
analytical method or
displaying a spectrum.
For spectral data, an
additional search option is
available. If a mass range,
methodology, and spectrum
(as x,y pairs) are supplied,
matching spectra with that
mass and methodology,
ranked by their similarity to
the user-supplied spectrum
will be returned. See Fig. 2.
The top table lists the
associated substance for
the found spectrum (with
associated DTXSID), the
similarity of that spectrum,
and a description of that
spectrum. Below that table
is an interactive plot of the
overlap of the two spectra.
Method Searches
AMOS contains two functions for searching for methods. One is a simple
table that lists all methods in the database (not pictured). This list can be
filtered by several fields including matrix, analyte, and method name,
allowing for quick discovery of methods that cover a known topic.
The other, shown below, is a search for methods containing similar
substances, thereby providing a starting point even for chemicals without
methods. A substance is searched for and if methods exist they are
returned. If there are no existing methods for that chemical then AMOS
returns all methods which contain at least one substance with a
sufficiently high Tanimoto structural similarity coefficient. This can be
especially useful in cases where a substance does not have any methods
associated with it at all – in the example below (see Fig. 3), the drug was
only available starting in 2015, so there has been relatively little time to
develop and publish methods for it.
Acknowledgements
Disclaimers
This tool is currently internal to the US- EPA and still under development.
Plans to release this to the public have not been finalized, but the process
is hoped to be complete by early 2024.The data used in this application
have not been thoroughly reviewed by the EPA and the user needs to
exercise judgement in their use of the results.
The views expressed in this poster are those of the authors and do not
necessarily reflect the views or policies of the U.S. EPA
Figure 1: The list of methods and
LC-MS or GC-MS spectra
associated with perfluorooctane-
sulfonic acid (PFOS).
Figure 2: A spectral similarity search
result includes the similarity match for
spectra and the list of associated
chemical compounds.
Figure 3: A search for a chemical with no matching methods then
provides the associated structure to a Tanimoto structural similarity
search to return methods with similar structures contained in them.