Machine learning for ecotoxicology and bee pesticide toxicity prediction
Jakub Adamczyk
Agrochemistry, in contrast to medicinal chemistry, is a relatively unexplored area in terms of rational drug design and molecular ML. Data science techniques and predictive ML models, exemplified by ADMET QSAR models, have long been used in pharmaceutical industry. Pesticides are the largest, and most economically important, group of agrochemicals. They need to pass multiple regulatory requirements in order to be used, showing safety not only to humans (toxicology), but also to a variety of wildlife organisms, such as honey bees, earthworms, birds, and fish (ecotoxicology). This is in many ways much more challenging, due to a wide variety of properties that need to be analyzed and predicted. At the same time, we actually require strong toxicity from pesticides, but highly selective, killing preferably only target organisms, e.g. weeds in case of herbicides.
Recently created ApisTox (https://doi.org/10.1038/s41597-024-04232-w) is the largest dataset in the literature concerning toxicity of pesticides to honey bees (Apis mellifera). It allows broad analyses of agrochemicals and building ML models for predicting toxicity of pesticides to honey bees. This required creating a complex data processing workflow, which utilized freely available data sources, like ECOTOX database. In this talk, we will go over tools and techniques used, so that attendees will understand challenges related to such tasks, and how to create other similar datasets for practical usage.
ApisTox paper was followed up by additional molecular datasets' analyzes and building ML models (currently under review). In this talk, we will also explore initial results of pesticide toxicity classification and how we can approach building molecular ML models for agrochemistry, e.g. molecular fingerprints, graph kernels, and graph neural networks. Results are highly distinct from those on molecular chemistry datasets, indicating a lot of unexplored potential.