Confidence score calculation for the carcinogenic potency categorization approach (CPCA) predictions for N-nitrosamines

Dr. @chakravarti_suman et al. published a new paper related to the reliability of CPCA ( According to the paper, 265 NDSRIs with established regulatory AI limits in the list, approximately 68% received strong confidence scores for accurate CPCA potency class predictions. However, 8% received poor confidence in potency class predictions, as well as lacked sufficient neighbor support due to uncommon structural features. It’s a great job!!


Yes. it is a very good & informative research paper by Dr. @chakravarti_suman and team.

1 Like

Thank you for sharing Sir!

Dr. Suman Chakravarti’s recent paper on CPCA for N-nitrosamines is noteworthy. Among 265 NDSRIs, 68% showed strong confidence in accurate potency predictions, while 8% faced challenges due to uncommon structural features. Great job!

Doesn’t this need more nuancing?
Please explain if I’m seeing things wrongly, it is not an easy paper:

It is rather about appropriateness of categorisation (which structures belong together based on QSAR only) than appropriateness of potency assignment?
The evaluated confidence in the categorization by computational means doesn’t focus on deriving an AI for a compound and the method to evaluate confidence is rather relative than absolute: if CPCA is for a big part based on QSAR and local similarities, then rebuilding a categorisation model based on such grounds giving a similar grouping, only gives relative confidence in the methodology and the repeatability of design but not absolute confidence in the design of CPCA (the risk that for a specific NDSRI the real AI is much lower (or much higher) than the categorisation AI, which is only visualised in the 30 small nitrosamine examples)?

Isn’t this just about showing that based on local similarities around the nitroso group the bucketing under CPCA has been done in the appropriate way based on available data (the structures that belong together based on CPCA/QSAR mostly are identified as belonging together based on the local similarity evaluation computationally) and visualising room for categorization improvement based on structural grounds?

It doesn’t say anything about the right AI being addressed to the bucket (the AI of the category being suitable for the real AI of most of the nitrosamines in the bucket), verification of confidence in the CPCA-based AI with TD50 data is only discussed for a smaller subset of 30 small nitrosamines (whereas CPCA as model is claimed to be verified with 84 nitrosamines by regulators).

It does remain interesting though that a low confidence calculated can be indicative for a high risk that the CPCA-based AI doesn’t match the real (possibly unknown AI) (based on verification on 30 nitrosamines but up for further verification in the future):
When applied to a dataset of 30 small nitrosamines with known carcinogenicity potencies, our approach revealed that cases with lower confidence often corresponded to larger discrepancies between CPCA predictions and experimentally observed potencies.
So the calculation can be helpful to prioritize areas of CPCA optimisation or in vivo or in vitro testing to increase the knowledge span.

However, the reverse is not necessary true and also high confidence can be linked to a QSAR knowledge gap that isn’t recognised with the method.
A high confidence in CPCA calculated for structure X originating from a false sense of security based on making abstraction of important factors contributing to the potency is still possible and does not prohibit that the real AI is significantly different than the CPCA one.
Elements like molecular weight considerations are also not implemented in CPCA and unfortunately also not in this QSAR approach as far as I understand (missed chance?) and for NDSRIs (being larger structures often), the fitting in the pockets of enzymes can also become more critical, whereas other groups in the structure can contribute to (unknown) detoxification mechanisms. These limitations of QSAR approaches (originating from limited available TD50 data on NDSRIs) are both present in CPCA and the CPCA computational confidence verification, but theoretically highly influence the confidence in CPCA, especially if confidence would not only mean that the real AI is not lower, but also that the CPCA AI is reasonably close to the real AI.
Or alternatively said: I’m still expecting a lot of real AIs different to CPCA AIs, as can be seen also from non-CPCA-based AIs published, although the present study reconfirms that CPCA as is is close to the best possible job with available data.

Note that European regulators have also used the remaining theoretical risk that the real AI is lower than the CPCA AI to motivate the AI capping mechanism in Q&A 22.

1 Like

Thanks for the comment.

The primary goal of our paper is to highlight the significance of incorporating confidence scores in predictions, including those made by CPCA. We understand that opinions may vary regarding the particular methodology we’ve adopted for this purpose. However, I think that acknowledging the uncertainties associated with specific nitrosamine queries, similar to what is customary in other (Q)SAR predictions, is needed.