Skip to main content

Adversarial attack


An adversarial attack is a model attack type that alters a model's expected predictions by modifying its input data in a particular way. During an adversarial attack, H2O Model Security systematically alters the input data when scoring new data in your deployed model (in H2O MLOps).

Adversarial attacks in production (and the need for H2O Model Security)

Because Machine Learning (ML) models are typically nonlinear and use high-degree interactions to increase accuracy, it's always possible that some combination of data can lead to unexpected model outputs. Therefore, in production, adversarial attacks are common, which are strange or subtle combinations of data that cause your model to give an attacker the prediction they want without the attacker having access to the internals of your model.

With the above in mind, H2O Model Security can help by enabling you to trick your own model by seeing its outcome on many different combinations of input data values. In particular, H2O Model Security enables you to observe how your model will react to an adversarial attack, leading to solutions before its implementation in production.