Skip to main content

Model inversion attack


A model inversion attack is a model attack type that trains a surrogate model to imitate the original model. The model inversion attack aims to create a surrogate model that copies the original model. The surrogate model learns by sending predictions to the original model. H2O Model Security, therefore, through a model inversion attack, highlights the likelihood of a surrogate model being able to resemble your original model.

Model inversion attacks in production (and the need for H2O Model Security)

Due to a lack of security or a distributed attack on your model API in production, hackers can simulate data, submit it, receive predictions, and train a surrogate model between their simulated data and your model predictions. This surrogate can:

  • Expose your proprietary business logic, which can be known as "model stealing"
  • Reveal sensitive information based on your training data
  • Be the first stage of a membership inference attack
  • Be a test-bed for adversarial example attacks

With the above in mind, H2O Model Security can help by highlighting areas of the model that increase the probability of a surrogate model being able to copy your original model and, therefore, protecting your model's proprietary logic.