Skip to main content

Membership inference attack


A membership inference attack is a model attack type that aims to discover the original model's training data (deployed in H2O MLOps). The membership inference attack is a two-stage model attack that begins with a model inversion attack. A membership inference attack creates two surrogate models refer as first-level and second-level surrogate models. Accordingly, you need to execute a model inversion attack first, which creates a surrogate model referred to as the first-level surrogate model.

After executing a model inversion attack, the surrogate model created in such an attack is used to create a second-level surrogate model. This second-level surrogate model is trained to discriminate between rows of data in and not in the first-level surrogate training data. As a result, the second-level surrogate model can indicate whether a training row was in the original model's training data.

Membership inference attacks in production (and the need for H2O Model Security)

Due to a lack of security or a distributed attack on your model API or other model endpoints, hackers can initiate a membership inference attack that can reveal the original model's training data. For example, simply knowing if a person was in or not in the training dataset can violate individual or group privacy. Furthermore, when a membership inference attack is executed to the fullest extent, the attack can allow a bad actor to rebuild your sensitive training data.

H2O Model Security can help by highlighting areas of the model that increase the probability of a successful membership inference attack. In particular, it can highlight the severity that a membership inference attack can generate to your model in production.