CROP-leaderboard

A standardized benchmark for certified robustness of RL algorithms

The goal of **CROP** iis to systematically certify the robustness of different RL algorithms based on certification criteria including per-state action stability and the lower bound of cumulative reward. Specifically, we propose three novel methods (LoAct, GRe, and LoRe) to achieve certification.

In **CROP-leaderboard**, we present the certification results in four RL environments under two certification criteria via three certification methods. Notably, we compare our certification with empirical results under attack to show the tightness of our certification.

The related paper can be found here.

1

1

Leaderboard: CartPole-v0 (LoAct)
1

Robustness certiļ¬cation for per-state action in terms of certiļ¬ed radius r at all time steps. Each column corresponds to one smoothing variance σ and each row corresponds to one RL algorithm. For each figure, the x-axis is time step *t*, and the y-axis is the certified radius *r _{t}*. The shaded area represents the standard deviation. The benign performance of locally smoothed policy under different smoothing variance σ can be found here.

1

1

Leaderboard: CartPole-v0 (GRe-mean)
1

Robustness certification as cumulative reward in terms of *expection bound* * J_{E}*. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1

1

Leaderboard: CartPole-v0 (GRe-median)
1

Robustness certification as cumulative reward in terms of *percentile bound* * J_{P} (p = 50%)*. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1

1

Leaderboard: CartPole-v0 (LoRe)
1

Robustness certification as cumulative reward in terms of *absolute lower bound bound* * J*. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1

1

1

1

PongNoFrameskip-v4
1

1

1

1

1

Leaderboard:
PongNoFrameskip-v4 (LoAct)
1

Robustness certiļ¬cation for per-state action in terms of certiļ¬ed radius r at time steps = 500. Each column corresponds to one smoothing variance σ and each row corresponds to one RL algorithm. For each figure, the x-axis is time step *t*, and the y-axis is the certified radius *r _{t}*. The shaded area represents the standard deviation. The shaded area represents the standard deviation. The benign performance of locally smoothed policy under different smoothing variance σ can be found here.

1

1

Leaderboard:
PongNoFrameskip-v4 (GRe-mean)
1

Robustness certification as cumulative reward in terms of *expection bound* * J_{E}* at time steps = 500. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1

1

Leaderboard:
PongNoFrameskip-v4 (GRe-median)
1

Robustness certification as cumulative reward in terms of *percentile bound* * J_{P} (p = 50%)* at time steps = 500. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1

1

Leaderboard:
PongNoFrameskip-v4 (LoRe)
1

Robustness certification as cumulative reward in terms of *absolute lower bound bound* * J* at time steps = 200. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1

1

1

1

FreewayNoFrameskip-v4
1

1

1

1

1

Leaderboard:
FreewayNoFrameskip-v4 (LoAct)
1

Robustness certiļ¬cation for per-state action in terms of certiļ¬ed radius r at time steps = 500. Each column corresponds to one smoothing variance σ and each row corresponds to one RL algorithm. For each figure, the x-axis is time step *t*, and the y-axis is the certified radius *r _{t}*. The shaded area represents the standard deviation. The benign performance of locally smoothed policy under different smoothing variance σ can be found here.

1

1

Leaderboard:
FreewayNoFrameskip-v4 (GRe-mean)
1

*expection bound* * J_{E}* at time steps = 500. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1

1

Leaderboard:
FreewayNoFrameskip-v4 (GRe-median)
1

*percentile bound* * J_{P} (p = 50%)* at time steps = 500. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1

1

Leaderboard:
FreewayNoFrameskip-v4 (LoRe)
1

*absolute lower bound bound* * J* at time steps = 200. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1

1

1

1

highway-fast-v0
1

1

1

1

1

Leaderboard:
highway-fast-v0 (LoAct)
1

Robustness certiļ¬cation for per-state action in terms of certiļ¬ed radius r at time steps = 30. Each column corresponds to one smoothing variance σ and each row corresponds to one RL algorithm. For each figure, the x-axis is time step *t*, and the y-axis is the certified radius *r _{t}*. The shaded area represents the standard deviation. The benign performance of locally smoothed policy under different smoothing variance σ can be found here.

1

1

Leaderboard:
highway-fast-v0 (GRe-mean)
1

Robustness certification as cumulative reward in terms of *expection bound* * J_{E}* at time steps = 30. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1

1

Leaderboard:
highway-fast-v0 (GRe-median)
1

Robustness certification as cumulative reward in terms of *percentile bound* * J_{P} (p = 50%)* at time steps = 30. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1

1

Leaderboard:
highway-fast-v0 (LoRe)
1

Robustness certification as cumulative reward in terms of *absolute lower bound bound* * J* at time steps = 30. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.