Invariant Causal Imitation Learning for Generalizable Policies

Ioana Bica, Daniel Jarrett, Mihaela van der Schaar

Neural Information Processing Systems (NeurIPS) 2021


The code was implemented in Python 3.6 and the following packages are needed for running it:

  • gym==0.17.2

  • numpy==1.18.2

  • pandas==1.0.4

  • tensorflow==1.15.0

  • torch==1.6.0

  • tqdm==4.32.1

  • scipy==1.1.0

  • scikit-learn==0.22.2

  • stable-baselines==2.10.1

Running and evaluating the model:

The control tasks used for experiments are from OpenAI gym [1]. Each control task is associated with a true reward
function (unknown to the imitation algorithm). In each case, the “expert” demonstrator can be obtained by using a
pre-trained and hyperparameter-optimized agent from the RL Baselines Zoo [2] in Stable OpenAI Baselines [3].

In this implementation we provide the expert demonstrations for 2 environments for CartPole-v1 in ‘volume/CartPole-v1’. Note that the
code in ‘contrib/baselines_zoo’ was taken from [2].

To train and evaluate ICIL on CartPole-v1, run the following command with the chosen command line arguments. For reference,
the expert performance is 500.

Options :
   --env                  # Environment name. 
   --num_trajectories	  # Number of expert trajectories used for training the imitation learning algorithm. 
   --trial                # Trial number.


  • Average reward for 10 repetitions of running ICIL.

Example usage

python testing/  --env='CartPole-v1' --num_trajectories=20 --trial=0 


If you use this code, please cite:

  title={Invariant Causal Imitation Learning for Generalizable Policies},
  author={Bica, Ioana and Jarrett, Daniel and van der Schaar, Mihaela},
  booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},


