How to make TensorFlow run faster

by rmoehn1 min read28th Nov 2019No comments


Personal Blog

I quickly put this together for a fellow AI alignment researcher/engineer, so I thought I'd share it here.

The main recommendations are from Intel:

I implement these recommendations by setting the following in the code:

tf.config.threading.set_intra_op_parallelism_threads(6)  # Number of physical cores.

And setting the environment variables in the PyCharm run configuration and Python Console settings.

As a result, the per-epoch training time dropped by 20 % with a small multi-layer perceptron on Fashion MNIST.

Another important point is to use a TensorFlow binary that uses all available CPU capabilities. Ie. it should display something like this:

2019-11-27 17:26:42.782399: I tensorflow/core/platform/] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA

And not something like this:

2019-11-28 09:18:26.315191: I tensorflow/core/platform/] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

One way to achieve this is to install TensorFlow with Conda instead of Pip.

Here is more information about that:

If you're on a Mac and the Conda-TensorFlow crashes with an OMP error, here's the solution: