I quickly put this together for a fellow AI alignment researcher/engineer, so I thought I'd share it here.

The main recommendations are from Intel: https://software.intel.com/en-us/articles/maximize-tensorflow-performance-on-cpu-considerations-and-recommendations-for-inference

I implement these recommendations by setting the following in the code:

tf.config.threading.set_inter_op_parallelism_threads(2)
tf.config.threading.set_intra_op_parallelism_threads(6)  # Number of physical cores.

And setting the environment variables in the PyCharm run configuration and Python Console settings.

As a result, the per-epoch training time dropped by 20 % with a small multi-layer perceptron on Fashion MNIST.

Another important point is to use a TensorFlow binary that uses all available CPU capabilities. Ie. it should display something like this:

2019-11-27 17:26:42.782399: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA

And not something like this:

2019-11-28 09:18:26.315191: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

One way to achieve this is to install TensorFlow with Conda instead of Pip.

Here is more information about that: https://software.intel.com/en-us/articles/intel-optimization-for-tensorflow-installation-guide

If you're on a Mac and the Conda-TensorFlow crashes with an OMP error, here's the solution: https://stackoverflow.com/a/53692707/5091738

New Comment