I quickly put this together for a fellow AI alignment researcher/engineer, so I thought I'd share it here.
The main recommendations are from Intel: https://software.intel.com/en-us/articles/maximize-tensorflow-performance-on-cpu-considerations-and-recommendations-for-inference
I implement these recommendations by setting the following in the code:
tf.config.threading.set_inter_op_parallelism_threads(2) tf.config.threading.set_intra_op_parallelism_threads(6) # Number of physical cores.
And setting the environment variables in the PyCharm run configuration and Python Console settings.
As a result, the per-epoch training time dropped by 20 % with a small multi-layer perceptron on Fashion MNIST.
Another important point is to use a TensorFlow binary that uses all available CPU capabilities. Ie. it should display something like this:
2019-11-27 17:26:42.782399: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA
And not something like this:
2019-11-28 09:18:26.315191: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
One way to achieve this is to install TensorFlow with Conda instead of Pip.
Here is more information about that: https://software.intel.com/en-us/articles/intel-optimization-for-tensorflow-installation-guide
If you're on a Mac and the Conda-TensorFlow crashes with an OMP error, here's the solution: https://stackoverflow.com/a/53692707/5091738