crYOLO is very slow and does not use the GPUs to the full potential
So far, this problem was only reported for CentOS/RockyLinux but other distributions might be affected as well.
The problem was related to problematic entries in the
LD_LIBRARY_PATH. Please check if there are many entries in your path:
Especially CUDA entries from other software packages are a problem. If the
LD_LIBRARY_PATH is not empty, try if the following prefix fixes your problem. In front of the
cryolo_predict.py command you put
LD_LIBRARY_PATH='' cryolo_train.py -c your_config.json -w 5
If that is working you can make the fix permanent. The following instructions will set your
LD_LIBRARY_PATH to an empty value when activating the cryolo environment. It will restore the old
LD_LIBRARY_PATH, once the environment gets deactivated. I assume that your cryolo environment is named
cryolo. Do the following:
conda activate cryolo conda env config vars set LD_LIBRARY_PATH= conda deactivate cryolo conda activate cryolo
If you now run
it should be empty. If you do
you should see all your libraries again.
Now, the fix should be permanent and you don’t need to add
LD_LIBRARY_PATH='' in front of the crYOLO commands.
crYOLO crashed with glibc errors
This problem should be solved with crYOLO 1.8.2
Within your crYOLO environment run:
pip install nvidia-tensorflow==1.5.5+nv22.1
The current CUDA 11 instructions need a quite recent glibc version (>=2.29). Not all systems provide such a recent version. However, you can manually compile it and force crYOLO to use it:
Download and compile a recent glibc (>= 2.29)
wget http://ftp.gnu.org/gnu/libc/glibc-2.34.tar.xz tar xvf glibc-2.34.tar.xz mkdir glibc-2.34/build cd glibc-2.34/build sudo mkdir /opt/glibc-2.34 ../configure --prefix=/opt/glibc-2.34 make -j 8 sudo make install
Add environment variable for the cryolo environment ( I assume the environment name is “cryolo”):
conda activate cryolo conda env config vars set LD_PRELOAD=/opt/glibc-2.34/lib/libm.so.6
Reload your environment
conda deactivate conda activate cryolo
Now you should be able to run cryolo with CUDA 11.
Thanks to Wolfgang Lugmayr for the instructions!
Since crYOLO 1.7.4 this problem is solved. Multithreading replaced multiprocessing.
On some machines crYOLO freezes during or at the end of training. The problem comes together with
multiprocessing and is deeply in one of the libraries we use. You can solve it by using
multithreading instead of multiprocessing. There for you can either use the option
or make it a permanent change by changing the environment variables in your crYOLO environment:
conda activate cryolo conda env config vars set CRYOLO_MP_START="fork" conda env config vars set CRYOLO_USE_MULTITHREADING="True"
You need to reactivate your environment to make the changes working by
conda activate cryolo
Now you use multithreading instead of multiprocessing.
crYOLO has memory problems
crYOLO can crash during training because of memory problems. In those cases you can try the following:
Reduce the batch_size. I recommend to reduce it by 1 stepwise. I would not choose a value below 3. You find the batch_size in your configuration file or in the Training options tab of the config Action
Reduce the input_size. Instead of 1024 you can choose any multiple of 32. Therefore 31*32=992 would next smaller input size. Don’t go too low (< 768) as you might become problem with very small particles. You find the input_size in your configuration file or in the Model options tab of the config Action.
I need more help
Find help at our mailing list!