Troubleshooting: H26xDec_0_0: Could not open codec/Nvidia driver Failed to initialize NVML

Due to a known issue in Nvidia Container Toolkit version 1.17.8-1.18.0, we recommend using Nvidia Container Toolkit version 1.17.7 or previous.

The Nvidia Container Toolkit issue is known to Nvidia here: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/troubleshooting.html#containers-losing-access-to-gpus-with-error-failed-to-initialize-nvml-unknown-error

It will manifest within Live Transcoder as either or both of the following:

H26xDec_0_0: Could not open codec

Nvidia driver Failed to initialize NVML

The Nvidia Container Toolkit is unable to pass the GPU to the container.

Downgrade Nvidia Container Toolkit to 1.17.7 (Ubuntu/Debian)

Please note, for other distros, you will need to edit the script to make suitable for your package manager.

sudo apt-get purge -y nvidia-container-toolkit libnvidia-container-tools libnvidia-container1 nvidia-container-toolkit-base

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
    
sudo apt-get update

export NVIDIA_CONTAINER_TOOLKIT_VERSION=1.17.7-1
  sudo apt-get install -y \
      nvidia-container-toolkit=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
      nvidia-container-toolkit-base=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
      libnvidia-container-tools=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
      libnvidia-container1=${NVIDIA_CONTAINER_TOOLKIT_VERSION}

Unable to downgrade?

If your image is reliant on Nvidia Container Toolkit version 1.17.8-1.18.0 or you wish not to downgrade, the following work around is available:

  1. SSH to the host instance
  2. Edit the docker-compose.yml in a text editor
  3. Add the following under devices:
    • /dev/nvidia0:/dev/nvidia0
    • /dev/nvidiactl:/dev/nvidiactl
    • /dev/nvidia-uvm:/dev/nvidia-uvm
  4. Here is an example of the complete docker-compose.yml:
version: "3.9"
services:
  transcoder:
    image: "comprimato/live-transcoder:latest"
    container_name: transcoder-0
    hostname: transcoder-0
    network_mode: host
    tty: true
    stdin_open: true
    shm_size: "1gb"
    stop_signal: SIGRTMIN+3
    restart: "always"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    tmpfs:
      - /run:exec
    logging:
      driver: "journald"
    volumes:
      - /var/lib/systemd/coredump:/var/lib/systemd/coredump:rw
      - /sys/fs/cgroup:/sys/fs/cgroup:rw
      - transcoder0-data:/etc/transcoder/transcoder
      - /var/log:/var/log/host:ro
    environment:
      - NVIDIA_DRIVER_CAPABILITIES=utility,compute,video
      - NVIDIA_VISIBLE_DEVICES=all
      - TRC_LICENSE_KEY=yourlicensekey
      # uncomment to enable sending data to the Monitoring Dashboard
      # - TRC_monitoring_enabled=1
    cap_add:
      - CAP_SYS_RESOURCE
      - CAP_SYS_PTRACE
      - CAP_SYSLOG
      - CAP_SYS_RAWIO
      - CAP_SYS_NICE
    devices:
      - /dev/mem:/dev/mem
      - /dev/nvidia0:/dev/nvidia0
      - /dev/nvidiactl:/dev/nvidiactl
      - /dev/nvidia-uvm:/dev/nvidia-uvm
    ulimits:
      core: -1
      memlock: -1
      nofile: 65536
volumes:
  transcoder0-data:
    name: transcoder0-data
  1. Save the changes to the docker-compose.yml
  2. Recreate the docker container:
    docker-compose up -d --force-recreate