1、卸载旧版docker
sudo apt-get remove docker
sudo apt-get remove --auto-remove docker
sudo apt remove docker-ce
如果上面方法都不行直接
#使用dpkg查询已安装包,针对性删除
# 查询相关软件包
dpkg -l | grep docker
# 删除这个包(出来几项就逐个删除几项)
sudo apt remove --purge docker.io
2、docker安装
sudo apt-get update
sudo apt-get install -y docker.io # 或者 snap install docker # version 19.03.11, or apt install docker.io
systemctl start docker
systemctl enable docker
docker version
tips:如需密码认证,需要加sudo,如不行就加sudo -i,就可以无需输入密码认证
执行没问题了,再改一下 /etc/docker/daemon.json 内容如下:
{
"registry-mirrors": ["https://docker.mirrors.ustc.edu.cn/"],
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
然后执行命令:
systemctl daemon-reload
systemctl restart docker
3、安装nvidia-docker
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
curl -s -L https://nvidia.github.io/nvidia-container-runtime/experimental/$distribution/nvidia-container-runtime.list | sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker
update源的时候有个报错:
E: Conflicting values set for option Signed-By regarding source https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64/ /: /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg !=
E: The list of sources could not be read.
解决方法:
cd /etc/apt/sources.list.d
sudo rm nvidia-*
最后测试一下:
先看CUDA版本一会要用版本信息:
cat /usr/local/cuda/version.txt
然后到这里找一下对应的你的cuda版本信息:https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/supported-tags.md
sudo docker run --rm --gpus all nvidia/cuda:11.0.3-cudnn8-devel-ubuntu18.04 nvidia-smi
可以查看一下下载的镜像:
docker images -a
这里完成nvidia-docker的正式安装,
4、安装NVIDIA Container Toolkit
如已安装nvidia-docker可以不安装Container
之前的时候记得安装完docker之后还需要安装单独的nvidia docker 2,现在的话只需要安装nvidia container toolkit即可
# 1、添加源
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
sudo curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
sudo curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
# 2、安装并重启
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
# 3、测试
; docker run --name test1 -it --gpus all或者写"device=0"或者'"device=0,1,2,3"' nvidia/cuda:10.2-base
docker run --name test1 -it --gpus all nvidia/cuda:10.2-base # 成功
执行完后自动进入docker环境
执行nvidia-smi
如报错:
docker: Error response from daemon: could not select device driver ““ with capabilities: [[gpu]]
执行:
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
问题解决
查看所有的镜像执行:
docker images
退出docker:
exit 或者 Ctrl+D
5、权限控制
创建名为docker的组,如果之前已经有该组就会报错,可以忽略这个错误:
sudo groupadd docker
将当前用户加入组docker:
sudo gpasswd -a ${USER} docker
重启docker服务(生产环境请慎用):
sudo systemctl restart docker
添加访问和执行权限:
sudo chmod a+rw /var/run/docker.sock
重新启动:
sudo reboot
refer:
https://zhuanlan.zhihu.com/p/305952676
https://blog.csdn.net/weixin_47062350/article/details/120896578
ubuntu18.04安装nvidia-docker_RayChiu_Labloy的博客-CSDN博客