juju创建lxd容器时如何使用本地镜像(by quqi99)


作者:张华 发表于:2023-03-01


没有外网,所以配置了一个local custom镜像库,也使用了container-image-metadata-url进行配置,但是用juju创建lxd容器时还是说找不着image.




1, 使用 juju创建一个focal的machine 0, 然后再machine 0上部署一个xenial的lxd容器。

juju add-model test
juju add-machine --series focal
juju model-config logging-config="<root>=DEBUG"
juju remove-application ceph-radosgw && juju deploy ceph-radosgw --series=xenial --to="lxd:0"

2, 在juju controller(juju ssh -m controller 0)与machine 0上运行下列iptables来模拟和cloud-images.ubuntu.com断网。这里我发现:

  • machine 0的日志(/var/log/juju/machine-0.log)显示它好像是从juju controller处下载镜像的
2023-03-01 07:58:21 INFO juju.cloudconfig userdatacfg_unix.go:613 Fetching agent: curl -sSf --connect-timeout 20 --noproxy "*" --insecure -o $bin/tools.tar.gz <[]>
2023-03-01 07:59:03 INFO juju.container.lxd container.go:256 starting new container "juju-68d726-0-lxd-2" (image "ubuntu-16.04-server-cloudimg-amd64-lxd.tar.xz")
2023-03-01 07:59:03 DEBUG juju.container.lxd container.go:257 new container has profiles [default]
2023-03-01 07:59:42 DEBUG juju.container.lxd container.go:286 created container "juju-68d726-0-lxd-2", waiting for start...
  • 但如果不在machine 0上运行下载iptables,测试表明machine0也能直接扰开juju controller从cloud-images.ubuntu.com处下镜像.
  • 似乎二者均相关,那就二者将运行下列iptables吧
dig cloud-images.ubuntu.com  # and
juju ssh -m controller 0 -- sudo iptables -A OUTPUT -d -j DROP
juju ssh -m controller 0 -- sudo iptables -A OUTPUT -d -j DROP
cat << EOF |tee test.yaml
cloudinit-userdata: |
    - bash -c 'echo quqi.com >> /etc/hosts'
    - bash -c 'iptables -A OUTPUT -d -j DROP'
    - bash -c 'iptables -A OUTPUT -d -j DROP'
juju model-config ./test.yaml

3, bastion上运行sstream-mirror将cloud-images.ubuntu.com中的xenial amd64镜像mirror了下来。

sudo apt -y install simplestreams -y
sudo sstream-mirror --keyring=/usr/share/keyrings/ubuntu-cloudimage-keyring.gpg --progress --max=1 --path=streams/v1/index.json https://cloud-images.ubuntu.com/releases/ $workdir 'arch=amd64' 'release~(xenial)' 'ftype~(lxd.tar.xz|squashfs|root.tar.xz|root.tar.gz|disk1.img|.json|.sjson)'


openssl genrsa -out ca.key 4096
openssl req -x509 -new -nodes -sha512 -days 3650 -subj "/C=CN/ST=Beijing/L=Beijing/O=example/OU=Personal/CN=quqi.com" -key ca.key -out ca.crt
openssl genrsa -out quqi.com.key 4096
openssl req -sha512 -new -subj "/C=CN/ST=Beijing/L=Beijing/O=example/OU=Personal/CN=quqi.com" -key quqi.com.key -out quqi.com.csr
#complies with the Subject Alternative Name (SAN) and x509 v3 extension requirements to avoid 'x509: certificate relies on legacy Common Name field, use SANs instead'
cat > v3.ext <<-EOF
keyUsage = digitalSignature, nonRepudiation, keyEncipherment, dataEncipherment
extendedKeyUsage = serverAuth
subjectAltName = @alt_names

openssl x509 -req -sha512 -days 3650 -extfile v3.ext -CA ca.crt -CAkey ca.key -CAcreateserial -in quqi.com.csr -out quqi.com.crt
#for docker, the Docker daemon interprets .crt files as CA certificates and .cert files as client certificates.
openssl x509 -inform PEM -in quqi.com.crt -out quqi.com.cert
curl --resolve quqi.com:443: --cacert ~/ca/ca.crt https://quqi.com:443/streams/v1/index.json
sudo cp ~/ca/ca.crt /usr/local/share/ca-certificates/ca.crt
sudo chmod 644 /usr/local/share/ca-certificates/ca.crt
sudo update-ca-certificates --fresh
curl --resolve quqi.com:443: https://quqi.com:443/streams/v1/index.json

$ cat /etc/nginx/sites-available/default
server {
    listen 443 ssl http2;
    listen [::]:443 ssl http2;
    server_name quqi.com;
    ssl_certificate /home/ubuntu/ca/quqi.com.crt;
    ssl_certificate_key /home/ubuntu/ca/quqi.com.key;
    #ssl_protocols TLSv1.2;
    ssl_prefer_server_ciphers on; 
    location / {
       root /home/ubuntu/simplestreams2;
       index index.html;
# 注意:由于上面使用了一个新目录/home/ubuntu/simplestreams2作为root,那需要将/etc/nginx/nginx.conf中添加'user root;'来避免权限问题
#curl --resolve quqi.com:443: --cacert ~/ca/ca.crt https://quqi.com:443/images/streams/v1/index.json
curl --resolve quqi.com:443: --cacert ~/ca/ca.crt https://quqi.com:443/streams/v1/index.json

4, 配置juju中的container-image-metadata-url使用上面的https based local image mirror

juju model-config container-image-metadata-url=https://quqi.com:443
juju model-config image-metadata-url=https://quqi.com:443

5, juju controller由于访问local image mirror, 所以配置hosts与添加ca key

echo ' quqi.com' >> /etc/hosts

curl --resolve quqi.com:443: --cacert ~/ca/ca.crt https://quqi.com:443/streams/v1/index.json
sudo cp ~/ca/ca.crt /usr/local/share/ca-certificates/ca.crt
sudo chmod 644 /usr/local/share/ca-certificates/ca.crt
sudo update-ca-certificates --fresh
curl --resolve quqi.com:443: https://quqi.com:443/streams/v1/index.json

6, 记得重新测试之前将machine 0上的image cache删除

juju ssh 0 -- sudo lxc image delete juju/xenial/amd64
juju remove-application ceph-radosgw

7, 重新测试

juju deploy ceph-radosgw --series=xenial --to="lxd:0"
sudo tail -f /var/log/juju/machine-0.log

能在machine 0的/var/log/juju/machine-0.log中观察下列日志:

2023-03-01 08:26:45 WARNING juju.worker.lxdprovisioner provisioner_task.go:1371 machine 0/lxd/3 failed to start: acquiring LXD image: no matching image found
2023-03-01 08:26:45 WARNING juju.worker.lxdprovisioner provisioner_task.go:1410 failed to start machine 0/lxd/3 (acquiring LXD image: no matching image found), retrying in 10s (10 more attempts)

在juju controller上有时能搜到quqi, 有时候又不能,奇怪.

2023-02-23 07:33:52 WARNING juju.apiserver.provisioner provisioninginfo.go:801 encountered "https://quqi.com:443/images/streams/v1/streams/v1/index.json": Get "https://quqi.com:443/images/streams/v1/streams/v1/index.json": dial tcp i/o timeout while getting published images metadata from image-metadata-url
2023-03-01 08:52:56 WARNING juju.environs.simplestreams datasource.go:184 Got error requesting "https://quqi.com:443/streams/v1/index.json": Get "https://quqi.com:443/streams/v1/index.json": x509: certificate relies on legacy Common Name field, use SANs instead

juju controller上仍然能看到cloud-images.ubuntu.com

2023-03-01 08:34:54 WARNING juju.apiserver.provisioner provisioninginfo.go:801 encountered "http://cloud-images.ubuntu.com/releases/streams/v1/index.sjson": Get "http://cloud-images.ubuntu.com/releases/streams/v1/index.sjson": dial tcp i/o timeout while getting published images metadata from default ubuntu cloud images


上面是使用来提供simplestreams, 我们现在换用glance中的image来提供simplestreams继续测试 (不确定是否这种只适用于创建juju controller, 还是说也可以用于VM/LXD创建,试一下)

mkdir -p ~/simplestreams/images
juju metadata generate-image -d ~/simplestreams -i $IMAGE_ID -s $SERIES -r RegionOne -u $OS_AUTH_URL

然后修改/etc/nginx/sites-available/default将上面测试用的/home/ubuntu/simplestreams2改成/home/ubuntu/simplestreams, 重启nginx之后, 设置container-image-metadata-url (注意:此时后面链接多出了/images)

juju model-config container-image-metadata-url=https://quqi.com:443/images
juju model-config image-metadata-url=https://quqi.com:443/images

#注意下面的并不是由上两句形成的,而是由人工运行lxc命令(lxc remote add xxx)形成的,但即使有它也不 work
root@juju-4e4d8f-test-0:~# cat ~/snap/lxd/common/config/config.yml
default-remote: local
    addr: https://images.linuxcontainers.org
    protocol: simplestreams
    public: true
    addr: unix://
    public: false
    addr: https://quqi.com:443
    protocol: simplestreams
    public: true
aliases: {}


systemctl restart jujud-machine-0.service

然后重复测试后,问题依旧, controller上看到下列日志:

2023-03-01 10:29:32 WARNING juju.environs.simplestreams datasource.go:184 Got error requesting "https://streams.canonical.com/juju/tools/streams/v1/index.sjson": Get "https://streams.canonical.com/juju/tools/streams/v1/index.sjson": dial tcp i/o timeout
2023-03-01 10:29:36 INFO juju.state addmachine.go:505 new machine "0/lxd/11" has preferred addresses: private "", public ""
2023-03-01 10:29:37 WARNING juju.apiserver.instancemutater lxdprofilewatcher.go:206 unit ceph-radosgw/11 has no machine id, start watching when machine id assigned.
2023-03-01 10:29:41 WARNING juju.apiserver.provisioner provisioninginfo.go:801 encountered index file has no data for cloud {stsstack} not found while getting published images metadata from image-metadata-url
2023-03-01 10:30:11 WARNING juju.environs.simplestreams datasource.go:184 Got error requesting "http://cloud-images.ubuntu.com/releases/streams/v1/index2.sjson": Get "http://cloud-images.ubuntu.com/releases/streams/v1/index2.sjson": dial tcp i/o timeout
2023-03-01 10:30:41 WARNING juju.environs.simplestreams datasource.go:184 Got error requesting "http://cloud-images.ubuntu.com/releases/streams/v1/index.sjson": Get "http://cloud-images.ubuntu.com/releases/streams/v1/index.sjson": dial tcp i/o timeout
2023-03-01 10:30:41 WARNING juju.apiserver.provisioner provisioninginfo.go:801 encountered "http://cloud-images.ubuntu.com/releases/streams/v1/index.sjson": Get "http://cloud-images.ubuntu.com/releases/streams/v1/index.sjson": dial tcp i/o timeout while getting published images metadata from default ubuntu cloud images




juju model-config container-image-metadata-url=https://quqi.com:443/
juju model-config image-metadata-url=https://quqi.com:443/

然后测试cloudinit-userdata, 这个是没问题的,可以作workaround

cat << EOF |tee cloudinit-userdata.yaml
cloudinit-userdata: |
    - echo ' quqi.com' >> /etc/hosts
    - if hostname |grep -qv lxd; then wget --tries=15 --retry-connrefused --timeout=15 --random-wait=on -O /home/ubuntu/ubuntu-16.04-server-cloudimg-amd64-lxd.tar.xz https://quqi.com:443/server/releases/xenial/release-20211001/ubuntu-16.04-server-cloudimg-amd64-lxd.tar.xz --no-check-certificate; wget --tries=15 --retry-connrefused --timeout=15 --random-wait=on -O /home/ubuntu/ubuntu-16.04-server-cloudimg-amd64.squashfs https://quqi.com:443/server/releases/xenial/release-20211001/ubuntu-16.04-server-cloudimg-amd64.squashfs --no-check-certificate; fi
    - sleep 30
    - if hostname |grep -qv lxd; then lxc image import /home/ubuntu/ubuntu-16.04-server-cloudimg-amd64-lxd.tar.xz /home/ubuntu/ubuntu-16.04-server-cloudimg-amd64.squashfs --alias juju/xenial/amd64; fi
juju model-config ./cloudinit-userdata.yaml
juju model-config cloudinit-userdata --format yaml
#juju model-config --reset cloudinit-userdata

注意:之前一直不work的原因是因为在postruncmd:后加了 | 的原因,找到答案的过程见下列的"调试cloud-init"一节。



cat << EOF |tee test.yaml
cloudinit-userdata: |
  postruncmd: |
    - echo ' quqi.com' >> /etc/hosts
    - echo 'test' > /home/ubuntu/cloud-init.txt


cat << EOF |tee test.yaml
cloudinit-userdata: |
    - bash -c 'echo quqi.com >> /etc/hosts'
    - bash -c 'echo test > /home/ubuntu/cloud-init.txt'


cat << EOF |tee test.yaml
cloudinit-userdata: |
  postruncmd: |
    bash -c 'echo quqi.com >> /etc/hosts'
    bash -c 'echo test > /home/ubuntu/cloud-init.txt'

下面的更不会work, 会直接报:ERROR json: unsupported type: map[interface {}]interface {}’

cat << EOF |tee test.yaml
cloudinit-userdata: |
    bash -c 'echo quqi.com >> /etc/hosts'
    bash -c 'echo test > /home/ubuntu/cloud-init.txt'


juju add-model test
juju model-config ./test.yaml
juju model-config cloudinit-userdata --format yaml
juju model-config ssl-hostname-verification=false
juju add-machine --series focal

1, check cloud-init log:    cloud-init collect-logs & tar -xf cloud-init.tar.gz
2, check cloud-init config: /etc/cloud/cloud.cfg
3, cloud-init is enabled: systemctl list-unit-files | grep cloud
4, /var/lib/cloud/instances/af2d721e-e38e-4937-81ad-7cc72a49c184/cloud-config.txt

lp bug 1797168


juju add-model test2
juju model-config container-image-metadata-url=https://quqi.com:443/
juju model-config image-metadata-url=https://quqi.com:443/
juju model-config logging-config="<root>=DEBUG"
juju model-config ssl-hostname-verification=false
juju add-machine --series xenial

#一定要拷ca.crt到machine 0上(而不是controller 0)
juju scp -m m ~/ca/ca.crt 0:~/
juju ssh -m m 0 -- sudo cp /home/ubuntu/ca.crt /usr/local/share/ca-certificates/ca.crt
juju ssh -m m 0 -- sudo update-ca-certificates --fresh

juju add-machine --series xenial lxd:0
#juju remove-application ceph-radosgw && juju deploy ceph-radosgw --series=xenial --to="lxd:0"

NOTE: 一直不work的原因是将ca.crt拷贝到了controller 0,而是应该将它拷到machine 0

lxc remote端的测试

lxc端用cloud-images.ubuntu.com作default ,这个default不能replace,

# lxc remote list |grep releases
| ubuntu          | https://cloud-images.ubuntu.com/releases | simplestreams | none        | YES    | YES    | NO     |

root@juju-4e4d8f-test-7:~# lxc remote set-url ubuntu https://quqi.com:443
Error: Remote ubuntu is static and cannot be modified


lxc remote add test https://quqi.com:443 --protocol=simplestreams
lxc remote remove test & lxc remote add test https://quqi.com:443 --protocol=simplestreams --public
sudo snap set lxd daemon.debug=true
sudo systemctl reload snap.lxd.daemon

也要machine里设置了LXD_INSECURE_TLS=true(remote error: tls: protocol version not supported), 总之确保了使用test mirror (lxc launch test:16.04 i1)能正常运行。

vim /etc/systemd/system/snap.lxd.daemon.service
#或者去掉nginx中的ssl_protocols TLSv1.2也行


2023-03-02 03:26:30 DEBUG juju.container.lxd manager.go:283 checking default image metadata sources
2023-03-02 03:27:51 WARNING juju.worker.lxdprovisioner provisioner_task.go:1371 machine 7/lxd/3 failed to start: acquiring LXD image: no matching image found
2023-03-02 03:27:51 WARNING juju.worker.lxdprovisioner provisioner_task.go:1410 failed to start machine 7/lxd/3 (acquiring LXD image: no matching image found), retrying in 10s (10 more attempts)

## 上报bug
最后报了一个lp bug - https://bugs.launchpad.net/juju/+bug/2008993


