K8S-Docker-VirtualBox集群搭建实验课

Wednesday, July 20, 2022

TOC

VirtualBox启动虚拟机

K8S最低配置要求:4GB+内存/2核+CPU

网络准备

启动两个网卡,一个桥接网卡用于宿主机与虚拟机互访,一个**网络地址转换(NAT)**用于访问外网。ubuntu启动后,配置/etc/netplan/00-installer-config.yaml将两个网卡:

# This is the network config written by 'subiquity'
network:
  ethernets:
    enp0s3: # 通过ip a可以看到两张网卡,这里根据实际情况来
      dhcp4: no
      addresses: [192.168.56.101/24] # 手动分配一个IP给节点,需要跟网卡的网段配合
    enp0s8: # 通过ip a可以看到两张网卡,这里根据实际情况来
      dhcp4: no
      addresses: [10.0.2.101/24] # 手动分配一个IP给节点,需要跟网卡的网段配合
      gateway4: 10.0.2.1 # 指定此网卡的网关,这一步很重要,否则数据包无法路由转发出去
      nameservers:
        addresses: [8.8.8.8, 8.8.4.4] # 为了访问外部域名, 顺便配置DNS
  version: 2

修改完成后,执行sudo netplan apply生效

Docker安装

有许多方式可以安装Docker,细节参照官方文档,下面是通过脚本一键安装的流程:

# 自动安装
curl -fsSL get.docker.com -o get-docker.sh
# --version 指定版本
sudo sh get-docker.sh --mirror Aliyun --version 19.03

# 启动docker服务
sudo systemctl enable docker
sudo systemctl start docker

K8S 安装

组件安装

master和所有worker节点均需要安装kubelet/kubeadm。

详细的安装流程参照官方文档,下面抽取关键步骤:

  1. 更新 apt 包索引并安装使用 Kubernetes apt 仓库所需要的包:

    sudo apt-get update
    # apt-transport-https 可能是一个虚拟包(dummy package);如果是的话,你可以跳过安装这个包
    sudo apt-get install -y apt-transport-https ca-certificates curl gpg
    
  2. 添加 Kubernetes APT 存储库的 GPG 密钥:

    curl -fsSL https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /usr/share/keyrings/kubernetes-archive-keyring.gpg
    
    # 国内添加信任证书
    curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | sudo apt-key add -
    
  3. 添加 Kubernetes apt 仓库:

    echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list > /dev/null
    
    # 添加国内源地址
    add-apt-repository "deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main"
    
  4. 更新 apt 包索引,安装 kubelet、kubeadm 和 kubectl,并锁定其版本:

    sudo apt-get update
    #这里选择安装1.24之前的旧版本, 可以少装cri-docker
    #sudo apt-get install -y kubelet=1.22.0-00 kubeadm=1.22.0-00 kubectl=1.22.0-00
    sudo apt-get install -y kubelet=1.11.0-00 kubeadm=1.11.0-00 kubectl=1.11.0-00 kubernetes-cni=0.6.0-00
    # 去掉指定版本则需要配合安装cri-docker
    #sudo apt-get install -y kubelet kubeadm kubectl
    # 锁定版本
    #sudo apt-mark hold kubelet kubeadm kubectl
    sudo apt-mark hold kubelet kubeadm kubectl kubernetes-cni
    

搭建集群

初始化master节点

sudo kubeadm init \
--apiserver-advertise-address=172.17.0.7 \
--pod-network-cidr=10.244.0.0/16

# 返回以下信息,保存以供新节点加入集群使用
kubeadm join 192.168.56.101:6443 --token 1iwx1m.r80xwy5mtil99pcr --discovery-token-ca-cert-hash sha256:1baf8447f18945b204c43cba9d36d17e2c7bb57cf5bf0b4c03f33fa20f1f4b78

# ubuntu003
kubeadm join 192.168.57.101:6443 --token 51n2um.mf2leydcsfc8q7ey --discovery-token-ca-cert-hash sha256:be1d81ab25802d18171e696d72b62a876cbd40faca7beaf27ef03f5ccf305fd8

# 将配置文件写到HOME
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Troubleshoot

  • 初始化集群时,kubelet进程一直无法启动,journalctl -xefu kubelet查看进程日志 提示Failed to run kubelet" err=“failed to run Kubelet: misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "systemd"
    1. 通过sudo docker info | grep -i cgroup查看docker实际使用的cgroup driver,这里应该是cgroupfs
    2. 修改/var/lib/kubelet/kubeadm-flags.env文件,在KUBELET_KUBEADM_ARGS变量中添加 --cgroup-driver=cgroupfs标识
      1. 修改完成后通过systemctl status kubelet观察kubelet进程状态为active

搭建POD网络组件(flannel)

# k8s
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
# k8s v1.11
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/c5d10c8/Documentation/kube-flannel.yml

完成后,可以通过kubectl get pods --all-namespaces查看kube-flannel-ds-xxxcoredns-xxx pod启动成功。

$ kubectl get pods --all-namespaces
NAMESPACE      NAME                                READY   STATUS    RESTARTS       AGE
kube-flannel   kube-flannel-ds-kljnw               1/1     Running   1 (114m ago)   127m
kube-flannel   kube-flannel-ds-rrzcp               1/1     Running   1 (114m ago)   120m
kube-system    coredns-6d4b75cb6d-gx2pj            1/1     Running   1 (114m ago)   131m
kube-system    coredns-6d4b75cb6d-n84gd            1/1     Running   1 (114m ago)   131m

mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config export KUBECONFIG=/etc/kubernetes/admin.conf

新节点加入

# 即初始化master/cluster的时候返回的token/hash信息, 直接copy过来执行即可
$ kubeadm join 192.168.56.3:6443 --token y95vm1.jb705h1i0s0bcy1j --discovery-token-ca-cert-hash sha256:210b042c21aa09fcf65dc152581f78532d8d8e17dfd191cb3a440441dec28d80

成果&故障排查

执行kubectl cluster-info命令看到以下信息,即表示集群搭建成功:

Kubernetes control plane is running at https://192.168.56.3:6443
CoreDNS is running at https://192.168.56.3:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

通过kubectl get nodes可查看集群节点信息。

  • Runtime CRI 配置出错 kubeadm init默认使用/etc/containerd/config.toml配置文件,若遇到以下错误:
    $ sudo kubeadm init  --config=config.yaml 
    W1125 12:58:32.733485   26426 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
    [init] Using Kubernetes version: v1.19.4
    [preflight] Running pre-flight checks
    error execution phase preflight: [preflight] Some fatal errors occurred:
            [ERROR CRI]: container runtime is not running: output: time="2020-11-25T12:58:32Z" level=fatal msg="getting status of runtime failed: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService"
    , error: exit status 1
    [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
    To see the stack trace of this error execute with --v=5 or higher
    

    可通过移除默认配置文件修复(唯一一行disable配置导致问题):

    rm /etc/containerd/config.toml # 默认的配置包含disabled_plugins = ["cri"]可能有问题, 删除之, 使用默认
    systemctl restart containerd
    kubeadm init <args>