Setting Up Cilium Networking on EKS Without Default Add-Ons

alpharm

INTRODUCTION

Kubernetes offers powerful orchestration capabilities that ensures scalability, reliability and ease of management. sometimes Kubernetes cluster can be complex and resource intensive especially in managing the resources and underlying infrastructure.

However, EKS enhances the overall experience in your Kubernetes cluster typically, EKS supports add-ons which are the default plugins that is installed on your cluster to extend the functionality. these add-ons can handle various tasks, from networking and security to monitoring and logging, enabling you to tailor your Kubernetes environment to meet specific needs. However, there are cases where you might choose to forgo these EKS add-ons and implement your own custom solutions to achieve more control and functionality.

Here, we will discuss the reasons for using cilium for networking while opting out of EKS add-ons and demonstrate how to flexibly create an Amazon EKS cluster without the add-ons and implement cilium as the networking solution. Whether you are a seasoned Kubernetes expert or just starting your journey, understanding how to leverage EKS with custom networking solutions like Cilium will empower you to build more robust, efficient, and secure applications.

What are AWS EKS Default Networking Add-Ons?

These are default networking plugins that enhance and manage network functionality within Kubernetes clusters. These add-ons are crucial for handling various networking tasks such as pod communication, service discovery, and network security:

  • AWS VPC CNI Plugin: The VPC-CNI add-on for kubernetes that creates the ENI (Elastic Network Interfaces) and attaches them to your Amazon EC2 nodes. This add-on assigns a private IPv4 or IPv6 (I.P) address from your VPC to each pod and service on each node.

  • CoreDNS: A flexible, extensible DNS server that specifically built for kubernetes and the default DNS server in kubernetes which provides resolution for all pods in the cluster.

  • Kube-proxy: This add-on maintains network rules on your Amazon EC2 nodes and enables network communication to your pods.

  • Amazon EBS CSI- The amazon elastic block store Container storage interface driver allows amazon elastic kubernetes service cluster to manage the lifecycle of Amazon EBS volumes

Other network add-ons include the AWS Load Balancer Controller and AWS Gateway API Controller.

Since we know that anytime an EKS cluster is created, these add-ons are automatically installed, However, there are cases where you might choose to forgo these EKS add-ons and implement your own custom solutions to achieve more control and functionality.

One such custom solution is Cilium, a networking plugin that can replace traditional components like VPC CNI and kube-proxy. By using Cilium, you can bypass some of the limitations and dependencies associated with EKS add-ons, providing a more flexible and powerful networking layer for your Kubernetes cluster.

The Case for Custom Networking with Cilium

Cilium leverages eBPF (extended Berkeley Packet Filter) technology to offer a range of features, including:

  • Enhanced Security: Cilium allows you to define fine-grained security policies that control the communication between pods. This ensures that only authorized traffic is allowed, enhancing the security posture of your cluster.

  • Improved Load Balancing: With Cilium, you can achieve more efficient load balancing for your services, optimizing resource utilization and improving application performance.

  • Deep Network Visibility: Cilium provides detailed insights into network traffic, helping you monitor and troubleshoot network issues more effectively.

  • Integration with Service Meshes: Cilium integrates seamlessly with service meshes like Istio, providing advanced traffic management and security capabilities.

Pre-Requisites

  • You should have an AWS subscription.

  • Create a ClusterIAM role if you’re going to create your cluster with eksctl

  • Install kubectl

  • Install Helm

  • The following EC2 privileges are required by the Cilium operator in order to perform ENI creation and IP allocation.

  • Install eksctl (Version 0.186.0 or higher)

  • Install awscli

  • Cilium CLI: Cilium provides Cilium CLI tool that automatically collects all the logs and debug information needed to troubleshoot your Cilium installation. You can install Cilium CLI for Linux, macOS, or other distributions on their local machine(s) or server(s).

Disabling Default Add-Ons and Installing Cilium

To use Cilium as your networking plugin, you can disable the default EKS add-ons and create the EKS cluster without amanagedgroupnode then add cilium as your network before creating your workernodes. This can be done by setting the addonsConfig.disableDefaultAddons parameter in your EKS cluster configuration. thanks to Amit Gupta for the assist. Here’s how you can do it:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: cluster-test
  region: us-east-1
  version: "1.29"

iam:
  withOIDC: true

addonsConfig:
  disableDefaultAddons: true
addons:
  - name: coredns

here i am disabling the addons

  • kube-proxy

  • AWS-VPC-CNI

then create the cluster

eksctl create cluster -f eks-cluster/ekscluster.yaml

once the cluster is created with the default add-ons disabled and core-dns running, then we check all running resource in the kube-system.

$ kubectl get all -n kube-system
NAME                           READY   STATUS    RESTARTS   AGE  
pod/coredns-54d6f577c6-hw2zk   0/1     Pending   0          3m31s
pod/coredns-54d6f577c6-m5bnx   0/1     Pending   0          3m31s

NAME               TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)                  AGE
service/kube-dns   ClusterIP   10.100.0.10   <none>        53/UDP,53/TCP,9153/TCP   3m32s

NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/coredns   0/2     2            0           3m33s

NAME                                 DESIRED   CURRENT   READY   AGE
replicaset.apps/coredns-54d6f577c6   2         2         0       3m33s

the cluster is running but the core-dns is running on pending state. then we install cilium network using helm but we have to obtain the eks cluster kubernetes_service_host and the kubernetes_service_port necessary for the cilium deployment.

$ kubectl cluster-info
Kubernetes control plane is running at https://17535DBDBFCD6A8D01567C54391DB73B.gr7.us-east-1.eks.amazonaws.com
CoreDNS is running at https://17535DBDBFCD6A8D01567C54391DB73B.gr7.us-east-1.eks.amazonaws.com/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

now to deploy cilium using helm

helm install cilium cilium/cilium --version 1.15.6 \
  --namespace kube-system \
  --set eni.enabled=true \
  --set ipam.mode=eni \
  --set egressMasqueradeInterfaces=eth0 \
  --set routingMode=native \
  --set k8sServiceHost=17535DBDBFCD6A8D01567C54391DB73B.gr7.us-east-1.eks.amazonaws.com \
  --set k8sServicePort=443 \
  --set kubeProxyReplacement=true

check the status of your pods.

$ kubectl get pods -A -o wide
NAMESPACE     NAME                               READY   STATUS    RESTARTS   AGE     IP       NODE     NOMINATED NODE   READINESS GATES
kube-system   cilium-operator-589bfbd7b6-d6h9b   0/1     Pending   0          28s     <none>   <none>   <none>           <none>      
kube-system   cilium-operator-589bfbd7b6-fq8w9   0/1     Pending   0          28s     <none>   <none>   <none>           <none>      
kube-system   coredns-54d6f577c6-hw2zk           0/1     Pending   0          7m26s   <none>   <none>   <none>           <none>      
kube-system   coredns-54d6f577c6-m5bnx           0/1     Pending   0          7m26s   <none>   <none>   <none>           <none>

now deploy the nodes

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: cluster-test
  region: us-east-1

managedNodeGroups:
  - name: cluster-node
    instanceType: t2.medium
    desiredCapacity: 2
    privateNetworking: true

now you can create the node group

eksctl create nodegroup -f eks-cluster/eksnode.yaml

now you can check if your nodes and pods are running properly

$ kubectl get nodes -o wide
NAME                             STATUS   ROLES    AGE     VERSION               INTERNAL-IP      EXTERNAL-IP   OS-IMAGE         KERNEL-VERSION                  CONTAINER-RUNTIME
ip-192-168-89-72.ec2.internal    Ready    <none>   2m45s   v1.29.6-eks-1552ad0   192.168.89.72    <none>        Amazon Linux 2   5.10.220-209.869.amzn2.x86_64   containerd://1.7.11
ip-192-168-96-208.ec2.internal   Ready    <none>   2m51s   v1.29.6-eks-1552ad0   192.168.96.208   <none>        Amazon Linux 2   5.10.220-209.869.amzn2.x86_64   containerd://1.7.11


$ kubectl get pods -A -o wide
NAMESPACE     NAME                               READY   STATUS    RESTARTS        AGE     IP                NODE
         NOMINATED NODE   READINESS GATES
kube-system   cilium-d9gf4                       1/1     Running   0               4m8s    192.168.96.208    ip-192-168-96-208.ec2.internal   <none>           <none>
kube-system   cilium-operator-589bfbd7b6-d6h9b   1/1     Running   0               9m24s   192.168.96.208    ip-192-168-96-208.ec2.internal   <none>           <none>
kube-system   cilium-operator-589bfbd7b6-fq8w9   1/1     Running   0               9m24s   192.168.89.72     ip-192-168-89-72.ec2.internal    <none>           <none>
kube-system   cilium-wv5x8                       1/1     Running   2 (3m23s ago)   4m2s    192.168.89.72     ip-192-168-89-72.ec2.internal    <none>           <none>
kube-system   coredns-54d6f577c6-hw2zk           1/1     Running   0               16m     192.168.113.145   ip-192-168-96-208.ec2.internal   <none>           <none>
kube-system   coredns-54d6f577c6-m5bnx           1/1     Running   0               16m     192.168.117.197   ip-192-168-96-208.ec2.internal   <none>           <none>

you can view the details of one of the cilium pods listed in the output.

kubectl describe pod cilium-d9gf4 -n kube-system
Name:                 cilium-d9gf4
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Service Account:      cilium
Node:                 ip-192-168-96-208.ec2.internal/192.168.96.208
Start Time:           Sun, 04 Aug 2024 14:46:49 +0100
Labels:               app.kubernetes.io/name=cilium-agent
                      app.kubernetes.io/part-of=cilium
                      controller-revision-hash=64449bc679
                      k8s-app=cilium
                      pod-template-generation=1
Annotations:          container.apparmor.security.beta.kubernetes.io/apply-sysctl-overwrites: unconfined
                      container.apparmor.security.beta.kubernetes.io/cilium-agent: unconfined
                      container.apparmor.security.beta.kubernetes.io/clean-cilium-state: unconfined
                      container.apparmor.security.beta.kubernetes.io/mount-cgroup: unconfined
Status:               Running
IP:                   192.168.96.208
IPs:
  IP:           192.168.96.208
Controlled By:  DaemonSet/cilium
Init Containers:
  config:
    Container ID:  containerd://66084ba93a2dac75ba9555e1d929e261307b40f86f5cd0677e720744a7767bba
    Image:         quay.io/cilium/cilium:v1.15.6@sha256:6aa840986a3a9722cd967ef63248d675a87add7e1704740902d5d3162f0c0def
    Image ID:      quay.io/cilium/cilium@sha256:6aa840986a3a9722cd967ef63248d675a87add7e1704740902d5d3162f0c0def
    Port:          <none>
    Host Port:     <none>
    Command:
      cilium-dbg
      build-config
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 04 Aug 2024 14:47:02 +0100
      Finished:     Sun, 04 Aug 2024 14:47:02 +0100
    Ready:          True
    Restart Count:  0
    Environment:
      K8S_NODE_NAME:             (v1:spec.nodeName)
      CILIUM_K8S_NAMESPACE:     kube-system (v1:metadata.namespace)
      KUBERNETES_SERVICE_HOST:  17535DBDBFCD6A8D01567C54391DB73B.gr7.us-east-1.eks.amazonaws.com
      KUBERNETES_SERVICE_PORT:  443
    Mounts:
      /tmp from tmp (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jwd7j (ro)
  mount-cgroup:
    Container ID:  containerd://2d96327c2209c801ef62aa5ee656f36710a87e8a01160fc8058e45795964270a
    Image:         quay.io/cilium/cilium:v1.15.6@sha256:6aa840986a3a9722cd967ef63248d675a87add7e1704740902d5d3162f0c0def
    Image ID:      quay.io/cilium/cilium@sha256:6aa840986a3a9722cd967ef63248d675a87add7e1704740902d5d3162f0c0def
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -ec
      cp /usr/bin/cilium-mount /hostbin/cilium-mount;
      nsenter --cgroup=/hostproc/1/ns/cgroup --mount=/hostproc/1/ns/mnt "${BIN_PATH}/cilium-mount" $CGROUP_ROOT;
      rm /hostbin/cilium-mount

    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 04 Aug 2024 14:47:07 +0100
      Finished:     Sun, 04 Aug 2024 14:47:07 +0100
    Ready:          True
    Restart Count:  0
    Environment:
      CGROUP_ROOT:  /run/cilium/cgroupv2
      BIN_PATH:     /opt/cni/bin
    Mounts:
      /hostbin from cni-path (rw)
      /hostproc from hostproc (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jwd7j (ro)
  apply-sysctl-overwrites:
    Container ID:  containerd://9747be543a4ed9abb3a0fda02b9e46cf2aa56981147e4c8639dff0ec62853f0d
    Image:         quay.io/cilium/cilium:v1.15.6@sha256:6aa840986a3a9722cd967ef63248d675a87add7e1704740902d5d3162f0c0def
    Image ID:      quay.io/cilium/cilium@sha256:6aa840986a3a9722cd967ef63248d675a87add7e1704740902d5d3162f0c0def
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -ec
      cp /usr/bin/cilium-sysctlfix /hostbin/cilium-sysctlfix;
      nsenter --mount=/hostproc/1/ns/mnt "${BIN_PATH}/cilium-sysctlfix";
      rm /hostbin/cilium-sysctlfix

    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 04 Aug 2024 14:47:08 +0100
      Finished:     Sun, 04 Aug 2024 14:47:08 +0100
    Ready:          True
    Restart Count:  0
    Environment:
      BIN_PATH:  /opt/cni/bin
    Mounts:
      /hostbin from cni-path (rw)
      /hostproc from hostproc (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jwd7j (ro)
  mount-bpf-fs:
    Container ID:  containerd://a294f96a7bff20c123ca8e7639b9f58cbec8ffb27f312c30b7e45914dc70f14e
    Image:         quay.io/cilium/cilium:v1.15.6@sha256:6aa840986a3a9722cd967ef63248d675a87add7e1704740902d5d3162f0c0def
    Image ID:      quay.io/cilium/cilium@sha256:6aa840986a3a9722cd967ef63248d675a87add7e1704740902d5d3162f0c0def
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/bash
      -c
      --
    Args:
      mount | grep "/sys/fs/bpf type bpf" || mount -t bpf bpf /sys/fs/bpf
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 04 Aug 2024 14:47:09 +0100
      Finished:     Sun, 04 Aug 2024 14:47:09 +0100
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /sys/fs/bpf from bpf-maps (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jwd7j (ro)
  clean-cilium-state:
    Container ID:  containerd://cc5d00316b12e945b59935ac02b79ed5ef81d2b5c5d47a3c1a2e60964d1c50da
    Image:         quay.io/cilium/cilium:v1.15.6@sha256:6aa840986a3a9722cd967ef63248d675a87add7e1704740902d5d3162f0c0def
    Image ID:      quay.io/cilium/cilium@sha256:6aa840986a3a9722cd967ef63248d675a87add7e1704740902d5d3162f0c0def
    Port:          <none>
    Host Port:     <none>
    Command:
      /init-container.sh
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 04 Aug 2024 14:47:10 +0100
      Finished:     Sun, 04 Aug 2024 14:47:10 +0100
    Ready:          True
    Restart Count:  0
    Environment:
      CILIUM_ALL_STATE:           <set to the key 'clean-cilium-state' of config map 'cilium-config'>         Optional: true
      CILIUM_BPF_STATE:           <set to the key 'clean-cilium-bpf-state' of config map 'cilium-config'>     Optional: true
      WRITE_CNI_CONF_WHEN_READY:  <set to the key 'write-cni-conf-when-ready' of config map 'cilium-config'>  Optional: true
      KUBERNETES_SERVICE_HOST:    17535DBDBFCD6A8D01567C54391DB73B.gr7.us-east-1.eks.amazonaws.com
      KUBERNETES_SERVICE_PORT:    443
    Mounts:
      /run/cilium/cgroupv2 from cilium-cgroup (rw)
      /sys/fs/bpf from bpf-maps (rw)
      /var/run/cilium from cilium-run (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jwd7j (ro)
  install-cni-binaries:
    Container ID:  containerd://29b6d2329f0fd8a8e3da8cffe322544814d87944c70eec85f4e21bf1e90623b7
    Image:         quay.io/cilium/cilium:v1.15.6@sha256:6aa840986a3a9722cd967ef63248d675a87add7e1704740902d5d3162f0c0def
    Image ID:      quay.io/cilium/cilium@sha256:6aa840986a3a9722cd967ef63248d675a87add7e1704740902d5d3162f0c0def
    Port:          <none>
    Host Port:     <none>
    Command:
      /install-plugin.sh
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 04 Aug 2024 14:47:11 +0100
      Finished:     Sun, 04 Aug 2024 14:47:11 +0100
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:        100m
      memory:     10Mi
    Environment:  <none>
    Mounts:
      /host/opt/cni/bin from cni-path (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jwd7j (ro)
Containers:
  cilium-agent:
    Container ID:  containerd://74b97029d25210bc28fe643ac150a2a0f60810a00e64fab07af78f630d00fc54
    Image:         quay.io/cilium/cilium:v1.15.6@sha256:6aa840986a3a9722cd967ef63248d675a87add7e1704740902d5d3162f0c0def
    Image ID:      quay.io/cilium/cilium@sha256:6aa840986a3a9722cd967ef63248d675a87add7e1704740902d5d3162f0c0def
    Port:          <none>
    Host Port:     <none>
    Command:
      cilium-agent
    Args:
      --config-dir=/tmp/cilium/config-map
    State:          Running
      Started:      Sun, 04 Aug 2024 14:47:12 +0100
    Ready:          True
    Restart Count:  0
    Liveness:       http-get http://127.0.0.1:9879/healthz delay=0s timeout=5s period=30s #success=1 #failure=10
    Readiness:      http-get http://127.0.0.1:9879/healthz delay=0s timeout=5s period=30s #success=1 #failure=3
    Startup:        http-get http://127.0.0.1:9879/healthz delay=5s timeout=1s period=2s #success=1 #failure=105
    Environment:
      K8S_NODE_NAME:               (v1:spec.nodeName)
      CILIUM_K8S_NAMESPACE:       kube-system (v1:metadata.namespace)
      CILIUM_CLUSTERMESH_CONFIG:  /var/lib/cilium/clustermesh/
      GOMEMLIMIT:                 node allocatable (limits.memory)
      KUBERNETES_SERVICE_HOST:    17535DBDBFCD6A8D01567C54391DB73B.gr7.us-east-1.eks.amazonaws.com
      KUBERNETES_SERVICE_PORT:    443

then we check the deamonsets are runnung

$ kubectl get ds -n kube-system
NAME     DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
cilium   2         2         2       2            2           kubernetes.io/os=linux   15m

To check the health of cilium we can verify by

$ kubectl exec ds/cilium -n kube-system -- cilium status
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init), install-cni-binaries (init)
KVStore:                 Ok   Disabled
Kubernetes:              Ok   1.29+ (v1.29.6-eks-db838b0) [linux/amd64]
Kubernetes APIs:         ["EndpointSliceOrEndpoint", "cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "cilium/v2alpha1::CiliumCIDRGroup", "core/v1::Namespace", "core/v1::Pods", "core/v1::Service", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement:    True   [eth0   192.168.96.208 fe80::105b:fbff:fe9a:3b89 (Direct Routing), eth1   fe80::10a0:f4ff:fef8:2789 192.168.114.201, eth2   fe80::100c:26ff:fe69:52a5 192.168.123.218]
Host firewall:           Disabled
SRv6:                    Disabled
CNI Chaining:            none
CNI Config file:         successfully wrote CNI configuration file to /host/etc/cni/net.d/05-cilium.conflist
Cilium:                  Ok   1.15.6 (v1.15.6-a09e05e6)
NodeMonitor:             Listening for events on 15 CPUs with 64x4096 of shared memory
Cilium health daemon:    Ok
IPAM:                    IPv4: 4/12 allocated,
IPv4 BIG TCP:            Disabled
IPv6 BIG TCP:            Disabled
BandwidthManager:        Disabled
Host Routing:            Legacy
Masquerading:            IPTables [IPv4: Enabled, IPv6: Disabled]
Controller Status:       30/30 healthy
Proxy Status:            OK, ip 192.168.113.42, 0 redirects active on ports 10000-20000, Envoy: embedded
Global Identity Range:   min 256, max 65535
Hubble:                  Ok              Current/Max Flows: 4095/4095 (100.00%), Flows/s: 4.69   Metrics: Disabled
Encryption:              Disabled
Cluster health:          2/2 reachable   (2024-08-04T14:02:26Z)
Modules Health:          Stopped(0) Degraded(0) OK(11)

then we check for the Endpoints and verify if the cilium endpoints are healthy

$ kubectl exec cilium-d9gf4 -n kube-system -- cilium endpoint list
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init), install-cni-binaries (init)
ENDPOINT   POLICY (ingress)   POLICY (egress)   IDENTITY   LABELS (source:key[=value])                                                  IPv6   IPv4              STATUS   
           ENFORCEMENT        ENFORCEMENT                                                                                                                        
98         Disabled           Disabled          55806      k8s:eks.amazonaws.com/component=coredns                                             192.168.113.145   ready   
                                                           k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=kube-system                                    
                                                           k8s:io.cilium.k8s.policy.cluster=default                                                                      
                                                           k8s:io.cilium.k8s.policy.serviceaccount=coredns                                                               
                                                           k8s:io.kubernetes.pod.namespace=kube-system                                                                   
                                                           k8s:k8s-app=kube-dns                                                                                          
127        Disabled           Disabled          55806      k8s:eks.amazonaws.com/component=coredns                                             192.168.117.197   ready   
                                                           k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=kube-system                                    
                                                           k8s:io.cilium.k8s.policy.cluster=default                                                                      
                                                           k8s:io.cilium.k8s.policy.serviceaccount=coredns                                                               
                                                           k8s:io.kubernetes.pod.namespace=kube-system                                                                   
                                                           k8s:k8s-app=kube-dns                                                                                          
906        Disabled           Disabled          1          k8s:alpha.eksctl.io/cluster-name=cluster-test                                                         ready   
                                                           k8s:alpha.eksctl.io/nodegroup-name=cluster-node                                                               
                                                           k8s:eks.amazonaws.com/capacityType=ON_DEMAND                                                                  
                                                           k8s:eks.amazonaws.com/nodegroup-image=ami-0de6fddfe8b32a7dc                                                   
                                                           k8s:eks.amazonaws.com/nodegroup=cluster-node                                                                  
                                                           k8s:eks.amazonaws.com/sourceLaunchTemplateId=lt-0c00202c5047e8b2e                                             
                                                           k8s:eks.amazonaws.com/sourceLaunchTemplateVersion=1                                                           
                                                           k8s:node.kubernetes.io/instance-type=t2.medium                                                                
                                                           k8s:topology.kubernetes.io/region=us-east-1                                                                   
                                                           k8s:topology.kubernetes.io/zone=us-east-1b                                                                    
                                                           reserved:host                                                                                                 
2135       Disabled           Disabled          4          reserved:health                                                                     192.168.106.198   ready

To check the cluster connectivity health, here we can determine the overall connectivity status of the cluster with cilium-health this tool will periodically run bidirectional traffic across multiple paths through the cluster and through each node using different protocol to determine the health status of each path and protocol.

kubectl exec cilium-d9gf4 -n kube-system -- cilium-health status
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init), install-cni-binaries (init)
Probe time:   2024-08-04T14:29:26Z
Nodes:
  ip-192-168-96-208.ec2.internal (localhost):
    Host connectivity to 192.168.96.208:
      ICMP to stack:   OK, RTT=338.737µs
      HTTP to agent:   OK, RTT=225.214µs
    Endpoint connectivity to 192.168.106.198:
      ICMP to stack:   OK, RTT=336.85µs
      HTTP to agent:   OK, RTT=320.779µs
  ip-192-168-89-72.ec2.internal:
    Host connectivity to 192.168.89.72:
      ICMP to stack:   OK, RTT=768.781µs
      HTTP to agent:   OK, RTT=787.024µs
    Endpoint connectivity to 192.168.94.129:
      ICMP to stack:   OK, RTT=717.739µs
      HTTP to agent:   OK, RTT=799.092µs

then we can mointor the datapath state using cilium-dbg monitor which allows you to quickly inspect and see if and where packets drop happens.

$ kubectl -n kube-system exec -ti cilium-d9gf4 -- cilium-dbg monitor --type drop
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init), install-cni-binaries (init)
Listening for events on 15 CPUs with 64x4096 of shared memory
Press Ctrl-C to quit
time="2024-08-04T14:50:18Z" level=info msg="Initializing dissection cache..." subsys=monitor
xx drop (Unsupported L3 protocol) flow 0x0 to endpoint 0, ifindex 13, file bpf_lxc.c:1496, , identity 55806->unknown: fe80::c6f:d2ff:fe95:f069 -> ff02::2 RouterSolicitation
xx drop (Unsupported L3 protocol) flow 0x0 to endpoint 2135, ifindex 6, file bpf_lxc.c:2509, , identity world->health: fe80::b0a8:39ff:fea9:9fe -> ff02::2 RouterSolicitation
xx drop (Unsupported L3 protocol) flow 0x0 to endpoint 98, ifindex 13, file bpf_lxc.c:2509, , identity world->55806: fe80::4057:aaff:fe3a:d410 -> ff02::2 RouterSolicitation

Cilium connectivity test (optional) it can be done to deploy several services and deployments, and cilium Network policy will use the various connectivity paths to connect.

cilium connectivity test

Conclusion

AWS EKS provides a robust and managed environment for running Kubernetes clusters, complete with default add-ons that simplify networking and service management. However, there are scenarios where these default components may not suffice, and a more advanced solution like Cilium becomes necessary.

By disabling the default add-ons and leveraging Cilium, you can achieve enhanced security, improved load balancing, and deeper visibility into your network traffic. This customization allows you to tailor your Kubernetes environment to meet specific requirements, ensuring optimal performance and security.

Here, we have explored the basics of Kubernetes EKS, the default networking add-ons, and the steps to implement Cilium as a custom networking solution. By understanding and utilizing these tools, you can unlock the full potential of Kubernetes and build more robust, efficient, and secure applications.

If you found this blog insightful and dive deeper into topics like AWS cloud, Kubernetes, and cloud native projects or anything related,

you can check me out on: linkedin | X | github

Reference: