Serverless 容器内日志采集最佳实践

    一、概述

    本文主要介绍观测云对 Serverless 容器内日志采集的最佳实践,通过观测云 CRD+DataKit Operator 注入 logfwd sidecar 的方式实现采集,方案主要特点如下:

    • 集中管理采集配置:支持监听 Kubernetes ClusterLoggingConfig CRD,并暴露匹配结果供 logfwd sidecar 轮询获取(sidecar 默认每 60 秒向 Operator 发起 HTTP 请求,logfwd 需 ≥ 1.86.0)。
    • 热更新 & 精细匹配:CRD selector(Namespace/Pod/Label/Container)随改随生效,无需重建 Workload。

    二、前置条件

    • Kubernetes 集群版本 1.16+
    • 安装 DataKit 并开启 logfwdserver 采集器,例如默认监听端口是 9533
    • DataKit service 需要开放 9533 端口,使得其他 Pod 能访问 datakit-service.datakit.svc:9533
    • DataKit-Operator v1.7.0 以及以上版本
    • 集群管理员权限(用于注册 CRD)

    三、采集流程

    1. 注册 Kubernetes CRD

    • 使用以下 YAML 注册 ClusterLoggingConfig CRD:
    apiVersion: apiextensions.k8s.io/v1
    kind: CustomResourceDefinition
    metadata:
      name: clusterloggingconfigs.logging.datakits.io
      labels:
        app: datakit-logging-config
        version: v1alpha1
    spec:
      group: logging.datakits.io
      versions:
        - name: v1alpha1
          served: true
          storage: true
          schema:
            openAPIV3Schema:
              type: object
              properties:
                apiVersion:
                  type: string
                kind:
                  type: string
                metadata:
                  type: object
                spec:
                  type: object
                  required:
                    - selector
                  properties:
                    selector:
                      type: object
                      properties:
                        namespaceRegex:
                          type: string
                        podRegex:
                          type: string
                        podLabelSelector:
                          type: string
                        containerRegex:
                          type: string
                    podTargetLabels:
                      type: array
                      items:
                        type: string
                    configs:
                      type: array
                      items:
                        type: object
                        required:
                          - source
                          - type
                        properties:
                          source:
                            type: string
                          type:
                            type: string
                          disable:
                            type: boolean
                          path:
                            type: string
                          multiline_match:
                            type: string
                          pipeline:
                            type: string
                          storage_index:
                            type: string
                          tags:
                            type: object
                            additionalProperties:
                              type: string
      scope: Cluster
      names:
        plural: clusterloggingconfigs
        singular: clusterloggingconfig
        kind: ClusterLoggingConfig
        shortNames:
          - logging
    
    • 创建 CRD 资源,自动应用采集配置
    kubectl apply -f clusterloggingconfig-crd.yaml
    
    • 验证 CRD 注册
    kubectl get crd clusterloggingconfigs.logging.datakits.io
    

    2. 安装配置 DataKit-Operator

    • 安装 DataKit-Operator v1.7.0 及以上版本,可通过命令 kubectl apply -f datakit-operator.yaml 安装最新的 datakit-operator.yaml 即可带上必要权限,或参考下列最小示例:
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: datakit-operator
    rules:
    - apiGroups: ["logging.datakits.io"]
      resources: ["clusterloggingconfigs"]
      verbs: ["get", "list", "watch"]
    
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: datakit-operator
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: datakit-operator
    subjects:
    - kind: ServiceAccount
      name: datakit-operator
      namespace: datakit
    
    ---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: datakit-operator
      namespace: datakit
    
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: datakit-operator
      namespace: datakit
      labels:
        app: datakit-operator
    spec:
      replicas: 1  # Do not change the ReplicaSet number!
      selector:
         matchLabels:
           app: datakit-operator
      template:
        metadata:
          labels:
            app: datakit-operator
        spec:
          serviceAccountName: datakit-operator
          containers:
          - name: operator
            # other..
    
    • 如下图,在 DataKit-Operator 配置中设置 logfwds 数组,主要配置 namespace_selectors/label_selectors 匹配规则和 log_volume_paths 挂载目录字段,namespace_selectors 和 label_selectors 为且的关系。

    3. DataKit Deployment 部署

    • 在超级节点集群安装部署 Deployment 类型的 DataKit,主要注意资源类型,副本,logfwdserver 采集器开关,以及 Deployment 的更新策略修改,如下:
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: datakit
    rules:
    - apiGroups: ["rbac.authorization.k8s.io"]
      resources: ["clusterroles"]
      verbs: ["get", "list", "watch"]
    - apiGroups: [""]
      resources: ["nodes", "nodes/stats", "nodes/metrics", "namespaces", "pods", "pods/log", "events", "services", "endpoints", "persistentvolumes", "persistentvolumeclaims", "pods/exec"]
      verbs: ["get", "list", "watch"]
    - apiGroups: ["apps"]
      resources: ["deployments", "daemonsets", "statefulsets", "replicasets"]
      verbs: ["get", "list", "watch"]
    - apiGroups: ["batch"]
      resources: ["jobs", "cronjobs"]
      verbs: [ "get", "list", "watch"]
    - apiGroups: ["monitoring.coreos.com"]
      resources: ["podmonitors", "servicemonitors"]
      verbs: ["get", "list", "watch"]
    - apiGroups: ["logging.datakits.io"]
      resources: ["clusterloggingconfigs"]
      verbs: ["get", "list", "watch"]
    - apiGroups: ["metrics.k8s.io"]
      resources: ["pods", "nodes"]
      verbs: ["get", "list"]
    - nonResourceURLs: ["/metrics"]
      verbs: ["get"]
    
    ---
    
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: datakit
      namespace: datakit
    
    ---
    
    apiVersion: v1
    kind: Service
    metadata:
      name: datakit-service
      namespace: datakit
    spec:
      selector:
        app: daemonset-datakit
      ports:
        - name: svc-http-port
          protocol: TCP # for HTTP apis and some collector(inputs) HTTP server, such as DDTrace
          port: 9529
          targetPort: http-port
        - name: svc-statsd-port
          protocol: UDP
          port: 8125
          targetPort: statsd-port
        - name: svc-otel-grpc-port
          protocol: TCP
          port: 4317
          targetPort: otel-grpc-port
        - name: svc-logfwd-port
          protocol: TCP
          port: 9533
          targetPort: logfwd-port
    
    ---
    
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: datakit
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: datakit
    subjects:
    - kind: ServiceAccount
      name: datakit
      namespace: datakit
    
    ---
    
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        app: daemonset-datakit
      name: datakit
      namespace: datakit
    spec:
      replicas: 1
      revisionHistoryLimit: 10
      selector:
        matchLabels:
          app: daemonset-datakit
      template:
        metadata:
          labels:
            app: daemonset-datakit
        spec:
          hostNetwork: true
          dnsPolicy: ClusterFirstWithHostNet
          containers:
          - env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
    
            - name: ENV_K8S_NODE_IP
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: status.hostIP
    
            - name: ENV_K8S_NODE_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: spec.nodeName
    
            #- name: ENV_K8S_CLUSTER_NODE_NAME
            #  value: cluster_a_$(ENV_K8S_NODE_NAME)
    
            - name: ENV_DATAWAY
              value: https://openway.guance.com?token=tkn_3a0052c9f6d3498c8ce9ca0988fd9c82 # Fill your real Dataway server and(or) workspace token
            - name: ENV_CLUSTER_NAME_K8S
              value: lyr-test
            - name: ENV_GLOBAL_HOST_TAGS
              value: host=__datakit_hostname,host_ip=__datakit_ip
            - name: ENV_GLOBAL_ELECTION_TAGS # Default not set
              value: ""
            - name: ENV_DEFAULT_ENABLED_INPUTS
              value: statsd,dk,cpu,disk,diskio,mem,swap,system,hostobject,net,host_processes,container,kubernetesprometheus,logfwdserver,ddtrace
            - name: ENV_ENABLE_ELECTION
              value: enable
            - name: ENV_HTTP_LISTEN
              value: 0.0.0.0:9529
            - name: HOST_PROC
              value: /rootfs/proc
            - name: HOST_SYS
              value: /rootfs/sys
            - name: HOST_ETC
              value: /rootfs/etc
            - name: HOST_VAR
              value: /rootfs/var
            - name: HOST_RUN
              value: /rootfs/run
            - name: HOST_DEV
              value: /rootfs/dev
            - name: HOST_ROOT
              value: /rootfs
            image: pubrepo.guance.com/datakit/datakit:1.86.2
            imagePullPolicy: IfNotPresent
            name: datakit
            ports:
            - containerPort: 9529
              hostPort: 9529
              name: http-port
              protocol: TCP
            - containerPort: 8125
              hostPort: 8125
              name: statsd-port
              protocol: UDP
            - containerPort: 4317
              hostPort: 4317
              name: otel-grpc-port
              protocol: TCP
            - containerPort: 9533
              hostPort: 9533
              name: logfwd-port
              protocol: TCP
            resources:
              requests:
                cpu: "200m"
                memory: "128Mi"
              limits:
                cpu: "2000m"
                memory: "4Gi"
            securityContext:
              privileged: true
            volumeMounts:
            - mountPath: /usr/local/datakit/cache
              name: cache
              readOnly: false
            - mountPath: /rootfs
              name: rootfs
              mountPropagation: HostToContainer
            - mountPath: /var/run
              name: run
              mountPropagation: HostToContainer
            - mountPath: /sys/kernel/debug
              name: debugfs
            - mountPath: /var/lib/containerd/container_logs
              name: container-logs
              mountPropagation: HostToContainer
          hostIPC: true
          hostPID: true
          restartPolicy: Always
          serviceAccount: datakit
          serviceAccountName: datakit
          tolerations:
          - operator: Exists
          volumes:
          - configMap:
              name: datakit-conf
            name: datakit-conf
          # - name: hellopythond
          #   configMap:
          #     name: python-scripts
          - hostPath:
              path: /
            name: rootfs
          - hostPath:
              path: /var/run
            name: run
          - hostPath:
              path: /sys/kernel/debug
            name: debugfs
          - hostPath:
              path: /root/datakit_cache
            name: cache
          - hostPath:
              path: /var/lib/containerd/container_logs
            name: container-logs
          # # ---iploc-start
          #- emptyDir: {}
          #  name: datakit-ipdb
          # # ---iploc-end
      strategy:
        rollingUpdate:
          maxUnavailable: 1
        type: RollingUpdate
    
    • 安装部署执行
    kubectl apply -f datakit.yaml
    

    4. 创建日志 CRD 采集配置

    apiVersion: logging.datakits.io/v1alpha1
    kind: ClusterLoggingConfig
    metadata:
      name: demo-logs
    spec:
      selector:
        namespaceRegex: "^(default)$"
        podRegex: "^(deploy.*)$"
        podLabelSelector: "app=demo"
    
      podTargetLabels:
        - app
        - version
        - enviroment
    
      configs:
        - source: "demo-file"
          type: "file"
          path: "/data/logs/server/server.log"
          tags:
            log_type: "server"
            component: "springboot-server"
    
    • 应用配置
    kubectl apply -f logging-config.yaml
    

    5. 查看日志上报(首次需重启业务)

    • 在 DataKit 容器内,通过“datakit monitor”命令查看日志上报:

    • 容器内日志如下图,数据成功上报到观测云,在观测云控制台筛选相关 source 为"demo-file"即可查看,并可以查看到 CRD 配置的相关字段展示:

    联系我们

    加入社区

    微信扫码
    加入官方交流群

    立即体验

    在线开通,按量计费,真正的云服务!

    立即开始

    选择观测云版本

    代码托管平台