sentimental programmer: k8s

Sentimental Programmer | ysoftman

레이블이 k8s인 게시물을 표시합니다. 모든 게시물 표시

k8s event exporter

# k8s event exporter 로 k8 에서 발생하는 이벤트들을 체크하여 알림을 보낼 수 있다.

# https://github.com/resmoio/kubernetes-event-exporter

# https://github.com/bitnami/charts/tree/main/bitnami/kubernetes-event-exporter/

# helm chart 다운로드

helm repo add bitnami https://charts.bitnami.com/bitnami

helm repo update

helm pull bitnami/kubernetes-event-exporter --untar

# pod 배포 변경이 있을때 receiver 로 알림 처림 설정

# values.yaml

config:

logLevel: debug

logFormat: pretty

# 7초 이상 오래된 event 무시(설정안하면 기본5초)

maxEventAgeSeconds: 7

# `client-side throttling` 에러가 발생시

# `Burst` to roughly match your events per minute

# `QPS` to be 1/5 of the burst

kubeQPS: 100

kubeBurst: 500

receivers:

- name: "webhook-alerts"

webhook:

endpoint: "https://ysoftman.test"

headers:

X-API-KEY: "123"

User-Agent: kube-event-exporter 1.0

Content-Type: "application/json"

to: "ysoftman"

message: |

이건 테스트 메시지입니다.

kind: "{{ .InvolvedObject.Kind }}"

createdAt: "{{ .GetTimestampMs }}"

details:

message: "{{ .Message }}"

reason: "{{ .Reason }}"

type: "{{ .Type }}"

kind: "{{ .InvolvedObject.Kind }}"

namespace: "{{ .Namespace }}"

component: "{{ .Source.Component }}"

host: "{{ .Source.Host }}"

labels: "{{ toJson .InvolvedObject.Labels}}"

route:

routes:

# argocd application 이벤트 무시

- drop:

- namespace: "^argocd.*"

- kind: "Application"

match:

- receiver: "webhook-alerts"

# match 조건은 기본적으로 AND 연산이므로, OR 조건을 구현하려면 여러 개의 match 블록을 사용

# - match:

# - kind: "Pod|Deployment|ReplicaSet"

# receiver: "webhook-alerts"

- match:

- reason: "BackOff"

receiver: "webhook-alerts"

- match:

- reason: "CrashLoopBackOff"

receiver: "webhook-alerts"

- match:

- reason: "Killing"

receiver: "webhook-alerts"

- match:

- reason: "Scheduled"

receiver: "webhook-alerts"

- match:

- reason: "Pulling"

receiver: "webhook-alerts"

- match:

- reason: "Created"

receiver: "webhook-alerts"

- match:

- reason: "Failed"

receiver: "webhook-alerts"

- match:

- reason: "FailedSync"

receiver: "webhook-alerts"

# 설치

helm install k8s-event-exporter . -n ysoftman --create-namespace

prometheus 의 kube_pod_xxx 메트릭들을 활용해서 alert-manager 나 grafana 로 알림을 보낼 수 있어 그런지 많이 사용되지는 않는것 같다.

다만 좀더 자세한 정보와 빠른 클러스터에서 동작하니 좀더 빠르게 이벤트 파악해서 알람을 보낼 수 있고, slack, es 같은 외부 전송 설정은 쉽게 할 수 있는것 같다.

배포 완료 알림이나 crash, error 알림등의 용도로는 괜찮아 보인다.

k8s event exporter > promtail > loki(로그 수집) > grafana 로도 사용한다.

promtail agent 가 로그를 수집해서 loki 에저장하고, grafana 에서 datasource 를 loki 로 해서 보는 방식이다.

promtail & loki : https://github.com/grafana/loki

grafana k8s event exporter dashbard: https://grafana.com/grafana/dashboards/17882-kubernetes-event-exporter/

참고로 promtail 은 deprecated 됐고 대신 alloy(https://github.com/grafana/alloy)이 있다.

k8s metrics-server pod resource issue

# k9s,ktop 등의 프로그램에서 에서 pod 리소스(cpu,mem) 사용량이 실제보다 2배로 보인다.

# kubectl(k) top pods 로 봐도 같다.

# 1개의 container 로 운영되는 pod 인데 이름이 없는 container 가 리소스를 똑같이 잡고 있어 pod 리소스에는 2배로 보인다.

kubectl top pods

kubectl top pods --containers

# 실제 metrics api 로 pod 정보를 요청해 보면 2개의 container 가 있고 하나는 이름이 없다.

kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespaces/{네임스페이스}/pods/{파드} | jq .

# 네임스페이스 구분 없이 모든 pod 리소스가 위와 같이 2배로 보인다.

# metrics-server 를 재시작 해봤지만 변화가 없다.

kubectl rollout restart deployment metrics-server -n kube-system

# 현재 k8s 버전에 비해 metrics-server 버전이 낮은것 같다.

# k8s v1.26.4

# metrics-server k8s.gcr.io/metrics-server/metrics-server:v0.4.2

# 현재 최신 버전 0.7.2 으로 다시 설치해 보자.

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# metrics-server pod 에서 다음과 같은 tls 에러가 발생해

tls: failed to verify certificate: x509

# 다음 옵션을 추가하면 자동 재시작되고 정상 동작 한다.

kubectl edit deployment metrics-server -n kube-system

args:

- --kubelet-insecure-tls

# metrics-server 가 최신버전으로 변경됐지만 문제는 여전하다.

# metrics-server 를 다음으로 삭제했는데 kubectl top 등이 동작한다.

kubectl delete -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# api 를 확인해보니 prometheus-adapter 서비스에서 제공하고 있다.

kubectl get apiservice v1beta1.metrics.k8s.io

# prometheus-adapter helm chart 를 보면 v1beta1.custom.metrics.k8s.io api 외 (kubectl top 에서 사용하는) /apis/metrics.k8s.io/v1beta1 (v1beta1.metrics.k8s.io) 를 서빙할 수 있다고 한다.

# 실제 prometheus-adapter configmap 을 보면 resourceRules 로 설정이 되어 있었다.

# https://github.com/prometheus-community/helm-charts/blob/995b3392c82ecfdec4cbc432064b04b40c71c8aa/charts/prometheus-adapter/README.md?plain=1#L114

# 설명을 보면 https://github.com/helm/charts/tree/master/stable/metrics-server 의 기능을 제공한다고 하는데 chart 버전을 보면 deprecatd 된 0.3.6 을 참고하는것 같다.

# prometheus-adapter 가 선점하고 있는 v1beta1.metrics.k8s.io 서비스는 삭제한다.

# 참고로 apiservice 는 먼저 등록한쪽이 사용되고 나중에 등록된건 무시된다고 한다.

kubectl delete apiservice v1beta1.metrics.k8s.io

# 참고로 prometheus-adapter chart 는 자동 argocd 에서 자동 싱크되어 있어 v1beta1.metrics.k8s.io 를 선점하려고 해서 잠시 비활성화해야 했다.

# 다시 최신 metrics-server 를 설치하고 v1beta1.metrics.k8s.io 서비스를 확인하면 metrics-server 로 동작한다.

kubectl get apiservice v1beta1.metrics.k8s.io

# 이제 kubectl top pods --containers 에도 중복된 녀석이 보이지 않는다.

# prometheus-adapter resource 수집 조건을 확인해 보니

# 원래 prometheus-adapter chart 에는 container_cpu_usage_seconds_total{container!=""} 로 빈이름의 container 는 제외 하는데 우리 클러스터에 적용시 이 조건이 누락되어 있있고 조건을 추가하니 container 리소스가 중복 취합 되지 않는다.

https://github.com/prometheus-community/helm-charts/blob/995b3392c82ecfdec4cbc432064b04b40c71c8aa/charts/prometheus-adapter/values.yaml#L167

argo-cd sync errror

# prometheus, grafana 버전업을 위해 다음과 같이 prometheus(operator) Chart.yaml > dependencies 버전업했다.

apiVersion: v2

version: 0.0.1

description: Chart for ysoftman-prometheus

dependencies:

- name: "kube-prometheus-stack"

version: "65.3.2"

repository: "https://prometheus-community.github.io/helm-charts"

- name: "prometheus-adapter"

version: "4.11.0"

repository: "https://prometheus-community.github.io/helm-charts"

# Chart.lock 새로 생성

helm dependencies build

# 이제 git develop 브랜치에 커밋

# argocd 에서 해당 앱(prometheus operator)이 자동 싱크가 활성화 돼 있어 자동 싱크를 수행하는 중 다음과 같은 에러가 발생했다.

Failed to compare desired state to live state: failed to calculate diff: error calculating structured merge diff: error building typed value from config resource: .spec.scrapeConfigSelector: field not declared in schema

# 해결하기

# argocd 해당 application > detail > sync policy > automated 비활성화 상태에서 수동으로 싱크한다.

# 다시 automated 를 활성화하면 자동싱크시 에러가 발생하지 않는다.

# 버전 확인

http://ysoftman-prometheus.aaa.bbb/ > prometheus_build_info 메트릭으로 조회

http://ysoftman-grafana.aaa.bbb/api/health

# 기타 values spec 변경 사항 확인

Kind: Prometheus > spec

Kind: Alertmanger > spec

k8ts topologyKey

# k8s 1.26.4 버전에서

# 멀티IDC 노드 분산을 위한 topologySpreadConstraints 사용시

https://kubernetes.io/ko/docs/concepts/scheduling-eviction/topology-spread-constraints/

# 다음 웰 노운 노드 레이블 중 하나로 사용하라고 한다.

# 참고로 IDC 를 구분하는 커스텀 노드 레이블을 설정해 사용해도 동작은 된다.

topology.kubernetes.io/zone

topology.kubernetes.io/region

# 참고로 서비스 topologyKeys 에선 아직 3개로만 사용해야 한다.

https://kubernetes.io/ko/docs/concepts/services-networking/service-topology/#%EC%A0%9C%EC%95%BD%EB%93%A4

kubernetes.io/hostname

topology.kubernetes.io/zone

topology.kubernetes.io/region

# 하나의 토폴로지내에서 노드별 균일하게 분산되지 않을때는 다음과 설정을 사용하자.

# 참고로

# DoNotSchedule 모든 제약 조건을 충족하는 적합한 노드가 나타날 때까지 대기

# ScheduleAnyway 스케줄러는 토폴로지 분산 제약 조건을 완전히 만족하지 못하더라도 Pod를 배치하려고 시도, 그러나 가능한 한 불균형을 최소화하는 노드를 우선적으로 선택

whenUnsatisfiable: ScheduleAnyway

# 만약 replica 를 줄이는 경우 pod 균형 분산이 안될 수 있다.

# 이 경우 pod 가 많이 할당된 노드를 삭제해서 다시 시작되도록 하자.

https://kubernetes.io/ko/docs/concepts/scheduling-eviction/topology-spread-constraints/#%EC%95%8C%EB%A0%A4%EC%A7%84-%EC%A0%9C%ED%95%9C%EC%82%AC%ED%95%AD

golang gomaxprocs in k8s

golang garbage collector(GC) 는 데이터 무결성을 위해 stop-the-world(일시정지)가 필요하다.

참고: https://tip.golang.org/doc/gc-guide

linux 스케줄러 Completely Fair Scheduler(CFS) 에서 프로세스를 cpu(core) 시간에 할당한다.

golang 은 container(linux cfs 기반)의 cpu (시간)제한을 인지하지 못해 일시정지가 발생할 수 있다.

참고: https://news.hada.io/topic?id=11747

GOMAXPROCS 를 k8s cpu limit 와 일치 시키면 gc 로 인한 일시정지를 줄 일 수 있다.

GOMAXPROCS 를 k8s cpu limit 와 일치시키기

ubuer automaxprocs 를 golang main 에 import 하면 되지만 GOMEMLIMIT 는 지원하지 않는다.

https://github.com/uber-go/automaxprocs

대신 환경변수로 GOMAXPROCS, GOMEMLIMIT 로 pod > resource > limits 를 설정하면 된다.

참고: https://blog.howardjohn.info/posts/gomaxprocs/

golang main() 에 다음을 추가해 프로그램 시작시 GOMAXPROCS, GOMEMLIMIT 값을 찍어보자.

기본적으로 최대 core 개수가 찍힌다.

fmt.Printf("GOMAXPROCS: %d\n", runtime.GOMAXPROCS(0))

fmt.Printf("GOMEMLIMIT: %d\n", debug.SetMemoryLimit(-1))

이제 deployment 에 다음과 같이 GOMAXPROCS, GOMEMLIMIT 환경변수를 적용하면

pod(container)가 새로 시작되고 golang 프로그램의 GOMAXPROCS, GOMEMLIMIT 에 반영된다.

spec:

template:

spec:

containers:

- name: ysoftman-app

resources:

requests:

cpu: "2000m"

memory: "2048Mi"

limits:

cpu: "2000m"

memory: "2048Mi"

env:

- name: GOMEMLIMIT

valueFrom:

resourceFieldRef:

resource: limits.memory

- name: GOMAXPROCS

valueFrom:

resourceFieldRef:

resource: limits.cpu

argocd sync k8tz replicaset

argocd k8tz 가 하루에 한번씩 rolling update(new replicaset, new pod) 되는 현상을 발견했다.

argocd sync policy > automated 로 되어 자동으로 싱크를 맞춘다.(디폴트 3분 마다)

repo url: https://k8tz.github.io/k8tz

chart: k8tz:0.16.0

자동/수동으로 모두 변화가 없다가 하루 중 특정 시간만 되면 checksum/config 를 변경해 rolling update 가 된다.

k8tz pod 의 로그를 확인하고 싶은데, 하루마다 재시작돼 이전 로그를 확인할 수 없는 문제가 있다.

k8tz replicaset 이 10개가 쌓여 있었고 매일 비슷한 시간에 생성되었다.

k8s event 를 보면 argocd sync 로 k8tz replicaset 가 생긴 기록이 있다.

(kube api server --event-ttl duration Default: 1h0m0s 로 1시간내 이벤트만 보임)

# 다음으로 replicaset 의 checksum/config 값이 매번 변했다는 것을 알 수 있다.

for r in $(k get rs -n k8tz | sed 1d | awk '{print $1}'); do k get rs $r -n k8tz -o json | jq '.spec.template.metadata.annotations'; done

k8tz helm chart 를 보면

admission-webhook 에서는 genSelfSignedCert 함수로 helm 실행시 마다 selfsigned 값이 새로 생성된다.

https://github.com/k8tz/k8tz/blob/a07bfbc76c4fcea972b46da4ebc26408c2dfc1a1/charts/k8tz/templates/admission-webhook.yaml#L2

deployment 는 위 admission-webhook.yaml 을 checksum/config sha256 값으로 사용한다.

https://github.com/k8tz/k8tz/blob/a07bfbc76c4fcea972b46da4ebc26408c2dfc1a1/charts/k8tz/templates/controller.yaml#L21

# 설치시 생성되는 인증서값으로 매번 새로운 checksum 이 생성된다.

helm install k8tz k8tz/k8tz --dry-run | rg -i checksum

이렇게 deployment checksum/config annotation 는 secret 나 configmap 변경이 발생하면 pod 를 새로 시작해서 반영한다.

https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deployments

참고로 https://github.com/stakater/Reloader 라는 방법도 있다.

argocd k8tz 특정 버전을 sync 하지만 변경사항이 없어서 deployment 변화(checksum/config)가 없어야 하는데 하루에 한번 변경이 발생된다는 것인데...

k8tz 앱 > self heal 을 비활성화 해두고 하루 지나서 sync 가 발생되면 바로 적용하지 않아 diff 로 차이점을 확인할 수 있었다.

차이는 아래 3개의 리소스에서 발생했다.

Secret

MutatingWebhookConfiguration

Deployoment

secret 이 변경이 됐고, 이로 인해 deployment checuksum/config 도 변경되었다.

24시간(1day)값과 관련된 argocd 내용중 repo cache 시간이 기본 24시로 되어있는것 발견했다.

디폴트로 24시간동안 repo를 캐싱하기 때문에 sync를 하더라도 변경이 없게 된다.

24시 이후에는 repo 캐시가 변경되고 이때 싱크하면 변경사항이 발생하게 된다.

argocd-repo-server Deployment 에 다음과 같이 repo 캐싱 시간을 짧게 줄 수 있다.

https://argo-cd.readthedocs.io/en/stable/operator-manual/high_availability/

spec:

template:

spec:

containers:

- args:

- /usr/local/bin/argocd-repo-server

- --port=8081

- --metrics-port=8084

- --repo-cache-expiration=30s

k8tz hard refresh 를 한번 실행(이렇게하면 강제 재시작됨), 이후 부터 3분 마다 argocd 가 k8tz sync 하면 pod 재시작한다.

정리

- k8tz 는 helm 에서 설치시 셀프 인증서를 생성하고, k8tz deployment > checksum/config 을 변경해서 pod 를 재시작함

- argocd repo 캐시로 24시간(디폴트값) 동안 싱크해도 변화 없음(hard refresh로는 변경됨)

해결

- k8tz 재시작 방지를 위해선 자동싱크(sync policy > automated) 를 비활성화 한다.

vector helm default values

# vector helm chart 를 로컬에 받고

helm fetch vector/vector

tar zxvf vector-0.35.0.tgz

# service: 설정을 변경하고자 templates/service.yaml 파일에 다음과 같이 Values.service 변수를 추가했다.

spec:

# 그리고 values-ysoftman.yaml 파일을 만들고 다음과 같이 작성했다.

service:

type: NodePort

externalTrafficPolicy: Cluster

# dry-run 으로 실행해 보면

helm install vector -f ./values-ysoftman.yaml . --dry-run

# 다음과 같이 values.yaml > service: 디폴트(빈값)들이 추가되는 문제가 발생했다.

# Source: vector/templates/service.yaml

apiVersion: v1

kind: Service

metadata:

... 생략 ...

spec:

annotations: {}

enabled: true

externalTrafficPolicy: ""

internalTrafficPolicy: ""

ipFamilies: []

ipFamilyPolicy: ""

loadBalancerIP: ""

ports: []

topologyKeys: []

type: NodePort

externalTrafficPolicy: Cluster

# 파일에 다음과 같이 Values.service 이름이 vector 의 기본(values.yaml) service: 와 중복되어 발생한것으로 다음과 같이 이름을 다르게 하면 된다.

# templates/service.yaml

spec:

# values-ysoftman.yaml

ysoftman_service:

type: NodePort

externalTrafficPolicy: Cluster

prometheus etcd-client-cert

# https://prometheus-community.github.io/helm-charts/ 으로 prometheus 설치시

# prometheus pod 가 실행할때 etcd-client-cert secret 을 참조하도록 설정했다.

# values.yaml

kube-prometheus-stack:

prometheus:

prometheusSpec:

replicas: 2

secrets:

- etcd-client-cert

# etcd-client-cert secret 생성

# k8s master 마스터 서버 접속해 아래 위치에서 3개의 파일을 가져온다.

ssh ysoftman@ysoftman-master-1.server

sudo -i

cp -v /etc/kubernetes/pki/etcd/etcd-ca.crt /home/ysoftman/

cp -v /etc/kubernetes/pki/apiserver-etcd-client.crt /home/ysoftman/

cp -v /etc/kubernetes/pki/apiserver-etcd-client.key /home/ysoftamn/

exit; exit;

# 로컬로 3개의 파일을 복사해 온다.

rsync ysoftman@ysoftman-master-1.server:/home/ysoftman/etcd-ca.crt .

rsync ysoftman@ysoftman-master-1.server:/home/ysoftman/apiserver-etcd-client.crt .

rsync ysoftman@ysoftman-master-1.server:/home/ysoftman/apiserver-etcd-client.key .

# 이 파일로 secret 을 생성한다.

kubectl create secret generic etcd-client-cert -n prometheus \

--from-literal=etcd-ca="$(cat etcd-ca.crt)" \

--from-literal=etcd-client="$(cat apiserver-etcd-client.crt)" \

--from-literal=etcd-client-key="$(cat apiserver-etcd-client.key)"

strimzi kafka nodeport ingress

# strimzi operator 로 k8s 에 kafka cluster 를 구성한 경우

# 클러스터들이 svc 로컬 호스트 사용으로 k8s 클러스터 외부에서 kafka 9092포트(bootstrap/broker)로 접속이 안된다.

# 우선 kafka 설치가 되어 있어야 테스트할 수 있다.

# /opt/homebrew/opt/kafka/bin 사용할 수 있는 커맨드 스크립트들이 생성된다.

brew install kafka kcat

# nodeport 생성하기

# kafka 리소스 > spec > kafka > listeners 에 다음과 설정을 추가하면 nodeport 가 생성된다.

# service/pod 에 9094 nodeport 설정이 추가된다.

# 참고 https://strimzi.io/blog/2019/04/23/accessing-kafka-part-2/

- name: external1 # ^[a-z0-9]{1,11}$' 이고 유니크해야 한다.

port: 9094

type: nodeport

tls: false

configuration:

bootstrap:

nodePort: 32100

brokers:

- broker: 0

nodePort: 32000

- broker: 1

nodePort: 32001

- broker: 2

nodePort: 32002

# nodeport 접속 확인

# 토픽으로 메시지 생성

/opt/homebrew/opt/kafka/bin/kafka-console-producer \

--broker-list ysoftman-node1:32100 \

--topic test

# 토픽으로 들오는 메시지 확인

/opt/homebrew/opt/kafka/bin/kafka-console-consumer \

--bootstrap-server ysoftman-node1:32100 \

--topic test \

--from-beginning

# 또는

kcat -b ysoftman-node1:32100 -t test

#####

# ingress 생성하기

# ingress 는 http 프로토콜을 사용하지만 kafka 는 tcp 프로토콜을 사용한다.

# 따라서 nginx ingress > ssl-passthrough 기능을 사용해 서비스 tcp 로 바로 연결되는 방식을 사용해야 한다.

# kafka 리소스 > spec > kafka > listeners 에 다음과 설정을 추가하면 ingress 가 생성된다.

# service/pod 에 9096 포트 설정이 추가된다.

# 참고 https://strimzi.io/blog/2019/05/23/accessing-kafka-part-5/

- name: external1 # ^[a-z0-9]{1,11}$' 이고 유니크해야 한다.

port: 9096

tls: true # Ingress type listener and requires enabled TLS encryption

type: ingress

configuration:

bootstrap:

host: ysoftman-bootstrap.ysoftman.abc

brokers:

- broker: 0

host: ysoftman-0.ysoftman.abc

- broker: 1

host: ysoftman-1.ysoftman.abc

- broker: 2

host: ysoftman-2.ysoftman.abc

# 잠시 후 생성된 인그레스 중 하나를 보면 다음과 같다.

# tls 에 별도의 secretName 이 없다.

# 대신 ssl-passthrough 활성화한다.

# nginx-ingress-controller daemonset(또는 deployment) 에 --enable-ssl-passthrough 설정을 적용해야 ingress ssl-passthrough 이 동작한다.

# 참고 https://kubernetes.github.io/ingress-nginx/user-guide/tls/#ssl-passthrough

spec:

template:

spec:

containers:

- args:

- /nginx-ingress-controller

- --enable-ssl-passthrough=true

# kafka 서버(broker)에서 https 를 받고 tls 인증을 처리하게 된다.

# 참고

# https://github.com/strimzi/strimzi-kafka-operator/issues/3521#issuecomment-675116408

# https://github.com/strimzi/strimzi-kafka-operator/blob/main/documentation/modules/security/proc-accessing-kafka-using-ingress.adoc

metadata:

annotations:

ingress.kubernetes.io/ssl-passthrough: "true"

nginx.ingress.kubernetes.io/backend-protocol: HTTPS

nginx.ingress.kubernetes.io/ssl-passthrough: "true"

... 생략 ...

spec:

tls:

- hosts:

- ysoftman-bootstrap.ysoftman.abc

# 그리고 client, cluster 등의 이름으로 secret 도 생성이 된다.

# 이중 client secret 를 .crt 파일로 다음과 같이 저장한다.

kubectl get secret ysoftman-kafka-cluster-cluster-ca-cert -o jsonpath='{.data.ca\.crt}' | base64 -d > ca.crt

# kafka 커맨드에서 사용할 truststore.jks 파일 생성

keytool -import -trustcacerts -alias root -file ca.crt -keystore truststore.jks -storepass password -noprompt

# kafka 클러스터에 접속해서 producing 해보기

/opt/homebrew/opt/kafka/bin/kafka-console-producer --broker-list ysoftman-bootstrap.ysoftman.abc:443 --producer-property security.protocol=SSL --producer-property ssl.truststore.password=password --producer-property ssl.truststore.location=./truststore.jks --topic test

# kafka 클러스터에 접속해서 consume 해보기

/opt/homebrew/opt/kafka/bin/kafka-console-consumer --bootstrap-server ysoftman-bootstrap.ysoftman.abc:443 --topic test

# 인증서 확인

openssl s_client -connect ysoftman-bootstrap.ysoftman.abc:443 \

-servername ysoftman-bootstrap.ysoftman.abc \

-showcerts

# 만약 다음과 같은 ssl 실패 에러가 발생한다면

failed authentication due to: SSL handshake failed

# ssl 디버깅 정보를 보자

export KAFKA_OPTS="-Djavax.net.debug=ssl"

#####

# strimzi operator 로 kafka 를 설치한 경우 broker, controller pod 들은

# strimzipodset(statefulset 과 비슷) 이라는 커스텀 리소스로 관리된다.

# broker pod 1개를 수동 삭제했는데 pod 가 새로 올라 올때 다른 pod 들과 연결 에러가 발생한다.

# 테스트해본 결과 strimzipodset broker, controller 모두 삭제해서 재시작하도록 하면 된다.

managed-kafka-cluster-broker

managed-kafka-cluster-controller

kube-ops-view

# kube-ops-view 를 사용하면 k8s 리소스가 어디서 어떻게 운영되는 시각화할 수 있다.

# 설치

git clone https://codeberg.org/hjacobs/kube-ops-view.git

cd kube-ops-view

kubectl create ns kube-ops-view

# deploy 디렉토리내의 모든 리소스 설치

kubectl apply -k deploy -n kube-ops-view

# 포트포워딩 후 localhost 로 접속

kubectl port-forward service/kube-ops-view 8080:80 -n kube-ops-view

# 필요 없으면 삭제~

kubectl delete -k deploy -n kube-ops-view

# 참고

# pod 에서 403 forbidden 에러가 발생하는 경우 다음 값을 수정

ClusterRoleBinding > kube-ops-view > subjects > namespace: kube-ops-view

kaniko args

# k8s pod 환경에서 이미지 빌드를 위해 kaniko 를 사용한다.

# github pull, docker registry push 를 위해 다음 2가지를 준비한다.

# github > personal_access_token > repo 접근 권한체크해서 생성

kubectl create secret generic ysoftman-generic-secret \

--from-literal=git-personal-access-token="abc123" \

--namespace=ysoftman-test

# 이미지 푸시를 위새 docker secret 생성

kubectl create secret docker-registry ysoftman-secret \

--docker-server=ysoftman \

--docker-username=ysoftman \

--docker-password=ysoftman123 \

--namespace=ysoftman-test

# 이제 argo workflow 로 kaniko(executor) 로 실행하는데,

# dockerfile ARGS 에 전달하기 위해 --build-arg 옵션을 아래와 같이 사용했다.

# argo workflow fields <https://argo-workflows.readthedocs.io/en/stable/fields/>

apiVersion: argoproj.io/v1alpha1

kind: Workflow

metadata:

generateName: ysoftman-test

namespace: ysoftman-test

spec:

entrypoint: build-image-and-push

serviceAccountName: workflow-template

templates:

- name: build-image-and-push

inputs:

parameters:

- name: fruit

value: "lemon"

script:

image: "rockylinux:latest"

command: [bash]

source: |

curl -X GET "https://httpbin.org/get" -H "accept: application/json"

echo "-----"

echo $ysoftman1

echo $ysoftman2

env:

- name: ysoftman1

value: lemon

- name: ysoftman2

valueFrom:

secretKeyRef:

key: mypassword # 'key' subcomponent of the secret

container:

image: "gcr.io/kaniko-project/executor:debug"

env:

- name: github_personal_access_token

valueFrom:

secretKeyRef:

key: git-personal-access-token

command: [executor]

args:

- "--context=git://$(github_personal_access_token)@github.com/ysoftman/foobar.git#refs/heads/branch1"

- "--context-sub-path=./aaa/bbb"

- "--dockerfile=Dockerfile"

- "--destination=ysoftman/foobar:test"

- "--build-arg var1={{inputs.parameters.fruit}}"

volumeMounts:

- name: kaniko-secret

mountPath: /kaniko/.docker/

volumes:

- name: kaniko-secret

secret:

secretname: ysoftman-secret

items:

- key: .dockerconfigjson

# 그런데 pod 로그에 다음과 같이 에러가 발생한다.

Error: unknown flag: --build-arg var1

# --build-arg 사용시 IFS(Internal Field Separator) 공백구분을 지원하지 않아 export IFS='' 를 설정하라고 한다.

# https://github.com/GoogleContainerTools/kaniko?tab=readme-ov-file#flag---build-arg

# 위 와 같은 yaml 에서는 IFS 설정이 안되니 다음과 같이 구분하면 된다.

args:

- "--build-arg"

- "var1={{inputs.parameters.fruit}}"

# 그리고 container > args 에서 env 참조시 $(VAR_NAME) 를 사용해야 한다.

# https://argo-workflows.readthedocs.io/en/stable/executor_swagger/#container

args:

- "foobar=$(github_personal_access_token)"

k8s service account secret

# pod(앱)가 k8s api 로 인증할때 service account(sa) 를 사용하는데

# 모든 pod 가 디폴트로 사용하는 sa default 외 별도로 앱용 sa 를 만들었다.

kubectl get sa

NAME SECRETS AGE

default 0 16h

ysoftman1 0 16h

# ysoftman1 pod spec > template > spec > ServiceAccountName: ysoftman1 을 사용하고 있다.

# 그런데 secrects 을 확인해 보면 not found 가 발생한다.

kubectl describe secret

Error from server (NotFound): secrets "ysoftman1" not found

# sa 를 새로 만들어 봐도 not found 가 발생한다.

kubectl create serviceaccount ysoftman2

kubectl describe secret ysoftman

Error from server (NotFound): secrets "ysoftman2" not found

# 찾아보니 1.24 부터(현재 1.26 사용하고 있음) sa 생성시 secret 를 자동 생성해주지 않도록 변경됐다고 한다.

# 참고 https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#auto-generated-legacy-serviceaccount-token-clean-up

# token 타입의 secret 를 생성해서 ysoftman sa 에서 사용

cat << zzz | kubectl apply -f -

apiVersion: v1

kind: Secret

type: kubernetes.io/service-account-token

metadata:

annotations:

kubernetes.io/service-account.name: "ysoftman1"

zzz

# 이제 ysoftman sa 의 secret 이 설정된 것을 볼 수 있다.

kubectl describe secret ysoftman1

# 참고로 새로운 토큰 값만 필요한 경우 다음 명령으로 얻을 수 있다.

kubectl create token ysoftman1 --duration=999999h

vector certificate verify failed

# k8s 내부 인증서(apiserver,apiserver-etc-client...)를 업데이트 했다.

# 참고 https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/#check-certificate-expiration

# vector daemonset 재시작했는데 vector pod error log 가 다음과 같이 발생한다.

# 에러 로그가 많아서 dump 로도 확인이 된다.

kubectl cluster-info dump | rg -i error

2024-05-17T02:38:08.766363Z WARN vector::kubernetes::reflector: Watcher Stream received an error. Retrying. error=InitialListFailed(HyperError(hyper::Error(Connect, ConnectError { error: Error { code: ErrorCode(1), cause: Some(Ssl(ErrorStack([Error { code: 167772294, library: "SSL routines", function: "(unknown function)", reason: "certificate verify failed", file: "ssl/statem/statem_clnt.c", line: 2092 }]))) }, verify_result: X509VerifyResult { code: 26, error: "unsuitable certificate purpose" } })))

2024-05-17T02:38:35.158930Z ERROR kube_client::client::builder: failed with error error trying to connect: error:0A000086:SSL routines:(unknown function):certificate verify failed:ssl/statem/statem_clnt.c:2092:: unsuitable certificate purpose

# vector 는 kubernetes_logs 를 소스로 하고 있고

# k8s 접근하기 위해서 인증과정을 거치게 되는것 같다.

# kube_config_file 로 kube config 파일을 명시하는 옵션이 있는데 사용하지 않아

# 디폴트로 in-cluster configuration 로 설정된다.

# 참고 https://vector.dev/docs/reference/configuration/sources/kubernetes_logs/#kube_config_file

# vector 에서 사용하는 kube client rust 소스(go 소스도 같다.)에 다음과 같은 경로의 인증서를 참고 하는것 같다.

// Mounted credential files

const SERVICE_TOKENFILE: &str = "/var/run/secrets/kubernetes.io/serviceaccount/token";

const SERVICE_CERTFILE: &str = "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt";

# 참고 https://github.com/kube-rs/kube/blob/a6f4337a951058db7d3c12ed6b12a33718ac1fa2/kube-client/src/config/incluster_config.rs#L8

# vector pod 에도 다음과 같이 mount 설정이 있다.

spec:

containers:

volumeMounts:

- mountPath: /var/run/secrets/kubernetes.io/serviceaccount

readOnly: true

volumes:

- name: kube-api-access-l82p5

projected:

defaultMode: 420

sources:

- serviceAccountToken:

expirationSeconds: 3607

path: token

- configMap:

items:

- key: ca.crt

path: ca.crt

- downwardAPI:

items:

- fieldRef:

apiVersion: v1

fieldPath: metadata.namespace

path: namespace

# kube-root-ca.crt 는 모든 namespace 의 configmap 에 등록되어 있다.

# 확인 결과 ca.crt 문제는 아니였고 master 노드의 /etc/kubernetes/ssl/apiserver.crt 인증서 문제로 인증서를 변경하니 에러가 발생하지 않았다.

observability data pipeline - vector

# kubernetes(k8s) pods stdout,stderr 는 노드의 다음에 경로에 저장된다.

/var/log/pods

/var/log/containers (pods 하위 컨테이너 로그 파일들이 이곳에 링크로 걸려 있음)

https://kubernetes.io/docs/concepts/cluster-administration/logging/#log-location-node

# 이런 pod 로그들을 예전에는 fluentd 에서 정재 -> es / kafka 로 보냈는데,

# 요즘에는 fluentd 대신 vector(observability data pipeline - agent & aggregator) 를 많이 사용하는것 같다.

# rust 로 만들어서인지 안정성과 성능이 좋은것 같다.

# https://github.com/vectordotdev/vector

# helm 으로 설치

helm repo add vector https://helm.vector.dev

helm repo update

# helm value 설정

# https://github.com/vectordotdev/helm-charts/blob/develop/charts/vector/values.yaml#L14

# vector 는 다음 3가지 형태(role)로 배포 할 수 있다.

# agent: daemonset 으로 모든 노드의 data(stdout)를 수집

# sidecar: 파드별 pod에 사이드카로 vector 를 띄워 pod에 대해서만 수집

# aggregator: 다른 스트림으로 부터 입력(수집)

# customConfig 로 디폴트 설정을 대신할 수 있다.

# config 설정: https://vector.dev/docs/reference/configuration/

# configmap > data 설정된다.

# kubernetes_logs,host_metrics,internal_metrics(source) -> transform -> prometheus_exporter,console(sink) 로 소비하는 흐름

cat << zzz > values.yaml

role: Agent

customConfig:

data_dir: /vector-data-dir

api:

enabled: true

address: 127.0.0.1:8686

playground: false

sources:

kubernetes_logs:

type: kubernetes_logs

host_metrics:

filesystem:

devices:

excludes: [binfmt_misc]

filesystems:

excludes: [binfmt_misc]

mountpoints:

excludes: ["*/proc/sys/fs/binfmt_misc"]

type: host_metrics

internal_metrics:

type: internal_metrics

sinks:

prom_exporter:

type: prometheus_exporter

inputs: [host_metrics, internal_metrics]

address: 0.0.0.0:9090

stdout:

type: console

inputs: [kubernetes_logs]

encoding:

codec: json

zzz

# 설치 하면 Agent 면 daemonset 으로 worker/ingress 노드들에 vector pod 가 설치된다.

helm install vector vector/vector --namespace vector --create-namespace --values values.yaml

# vector 버전업 반영시

helm repo update

helm upgrade vector vector/vector --namespace vector --values values.yaml

# vector 삭제시

helm uninstall vector --namespace vector

# vector 처리 현황 보기

kubectl -n vector exec -it daemonset/vector -- vector top --url http://127.0.0.1:8686/graphql

# k8s log -> filter -> remap -> kafka,elasticsearch,console 로 보내는 경우

# console 은 vector pod log 에서 확인

# vi values.yaml

role: Agent

customConfig:

data_dir: /vector-data-dir

api:

enabled: true

address: 127.0.0.1:8686

playground: false

sources:

k8s_log:

type: kubernetes_logs

# namespace 가 kube_system 아닌것 중 ysoftman 인 것만

# https://kubernetes.io/docs/concepts/overview/working-with-objects/field-selectors/#chained-selectors

extra_field_selector: metadata.namespace=!kube_system,metadata.namespace=ysoftman

transforms:

k8s_transform1:

type: filter

inputs:

- k8s_log

condition: .level != "debug"

k8s_transform2:

type: remap

inputs:

- k8s_transform1

source: |

# % root of the event metadata

# . root of the event

# set es index name

%custom_type = "sample"

if .message == r'.*error.*' {

# % root of the event metadata

%custom_type = "error"

}

sinks:

kafka_log:

type: kafka

inputs: [k8s_transform2]

bootstrap_servers: logis-kafka-dev.daumtools.com:9092

topic: kave-dev-sample

encoding:

codec: json

es_log:

type: elasticsearch

inputs:

- k8s_transform2

endpoints:

- http://ysoftman.es:9200

bulk:

index: "ysoftman-sample-%Y-%m-%d"

console_log:

type: console

inputs: [k8s_transform2]

encoding:

codec: json

change grafana pod localtime

# grafana timezone 이 UTC 로 되어 있다.

kubectl exec monitoring-grafana-aaa -it -- date

Wed Mar 6 07:35:45 UTC 2024

# 그래서 로그가 UTC 로 기록된다.

kubectl logs --tail=3 monitoring-grafana-aaa | rg -i "t="

... t=2024-03-06T07:45:00.515393518Z ...

# 이를 KST 로 변경하기 위해 deployment 에서

env > TZ 의 값을 Asia/Seoul 로 변경하면된다.

# 또는 아래와 같이 노드의 timezone 을 container 의 /etc/localtime 을 마운트되도록 설정한다.

kubectl edit deploy monitoring-grafana

spec > template > spec > containers > env > volumeMounts

volumeMounts:
- mountPath: /etc/localtime

spec > template > spec > containers > env > volumes

volumes:

- hostPath:

path: /usr/share/zoneinfo/Asia/Seoul

# pod 가 다시 시작하고 나면 KST 로 변경되어 있다.

kubectl exec monitoring-grafana-aaa -it -- date

Wed Mar 6 16:45:55 KST 2024

# 이제 로그도 KST 로 기록된다.

kubectl logs --tail=3 monitoring-grafana-aaa | rg -i "t="

... t=2024-03-06T16:54:49.939479809+09:00 ...

# k8tz 을 사용하면 pod 에 편한게 적용할 수 있다.

# 배포되면 기본 k8tz 네임스페이스에 service,deployment,pod 등이 뜬다.

# install k8tz

helm repo add k8tz https://k8tz.github.io/k8tz/

helm install k8tz k8tz/k8tz --set timezone=Asia/Seoul

# deploy 등을 재시작해보자.

# 새로 뜨는 파드는 k8tz container 를 사이드카로 해서 locatime 이 반영된다.

# k8tz 명령어를 사용하는 경우

# install k8tz cli command

wget -c https://github.com/k8tz/k8tz/releases/download/v0.16.0/k8tz_0.16.0_darwin_amd64.tar.gz -O - | tar zx

./k8tz version

# 수동으로 현재 네임스페이스의 모든 deployment 에 반영하기

kubectl get deploy -oyaml | k8tz inject --timezone=Asia/Seoul -| kubectl apply -f -

# 참고로 grafana dashboard 디폴트 타임존은 다음 값으로 설정하면 된다.

# https://github.com/prometheus-community/helm-charts/blob/24979b1d1aee425312e039882de3d27c8fa9b607/charts/kube-prometheus-stack/values.yaml#L981

grafana:

defaultDashboardsTimezone: Asia/Seoul

#####

# argocd 로 등록하기

# --parameter namespace= 를 명시하지 않으면 k8tz 네임스페이스에 pod 가 뜬다.

argocd app create ysoftman-k8tz \

--dest-server https://kubernetes.default.svc \

--sync-option CreateNamespace=true \

--sync-policy automated \

--project ysoftman \

--repo https://k8tz.github.io/k8tz \

--helm-chart k8tz \

--revision "0.16.0" \

--dest-namespace ysoftman-k8tz \

--parameter namespace=ysoftman-k8tz \

--parameter timezone=Asia/Seoul

prometheus "found duplicate series" error

# pod 기준으로 network 트래픽 쿼리를 다음과 같이 실행하면

avg_over_time(container_network_transmit_bytes_total{pod=~"ysoftman-.*", interface="eth0"}[1w:1m]) + on(pod) group_left avg_over_time(container_network_receive_bytes_total{pod=~"ysoftman-.*", interface="eth0"}[1w:1m])

# 특정 pod series 가 중복되어 하나로 그룹핑 되지 않아 다음과 같은 에러를 발생한다.

Error executing query: found duplicate series for the match group {pod="ysoftman-123"} on the right hand-side of the operation:

# ysoftman-123 pod 의 id 가 다르게 3개가 나와서 문제였다.

# 해당 series 는 데이터는 무의미한것으로 없어도 된다.

# 해결방법1

# prometheus 의 admin api가 활성화(--web.enable-admin-api) 되어 있다면 다음과 같이 삭제할 수 있다.

# 바로 삭제되지는 않고 다음 compaction 시 적용된다.

curl -X POST -g 'http://localhost:8090/api/v1/admin/tsdb/delete_series?match[]=container_network_transmit_bytes_total{pod=~"ysoftman-.*"}[1w]'

# 바로 삭제를 위해선 다음을 api 한번더 호출해 준다.

curl -X POST -g 'http://localhost:8090/api/v1/admin/tsdb/clean_tombstones'

# 해결방법2

# on(pod, id) 로 pod, id 로 그룹핑되도록 한다.

avg_over_time(container_network_transmit_bytes_total{pod=~"ysoftman-.*", interface="eth0"}[1w:1m]) + on(pod, id) group_left avg_over_time(container_network_receive_bytes_total{pod=~"ysoftman-.*", interface="eth0"}[1w:1m])

systemd timer 에 etcdctl defrag 등록하기

# prometheus 알람 중 다음과 같은 k8s etcd 디스크 할당관련 에러가 온다

etcd cluster "kube-etcd": database size in use on instance 10.10.10.10:2379 is 48.18% of the actual allocated disk space, please run defragmentation (e.g. etcdctl defrag) to retrieve the unused fragmented disk space.

# 실제 master 노드에 들어가 etcdctl 수행하기

# kube-apiserver 프로세스 옵션 중 인증 부분을 참고하자

ps -ef | grep kube-apiserver

...

--etcd-cafile=/etc/ssl/etcd/ssl/ca.pem --etcd-certfile=/etc/ssl/etcd/ssl/node-master1.pem --etcd-keyfile=/etc/ssl/etcd/ssl/node-master1-key.pem

# etcdctl 옵션명으로 바꿔서 etcdctl 수행할 수 있다.

# cluster member 를 확인해 보자.

sudo etcdctl --cacert=/etc/ssl/etcd/ssl/ca.pem --cert=/etc/ssl/etcd/ssl/node-master1.pem --key=/etc/ssl/etcd/ssl/node-master1-key.pem member list

# etcdctl defrag 를 수행한다.

# https://etcd.io/docs/v3.3/op-guide/maintenance/#defragmentation

sudo etcdctl --cacert=/etc/ssl/etcd/ssl/ca.pem --cert=/etc/ssl/etcd/ssl/node-master1.pem --key=/etc/ssl/etcd/ssl/node-master1-key.pem defrag --cluster

Finished defragmenting etcd member[https://10.10.10.10:2379]

#####

# 주기적으로 etcdctl defrag 수행하기

# etcd 서비스 설정이 이미 있다.

# /etc/systemd/system/etcd.service

# etcd 서비스 동작 확인

sudo journalctl -f -u etcd.service

# etcdctl defrag 를 주기적으로 실행하기 위해선 cron 대신 systemd service timer 를 사용하자

# 서비스명.service 와 서비스명.timer 로 파일명에서 서비스명이 같아야 한다.

# etcdctl defrag 서비스 등록

sudo vi /etc/systemd/system/etcdctl-defrag.service

[Unit]

Description=Run etcdctl defrag

# 유닛(이서비스)의 의존성, network.target(네트워크가 연결된 이후여야 한다.)

After=network.target

[Service]

# oneshot: 한번 실행하고 끝나는 서비스

Type=oneshot

Environment="LOG_DIR=/var/log"

Environment="ETCDCTL_API=3"

ExecStart=/usr/local/bin/etcdctl defrag --cacert=/etc/ssl/etcd/ssl/ca.pem --cert=/etc/ssl/etcd/ssl/node-master1.pem --key=/etc/ssl/etcd/ssl/node-master1-key.pem

[Install]

# linux run level 중 3 단계(multi-user.target, 네트워크등이 활성화 되는 시점)일대 동작(서비스로 설치)

WantedBy=multi-user.target

# 매일 1시에 etcdctl-defrag 서비스가 수행할 수 있도록 timer 파일 생성

sudo vi /etc/systemd/system/etcdctl-defrag.timer

[Unit]

Description=Run etcd-defrag.service every day

After=network.target

[Timer]

OnCalendar=*-*-* 01:00:0

[Install]

WantedBy=multi-user.target

# systemctl 로 서비스, 타이머 시작(등록)

sudo systemctl start etcdctl-defrag

sudo systemctl start etcdctl-defrag.timer

# systemctl 동작 확인

sudo systemctl status etcdctl-defrag --no-pager

sudo systemctl status etcdctl-defrag.timer

# 참고

https://www.gojek.io/blog/a-few-notes-on-etcd-maintenance

k8s PersistentVolume 값 변경

# k8s PersistentVolume(pv) > nfs > ip 를 변경하고자 한다.

# patch 로 변경하면 다음과 같이 생성 후에는 변경할 수 없다고 나온다.

kubectl patch pv ysoftmanPV -p '{"spec":{"nfs":{"server":"10.10.10.10"}}}'

Forbidden: spec.persistentvolumesource is immutable after creation

# 참고로 pvc 용량 패치는 되는데, 용량을 줄이면 안된다.

kubectl patch pvc prometheus-1 -p '{"spec":{"resources":{"requests":{"storage":"10Gi"}}}}' -n monitoring

... spec.resources.requests.storage: Forbidden: field can not be less than previous value

# https://kubernetes.io/ko/docs/concepts/storage/persistent-volumes/#%EB%8B%A8%EA%B3%84-phase

# Available 아직 클레임에 바인딩되지 않은 사용할 수 있는 리소스

# Bound 볼륨이 클레임에 바인딩됨

# Released 클레임이 삭제되었지만 클러스터에서 아직 리소스를 반환하지 않음

# Failed 볼륨이 자동 반환에 실패함

status:

phase: Bound

# claimRef 부분을 삭제하해 Available 상태로 만들 수 있다.

kubectl patch pv ysoftmanPV -p '{"spec":{"claimRef:"null}}'

# 하지만 pv 가 terminating 상태에서 삭제가 안된다.

# finalizers: 오브젝트 삭제 시 충족해야될 조건을 명시하는 곳

# kubernetes.io/pv-protection: pv, pvc 등에서 실수로 오브젝트 삭제를 막기위해 기본적으로 명시되어 있다.

kind: PersistentVolume

metadata:

finalizers:

- kubernetes.io/pv-protection

# 다음과 같이 finalizers 조건을 패치(또는 kubectl edit.. 로 해당 부분 삭제)하면, pv 가 삭제된다.

kubectl patch pv ysoftmanPV -p '{"metadata":{"finalizers":null}}'

#####

# 위 내용을 바탕으로 많은 PV 값을 수정해 반영하는 스크립트를 다음과 같이 작성한다.

# 우선 변경할 pv 들을 yaml 로 로컬에 백업해두자.

mkdir -p pv

for name in $(kubectl get pv -A | grep -i aaa | awk '{print $1}'); do

echo "backup pv manifest(yaml)... ./pv/$name.yaml"

kubectl get pv $name -o yaml > ./pv/$name.yaml

done

# pv 삭제

for name in $(kubectl get pv -A | grep -i aaa | awk '{print $1}'); do

echo "delete pv manifest(yaml)"

# delete 하면 terminating 상태가 유지되는데, 이때 finalizers > kubernetes.io/pv-protection 를 삭제해야 완전히 제거된다.

kubectl delete pv $name & kubectl patch pv $name -p '{"metadata":{"finalizers":null}}'

done

# 백업해둔 pv yaml 에서 ip 만 변경해서 적용

for f in $(ls -1 ./pv); do

cat ./pv/$f | sed -e 's/server: 10.10.10.11/server: 10.10.10.12/'g | kubectl apply -f -

done

helmfile 사용하기

# helmfile.yaml(release 명시)정의된 helm 배포할 수 있는 툴이다.

# https://github.com/helmfile/helmfile

# https://helmfile.readthedocs.io

# 설치(mac 기준)

# 참고로 helm 이 설치되어 있어야 한다.

brew install helmfile

# 초기화 필요한 플러그인들은 설치한다.

helmfile init

# 추가로 diff 기능을 위채 설치할것

helm plugin install https://github.com/databus23/helm-diff

# 현재 배포된 것과 전체 차이 보기

helmfile diff

# 특정 이름의 release만 비교

# -n ysoftman1 로 네임스페이스를 별도로 지정할 수도 있다.

helmfile diff -l name=ysoftman-server

# 변경된 부분 앞뒤로 3줄만 보이도록 한다.

helmfile diff -l name=ysoftman-server --context 3

# 이름이 ysoftman-server 제외한 모든 release 비교

helmfile diff -l name!=ysoftman-server

# helmfile.yaml 에 명시 릴리즈 모두 적용

# apply 하면 내부적으로 diff -> 변경사항 -> sync 가 실행된다.

helmfile apply

# 특정 이름의 release 만 배포, diff 부분 출력시 앞뒤로 3줄 까지 표시

helmfile apply -l name=ysoftman-server --context 3

# 특정 이름의 release 만 삭제

helmfile delete -l name=ysoftman-server

nginx https websocket newline error

# 현상

# k8s pod 접근시 nginx https 를 경유 exec 로 접속 후 엔터를 치면 다음 처럼 prompt가 보이고

root@ysoftman-123:/aaa#

(커서) 여기서 멈춰있다., 엔터를 치면 다시 prompt가 뜨고 다시 똑같이 prompt가 보이고 커서 다음줄에 위치하는 문제가 있다.

nginx http 를 통하면 문제가 없다.

# k8s client python 를 사용 중이고

# https://github.com/kubernetes-client/python/blob/4722e8f7e52369f650b9b2dfdb125f55d62e4f28/kubernetes/base/stream/ws_client.py

# https://github.com/kubernetes-client/python/blob/master/examples/pod_exec.py

# websocket 이 연결되어 있는 동안 stdout, sterr 를 받아 출력하도록 했다.

while websocket_client.is_open():

websocket_client.update(timeout=1)

if websocket_client.peek_stdout():

print(websocket_client.read_stdout(), file=sys.stdout, flush=True, end='')

if websocket_client.peek_stderr():

print(websocket_client.read_stderr(), file=sys.stderr, flush=True, end='')

# 테스트 환경

# nginx 가 --with-debug 로 빌드되었는지 확인

nginx -V | grep -i with-debug

# ngnix.config 에러 로그에 debug 레벨을 추가하자.

error_log /usr/local/var/log/nginx/error.log debug;

# nginx 를 리로딩하기

sudo nginx -s reload

# 디버깅 로깅을 보면

tail -F /usr/local/var/log/nginx/error.log

# pod 접속 후 키를 입력할때마다 nginx debug 다음과 같은 로그가 찍한다.

# (엔터) 친 경우 prompt 가 출력되어야 한다.

# newline 에 커서가 가있지만 prompt 가 안뜨는 경우

2023/05/10 13:24:33 [debug] 40385#0: *58 http upstream process upgraded, fu:1

2023/05/10 13:24:33 [debug] 40385#0: *58 recv: eof:0, avail:150, err:0

2023/05/10 13:24:33 [debug] 40385#0: *58 recv: fd:15 150 of 4096

2023/05/10 13:24:33 [debug] 40385#0: *58 SSL to write: 150

2023/05/10 13:24:33 [debug] 40385#0: *58 SSL_write: 150

2023/05/10 13:24:33 [debug] 40385#0: *58 event timer: 15, old: 17342356, new: 17342362

2023/05/10 13:24:33 [debug] 40385#0: timer delta: 6

2023/05/10 13:24:33 [debug] 40385#0: worker cycle

# newline 에 prompt 정상적으로 뜨는 경우도 가끔 발생했다.

2023/05/10 13:24:50 [debug] 40385#0: *58 http upstream process upgraded, fu:1

2023/05/10 13:24:50 [debug] 40385#0: *58 recv: eof:0, avail:147, err:0

2023/05/10 13:24:50 [debug] 40385#0: *58 recv: fd:15 147 of 4096

2023/05/10 13:24:50 [debug] 40385#0: *58 SSL to write: 147

2023/05/10 13:24:50 [debug] 40385#0: *58 SSL_write: 147

2023/05/10 13:24:50 [debug] 40385#0: *58 event timer: 15, old: 17359466, new: 17359540

2023/05/10 13:24:50 [debug] 40385#0: timer delta: 2

# http 로 연결한 경우 recv 150 인데도, prompt 가 잘뜬다.

2023/05/11 13:44:27 [debug] 41253#0: *48 http upstream process upgraded, fu:1

2023/05/11 13:44:27 [debug] 41253#0: *48 recv: eof:0, avail:150, err:0

2023/05/11 13:44:27 [debug] 41253#0: *48 recv: fd:13 150 of 4096

2023/05/11 13:44:27 [debug] 41253#0: *48 send: fd:12 150 of 150

2023/05/11 13:44:27 [debug] 41253#0: *48 event timer: 13, old: 104937207, new: 104937220

2023/05/11 13:44:27 [debug] 41253#0: timer delta: 12

# 그냥 엔터만 친 경우 150(비정상), 147(정상) 의 데이터 크기 차이를 보인다.

# http 에서도 150, 147 둘다 나오는데, 둘다 prompt 가 정상적으로 출력된다.

# 데이터가 프롬프트 길이 뒤에 값이 추가되는데 https 연결상에서는 이것이 newline 으로 취급되는것으로 보인다.

# update() -> print(data) 로 추가해서

# 엔터를 쳤을대 받는 데이터를 출력해보면

150 -> 비정상인 경우 b'\x01\r\n'

147 -> 정상인 경우 b'\x01\r\n\x1b]0;프롬프트 스트링'

# 0x1(SOH, start of heading)

# \r\n(CR:carriage-return, LF:linefeed) newline

# 0x1b]0;로 x1b(escape) 가 포함되어 있음

# http 에서는 b'\x01\r\n' 인 경우에도 b'\x01\x1b]0; 로 시작하는 prompt 응답이 온다.

# websocket python 트레이싱 해보면

enableTrace(True)

# update() -> polling 을 해서 recv 데이터를 보여주는데 여기에 b'\x01\r\n' 만 있고 prompt 데이터는 나오지 않는다.

# nginx 는 150(byte) 으로 응답했다고 하는것 같은데, ws client 는 3바이트의 newline(b'\x01\r\n')만 받고

# 그 뒤로는 recv 데이터를 받은 것이 없다고 트레이싱 된다.

# (http 에서는 newline 이후에도 prompt 데이터를 받았다고 트레이싱 된다.)

# ws_client 소스에서 update() 부분에서 응답 패킷을 받는데

# https://github.com/kubernetes-client/python/blob/4722e8f7e52369f650b9b2dfdb125f55d62e4f28/kubernetes/base/stream/ws_client.py#L193

# r 체크 조건을 제거하면 https 상태에서도 prompt 데이터를 받는다.

# if r:

op_code, frame = self.sock.recv_data_frame(True)

# 결국 websocket client 데이터를 받는 r(polling)이 제대로 되지않는게 문제로 보인다.

# update() 에서 polling 없이 sock.recv_data_frame(True) 를 받을 수 있도록 하고

# peek_channel() 에서는 self.updat(timeout=timout)을 제거하니

# http, https 둘다 newline 후 prompt 가 잘 표시되었다.