성능 튜닝 가이드

Kafka 클러스터의 성능을 최적화하기 위한 가이드입니다.

성능 튜닝 개요

성능 최적화 영역

┌─────────────────────────────────────────────────────────────────┐
│                  Kafka Performance Tuning Areas                  │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐ │
│  │     Broker      │  │    Producer     │  │    Consumer     │ │
│  │  - 스레드 설정   │  │  - 배치 설정    │  │  - Fetch 설정   │ │
│  │  - 메모리 설정   │  │  - 압축 설정    │  │  - 처리 설정    │ │
│  │  - 디스크 I/O   │  │  - 버퍼 설정    │  │  - 병렬 처리    │ │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘ │
│                                                                 │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐ │
│  │     Network     │  │       OS        │  │       JVM       │ │
│  │  - 소켓 버퍼    │  │  - 파일 시스템  │  │  - 힙 크기      │ │
│  │  - 연결 관리    │  │  - 커널 파라미터│  │  - GC 설정      │ │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘ │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Broker 튜닝

스레드 설정

# server.properties
 
# 네트워크 스레드 (클라이언트 요청 처리)
# 권장: CPU 코어 수 또는 그 이상
num.network.threads=8
 
# I/O 스레드 (디스크 읽기/쓰기)
# 권장: 디스크 수 * 2 ~ 디스크 수 * 8
num.io.threads=16
 
# 복제 스레드
num.replica.fetchers=4
 
# 백그라운드 스레드
background.threads=10

메모리 설정

# 소켓 버퍼
socket.send.buffer.bytes=102400      # 100KB
socket.receive.buffer.bytes=102400   # 100KB
socket.request.max.bytes=104857600   # 100MB
 
# 로그 플러시 (성능 vs 내구성 트레이드오프)
log.flush.interval.messages=10000
log.flush.interval.ms=1000
 
# 메시지 최대 크기
message.max.bytes=10485760           # 10MB
replica.fetch.max.bytes=10485880     # 약간 더 큰 값

복제 설정

# 복제 처리량 조절
replica.fetch.wait.max.ms=500
replica.fetch.min.bytes=1
 
# ISR 관리
replica.lag.time.max.ms=30000
 
# 리더 밸런싱
auto.leader.rebalance.enable=true
leader.imbalance.check.interval.seconds=300
leader.imbalance.per.broker.percentage=10

로그 관리

# 세그먼트 크기 (작을수록 빠른 삭제)
log.segment.bytes=1073741824         # 1GB
 
# 인덱스 크기
log.index.size.max.bytes=10485760    # 10MB
log.index.interval.bytes=4096
 
# Retention
log.retention.hours=168
log.retention.bytes=-1
log.cleanup.policy=delete

Producer 튜닝

처리량 최적화

Properties props = new Properties();
 
// 배치 설정 (처리량 ↑)
props.put("batch.size", 65536);           // 64KB (기본 16KB)
props.put("linger.ms", 10);               // 10ms 대기
props.put("buffer.memory", 67108864);     // 64MB 버퍼
 
// 압축 (네트워크 ↓, CPU ↑)
props.put("compression.type", "lz4");     // lz4 권장 (빠름)
 
// 병렬 처리
props.put("max.in.flight.requests.per.connection", 5);
 
// acks 설정 (처리량 vs 내구성)
props.put("acks", "1");                   // 리더만 확인 (빠름)
// props.put("acks", "all");              // 모든 ISR 확인 (안전)

지연 시간 최적화

Properties props = new Properties();
 
// 즉시 전송 (지연 시간 ↓)
props.put("linger.ms", 0);
props.put("batch.size", 1);               // 배치 없음
 
// acks 설정
props.put("acks", "1");                   // 빠른 응답
 
// 재시도 설정
props.put("retries", 3);
props.put("retry.backoff.ms", 100);
 
// 타임아웃
props.put("delivery.timeout.ms", 30000);
props.put("request.timeout.ms", 10000);

안정성 최적화

Properties props = new Properties();
 
// 내구성 보장
props.put("acks", "all");
props.put("enable.idempotence", true);
 
// 순서 보장 (멱등성과 함께)
props.put("max.in.flight.requests.per.connection", 5);
 
// 재시도
props.put("retries", Integer.MAX_VALUE);
props.put("retry.backoff.ms", 100);
 
// 압축 (데이터 무결성)
props.put("compression.type", "snappy");

Producer 설정 시나리오별 권장값

시나리오	batch.size	linger.ms	acks	compression
최대 처리량	131072	50	1	lz4
최소 지연	1	0	1	none
최대 안정성	16384	5	all	snappy
균형	32768	10	all	lz4

Consumer 튜닝

처리량 최적화

Properties props = new Properties();
 
// Fetch 설정
props.put("fetch.min.bytes", 1048576);    // 1MB (기본 1 byte)
props.put("fetch.max.wait.ms", 500);      // 500ms 대기
props.put("max.partition.fetch.bytes", 10485760);  // 10MB
 
// 폴링 설정
props.put("max.poll.records", 1000);      // 한 번에 1000개
 
// 세션 관리
props.put("session.timeout.ms", 30000);
props.put("heartbeat.interval.ms", 10000);

지연 시간 최적화

Properties props = new Properties();
 
// 즉시 Fetch
props.put("fetch.min.bytes", 1);          // 즉시 반환
props.put("fetch.max.wait.ms", 100);      // 짧은 대기
 
// 작은 배치
props.put("max.poll.records", 100);
 
// 빠른 재연결
props.put("reconnect.backoff.ms", 50);
props.put("reconnect.backoff.max.ms", 1000);

안정성 최적화

Properties props = new Properties();
 
// 수동 커밋
props.put("enable.auto.commit", false);
 
// 세션 관리 (Rebalance 안정성)
props.put("session.timeout.ms", 45000);
props.put("heartbeat.interval.ms", 15000);
props.put("max.poll.interval.ms", 300000);
 
// 격리 수준 (트랜잭션)
props.put("isolation.level", "read_committed");

Consumer 설정 시나리오별 권장값

시나리오	fetch.min.bytes	fetch.max.wait.ms	max.poll.records
최대 처리량	1048576	500	2000
최소 지연	1	100	100
배치 처리	5242880	1000	5000
균형	524288	300	500

OS 튜닝

Linux 커널 파라미터

# /etc/sysctl.conf
 
# 파일 디스크립터
fs.file-max=1000000
 
# 네트워크 버퍼
net.core.wmem_default=1048576
net.core.wmem_max=16777216
net.core.rmem_default=1048576
net.core.rmem_max=16777216
net.core.netdev_max_backlog=30000
 
# TCP 설정
net.ipv4.tcp_wmem=4096 65536 16777216
net.ipv4.tcp_rmem=4096 65536 16777216
net.ipv4.tcp_max_syn_backlog=8096
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_fin_timeout=15
 
# VM 설정
vm.swappiness=1
vm.dirty_ratio=60
vm.dirty_background_ratio=5
 
# 적용
sysctl -p

파일 디스크립터 제한

# /etc/security/limits.conf
kafka soft nofile 128000
kafka hard nofile 128000
kafka soft nproc 128000
kafka hard nproc 128000
 
# 확인
ulimit -n

디스크 설정

# XFS 파일 시스템 권장
mkfs.xfs /dev/sdb
 
# 마운트 옵션
mount -o noatime,nodiratime /dev/sdb /var/kafka-logs
 
# /etc/fstab
/dev/sdb /var/kafka-logs xfs noatime,nodiratime 0 0
 
# I/O 스케줄러 (SSD)
echo noop > /sys/block/sdb/queue/scheduler
 
# I/O 스케줄러 (HDD)
echo deadline > /sys/block/sdb/queue/scheduler

JVM 튜닝

힙 설정

# kafka-server-start.sh 또는 KAFKA_HEAP_OPTS
 
# 기본 권장 (메모리의 25-50%)
export KAFKA_HEAP_OPTS="-Xms6g -Xmx6g"
 
# 대규모 클러스터
export KAFKA_HEAP_OPTS="-Xms12g -Xmx12g"
 
# 주의: 힙이 너무 크면 GC 일시 정지 증가
# 31GB 이하 권장 (Compressed OOPs)

G1GC 설정 (Java 11+)

export KAFKA_JVM_PERFORMANCE_OPTS="-XX:+UseG1GC \
    -XX:MaxGCPauseMillis=20 \
    -XX:InitiatingHeapOccupancyPercent=35 \
    -XX:G1HeapRegionSize=16M \
    -XX:MinMetaspaceFreeRatio=50 \
    -XX:MaxMetaspaceFreeRatio=80 \
    -XX:+ExplicitGCInvokesConcurrent"

ZGC 설정 (Java 15+, 실험적)

export KAFKA_JVM_PERFORMANCE_OPTS="-XX:+UseZGC \
    -XX:+ZGenerational"

GC 로깅

# Java 11+
-Xlog:gc*:file=/var/log/kafka/gc.log:time,tags:filecount=10,filesize=100M
 
# 분석 도구
# GCViewer, GCEasy 등 사용

네트워크 튜닝

Broker 네트워크 설정

# server.properties
 
# 연결 관리
connections.max.idle.ms=600000
max.connections=1000
max.connections.per.ip=100
 
# 요청 큐
queued.max.requests=500
 
# 요청 처리 제한
max.incremental.fetch.session.cache.slots=1000

클라이언트 네트워크 설정

// Producer/Consumer
props.put("connections.max.idle.ms", 600000);
props.put("reconnect.backoff.ms", 50);
props.put("reconnect.backoff.max.ms", 30000);

성능 측정

kafka-producer-perf-test

# Producer 성능 테스트
kafka-producer-perf-test.sh \
    --topic perf-test \
    --num-records 10000000 \
    --record-size 1024 \
    --throughput -1 \
    --producer-props \
        bootstrap.servers=localhost:9092 \
        batch.size=65536 \
        linger.ms=10 \
        compression.type=lz4
 
# 결과:
# 10000000 records sent, 500000.0 records/sec (488.28 MB/sec),
# 10.5 ms avg latency, 250.0 ms max latency

kafka-consumer-perf-test

# Consumer 성능 테스트
kafka-consumer-perf-test.sh \
    --bootstrap-server localhost:9092 \
    --topic perf-test \
    --messages 10000000 \
    --threads 1 \
    --fetch-size 1048576
 
# 결과:
# start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec

End-to-End 지연 측정

# End-to-End 지연
kafka-run-class.sh kafka.tools.EndToEndLatency \
    localhost:9092 perf-test 10000 all 1024
 
# 결과: Avg latency: 5.0 ms

튜닝 체크리스트

처리량 최적화

□ batch.size 증가 (Producer)
□ linger.ms 증가 (Producer)
□ compression.type 설정 (lz4)
□ fetch.min.bytes 증가 (Consumer)
□ num.network.threads 증가 (Broker)
□ num.io.threads 증가 (Broker)
□ 파티션 수 증가

지연 시간 최적화

□ linger.ms = 0 (Producer)
□ fetch.min.bytes = 1 (Consumer)
□ fetch.max.wait.ms 감소 (Consumer)
□ 네트워크 대역폭 확인
□ 디스크 I/O 확인
□ GC 튜닝

안정성 최적화

□ acks=all (Producer)
□ enable.idempotence=true (Producer)
□ min.insync.replicas >= 2
□ unclean.leader.election.enable=false
□ enable.auto.commit=false (Consumer)
□ 모니터링 설정

Best Practices

1. 점진적 튜닝

1. 기준선 측정 (baseline)
2. 한 번에 하나의 파라미터 변경
3. 변경 후 영향 측정
4. 롤백 계획 준비

2. 환경별 설정

# 개발 환경
batch.size=16384
linger.ms=1
acks=1
 
# 스테이징 환경
batch.size=32768
linger.ms=5
acks=all
 
# 프로덕션 환경
batch.size=65536
linger.ms=10
acks=all

3. 모니터링 기반 튜닝

모니터링 메트릭 → 병목 식별 → 파라미터 조정 → 효과 측정
                      ↑                            │
                      └────────────────────────────┘

Yong's Park

Recent

모니터링 도구

성능 튜닝 가이드

장애 대응

Topic

성능 튜닝 가이드

성능 튜닝 가이드

성능 튜닝 개요

성능 최적화 영역

Broker 튜닝

스레드 설정

메모리 설정

복제 설정

로그 관리

Producer 튜닝

처리량 최적화

지연 시간 최적화

안정성 최적화

Producer 설정 시나리오별 권장값

Consumer 튜닝

처리량 최적화

지연 시간 최적화

안정성 최적화

Consumer 설정 시나리오별 권장값

OS 튜닝

Linux 커널 파라미터

파일 디스크립터 제한

디스크 설정

JVM 튜닝

힙 설정

G1GC 설정 (Java 11+)

ZGC 설정 (Java 15+, 실험적)

GC 로깅

네트워크 튜닝

Broker 네트워크 설정

클라이언트 네트워크 설정

성능 측정

kafka-producer-perf-test

kafka-consumer-perf-test

End-to-End 지연 측정

튜닝 체크리스트

처리량 최적화

지연 시간 최적화

안정성 최적화

Best Practices

1. 점진적 튜닝

2. 환경별 설정

3. 모니터링 기반 튜닝

관련 문서

댓글 (0)

목차

백링크