# frontmatter
date:2024-03-15
author:@backend_arch
type:architecture
status:COMPLETED
tags:[distributed-systems, go, redis]
summary:An analysis of race conditions in high-throughput microservices and the implementation of a custom locking mechanism using Redis and Lua.

# Optimizing Distributed Locks in Go

An analysis of race conditions in high-throughput microservices and the implementation of a custom locking mechanism using Redis and Lua.


Optimizing Distributed Locks in Go

##The Problem

Standard library mutexes work perfectly within the bounds of a single process. However, as we scaled our worker nodes to handle 50k events/sec, we started encountering significant data corruption in our user wallet service.

The root cause was a "thundering herd" scenario where multiple instances attempted to reconcile the same ledger entry simultaneously. Our initial Redis SETNX implementation failed to account for process crashes, leading to deadlocks.

##Solution Set

The atomic acquisition is handled via a Lua script to ensure that the existence check and the SET command happen in a single operation. This prevents the classic check-then-act race condition.

pkg/distlock/mutex.go
type DistributedLock struct {
    client *redis.Client
    key    string
    ttl    time.Duration
}
 
func (l *DistributedLock) TryLock(ctx context.Context) (bool, error) {
    const luaScript = `
        if redis.call("exists", KEYS[1]) == 0 then
            return redis.call("set", KEYS[1], ARGV[1], "PX", ARGV[2])
        end
        return nil
    `
    // Atomic acquire with backoff...
    ok, err := l.acquire(ctx, luaScript)
    return ok, err
}

##Benchmarks

> Environment: AWS c6g.2xlarge
> Commit: 8f3a2c9

StrategyOps/secP99 Latency
Mutex (In-Memory)1,200k<0.1ms
Redis (SETNX)15k4ms
Redis (Redlock+Lua)12.5k6ms
Etcd v34.2k18ms

While the in-memory mutex is orders of magnitude faster, it fundamentally cannot solve distributed coordination. The Lua script approach provided the best balance of safety and performance for our specific latency requirements.

The Lua script approach provided the best balance of safety and performance for our specific latency requirements.

Recommended Logs