Phần 6: Chiến lược caching và tối ưu tầng ứng dụng

Trong năm bài đầu tiên của series, mình đã tìm hiểu về nền tảng tối ưu hóa cơ sở dữ liệu, phân tích và tối ưu câu truy vấn SQL, chiến lược indexing chuyên sâu, thiết kế schema tối ưu, và quản lý transaction và concurrency hiệu quả. Bài viết này sẽ đi sâu vào một khía cạnh quan trọng khác: chiến lược caching và tối ưu tầng ứng dụng.

Caching là một trong những kỹ thuật mạnh mẽ nhất để cải thiện hiệu năng hệ thống. Bằng cách lưu trữ tạm thời dữ liệu thường xuyên truy cập, mình có thể giảm đáng kể tải cho cơ sở dữ liệu và cải thiện thời gian phản hồi cho người dùng. Tuy nhiên, caching cũng đi kèm với nhiều thách thức về tính nhất quán, invalidation, và quản lý bộ nhớ.

Trong bài viết này, mình sẽ khám phá các chiến lược caching hiệu quả và các kỹ thuật tối ưu tầng ứng dụng để cải thiện hiệu năng cơ sở dữ liệu.

Cache levels và cache invalidation patterns

Các tầng cache trong kiến trúc hiện đại

Trong một hệ thống hiện đại, cache có thể được triển khai ở nhiều tầng khác nhau:


  graph TD
    A[Client] --> B[Browser Cache]
    A --> C[CDN Cache]
    C --> D[API Gateway Cache]
    D --> E[Application Cache]
    E --> F[Database Cache]
    F --> G[Database]

Browser Cache: Lưu trữ tài nguyên tĩnh (JS, CSS, images) ở phía client
CDN Cache: Lưu trữ và phân phối nội dung tĩnh gần với người dùng
API Gateway Cache: Cache responses của API calls
Application Cache: Cache ở tầng ứng dụng (in-memory, distributed cache)
Database Cache: Buffer pool, query cache của database engine

Mỗi tầng cache có đặc điểm riêng và phù hợp cho các loại dữ liệu khác nhau:

Tầng Cache	Thời gian tồn tại	Phạm vi	Phù hợp cho
Browser	Dài (ngày, tuần)	Người dùng cụ thể	Tài nguyên tĩnh, UI components
CDN	Dài (giờ, ngày)	Tất cả người dùng	Nội dung tĩnh, assets
API Gateway	Trung bình (phút, giờ)	Tất cả người dùng	API responses, authentication
Application	Ngắn-Trung bình (giây, phút)	Theo instance hoặc cluster	Business logic, computed data
Database	Ngắn (mili giây, giây)	Database instance	Query results, index data

Cache strategies

Có nhiều chiến lược caching khác nhau, mỗi chiến lược có ưu và nhược điểm riêng:

Cache-Aside (Lazy Loading):
- Ứng dụng kiểm tra cache trước, nếu không có (cache miss) thì đọc từ database và cập nhật cache
- Phù hợp cho dữ liệu đọc nhiều, ít thay đổi
- Có thể dẫn đến cache miss đồng thời (thundering herd)


  sequenceDiagram
    participant App as Application
    participant Cache as Cache
    participant DB as Database

    App->>Cache: Get data
    alt Cache hit
        Cache->>App: Return data
    else Cache miss
        Cache->>App: Data not found
        App->>DB: Query data
        DB->>App: Return data
        App->>Cache: Store data
        App->>App: Process data
    end

/**
 * Lấy thông tin người dùng với cache
 *
 * @param int $userId
 * @return \App\Models\User|null
 */
public function getUser(int $userId)
{
    // Cách 1: Sử dụng Cache facade cơ bản
    $cacheKey = "user:{$userId}";

    if (Cache::has($cacheKey)) {
        return Cache::get($cacheKey);
    }

    // Cache miss, lấy từ database
    $user = User::find($userId);

    // Lưu vào cache cho các request sau
    Cache::put($cacheKey, $user, now()->addHour()); // TTL: 1 giờ

    return $user;
}

/**
 * Cách 2: Sử dụng Cache::remember() để làm gọn code
 *
 * @param int $userId
 * @return \App\Models\User|null
 */
public function getUserOptimized(int $userId)
{
    return Cache::remember("user:{$userId}", now()->addHour(), function () use ($userId) {
        return User::find($userId);
    });
}

/**
 * Cách 3: Sử dụng Repository Pattern với cache
 */
class UserRepository
{
    protected $cache;

    public function __construct(\Illuminate\Contracts\Cache\Repository $cache)
    {
        $this->cache = $cache;
    }

    public function find(int $userId)
    {
        $cacheKey = "user:{$userId}";

        return $this->cache->remember($cacheKey, now()->addHour(), function () use ($userId) {
            // Có thể thêm các logic phức tạp hơn ở đây
            $user = User::find($userId);

            if ($user) {
                // Eager load relationships nếu cần
                $user->load('profile', 'roles');
            }

            return $user;
        });
    }

    // Xóa cache khi cập nhật user
    public function update(User $user, array $data)
    {
        $user->update($data);
        $this->cache->forget("user:{$user->id}");
        return $user;
    }
}

Entity caching trong ORM

Nhiều ORM frameworks cung cấp cơ chế caching entities để cải thiện hiệu năng:

First-level cache (Session/Persistence Context):
- Cache trong phạm vi một session/transaction
- Tự động, không cần cấu hình
- Giúp đảm bảo identity map (cùng một entity chỉ được load một lần trong session)


  graph TD
    A[Application] --> B[ORM Session]
    B --> C[First-level Cache]
    B --> D[Database]

    C -->|Cache Hit| B
    D -->|Cache Miss| C

Second-level cache (Shared Cache):
- Cache ở cấp độ ứng dụng, được chia sẻ giữa nhiều sessions
- Cần cấu hình rõ ràng, thường sử dụng các provider như EHCache, Redis, Hazelcast
- Giúp giảm tải database khi nhiều users truy cập cùng dữ liệu


  graph TD
    A1[Session 1] --> B[Second-level Cache]
    A2[Session 2] --> B
    A3[Session 3] --> B
    B --> C[Database]

Ví dụ cấu hình second-level cache với Hibernate và Redis:

// config/cache.php
return [
    'default' => env('CACHE_DRIVER', 'redis'),

    'stores' => [
        'redis' => [
            'driver' => 'redis',
            'connection' => 'cache',
            'lock_connection' => 'default',
        ],
    ],

    'prefix' => env('CACHE_PREFIX', 'laravel_cache'),
];

// Laravel không có "default TTL" global, TTL truyền trực tiếp khi get/put/remember.
// Helper đơn giản qua macro hoặc config:

// app/Providers/AppServiceProvider.php
namespace App\Providers;

use Illuminate\Support\ServiceProvider;
use Illuminate\Support\Facades\Cache;

class AppServiceProvider extends ServiceProvider
{
    public function boot(): void
    {
        // Thêm macro "default TTL" 600 giây, dùng nơi nào cần
        Cache::macro('rememberDefault', function (string $key, \Closure $cb) {
            return Cache::remember($key, now()->addSeconds(600), $cb);
        });
    }
}

// Sử dụng:
// $users = Cache::rememberDefault('users.active', fn () => User::active()->get());

// Hoặc set TTL trực tiếp, API chính thức của Laravel
Cache::put('key', $value, now()->addMinutes(10));
Cache::remember('users.1', 600, fn () => User::find(1));
Cache::forever('settings.global', $settings);  // không TTL, chỉ xoá bằng forget/flush

Lưu ý: Cache::setDefaultCacheTime() không tồn tại trong Laravel (có thể nhầm với helper nội bộ của dự án cũ). Muốn TTL thống nhất, nên:
Định nghĩa constant/config (config/cache.php → 'default_ttl' => 600) và gọi config('cache.default_ttl') ở mọi chỗ.
Tạo CacheService wrapper xử lý TTL tập trung.
Với Redis, dùng CONFIG SET maxmemory-policy volatile-ttl + default TTL ở application layer.

Các chiến lược caching khác

Ngoài Cache-Aside, còn có các chiến lược caching khác:

Write-Through Cache:
- Dữ liệu được ghi đồng thời vào cache và database
- Đảm bảo tính nhất quán cao
- Có thể làm chậm các thao tác ghi


  sequenceDiagram
    participant App as Application
    participant Cache as Cache
    participant DB as Database

    App->>App: Update data
    App->>Cache: Write data
    Cache->>DB: Write data
    DB->>Cache: Acknowledge
    Cache->>App: Acknowledge

Write-Behind (Write-Back) Cache:
- Dữ liệu được ghi vào cache trước, sau đó mới ghi vào database (async)
- Tối ưu hiệu năng ghi
- Rủi ro mất dữ liệu nếu cache bị lỗi trước khi đồng bộ với database


  sequenceDiagram
    participant App as Application
    participant Cache as Cache
    participant DB as Database

    App->>App: Update data
    App->>Cache: Write data
    Cache->>App: Acknowledge
    Note over Cache,DB: Asynchronously
    Cache->>DB: Write data (delayed)
    DB->>Cache: Acknowledge

Read-Through Cache:
- Cache tự động load dữ liệu từ database khi cache miss
- Ứng dụng chỉ tương tác với cache, không trực tiếp với database
- Đơn giản hóa logic ứng dụng


  sequenceDiagram
    participant App as Application
    participant Cache as Cache
    participant DB as Database

    App->>Cache: Get data
    alt Cache hit
        Cache->>App: Return data
    else Cache miss
        Cache->>DB: Query data
        DB->>Cache: Return data
        Cache->>App: Return data
    end

Cache invalidation patterns

Một trong những thách thức lớn nhất của caching là làm sao để dữ liệu trong cache luôn đồng bộ với database. Có một số pattern phổ biến để giải quyết vấn đề này:

Time-based invalidation (TTL - Time To Live):
- Đặt thời gian hết hạn cho mỗi cache entry
- Đơn giản, dễ triển khai
- Có thể dẫn đến dữ liệu không nhất quán trong khoảng thời gian TTL

// Set cache with TTL of 5 minutes
Cache::put("product:1001", $productData, now()->addMinutes(5));

Event-based invalidation:
- Xóa hoặc cập nhật cache khi có sự kiện thay đổi dữ liệu
- Đảm bảo tính nhất quán cao
- Phức tạp hơn, cần cơ chế theo dõi các thay đổi

function updateProduct($productId, $newData)
{
    // Update database
    Product::where('id', $productId)->update($newData);

    // Invalidate cache
    Cache::forget("product:{$productId}");
    Cache::forget("products:recent");
    Cache::forget("products:featured");
}

Version-based invalidation:
- Gắn version cho mỗi cache entry
- Khi dữ liệu thay đổi, tăng version
- Cache key bao gồm cả version, giúp tự động invalidate các phiên bản cũ

function getProduct($productId)
{
    // Get current version
    $version = VersionService::getVersion("product", $productId);

    // Try to get from cache with version
    $cacheKey = "product:{$productId}:v{$version}";
    $product = Cache::get($cacheKey);

    if ($product === null) {
        $product = Product::find($productId);
        Cache::put($cacheKey, $product, now()->addHour());
    }

    return $product;
}

function updateProduct($productId, $newData)
{
    // Update database
    Product::where('id', $productId)->update($newData);

    // Increment version (no need to invalidate old cache)
    VersionService::incrementVersion("product", $productId);
}

Pattern-based invalidation:
- Xóa nhiều cache entries cùng lúc dựa trên pattern
- Hữu ích khi một thay đổi ảnh hưởng đến nhiều cache entries

// Invalidate all product caches
Cache::tags(['products'])->flush();

// Invalidate all caches related to a specific category
Cache::tags(['category:electronics'])->flush();

Cảnh báo: KHÔNG dùng redis.call('keys', ARGV[1]) trên production, KEYS pattern là O(N) với N = số key toàn DB và block thread chính của Redis. Trên cluster 10M key, câu này có thể block 1–3 giây. Thay bằng SCAN (cursor-based, non-blocking):

-- Lua script xóa theo pattern SCAN, an toàn trên production
local cursor = '0'
local count = 0
repeat
  local result = redis.call('SCAN', cursor, 'MATCH', ARGV[1], 'COUNT', 500)
  cursor = result[1]
  local keys = result[2]
  if #keys > 0 then
    count = count + redis.call('UNLINK', unpack(keys))  -- UNLINK = async DEL
  end
until cursor == '0'
return count

Dùng trong PHP:

Redis::command('EVAL', [
    file_get_contents(base_path('scripts/invalidate.lua')),
    0,
    'product:*',
]);

UNLINK được giới thiệu ở Redis 4.0, xoá key lớn bất đồng bộ (không block), nên dùng thay DEL cho batch invalidate.

Chống cache stampede (thundering herd)

Khi một key hot hết hạn cùng lúc và hàng nghìn request đi thẳng xuống DB, bạn gặp cache stampede. 4 kỹ thuật hay dùng kết hợp:

1. TTL jitter, rải hạn kết thúc ngẫu nhiên ±10-20%:

$ttl = 3600 + random_int(-600, 600);   // 60 ± 10 phút
Cache::put($key, $value, $ttl);

2. Early refresh (“recompute before expire”), refresh khi TTL còn ngắn hơn một ngưỡng, xác suất tăng dần khi tiến gần expire (XFetch / probabilistic early expiration):

import random, math

def get_or_refresh(key, ttl, recompute, beta=1.0):
    value, delta, expiry = cache.get_with_meta(key)  # value, time-to-recompute, expiry_ts
    if value is None or time.time() - delta * beta * math.log(random.random()) >= expiry:
        t0 = time.time()
        value = recompute()
        cache.set_with_meta(key, value, ttl, delta=time.time() - t0)
    return value

3. Mutex / single-flight, chỉ một process được phép compute key đó, các process khác chờ:

def get_with_lock(key, recompute, lock_ttl=5):
    value = cache.get(key)
    if value is not None:
        return value

    lock_key = f"lock:{key}"
    if redis.set(lock_key, "1", nx=True, ex=lock_ttl):
        try:
            value = recompute()
            cache.set(key, value, ttl=3600 + random.randint(-600, 600))
            return value
        finally:
            redis.delete(lock_key)
    else:
        time.sleep(0.05)
        return get_with_lock(key, recompute, lock_ttl)

4. Negative cache, cache cả kết quả rỗng / 404 với TTL ngắn (30-60 giây) để tránh DoS lặp lại:

def get_user(user_id):
    key = f"user:{user_id}"
    v = cache.get(key)
    if v == "__NULL__":
        return None
    if v is not None:
        return json.loads(v)

    user = db.users.find_one(user_id)
    if user is None:
        cache.set(key, "__NULL__", ttl=60)   # negative cache TTL ngắn
    else:
        cache.set(key, json.dumps(user), ttl=3600 + random.randint(-300, 300))
    return user

5. Stale-while-revalidate, trả bản cũ cho request trong khi một worker refresh ngầm (phù hợp CDN/HTTP cache, framework fastify-cache, Next.js revalidate).

Tối ưu ORM và giải quyết vấn đề N+1 queries

Vấn đề N+1 queries

Vấn đề N+1 là một trong những nguyên nhân phổ biến gây ra hiệu năng kém trong ứng dụng sử dụng ORM. Vấn đề xảy ra khi:

Ứng dụng thực hiện 1 query để lấy danh sách N bản ghi
Sau đó thực hiện N queries riêng biệt để lấy dữ liệu liên quan cho mỗi bản ghi


  sequenceDiagram
    participant App as Application
    participant DB as Database

    App->>DB: SELECT * FROM orders WHERE user_id = 123
    DB->>App: Return N orders

    loop For each order
        App->>DB: SELECT * FROM order_items WHERE order_id = ?
        DB->>App: Return order items
    end

Ví dụ với ORM:

// Vấn đề N+1
$orders = Order::where('user_id', $userId)->get();

// Cho mỗi order, thực hiện thêm 1 query để lấy items
foreach ($orders as $order) {
    $items = $order->items;  // Trigger lazy loading, thực hiện thêm 1 query
    echo "Order #{$order->id} has " . count($items) . " items";
}

Giải pháp cho vấn đề N+1

Eager Loading (JOIN / Fetch Join):
- Sử dụng JOIN để lấy dữ liệu liên quan trong cùng một query
- Giảm số lượng queries từ N+1 xuống còn 1

// Giải pháp: Eager loading với join
$orders = Order::with('items')
    ->where('user_id', $userId)
    ->get();

// Không có thêm query nào được thực hiện
foreach ($orders as $order) {
    $items = $order->items;  // Đã được load sẵn, không trigger thêm query
    echo "Order #{$order->id} has " . count($items) . " items";
}

Batch Loading:
- Thay vì N queries riêng biệt, sử dụng 1 query với điều kiện IN
- Giảm số lượng queries từ N+1 xuống còn 2

// Giải pháp: Batch loading
$orders = Order::where('user_id', $userId)->get();

// Lấy tất cả order_ids
$orderIds = $orders->pluck('id')->toArray();

// Thực hiện 1 query để lấy tất cả items cho các orders
$allItems = OrderItem::whereIn('order_id', $orderIds)->get();

// Gom items theo order_id
$itemsByOrder = [];
foreach ($allItems as $item) {
    if (!isset($itemsByOrder[$item->order_id])) {
        $itemsByOrder[$item->order_id] = [];
    }
    $itemsByOrder[$item->order_id][] = $item;
}

// Sử dụng dữ liệu đã được batch load
foreach ($orders as $order) {
    $items = $itemsByOrder[$order->id] ?? [];
    echo "Order #{$order->id} has " . count($items) . " items";
}

Subquery Loading:
- Sử dụng subquery để lấy dữ liệu liên quan
- Phù hợp cho các relationships một-nhiều lớn

// Giải pháp: Subquery loading (Laravel sử dụng eager loading tương tự)
$orders = Order::with(['items' => function($query) {
    $query->orderBy('created_at', 'desc');
}])->where('user_id', $userId)->get();

// Không có thêm query nào được thực hiện
foreach ($orders as $order) {
    $items = $order->items;  // Đã được load sẵn
    echo "Order #{$order->id} has " . count($items) . " items";
}

Các kỹ thuật tối ưu ORM khác

Sử dụng Projections:
- Chỉ select các columns cần thiết thay vì tất cả
- Giảm lượng dữ liệu truyền từ database đến ứng dụng

// Thay vì
$users = User::all();

// Chỉ select các columns cần thiết
$users = User::select('id', 'name', 'email')->get();

Pagination:
- Phân trang kết quả thay vì lấy tất cả cùng lúc
- Giảm memory usage và cải thiện response time

$page = 1;
$pageSize = 20;

$users = User::orderBy('created_at', 'desc')
    ->skip(($page - 1) * $pageSize)
    ->take($pageSize)
    ->get();

// Hoặc sử dụng paginate của Laravel
$users = User::orderBy('created_at', 'desc')->paginate($pageSize);

Bulk Operations:
- Sử dụng bulk inserts/updates thay vì xử lý từng bản ghi
- Giảm số lượng queries và cải thiện hiệu năng

// Thay vì
foreach ($users as $user) {
    $user->status = 'active';
    $user->save();
}

// Sử dụng bulk update
User::whereIn('id', $users->pluck('id')->toArray())
    ->update(['status' => 'active']);

Sử dụng Native Queries cho các truy vấn phức tạp:
- Đôi khi ORM không tạo ra SQL tối ưu cho các truy vấn phức tạp
- Sử dụng native SQL có thể cải thiện hiệu năng đáng kể

// Thay vì sử dụng ORM API phức tạp
$activeUsersWithManyOrders = DB::select("
    SELECT u.id, u.name, COUNT(o.id) as order_count
    FROM users u
    LEFT JOIN orders o ON u.id = o.user_id
    WHERE u.status = 'active'
    GROUP BY u.id, u.name
    HAVING COUNT(o.id) > 5
    ORDER BY order_count DESC
    LIMIT 10
");

Kết hợp caching và ORM optimization

Để đạt hiệu năng tối đa, bạn nên kết hợp cả caching và ORM optimization:

Cache query results:
- Cache kết quả của các queries phức tạp hoặc thường xuyên sử dụng
- Invalidate cache khi dữ liệu thay đổi

function getTopProducts($categoryId)
{
    $cacheKey = "top_products:{$categoryId}";

    // Try to get from cache
    $products = Cache::get($cacheKey);
    if ($products !== null) {
        return $products;
    }

    // Cache miss, query with optimized ORM
    $products = Product::with('reviews')
        ->where('category_id', $categoryId)
        ->orderBy('rating', 'desc')
        ->limit(10)
        ->get();

    // Store in cache
    Cache::put($cacheKey, $products, now()->addMinutes(30));  // TTL: 30 minutes

    return $products;
}

Sử dụng cache để giảm thiểu vấn đề N+1:
- Cache các entities thường được truy cập
- Sử dụng batch loading kết hợp với cache

use Illuminate\Support\Facades\Cache;

function getOrderWithItems($orderId)
{
    // Thử lấy order từ cache
    $order = Cache::get("order:{$orderId}");
    if ($order === null) {
        $order = Order::find($orderId);
        Cache::put("order:{$orderId}", $order, now()->addHour());
    }

    // Thử lấy items từ cache
    $items = Cache::get("order_items:{$orderId}");
    if ($items === null) {
        $items = OrderItem::where('order_id', $orderId)->get();
        Cache::put("order_items:{$orderId}", $items, now()->addHour());
    }

    // Gắn items vào order
    $order->setRelation('items', $items);

    return $order;
}

Cache đúng pattern, xử lý stampede và invalidation nhất quán

Caching và tối ưu tầng ứng dụng là hai chiến lược quan trọng để cải thiện hiệu năng hệ thống. Bằng cách áp dụng các kỹ thuật caching phù hợp và tối ưu ORM, mình có thể giảm đáng kể tải cho cơ sở dữ liệu và cải thiện thời gian phản hồi cho người dùng.

Tuy nhiên, caching cũng đi kèm với những thách thức về tính nhất quán và quản lý bộ nhớ. Việc lựa chọn chiến lược caching và invalidation pattern phù hợp là rất quan trọng để đảm bảo hệ thống hoạt động hiệu quả và đáng tin cậy.

Câu hỏi hay gặp

Cache miss rất cao (~80%), có đáng cache không?

Trả lời: Nếu data thay đổi liên tục (real-time feed, stock price) thì cache miss cao là bình thường, cân nhắc bỏ cache. Nếu data ổn nhưng miss cao: kiểm tra TTL quá ngắn, key cardinality quá cao (mỗi user một key), hoặc cache eviction vì memory nhỏ.

Redis `KEYS *` chạy chậm production, thay bằng gì?

Trả lời: SCAN (cursor-based, không block). KEYS block Redis vì single-threaded, 10M key = hàng giây freeze. Tương tự dùng UNLINK thay DEL cho batch delete (async free memory).

Cache stampede (thundering herd) xảy ra khi nào?

Trả lời: Khi cache expire đồng thời cho hot key → hàng trăm request cùng query DB → DB overload. Giải pháp: TTL jitter (thêm random ±10% vào TTL), mutex/lock (chỉ 1 request rebuild cache), hoặc stale-while-revalidate (trả data cũ trong khi refresh).

Bài tiếp theo: Tối ưu hóa cho cơ sở dữ liệu SQL, MySQL InnoDB và PostgreSQL tuning chuyên sâu.