From 600 Queries to 15: How We Fixed a Legacy PHP ERP N+1 Problem

One morning, the sales team started filing complaints. The quote edit page — the most-used screen in the ERP — was hanging for 9 to 10 seconds every time someone changed a value in the parts grid.

Not crashing. Not throwing an error. Just sitting there, spinning.

The frustrating part: nothing in the application code had changed. No deployments, no config edits, no schema migrations. The codebase was identical to what had been running fine for years.

What had changed was the database server. We had just finished migrating from a self-managed MariaDB instance to a managed MySQL 8.0 service. Everything worked — except the quote page.

The reason turned out to be one line in the MySQL 8.0 changelog that we had not paid enough attention to:

MySQL 8.0 removed the query cache entirely.

MariaDB had been silently absorbing hundreds of repeated identical queries from its built-in query cache, serving them from RAM in microseconds. The moment we moved to MySQL 8.0, every single one of those queries hit the query engine for real. For a page that fired 600–700 queries per load, the cost was immediate.

This post is a full account of how we diagnosed the problem and fixed it — the right approaches, the wrong approaches, and the tradeoffs we knowingly accepted.

Understanding the Quote Page

To understand the fix, you need to understand what the page was doing.

A quote in this system can have 70+ line items. Each line is a part from a specific brand, with a price, a cost, a supplier, stock status, incoming order status, and a dozen other computed fields. When a salesperson edits any cell in the grid, the browser fires an AJAX request to the server. The server receives the entire grid state as a POST payload, recomputes every line item from scratch, and returns the updated grid.

That phrase — "recomputes every line item from scratch" — is the problem.

For each of those 70 lines, the original code was doing something like this:

foreach ($line_items as $k => $v) {
    $brand = $v['brand'];
    $sku   = $v['sku'];

    // Is this part in our catalogue?
    $product = db_fetch_row("SELECT * FROM ims_products
                             WHERE brand_code = '$brand'
                             AND sku = '$sku'");

    // What are this brand's pricing tiers?
    $brand_data = db_fetch_row("SELECT * FROM ims_brands
                                WHERE brand_id = '$brand'");

    // Does this part have any special category flags?
    $flag_data = db_fetch_row("SELECT * FROM ims_product_flags
                               WHERE brand_id = '$brand'
                               AND sku = '$sku'");

    // What stock do we have? What's incoming? Any backorders?
    $stock = resolveStockAvailability($sku, $brand, $warehouse, $qty);

    // ... and so on for another eight queries
}

And resolveStockAvailability() was itself firing four raw database calls per invocation — one against the stock table, three against the order lines table. For a 70-line quote, that function alone accounted for 280 queries. None of them cached.

In total: roughly 600–700 database queries per page load.

On MariaDB with query cache, most of those were served in microseconds from memory. On MySQL 8.0, every one of them went all the way to the query engine.

The Fix: Three Layers of Caching

We did not rewrite the page. We did not refactor the data model. We applied three distinct caching layers, each targeted at a different category of repeated query.

Layer 1 — In-Memory KV Cache for Shared Queries

The codebase already had a helper function, cached_query(), that wrapped any SELECT with a short-lived key-value cache:

function cached_query(string $sql, int $ttl = 60, $cache = null): array {
    if (!preg_match('/^\s*SELECT\s/i', trim($sql))) {
        return db_run_query($sql); // Never cache writes
    }

    $key    = 'erp:qc:' . md5(trim(strtolower($sql)));
    $cached = $cache->get($key);

    if ($cached !== false) {
        return json_decode($cached, true); // Cache hit
    }

    $rows = db_run_query($sql);
    $json = json_encode($rows);

    if (strlen($json) <= 1048576) {
        $cache->setex($key, $ttl, $json); // Cache miss — store result
    }

    return $rows;
}

The three core helper functions — db_fetch_row(), db_fetch_value(), db_count() — were already wired to call cached_query() internally. Any query going through those helpers was automatically cached with zero changes to the call sites.

The problem: resolveStockAvailability() was using raw db_query() calls directly. It completely bypassed the caching layer. That is where the real work began.

Layer 2 — PHP Static Arrays for Reference Data

Before touching the big N+1 patterns, we looked for the cheapest wins first.

The brands table has around 500 rows. The suppliers table around 200. Both were being queried individually inside the main loop — once per line item. For a 70-line quote, that meant up to 140 round-trips fetching data from tables that do not change during a request.

The fix: a PHP static global, loaded once on first access and served from memory for every subsequent call within the same request.

function get_brand(string $brand_id): array|false {
    global $_brand_cache;

    if (!isset($_brand_cache)) {
        $_brand_cache = [];
        $r = db_query(
            "SELECT brand_id, brand_type, sales_code, tier_a, tier_b
             FROM ims_brands"
        );
        while ($row = db_fetch_assoc($r)) {
            $_brand_cache[$row['brand_id']] = $row;
        }
    }

    return $_brand_cache[$brand_id] ?? false;
}

One query for all brands. One query for all suppliers. Done for the entire request.

One deliberate choice here: we listed only the columns the quote page actually uses, rather than SELECT *. Loading 500 full brand rows with every column would waste memory on data the page never reads. The tradeoff is that a future call site needing an unlisted column will silently get null — easy to fix, but requires knowing the behaviour exists.

Layer 3 — Batch Preloads with IN() Clauses

This was the highest-impact change. Instead of querying once per part inside the loop, we query once for all parts before the loop starts.

The pattern: collect all unique brands and SKUs from the request payload, then fire a single IN() query and store the results in a two-level PHP array.

function preload_products(array $line_items): void {
    global $_product_cache, $_product_lookup;

    $brands = [];
    $skus   = [];

    foreach ($line_items as $v) {
        $brand = trim_input($v['brand']);
        $sku   = normalize_sku(strtoupper(trim_input($v['sku'])));
        $brands[$brand] = true;
        $skus[$sku]     = true;
        $_product_lookup[$brand][$sku] = true;
    }

    $brand_ids = implode("','", array_map('db_escape', array_keys($brands)));
    $sku_list  = implode("','", array_map('db_escape', array_keys($skus)));

    $result = db_query(
        "SELECT * FROM ims_products
         WHERE brand_code IN ('$brand_ids')
         AND (sku IN ('$sku_list')
              OR alt_sku IN ('$sku_list'))
         ORDER BY id ASC"
    );

    while ($row = db_fetch_assoc($result)) {
        $_product_cache[$row['brand_code']][$row['sku']] = $row;
    }
}

We applied this pattern to five tables — the product catalogue, category flags, refurbished stock, reorder flags, and the live stock table — plus three queries against the order lines table.

The main loop then looks like this:

// All preloads fire before the loop
preload_products($line_items);
preload_flags($line_items);
preload_stock($line_items);
preload_order_lines($line_items);
// ... etc.

foreach ($line_items as $k => $v) {
    // Everything is now served from PHP arrays
    // Zero per-line database calls
}

The Mistake We Made First

Our first implementation of the batch queries used OR pairs instead of IN() clauses:

-- WRONG — defeats composite indexes
WHERE (brand_code = 'ACME'   AND sku = 'BRK-0042')
   OR (brand_code = 'ACME'   AND sku = 'FLT-1138')
   OR (brand_code = 'GLOBEX' AND sku = 'EXH-7700')
   -- ... 70 more pairs

The page got slower. MySQL cannot use a composite index on (brand_code, sku) with a long OR chain — it falls back to a full table scan.

The correct form uses IN() on each column independently:

-- RIGHT — uses indexes
WHERE brand_code IN ('ACME', 'GLOBEX', ...)
AND (sku IN ('BRK-0042', 'FLT-1138', 'EXH-7700', ...)
  OR alt_sku IN ('BRK-0042', 'FLT-1138', 'EXH-7700', ...))

This fetches cross-product rows — for example, ACME paired with every SKU in the list — rather than exact (brand, SKU) pairs. The result set is slightly larger than necessary, but because the PHP array is keyed by both brand and SKU, false positives are filtered naturally at lookup time without any extra logic.

The Static Result Cache Inside `_resolveStockAvailability`

resolveStockAvailability() in the shared functions file is called from 40+ places across the codebase — reports, admin pages, scheduled jobs. We could not safely modify it in place.

A static result cache added to a function used by cron jobs processing thousands of parts would cause unbounded memory growth. So we created a separate function, _resolveStockAvailability(), scoped to the quote page only. It is copied from the original and modified to:

Use the batch preloads for stock and order line lookups.
Cache its own results within the request using a static local array.

function _resolveStockAvailability(
    string $sku,
    string $brand,
    string $warehouse,
    int    $qty
): array {
    static $result_cache = [];

    $key = "{$brand}|{$sku}|{$warehouse}|{$qty}";
    if (isset($result_cache[$key])) {
        return $result_cache[$key];
    }

    // Serve from preloaded arrays — no DB call
    $stock  = get_preloaded_stock($sku, $brand);
    $orders = get_preloaded_order_lines($sku, $brand);

    $result = compute_stock_summary($stock, $orders, $warehouse, $qty);

    $result_cache[$key] = $result;
    return $result;
}

The static result cache handles a subtle edge case: the quote page calls the function in two separate loops (the main parts loop and a secondary fulfilment table loop), so the same (SKU, brand, warehouse, qty) combination can appear more than once within a single request. Without the cache, both loops would redundantly recompute the same result even after the batch preloads were in place.

The original function in the shared file was left completely untouched.

The Honest Tradeoffs

No optimisation of this kind is free. Here is exactly what we knowingly accepted.

Variant SKUs (minor, fixable) Some parts are stored with a brand suffix appended to their SKU (e.g., BRK-0042-ACME). These modified identifiers are not in the preload — the preload only knows about base SKUs from the request payload. When _resolveStockAvailability is called with a suffixed SKU, the preload returns nothing and there is no DB fallback. The stock indicator for those parts shows zero. The fix is a targeted DB fallback on preload miss, at a cost of roughly 150–250ms. It is a known gap, not an unknown one.

Related parts section (accepted compromise) The page includes a secondary table showing parts frequently bought alongside items in the quote. For each related part, the code calls _resolveStockAvailability to show stock at the main warehouse. These related SKUs are not in the preload. Fully preloading them would require up to 200+ individual queries or a very large IN() query, adding 1–2 seconds. The team accepted showing a dash for related part stock as a reasonable tradeoff, since it does not affect the main stock count that salespeople actually act on.

Main stock counts are accurate The primary stock count per line — the number shown in the main grid — is fully accurate. The preload query is semantically identical to the original per-part query: same filters, same ordering, same exclusions. The two-level array key structure means cross-product false positives from the IN() clause never corrupt results for the wrong brand.

The Numbers

For a 70-line quote:

Query type	Before	After
Brands	70 individual	1 static load
Suppliers	~140 individual	1 static load
Product catalogue	70 individual	1 batch IN()
Category flags	70 individual	1 batch IN()
Reorder flags	~80 individual	1 batch IN()
Live stock (raw, uncached)	70 individual	1 batch IN()
Incoming order lines	70 individual	1 batch IN()
Problem/missing lines	70 individual	1 batch IN()
Customer backorders	70 individual	1 batch IN()
Total (approx)	600–700	~15

Response time: 9–10 seconds → 3–4 seconds.

What This Really Reveals

The N+1 pattern is extremely common in mature procedural codebases. It is not the result of carelessness — it is the natural consequence of writing readable, self-contained code over many years, where each function fetches exactly what it needs. The problem only becomes visible when the caching layer that was silently absorbing the cost disappears.

A few things worth taking away:

Query cache dependency is invisible debt. You do not know your code relies on it until the cache is gone. A database migration is not the time to discover you have 600 queries per page load.

"We have a caching layer" is not enough. The helpers going through the KV cache sounded like a complete solution. They were not — direct db_query() calls bypass every helper. The actual bottleneck was in code that never touched the caching layer. Knowing your cache layer exists is not the same as knowing that all your queries go through it.

Batch preloads beat per-item caching for hot loops. Caching per query still requires a network round-trip to the cache service per query. For 70 identical patterns within one request, a single IN() query served from a PHP array is faster than 70 cache round-trips.

Isolation matters. The temptation was to modify the shared resolveStockAvailability() directly. We resisted it. Modifying a function called from 40+ files — including long-running cron jobs — to serve the performance needs of one page would have been a dangerous global change with unpredictable side effects. Copying and specialising kept the risk contained.

What Comes Next

The remaining 3–4 seconds are now dominated by a data enrichment function called once per part from within _resolveStockAvailability, per-row order line lookups that go through the KV cache but are still called one-at-a-time in a secondary loop, and the general PHP processing time for rendering the grid payload.

The variant SKU gap is the next concrete fix — a single targeted fallback query when a preload miss is detected. The related parts section is a larger project: the clean solution is replacing the per-part function call with a single batch query against the stock table for just the one field that section actually displays.

600 queries to 15. The page that blocked a team is usable again. And for the first time, we actually know exactly how many queries it fires.