PHP Performance Playbook

PHP performance tuning is one of those areas where the gap between what people think matters and what actually matters is enormous, and this guide is built around closing that gap with practical, measurable techniques. This page targets developers searching for concrete guidance on profiling PHP applications, configuring OPcache and JIT compilation, implementing data caching with Redis or Memcached, optimising database interactions, and tuning PHP-FPM for production workloads. The content here comes from years of profiling production applications serving millions of requests - work where guessing at bottlenecks costs real money. Below you will find a profiling methodology that actually works, opcode and data caching strategies, N+1 query detection, Composer autoloader optimisation, and PHP-FPM pool tuning with concrete numbers.

For performance considerations specific to Zend Framework applications, see the Survive the Deep End guides hub.

Profiling Methodology: Measure Before You Optimise

The single most expensive performance mistake is optimising without profiling. I have watched developers spend a week rewriting a function that accounts for 0.3% of total request time while a database query consuming 60% of the budget sits untouched. Always profile first.

A sound profiling workflow looks like this:

Reproduce the slow behaviour in a controlled environment.
Capture a profile of the request or process.
Identify the hottest code paths - the functions or operations consuming the most wall time.
Form a hypothesis about why the hot path is slow.
Make a targeted change.
Profile again to verify the change had the expected effect.
Repeat until response time meets your target.

Never skip step 6. Optimisations that seem obvious sometimes have no measurable effect, and occasionally they make things worse due to unexpected interactions.

Xdebug Profiling with KCachegrind and Webgrind

Xdebug’s profiler generates cachegrind-format files that capture every function call, its execution time, and its memory usage. Enable it selectively - never leave profiling on in production.

Configure Xdebug 3.x for on-demand profiling:

[xdebug]
xdebug.mode=profile
xdebug.start_with_request=trigger
xdebug.output_dir=/tmp/xdebug-profiles
xdebug.profiler_output_name=cachegrind.out.%R.%t

With start_with_request=trigger, profiling only activates when the request includes the XDEBUG_PROFILE cookie or query parameter. Install a browser extension to toggle this conveniently.

Open the generated cachegrind files in KCachegrind (Linux/macOS) or QCachegrind (Windows) for visual call-graph analysis. The call graph shows you exactly which functions call which other functions and how much time each consumes. Look for:

Functions with high “self” time (time spent in the function itself, not in callees)
Functions called an unusually high number of times
Deep call stacks that suggest excessive abstraction layers

Webgrind is a web-based alternative that requires less setup. Drop it into a directory accessible by your local web server, point it at your profile output directory, and you get a sortable table of function calls. It lacks the visual call graph but is faster to get running.

Blackfire.io for Production Profiling

Blackfire takes a different approach: it instruments the PHP runtime with a C extension that has near-zero overhead when not actively profiling. This makes it safe to install in production and trigger profiles on demand.

The workflow with Blackfire:

Install the Blackfire agent and PHP probe on your server.
Trigger a profile from the browser extension, CLI tool, or API.
View the profile in the Blackfire web interface, which provides call graphs, timeline views, and comparison tools.

The comparison feature is where Blackfire earns its keep. Profile a request before your change and after. Blackfire shows you exactly which functions got faster, which got slower, and the net impact. This eliminates guesswork from the optimisation cycle.

Blackfire also supports automated performance testing through its .blackfire.yaml configuration, where you define assertions like “this endpoint must respond in under 200ms” and run them in CI. This catches performance regressions before they reach production.

OPcache Configuration for PHP 8.x

OPcache compiles PHP source files into opcode and caches the compiled representation in shared memory. This eliminates the parsing and compilation overhead on every request. On a typical application, enabling OPcache reduces response time by 30-50% with zero code changes.

Recommended production settings for PHP 8.x:

[opcache]
opcache.enable=1
opcache.memory_consumption=256
opcache.interned_strings_buffer=32
opcache.max_accelerated_files=20000
opcache.validate_timestamps=0
opcache.save_comments=1
opcache.enable_file_override=1

Key decisions:

memory_consumption=256: 256 MB is generous for most applications. Monitor actual usage with opcache_get_status() and adjust. If you see frequent cache restarts, increase this value.
max_accelerated_files=20000: the default of 10000 is often too low for framework-based applications. Symfony and Laminas applications can easily have 15000+ unique files. Set this higher than your actual file count.
validate_timestamps=0: in production, disable timestamp validation. OPcache will never check whether source files have changed on disk. This improves performance but means you must restart PHP-FPM (or call opcache_reset()) after each deployment to pick up new code.
save_comments=1: some frameworks and libraries read annotations from docblocks at runtime. If yours does, keep this enabled. If not, disabling it saves a small amount of memory.

JIT Compilation in PHP 8.x

PHP 8.0 introduced a JIT (Just-In-Time) compiler that compiles opcodes into native machine code. The performance impact depends heavily on your workload:

CPU-bound code (mathematical operations, string processing, tight loops): JIT can provide significant speedups, sometimes 2-3x.
I/O-bound code (database queries, API calls, file operations): JIT provides minimal benefit because the bottleneck is not PHP execution speed.

Most web applications are I/O-bound. The time spent waiting for MySQL or Redis dwarfs the time spent executing PHP code. For typical web applications, JIT adds complexity without measurable improvement.

If your application does heavy computation (image processing, report generation, data transformation), enable JIT and measure:

1 2	opcache.jit=1255 opcache.jit_buffer_size=128M

The 1255 value enables the tracing JIT with optimistic type specialisation. Profile with and without JIT on your actual workload. Do not trust synthetic benchmarks.

Data Caching with Redis and Memcached

Once you have eliminated unnecessary computation through profiling and OPcache, the next layer is caching computed results so you do not repeat expensive work.

Redis is the default choice for most applications. It supports data structures (strings, hashes, lists, sets, sorted sets), persistence, replication, and Lua scripting. For PHP, the phpredis extension provides a fast C-based client:

$redis = new Redis();
$redis->connect('127.0.0.1', 6379);

$cacheKey = 'user_profile_' . $userId;
$profile = $redis->get($cacheKey);

if ($profile === false) {
    $profile = $this->userRepository->findWithRelations($userId);
    $redis->setex($cacheKey, 3600, serialize($profile));
} else {
    $profile = unserialize($profile);
}

Cache invalidation patterns that work in practice:

TTL-based expiry: set a time-to-live on every cache entry. Accept that data may be stale by up to the TTL duration. This is the simplest approach and works for most read-heavy scenarios.
Write-through invalidation: when data changes, explicitly delete or update the cache entry. This keeps the cache fresh but requires discipline to invalidate every cache key affected by a write operation.
Cache tags: group related cache entries under tags and invalidate by tag. Redis does not support this natively, but you can implement it with sets. Many PHP caching libraries (Symfony Cache, Laravel Cache) provide tag support out of the box.

Memcached is simpler than Redis and slightly faster for pure key-value lookups. Use it when you need nothing more than get and set with TTL. It lacks persistence, data structures, and replication, but its simplicity is a feature in high-throughput scenarios.

Database Query Analysis and N+1 Detection

Database queries are the performance bottleneck in the majority of PHP applications I have profiled. The two most common problems are missing indexes and N+1 queries.

Missing indexes: enable the MySQL slow query log with a low threshold (100ms or even 50ms) and review it regularly. Run EXPLAIN on every slow query to check whether it uses indexes effectively. Look for type: ALL (full table scan) and rows counts that seem disproportionate to the result set.

EXPLAIN SELECT e.*, a.name as author_name
FROM entries e
JOIN authors a ON e.author_id = a.id
WHERE e.status = 'published'
ORDER BY e.created_at DESC
LIMIT 20;

If the entries table lacks an index on (status, created_at), this query scans every row. Adding a composite index drops the query from hundreds of milliseconds to under one:

1	CREATE INDEX idx_entries_status_created ON entries (status, created_at DESC);

N+1 queries: this pattern occurs when code fetches a list of records and then executes a separate query for each record to load related data. The classic example:

$entries = $entryRepository->findPublished(); // 1 query
foreach ($entries as $entry) {
    $author = $authorRepository->find($entry->getAuthorId()); // N queries
    $entry->setAuthor($author);
}

For 20 entries, this executes 21 queries. Fix it with a join or a batch load:

1	$entries = $entryRepository->findPublishedWithAuthors(); // 1 query with JOIN

Or batch the author lookup:

1
2
3

$entries = $entryRepository->findPublished();
$authorIds = array_unique(array_map(fn($e) => $e->getAuthorId(), $entries));
$authors = $authorRepository->findByIds($authorIds); // 2 queries total

To detect N+1 queries systematically, log all queries during a request and look for repeated patterns with different parameters. Tools like Clockwork, Laravel Debugbar, or a custom PDO wrapper that counts queries can surface these patterns automatically.

Composer Autoloader Optimisation

Composer’s default autoloader checks the filesystem on every class load. In production, generate an optimised classmap:

1	composer dump-autoload --optimize --classmap-authoritative

The --optimize flag generates a full classmap for PSR-4 and PSR-0 namespaces, eliminating filesystem checks. The --classmap-authoritative flag tells the autoloader that if a class is not in the classmap, it does not exist - do not fall back to filesystem scanning.

On applications with thousands of classes, this reduces autoload time measurably. Combined with OPcache (which caches the autoloader itself), the per-request cost of autoloading drops to nearly zero.

One caveat: --classmap-authoritative breaks any code that generates classes at runtime (some ORMs, proxy generators, and template engines do this). If your application generates classes dynamically, use --optimize without --classmap-authoritative.

PHP-FPM Tuning

PHP-FPM (FastCGI Process Manager) is the layer between your web server and PHP. Misconfigured FPM pools are responsible for more production outages than most teams realise.

Key settings in your pool configuration (www.conf or equivalent):

pm = dynamic
pm.max_children = 50
pm.start_servers = 10
pm.min_spare_servers = 5
pm.max_spare_servers = 20
pm.max_requests = 500

How to calculate pm.max_children: determine the peak memory usage of a single PHP process (check memory_get_peak_usage() or monitor with ps). Divide your available server memory (minus what the OS, database, and web server need) by that number.

Example: server has 8 GB RAM. OS and services use 2 GB. MySQL uses 2 GB. That leaves 4 GB for PHP. If each PHP process peaks at 80 MB, you can run 4096 / 80 = 51 children. Set pm.max_children = 50 to leave a small buffer.

pm.max_requests = 500: this recycles worker processes after 500 requests, which prevents memory leaks from accumulating. Set this to a value that balances recycle overhead against memory stability. Lower values recycle more often but keep memory usage tighter.

Static vs. dynamic pool mode: pm = static pre-forks all children at startup. It uses more memory when idle but avoids the overhead of forking under load. Use static for high-traffic applications with predictable load. Use dynamic for applications with variable traffic where you want to reduce idle memory consumption.

Real-World Measurement

After making optimisations, measure their actual impact in production. Application Performance Monitoring tools capture response times, throughput, error rates, and resource utilisation across real user requests.

If you do not have an APM tool, at minimum track these metrics:

P50 and P95 response times from your access logs (not averages - averages hide outliers)
PHP-FPM active process count over time
OPcache hit rate via opcache_get_status() (should be above 99%)
Database query count per request from your query logger
Cache hit rate from Redis INFO stats (look at keyspace_hits vs keyspace_misses)

Set up alerts on these metrics. A sudden drop in cache hit rate or a spike in query count per request tells you something changed before users start complaining. Performance is not a one-time project - it is a continuous practice of measuring, understanding, and improving. For guidance on configuring the local tooling you need to make profiling part of your daily workflow, see the guide on local development environments for legacy PHP.