Mixing C and PHP code in a PHP extension is a long-awaited feature. Improvements in maintainability are obvious, and it is now widely agreed that porting a lot of non performance-critical C code to PHP would be welcome. Today, the PHP 7 performance improvements still provide more potential candidates for a port to PHP code.
In theory, including PHP code in an extension and executing it has been possible for a long time. Unfortunately, two important issues were hard to solve :
- A PHP script run from a memory buffer (using zend_compile_string() for instance) cannot be cached by opcode caches. Code must be re-compiled every time it is loaded.
- Exposing symbols (classes, functions, constants) to the PHP user layer requires to load every scripts at RINIT time, even if these features are not used in the request.
Those constraints may be acceptable when considering two or three scripts, but we are potentially considering hundreds of scripts, recompiled from scratch at the beginning of every request (note that embedding the MongoDB library in the MongoDB extension, for instance, represents about 60 scripts).
For these reasons, despite some exceptions, mixing C and PHP code in an extension is very rare today.
Unlike other PHP extensions, PCS does not expose features to the user space, but provides a service to other extensions. The schema below shows how PCS interacts with other extensions and the PHP core :
Client extensions have the possibility to interact with PCS after the PHP code registration step but, in most cases, they don't, fully delegating the management of their PHP code to PCS.
Let's see how PCS solves the issues I was talking about :
Opcode caches, as any cache, require a key for each object they cache. So, trapping zend_compile_string() is not an option, as there is no way to get a key to cache the compiled contents. So, scripts must be executed via zend_compile_file(), and we need to provide unique and persistent paths to identify each of them. PCS maintains a stream wrapper for this. This stream wrapper, using the 'pcs://' prefix, maintains a tree of virtual files registered by the client extensions. As these files cannot be overwritten, the unicity between the stream-wrapped path and the file's contents is guaranteed.
This is the first required step but that's not enough. When detecting a 'stream-wrapped' path, opcode caches have no way to know whether the path should be cached or not. Some should, like 'pcs://' ones, but many are transient by nature and must not be cached. Today, the 'logic' is to cache everything belonging to the file/plain and 'phar' wrappers, and to ignore the rest. The easy way would be to add 'pcs' to the list, but I don't work the 'phar' way . So, an additional stream operation, named 'cache_key', will be proposed soon for inclusion in the PHP core. This operation will be used by opcode caches to ask stream wrappers whether a given URI must be cached, and which key to use (the key may potentially differ from the URI).
Several ways were imagined to avoid loading everything at the beginning of each request :
- Some consider that, using PHP 7 speed improvements and the opcache extension, the overhead induced by script loading at the beginning of each request will remain negligible. I have no measurements proving or disproving such claims. If the measured overhead is really negligible for several hundreds of scripts, we may decide to remove the whole autoloading stuff from PCS. Unfortunately, this would require changing most of the registration API because script load order could not be managed transparently anymore.
- Concatenating scripts and load one big script only is not possible because of the different namespaces potentially used by the scripts.
- Persistent user classes/functions/constants open the 'persistence' can of worms. This goes far beyond our actual need and would require years of dicussion and flame wars.
So, PCS combines these constraints and uses two load mechanisms :
- PHP scripts defining classes/interfaces/traits only are autoloaded,
- and scripts that define functions and/or constants are registered at RINIT time.
The reasons :
- The overhead introduced by a fast map-based autoloader is near-zero, as the map is stored in persistent memory,
- Most API exposed today in the PHP world are object-oriented (the MongoDB library, for instance, contains 56 100% OO scripts, and only one defining functions),
- When the overhead introduced by RINIT loads becomes unacceptable, we can easily extend the autoloader to functions and constants in a minor distribution (can be done with no BC break).
Note that the autoloader is based on a symbol map. File paths and names are free, and there's no limit to the number of classes/functions/constants defined in a single file. Symbols are automatically extracted from the PHP source at registration time.
You may also note that :
- the structure of original file trees is preserved. So, relative paths (prefixed with '__DIR__/') may be used to access other files in the tree.
- PCS also allows to embed non-script files (aka 'resource' files). Such file will be recognized as not containing a PHP script and will never be loaded automatically by PCS. Such file may be used though the stream wrapper as any other file of the environment (a potential example is the embedded magic database).
- The automatic determination of load modes at registration time may be bypassed by the calling extension. Generally, this feature will be used to disable automatic loading of scripts when it is handled by another mechanism managed by the client.
Several have been given in past discussions :
- Integrating the MongoDB library in the MongoDB extension is the subject of this tutorial,
- Most, if not all, of the generic PDO layer might be rewritten in PHP,
- Add some high-level date handling,
- Add high-level crypto code,
- Easily add an OO API on function-only legacy extensions,
- and probably much, much more...