Quantcast
Viewing all articles
Browse latest Browse all 6

The Mysteries Of Asynchronous Processing With PHP – Part 3: Implementation With Spawned Child Processes Using Simple Scripts Or Zend Framework

In Part 1 of this series, we started an exploration of the concept of Asynchronous Processing as it applied to PHP. We covered the benefits it offers, the basic implementation directions often applied, and also discussed how to identify and separate tasks from the main application so they could be made subject to asynchronous processing. It is highly recommended that you read this before continuing with Part 3 so you can follow the discussion that follows.

With the theory heavy portion of the series out of the way, we can begin to explore the various implementation possibilities. In this part, we will examine implementing Asynchronous Processing using a child process, i.e. a separate PHP process we create from our application during a request. We’ll analyse this implementation option before introducing the source code so we may understand its advantages and disadvantages.

Note: If you have already read Part 2 of the series concerning enabling CLI access to Zend Framework applications, please note it has been subsequently edited to include an improved copy of the BootstrapCli.php and zfrun.php files. The change allows for more dynamic control over command line option definitions from within a controller’s action which makes the source code in this Part workable.

Advantages Of Asynchronous Processing With Child Processes

Performing asynchronous processing using a child process is probably one of the simplest methods. It involves a parent process (usually the one serving the current application request) spawning an immediate child process which will continue running in the background even when the parent process has exited. This child process can be a simple PHP process called from the command line to execute a standalone script or initiate a CLI request to a fuller application. Typically you can avoid the web server (strongly recommended) which allows these child processes to be a bit more efficient than incurring the overhead of web server involvement. Because it is triggered by the application itself as part of a request, a child process is an excellent means of performing asynchronous processing on the spot, without incurring the delays sometimes associated with using a Job Queue coupled with a scheduled task manager (cron, for example, has a minimum interval of one minute).

To this advantage we can add its simplicity of operation. Since the child process is spawned from within the application itself, it requires very little surrounding code. Just add the task to a script and away you go. Creating the child process in the first place is easily done using pipes, process management or inline execution.

For security reasons, I prefer to avoid direct execution of a task, i.e. using the exec() function, since it can add a risk of code injection and in general is considered extremely dangerous – unfortunately you may not have another choice if in a shared host environment. Using process management (using proc_open() and proc_close()) or pipes (using popen() and pclose()) are much preferred, though the process management functions are typically not suitable for use in a web environment and are generally reserved for use in CLI application.

Using these functions within your applications is extremely simple, as we’ll see later.

Disadvantages Of Asynchronous Processing With Child Processes

Of course, simplicity does not mean a method is perfect. Child processes, when triggered within requests, bind the asynchronous tasks to the time of the request. If you think of it, any request may also spawn additional child processes. This means extra resources are being used, CPU cycles are needed and memory is consumed. As the number of requests to the application increases, so too will the number of resource needy child processes. It should also be noted that all child processes are created on the same server as the parent. There is no opportunity to offload work to another server.

This is exacerbated if your script requires bootstrapping, i.e. the process of initialising and configuring any resources needed by the script. In many cases, there is an inclination to depend on the application’s framework. You can imagine the resource needs of a script which, for example, is treated as a console call to a Zend Framework action. Now you not only need to contend with the resources needed by a simple script, but with the resources demanded by a complete application framework. This can push your server load to new and unnecessary heights. In fact you may easily find that up to 90%+ of a script’s work is actually just bootstrapping. Note that I’m not saying to never use a framework – merely that it comes with a cost to be aware of.

This is all obviously a problem from a scaling perspective. It may even worsen a situation where the application already operates in a resource scarce environment. So while ideally it buys us extra responsiveness to serve clients and users, it does so at the cost of a difficult to scale method that requires additional resources. It is also difficult to load balance as the child processes are spawned on the same server as the parent.

Nevertheless, for certain tasks it may be an excellent fit. Not all tasks are resource intensive. Some simply take a while to complete for other reasons and may be time sensitive, requiring completion as soon as possible. You’ll understand the trade offs in greater detail as the series continues to cover the alternative asynchronous strategies.

What About Forking?

In all our discussions, we’ve avoided mentioning an alternative to creating a brand new PHP process: forking. Using process forking (see the pcntl_fork() function) we are basically copying or cloning the parent process. This means the new process shares the parent’s resources, including its open database connections and all variables used up to the forking point in the application. This, in theory, avoids the setup costs associated with a from-scratch process since a fork is simply copying the parent which is already bootstrapped.

While there’s nothing fundamentally wrong with using forking, it is more difficult to manage than simple spawning due the fact it requires additional management to ensure shared resources (e.g. a database connection) are not prematurely closed by any forked process when the parent or sibling forks are using the same connection (the same can be said for other shared resources). Often this still requires some limited bootstrapping to replace shared resources with separate instances invulnerable to premature closing, most commonly database connections.

We also need to manage the source code of the application to determine whether it’s being executed by the parent process or a forked process (both start at the exact same execution point – the point where the forking occured). Forking may also not be supported by the underlying system by default. For example, when PHP is used as an Apache module, pcntl support is usually disabled by default. Also, forking is not supported on Windows systems at all. Adding even more weight against forking, ideally the parent cannot exit while forked processes are still running since it would leave the children without a parent process and risk them becoming “zombie” processes.

Forking is a great means of achieving concurrency in an application where the parent remains in control of all forked processes. Concurrency refers to the ability to perform any number of simultaneous tasks which perhaps will interact with each other. This is somewhat related to asynchronous processing which is why I mention it. The interaction can occur at a few places, like shared memory, a common used file, or even a database.

The main reason I don’t address forking in great detail, however, is stated in the PHP manual itself:

Process Control should not be enabled within a web server environment and unexpected results may happen if any Process Control functions are used within a web server environment.

Most of my own applications use Apache or FastCGI somehow and forking in these scenarios is not recommended. If, on the other hand, you’re working from the CLI on Linux, forking can have some neat uses particularly for concurrency or when managing a daemon that is allowed to spawn and control worker processes. We’ll be covering daemons later in this series.

Basic Implementation

Describing it may be complicated, but implementing child process spawning is very simple. All we are doing in effect is putting out a call to the command line to execute a script using PHP. We also add the notation needed to make the process operate in the background (i.e. the method used to initiate the process can itself be immediately closed without any consequences). We must close the resulting pointer to allow the parent continue it’s own processing to it can deliver a response without waiting for the child process to finish.

We’ll start with a basic example using pipes:

[geshi lang=php]if (PHP_OS == ‘WINNT’ || PHP_OS == ‘WIN32′) {
$ppointer = popen(‘start /b php c:\\www\\myapp\\scripts\\deliver_registration_email.php’, ‘r’);
} else {
$ppointer = popen(‘php /var/www/myapp/scripts/deliver_registration_email.php > /dev/null &’, ‘r’);
}
pclose($ppointer);[/geshi]

In the above code example, we are using two functions to open a “process file pointer” (identical to a file pointer) to a new process we execute using a typical command line call. The command used differs between *nix and Windows for obvious reasons, but both are pushed to the background using “&” in Linux or “start /b” in Windows. The base command is simply “php” as this does not require a web server dispatch we can use the PHP CLI directly. You could call this code from anywhere in the application.

Just to be absolutely clear, there is nothing which says this MUST be a php call. You could use a batch file or bash script, or practically any other command line tool on your system – so long as its operable as a background process.

The second parameter to popen() is familiar from the basic file functions. It means that the pointer returned from the function is a file pointer, just as if we’d used fopen() with the same parameter. This also means we can use it with other file functions, however we intend cutting the child process loose as soon as possible so the parent process need not wait for it (and besides, it’s being executed in the background).

The lack of parameters used in the command does raise the question of how the script finds out what email needs to be sent. There are a few ways of handling this, among them are passing a reference to a database user record to get the email and status from, passing the email address itself, or using something more deliberate like a queue implementation to keep it lightweight (perhaps using memcached, apc or a dedicated message queue for overkill Image may be NSFW.
Clik here to view.
;)
).

Another parameter concern is indicating an operation mode: production, development, staging or testing. Usually we’d communicate this in an application using an environmental value set perhaps in an application’s .htaccess file or its virtual host configuration for Apache. However, a script operates from the command line so these would obviously not be available.

Solving the first is done easily enough:

[geshi lang=php]$email = ‘joe@example.com’;

if (PHP_OS == ‘WINNT’ || PHP_OS == ‘WIN32′) {
$ppointer = popen(‘start /b php c:\\www\\myapp\\scripts\\deliver_registration_email.php –email ‘ . escapeshellarg($email), ‘r’);
} else {
$ppointer = popen(‘php /var/www/myapp/scripts/deliver_registration_email.php –email ‘ . escapeshellarg($email) . ‘ > /dev/null &’, ‘r’);
}
pclose($ppointer);[/geshi]

We just add a parameter to the script containing the user’s email address, allowing the script to parse this from the arguments (via $_SERVER['argv'] which contains a flat array of all space delimited terms in the command line arguments).

Solving the second can also be accomplished using another parameter which tells the script to manually set a matching environmental value during any bootstrapping. We can probably accomplish this to match the normal application mode using its current value. For example it may be set as $_ENV['APPLICATION_ENV'] from the app’s .htaccess file or virtual host configuration.

[geshi lang=php]$email = ‘joe@example.com’;

if (PHP_OS == ‘WINNT’ || PHP_OS == ‘WIN32′) {
$ppointer = popen(‘start /b php c:\\www\\myapp\\scripts\\deliver_registration_email.php –email ‘ . escapeshellarg($email) . ‘ –environment=’ . escapeshellarg($_ENV['APPLICATION_ENV']), ‘r’);
} else {
$ppointer = popen(‘php /var/www/myapp/scripts/deliver_registration_email.php –email ‘ . escapeshellarg($email) . ‘ –environment=’ . escapeshellarg($_ENV['APPLICATION_ENV']) . ‘ > /dev/null &’, ‘r’);
}
pclose($ppointer);[/geshi]

So we now have the basics for calling a script. Let’s look at a simple script example itself. I’m deliberately not complicating it beyond the absolute essentials, but I’ll delve into a much better example in the next section where we apply what we’ve learned to the Zend Framework (a more realistic application structure).

[geshi lang=php]

// remove script path (first array element)
array_shift($_SERVER['argv']);

// compile options (assume all are key/value pairs)
$options = array();
while (count($_SERVER['argv']) > 1) {
$key = str_replace(‘–’, ”, array_shift($_SERVER['argv']));
$options[$key] = array_shift($_SERVER['argv']);
}

mail(
$options['email'],
‘Thanks for registering!’,
‘Thank you for registering. You may now log into your new account.’
);

// In case manually run – echo feedback to command line
echo ‘Email to ‘, $options['email'], ‘ sent.’, “\n”;[/geshi]

The script first parses out any options passed. Specifically it’s looking for the –email option. The parsing logic is very simple since we assume it must be –email, not a shorter form, and all options are always in key/value pairs. Typically, it’s recommended to using PEAR’s Console_CommandLine or even ZF’s Zend_Console_Getopt. I actually prefer the PEAR package where possible – it’s a much more functional and feature rich option though slightly more complicated to setup.

Once we get the email address from the command line options, we simply fire out an email with a simple registration message using mail(). In the next example, we’ll use an email library which is a better method.

Basic Implementation With Zend Framework

As covered in Part 2 of this series, using the Zend Framework from the command line allows you to run asynchronous tasks based from any Zend Framework application. All that is required is ensuring the relevant Controller and Action for the task are not accessible from the web so public access is impossible.

We can graduate the above task to a Zend Framework application by adding it using a Controller. Here I’ve created the controller MailController with the action “registration” in /application/controllers/MailController.php:

[geshi lang=php]

class MailController extends Zend_Controller_Action
{

public function init()
{
if (!$this->getRequest() instanceof ZFExt_Controller_Request_Cli) {
exit(‘MailController may only be accessed from the command line’);
}
}

public function registrationAction()
{
$this->getInvokeArg(‘bootstrap’)->addOptionRules(
array(‘email|p=s’ => ‘Email address for task (required)’)
);
$options = $this->getInvokeArg(‘bootstrap’)->getGetOpt();

$mail = new Zend_Mail();
$mail->setBodyText(‘Thank you for registering. You may now log into your new account.’)
->setFrom(‘me@example.com’, ‘Padraic Brady’)
->addTo($options->email)
->setSubject(‘Thanks for registering!’);
try {
$mail->send();
} catch (Zend_Mail_Exception $e) {
// mail probably failed – at this point add to a Job Queue to try
// again later or analyse the failure.
}
}

}[/geshi]

You can, of course, manually call this from the command line using:

php /path/to/myapp/scripts/zfrun.php -c mail -a registration –email joe@example.com

Update (03 Oct): The generated command from the example will actually place single quotes around the argument values as part of the escaping mechanism I added. As Bruce Weirdan, quite rightly, points out in the comments, I really should have used escaping when this article was originally released.

The zfrun.php file was detailed in Part 2 of the series. It’s basically what you would find in the index.php file of a ZF application using Zend_Application for bootstrapping.

The above action uses a bit of magic from the bootstrap class (from Part 2) which enables it to dynamically set additional valid arguments available from its command line call. Here, we supplement the default arguments (mainly those needed for setting the MVC values of controller, action, module and environment) with an additional –email argument (which can be shortened to -p, i.e. just a reference to it being a lone “p”arameter). The parameter is marked as being a required string (=s) argument when its added.

The rest is simply using the bootstrap’s copy of Zend_Console_Getopt to parse the arguments with updated rules, so we can later grab the email address value for use with a typical example of Zend_Mail. The exception check should yield an exception if the email, for whatever reason, failed to be sent. To be perfectly honest, I don’t use Zend_Mail (I’m a Swiftmailer user Image may be NSFW.
Clik here to view.
;)
), so if there’s a better way of getting the context of a failure please let me know in the comments to this article.

If we assume we have a RegistrationController where registration takes place. We could trigger the performance of this new task asynchronously using the following:

[geshi lang=php]

class RegistrationController extends Zend_Controller_Action
{

public function indexAction()
{
// perform registration processing here but delegate emailing to
// an asynchronous process

// ...

$email = 'joe@example.com';
if (PHP_OS == 'WINNT' || PHP_OS == 'WIN32') {
$ppointer = popen('start /b php c:\\path\\to\\myapp\\scripts\\zfrun.php -c mail -a registration --email ' . escapeshellarg($email), 'r');
} else {
$ppointer = popen('php /path/to/myapp/scripts/zfrun.php -c mail -a registration --email ' . escapeshellarg($email) . ' > /dev/null &’, ‘r’);
}
pclose($ppointer);

// …
}

}[/geshi]

Now, repeating the same block of code with minor variations can be a pain. So, let’s simplify things a bit.

Simplifying The Zend Framework Controller Calls

Since all these Zend Framework based asynchronous tasks may coexist with non-ZF based tasks (which exist when we really don’t need the framework heavyweight for the task), we may end up with a lot of calls using the above method. In addition, there’s no guarantee all tasks would remain as is, operated by child processes spawned from the parent.

It would be more flexible to delegate the calling of asynchronous processes to an Action Helper which implements a simple API more amenable to diverting asynchronous processing to specific implementations (process spawn, job queue, daemon, etc.). Here’s an example of such a helper (simplified since we’ve only covered child process spawning so far!):

[geshi lang=php]

class ZFExt_Controller_Action_Helper_Spawn
extends Zend_Controller_Action_Helper_Abstract
{

protected $_scriptPath = null;

protected $_defaultScriptPath = null;

public function setScriptPath($script = null)
{
if (PHP_OS == 'WINNT' || PHP_OS == 'WIN32') {
$script = str_replace('/', '\\', $script);
}
$this->_scriptPath = $script;
return $this;
}

public function setDefaultScriptPath($script)
{
if (PHP_OS == ‘WINNT’ || PHP_OS == ‘WIN32′) {
$script = str_replace(‘/’, ‘\\’, $script);
}
$this->_defaultScriptPath = $script;
return $this;
}

public function direct(array $parameters = null, $controller = null,
$action = null, $module = null)
{
if (is_null($parameters)) {
$parameters = array();
} else {
foreach ($parameters as $key => $value) {
$parameters[$key] = escapeshellarg($value);
}
}
if ($module) {
$parameters['-m'] = escapeshellarg($module);
}
if ($controller) {
$parameters['-c'] = escapeshellarg($controller);
}
if ($action) {
$parameters['-a'] = escapeshellarg($action);
}
$this->_spawnProcess($parameters);
$this->_scriptPath = null; // reset
}

protected function _spawnProcess(array $args)
{
if (is_null($this->_scriptPath)) {
$script = $this->_defaultScriptPath;
} else {
$script = $this->_scriptPath;
}
$command = ‘php ‘ . $script;
foreach ($args as $key => $value) {
$command .= ‘ ‘ . $key . ‘ ‘ . $value;
}
if (PHP_OS == ‘WINNT’ || PHP_OS == ‘WIN32′) {
$pcommand = ‘start /b ‘ . $command;
} else {
$pcommand = $command . ‘ > /dev/null &’;
}
pclose(popen($pcommand, ‘r’));
}

}[/geshi]

The new helper makes spawning a new process that bit simpler. Besides setting parameters and specific arguments for MVC calls into the application, it allows the setting of a default script to execute (e.g. it could be our zfrun.php script for a Zend Application call). You can set a custom script before a new spawning, but it will revert back to the default on the next spawn attempt (simply to prevent the default being overridden by accident). Obviously the class could be improved a lot more, but let’s leave it here for now.

Let’s amend the RegistrationController example to use this helper, and then show how it can be initialised/configured during our bootstrapping process:

[geshi lang=php]

class RegistrationController extends Zend_Controller_Action
{

public function indexAction()
{
// perform registration processing here but delegate emailing to
// an asynchronous process

// ...

$this->_helper->getHelper(‘Spawn’)->setScriptPath(‘/path/to/myapp/scripts/zfrun.php’);
$this->_helper->spawn(array(‘–email’ => ‘joe@example.com’), ‘mail’, ‘registration’);

// …
}

}[/geshi]

The new helper object’s direct() method can be called directly from any action using $this->_helper->spawn(). Beforehand, we can set the script we intend using. This could also be configured by default during bootstrapping, using the setDefaultScript() method. Using spawn(), we set the “–email” option to pass onto the CLI, and set the controller/action/module separately. You could simply set the MVC options through the first array – just remember the full and short options they reserve.

After that, the helper does the rest of the work in creating the command and spawning the child correctly.

To enable this helper, you just need to make it accessible by registering it from your application.ini settings file (at /config/application.ini in the example application from Part 2). See the new line in “Standard Resource Options”:

[geshi lang=php][production]
; PHP INI Settings
phpSettings.display_startup_errors = 0
phpSettings.display_errors = 0

; Bootstrap Location
bootstrap.path = APPLICATION_ROOT “/library/ZFExt/Bootstrap.php”
bootstrap.class = “ZFExt_Bootstrap”

; Standard Resource Options
resources.frontController.controllerDirectory = APPLICATION_PATH “/controllers”
resources.frontController.moduleDirectory = APPLICATION_PATH “/modules”
resources.frontController.plugins[] = “ZFExt_Controller_Plugin_ModuleConfigurator”
resources.view.encoding = “UTF-8″
resources.view.helperPath.ZFExt_View_Helper = “ZFExt/View/Helper/”
resources.view.helperPath.SpotSec_View_Helper = “SpotSec/View/Helper/”
resources.modifiedFrontController.contentType = “text/html;charset=utf-8″
resources.layout.layout = “default”
resources.layout.layoutPath = APPLICATION_PATH “/views/layouts”
resources.frontController.actionHelperPaths.ZFExt_Controller_Action_Helper = “ZFExt/Controller/Action/Helper”

;resources.layout.pluginClass = “ZFExt_Controller_Plugin_LayoutSwitcher”

; Module Options (Required For Mysterious Reasons)
resources.modules[] =

; Autoloader Options
autoloaderNamespaces[] = “ZFExt_”

; HTML Markup Options
resources.view.charset = “utf-8″
resources.view.doctype = “XHTML5″
resources.view.language = “en”

[staging : production]

[testing : production]
phpSettings.display_startup_errors = 1
phpSettings.display_errors = 1
resources.frontController.throwExceptions = 1

[development : production]
phpSettings.display_startup_errors = 1
phpSettings.display_errors = 1
resources.frontController.throwExceptions = 1[/geshi]

You could also register the same helper using cli.ini settings so it’s accessible from there also.

Conclusion

In Part 3 of this series on Asynchronous Processing in PHP, we’ve covered two concepts: forking and spawning. Forking is not covered in detail (yet) since it is problematic to run within a web environment. However it is extremely useful in any command line application since it does not incur the startup costs associated with spawning a new process. Nevertheless, in a web environment spawning a new process from scratch remains the simplest option to implement.

I’ve done my best to at least outline in sufficient detail the advantages and disadvantages of spawning processes. With some luck, I’ve impressed upon you that asynchronous processing is not some weird and technically infeasible strategy for improving an application’s responsiveness. It is, actually, very simple to implement – even in something as architected and involved as a Zend Framework application.

If you want to give the code here a trial run, you can simply change things for a few quick tests by having the asynchronous task do something like write its parameters to a temporary file. This “instant gratification” test will more than prove that the strategy above works in practice and with very little effort.

In Part 4, I’ll be exploring an alternative to process spawning (at least as described above) by using the combination of a scheduled task and a Job/Message Queue. This offers a few improvements over process spawning, primarily that it allows spreading the performance of asynchronous tasks over time and, with a little nudging, over multiple servers (even to the point of being able to use a server dedicated to specific tasks).


Viewing all articles
Browse latest Browse all 6

Trending Articles