HEAD first with PHP Streams

PHP has a built-in function called get_headers that will return the response headers. But it has some downsides, like requesting the whole response (and not only the HEAD) and it’s hard to control behaviour for redirects.

On Stackoverflow some users are asking from time to time how the status of a remote file (existing or not) can be done with a HEAD request instead of GET which actually downloads the file in question. Often the answer is involving cUrl which is indeed a great HTTP library.

But let’s not forget that PHP itself already offers a great way to solve the problem: The HTTP Stream wrapper. It’s good to learn more about it, often the key lies in the HTTP context options.

Let’s start with a simple example that shows it in action:

HTTP Status Code via HEAD Request

A first simple example: A HEAD request to an URL, not following redirects and obtaining the status code (Demo):

$url = 'http://example.com/';
$code = FALSE;

$options['http'] = array(
    'method' => "HEAD", 
    'follow_location' => 0
);

$context = stream_context_create($options);

$body = file_get_contents($url, NULL, $context);

if (!empty($http_response_header))
{
    sscanf($http_response_header[0], 'HTTP/%*d.%*d %d', $code);
}

echo $code;

The example is making use of the HTTP context options to specify the method of the request ('method' => "HEAD") and that redirects should not be followed ('follow_location' => 0).

The context is then used with file_get_contents, passed as the third parameter. If successful, file_get_contents returns a zero-length string. On error it will return FALSE. The return value is assigned to $body.

To obtain the HTTP Status-Line – which contains the HTTP Status Code – a special PHP array variable is used: $http_response_header. It contains all response headers of which each entry contains a header. The first entry (0) is the HTTP Status Line.

The status code is taken out of that line and assigned to $code. Easy.

Before continuing with the next example, let’s look into two things more specifically: file_get_contents and $http_response_header.

file_get_contents in HTTP Mode

file_get_contents returns FALSE on error (and gives a warning). With HTTP requests, an error is any status code not within the 2xx and 3xx classes (that are existing files and redirects). So if you request a file that returns a 404 code, it would give a warning and return FALSE.

To override the default behaviour, the ignore_errors HTTP context option can be set to TRUE. This will prevent the warning and won’t set the return value to FALSE but would return the error-page for a GET request for example.

$http_response_header

This is a special variable. It is set by the HTTP stream when executing a request. The variable is set in the current scope, so if you execute the HTTP request within a function, it’s available as a local variable, it does not overwrite the same variable in another function.

The good thing about it is, that PHP already deals with some details of the HTTP protocol for us with it. For example, some headers stretch over multiple lines in the raw response. Those header values are already normalized into a single line in that variable.

But there is a downside, too: If the request is following redirects all responses are put after each other. As it is sometimes wanted to follow redirects (and to know about each of them and especially the final response) or to get more information from specific headers, the next example will deal with that.

Dealing with Redirects

This is a little bit different. This time file_get_contents will follow redirects (the default behaviour). As shown with the response code from the first example (302), this is the case for the example URL I’m using.

To obtain the last status code, the whole $http_response_header array needs to be parsed, the last status code wins. It’s done very quickly (Demo):

$url = 'http://example.com/';
$code = FALSE;

$options['http'] = array(
    'method' => "HEAD"
);

$context = stream_context_create($options);

$body = file_get_contents($url, NULL, $context);

foreach($http_response_header as $header)
{
    sscanf($header, 'HTTP/%*d.%*d %d', $code);
}

echo "Status code (after all redirects): $code<br>\n";

This is a bit quick and dirty. Often it’s useful as well to get all response headers associated with each request in the redirect-chain.

The next example does this more nicely by parsing header values into an array structure that is easy to use:

$url = 'http://example.com/';

$options['http'] = array(
    'method' => "HEAD"
);

$context = stream_context_create($options);

$body = file_get_contents($url, NULL, $context);

$responses = parse_http_response_header($http_response_header);

$code = $responses[0]['status']['code']; // last status code

echo "Status code (after all redirects): $code<br>\n";

The goal again is to obtain the final status code, but now this is done by parsing the $http_response_header array with a new function, parse_http_response_header. It divides each response and lines them up in reverse order, so the last response comes first. Additionally it categorizes each response into status (Status Line, http version, code and phrase) and fields (all header-values keyed by the header-name).

So it’s easy to get the final status code: $responses[0]['status']['code']. Even more information can be easily obtained, for example to display the chain of all redirects nicely, which is part of the full example (Demo):

$url = 'http://example.com/';

$options['http'] = array(
    'method' => "HEAD"
);

$context = stream_context_create($options);

$body = file_get_contents($url, NULL, $context);

$responses = parse_http_response_header($http_response_header);

$code = $responses[0]['status']['code']; // last status code

echo "Status code (after all redirects): $code<br>\n";

$number = count($responses);

$redirects = $number - 1;

echo "Number of responses: $number ($redirects Redirect(s))<br>\n";

if ($redirects)
{
    $from = $url;
    
    foreach (array_reverse($responses) as $response)
    {
        if (!isset($response['fields']['LOCATION']))
            break;
        $location = $response['fields']['LOCATION'];
        $code = $response['status']['code'];
        
        echo " * $from -- $code --> $location<br>\n";
        $from = $location;
    }
    echo "<br>\n";
}

/**
 * parse_http_response_header
 * 
 * @param array $headers as in $http_response_header
 * @return array status and headers grouped by response, last first 
 */
function parse_http_response_header(array $headers)
{
    $responses = array();
    $buffer = NULL;
    foreach ($headers as $header)
    {
        if ('HTTP/' === substr($header, 0, 5))
        {
            // add buffer on top of all responses
            if ($buffer) array_unshift($responses, $buffer);
            $buffer = array();
                
            list($version, $code, $phrase) = explode(' ', $header, 3) + array('', FALSE, '');
            
            $buffer['status'] = array(
                'line' => $header, 
                'version' => $version, 
                'code' => (int) $code, 
                'phrase' => $phrase
            );
            $fields = &$buffer['fields'];
            $fields = array();
            continue;
        }
        list($name, $value) = explode(': ', $header, 2) + array('', '');
        // header-names are case insensitive
        $name = strtoupper($name);
        // values of multiple fields with the same name are normalized into
        // a comma separated list (HTTP/1.0+1.1)
        if (isset($fields[$name]))
        {
            $value = $fields[$name].','.$value;
        }
        $fields[$name] = $value;
    }
    unset($fields); // remove reference
    array_unshift($responses, $buffer);
    
    return $responses;
}

PHP HTTP Streams have a lot to offer. In combination with $http_response_headers, simple things can be easily accomplished. The last example shows how to get the final status code and even more. Naturally a much shorter version can be done by only looking for the last status code.

get_headers in Context

BTW, the parse_request_headers function can be used with the return value from get_headers as well. The original problem with get_headers was that it only did a GET request. Overwriting the default context options can be used to work around that (Demo):

$url = 'http://example.com/';
$code = FALSE;

$options['http'] = array(
    'method' => "HEAD",
);

stream_context_set_default($options);

$header = get_headers($url);

$responses = parse_http_response_header($header);

echo 'URL:', $url, "\n";

echo "Status code:", $responses[0]['status']['code'], "\n";

print_r($responses);
...

See also: Improved handling of HTTP requests in PHP

About these ads
This entry was posted in Hakre's Tips, PHP Development, Pressed and tagged , , , , , , , . Bookmark the permalink.

One Response to HEAD first with PHP Streams

  1. AskApache says:

    Great, great, great! Nice post, i’ve always had to examine the php C source code to figure out streams.. this helps. I personally disable file_get_contents ability to request remote files in my php.ini because this is the number one way malware that propagates blogs works.

    Instead I opt for raw socketlevel calls with fsockopen, which is surprisingly easier/faster than streams. See http://www.askapache.com/php/fsockopen-socket.html and also check out the wp_http class in wordpress.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s