HTML Entity Boundaries – Zero Padding

I can not say why, but the HTML specification does allow to zero-pad numerical entities [Reference needed]. Well that sounds fair per-se, but it does not give a limit here. So strictly spoken, you can pad your numeric entities with gigabytes of 0′s … .

This really does not make much sense because one should have been aware of the fact that documents need to be interpreted by systems with limited resources. Technically a parser could just filter out those zeroes from the input stream (drop) but that might not be considered save.

The good thing is, this leaves some playground. While testing diverse HTML encoding related routines with miqrogroove this morning (who has more and more cool plugins btw.) I just needed to find out about the limitations (a related ticket is #12284). I started the game by sending an entity of 1 MB to Firefox, just imagine the number of zeros in there: &#000...00065;

Padding Zeroes

So first I wanted to know what the maxmimum number of zeros in a simple A written as A are possible in internet browsers. If there are no limits in the specs, let’s look which limits can be found in the implementations.

A second test then was an entity which aint one &#65b (there is a ‘b’ at the end) and how many zero-paddings are possible as well as a counter-test.

So normally I would expect there to be a limit in the implementation. The question is how large it is. So first of all as webdev you might want to have fun with Mozilla Firefox. And indeed, it looks like the hotty foxy really tries to eat the pain. I was able to send a fairly large zero-padded numerical entity into it. Let’s measure the number of zeroes in bytes, whereas one byte is one zero. A kilobyte are 1024 bytes and a megabyte are 1024 kilobytes. I had no problem at all to send 1MB or even 4MB. The more MB I send, the slowlier the browser became. This resulted finally in some denial of service for the browser while loading up a 80MB zeropadded entity. You couldn’t use it any longer. There is a limit everywhere. You find the PHP-Testscript at the end of the article.

Then we got Opera. Opera does limiting, more pro-actively than Firefox. That will actually break according to specs, but you immediality see what counts: Display performance is way better as with large chunks in Firefox. Looks like Opera limits here near the limit of a signed, positive word. 32k is large enough in the real world for those. I was able to display an entity with the length of 32,766 characters incl. &# and ;. Everything higher get’s cutted and the ending ; assumed.

Chrome

Internet Explorer

So now a candidate that should not be missed here: Internet Exploiter Numba Six. It’s very straight forward: The maximum number of digits incl. preceeding zeroes is 7. Everyhing more than this will just not work. For IE 7 it’s the same. No support for IE8 here.

A quick run against Lynx revealed something similar like Opera. There is a limit related to the range of a signed bouble-byte value. The total length of the whole entity incl. &# and ; can be 32,768, so the decimal part can be 32,765 characters long.

Then I made some tests with Chrome. Chrome is strict related to the length of the entities, the number of allowed zeroes is limited to a maximum of 8 minus the number the pure decimal value consumes. I.e. 65 for A can be prefixed with 6 zeroes at maximum. That is compare able to Internet Explorer.

The counter test revealed nothing notable.

Results

Maximum length of entities in chars incl. &# and ;.

Firefox ........... Unlimited
Opera ................ 32,766
Internet Explorer ........ 10
Lynx ................. 32,768
Google Chrome ............ 11

Test System

Windows XP Professional based computer system with about 4 GB RAM was hosting the Browser instances. Server Software was Apache HTTPD on a remote system.

  • Firefox 3.6 Windows NT 5.1; de; Gecko/20100115 Firefox/3.6
  • Opera v10.10 Build 1893 Win32 Windows XP
  • IE 6 and 7.
  • Lynx Win32 2.86rel 5 from 09 may 2007
  • Google Chrome 4.0.249.89 (38071)

Testpage

Linked

PHP Testscript

<?php
$step  = 32745;
$count = 1;
$step_str = str_repeat('0', $step);
print '&#';
for ($i = 0; $i < $count; $i++)
	print $step_str;
print '65;';
?>
About these ads
This entry was posted in Hacking The Core, Hakre's Tips, Pressed and tagged , , , , , . Bookmark the permalink.

One Response to HTML Entity Boundaries – Zero Padding

  1. Pingback: Test: Firefox’s new Plugin Container Process | hakre on wordpress

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s