The problem

for this blog we needed to make some adjustments to the code. One of them was the summary. Because we don’t want our bloggers to write excerpts by themselves, we decided to do it for them (mainly tarwn).

To do this we take the original blogpost and cut it off at 600 characters. This works pretty well in php, since there are some functions for this. And this is what we have so far.

PHP
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
function closetags($html, $blockquote=true){
    $doc = new DOMDocument();
    @$doc->loadHTML($html);
    $text = $doc->saveXML();
    $text = str_replace('<?xml version="1.0" standalone="yes"?>',"",$text);
    $text = str_replace('<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">',"",$text);
    $text = str_replace('<body>',"",$text);
    $text = str_replace('</body>',"",$text);
    $text = str_replace('<html>',"",$text);
    $text = str_replace('</html>',"",$text);
    $text = preg_replace('/<div class="codeheader">(<span>[^<]+</span>).*</div>/','<div class="codeheader">1 Sample Code (See Article for Rest)</div></div>',$text);
 
    if($blockquote)
    {
        $text = str_replace('<blockquote>',"<blockquote><p>",$text);
        $text = str_replace('</blockquote>',"</p></blockquote>",$text);
    }
    return $text;
}
function closetags($html, $blockquote=true){
	$doc = new DOMDocument();
	@$doc->loadHTML($html);
	$text = $doc->saveXML();
	$text = str_replace('<?xml version="1.0" standalone="yes"?>',"",$text);
	$text = str_replace('<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">',"",$text);
	$text = str_replace('<body>',"",$text);
	$text = str_replace('</body>',"",$text);
	$text = str_replace('<html>',"",$text);
	$text = str_replace('</html>',"",$text);
	$text = preg_replace('/<div class="codeheader">(<span>[^<]+</span>).*</div>/','<div class="codeheader">1 Sample Code (See Article for Rest)</div></div>',$text);

	if($blockquote)
	{
		$text = str_replace('<blockquote>',"<blockquote><p>",$text);
		$text = str_replace('</blockquote>',"</p></blockquote>",$text);
	}
	return $text;
}

this worked pretty wel up untill I had a post that was cutt of like this.

<div class="

The closetags function made it into this.

<div class=""/>

That is not interpreted in the correct way by browers it seems. But the fix was simple and nice, just the way we like them. Of course, it involved some RTFMing for a while.

The solution

PHP
1
$text = $doc->saveXML(null, LIBXML_NOEMPTYTAG);
$text = $doc->saveXML(null, LIBXML_NOEMPTYTAG);

Now the saveXML will not expand empty tags, which works better for us.

Sites used.