HTML::TagFilter

A fine-grained html-filter, xss-blocker and mailto-obfuscator
Download

HTML::TagFilter Ranking & Summary

Advertisement

  • Rating:
  • License:
  • Perl Artistic License
  • Price:
  • FREE
  • Publisher Name:
  • William Ross
  • Publisher web site:
  • http://search.cpan.org/~wross/

HTML::TagFilter Tags


HTML::TagFilter Description

A fine-grained html-filter, xss-blocker and mailto-obfuscator HTML::TagFilter is a fine-grained html-filter, xss-blocker and mailto-obfuscator.SYNOPSIS use HTML::TagFilter; my $tf = new HTML::TagFilter; my $clean_html = $tf->filter($dirty_html); # or my $tf = HTML::TagFilter->new( allow=>{...}, deny=>{...}, log_rejects => 1, strip_comments => 1, echo => 1, verbose => 1, skip_xss_protection => 1, skip_entification => 1, skip_mailto_entification => 1, xss_risky_attributes => , xss_permitted_protocols => , xss_allow_local_links => 1, ); # or my $tf = HTML::TagFilter->new( on_finish_document =>sub { return " " . $self->report . " "; }, ); $tf->parse($some_html); $tf->parse($more_html); my $clean_html = $tf->output; my $cleaning_summary = $tf->report; my @tags_removed = $tf->report; my $error_log = $tf->error;HTML::TagFilter is a subclass of HTML::Parser with a single purpose: it will remove unwanted html tags and attributes from a piece of text. It can act in a more or less fine-grained way - you can specify permitted tags, permitted attributes of each tag, and permitted values for each attribute in as much detail as you like.Tags which are not allowed are removed. Tags which are allowed are trimmed down to only the attributes which are allowed for each tag. It is possible to allow all or no attributes from a tag, or to allow all or no values for an attribute, and so on.The filter will also guard against cross-site scripting attacks and obfuscate any mailto:email addresses, unless you tell it not to.The original purpose for this was to screen user input. In that setting you'll often find that just using: my $tf = new HTML::TagFilter; put_in_database($tf->filter($my_text));will do. However, it can also be used for display processes (eg text-only translation) or cleanup (eg removal of old javascript). In those cases you'll probably want to override the default rule set with a small number of denial rules. my $self = HTML::TagFilter->new(deny => {img => {'all'}}); print $tf->filter($my_text);Will strip out all images, for example, but leave everything else untouched.nb (faq #1) the filter only removes the tags themselves: all it does to text which is not part of a tag is to escape the s, to guard against false negatives and some common cross-site attacks. Requirements: · Perl


HTML::TagFilter Related Software