htmlcxx

htmlcxx is a simple non-validating css1 and html parser for C++.
Download

htmlcxx Ranking & Summary

Advertisement

  • Rating:
  • License:
  • LGPL
  • Price:
  • FREE
  • Publisher Name:
  • Davi de Castro Reis and Robson Braga Ara
  • Publisher web site:

htmlcxx Tags


htmlcxx Description

htmlcxx is a simple non-validating css1 and html parser for C++. htmlcxx project is a simple non-validating css1 and html parser for C++. Although there are several other html parsers available, htmlcxx has some characteristics that make it unique:· STL like navigation of DOM tree, using excelent's tree.hh library from Kasper Peeters · It is possible to reproduce exactly, character by character, the original document from the parse tree · Bundled css parser · Optional parsing of attributes · C++ code that looks like C++ (not so true anymore) · Offsets of tags/elements in the original document are stored in the nodes of the DOM tree The parsing politics of htmlcxx were created trying to mimic mozilla firefox (http://www.mozilla.org) behavior. So you should expect parse trees similar to those create by firefox. However, differently from firefox, htmlcxx does not insert non-existent stuff in your html. Therefore, serializing the DOM tree gives exactly the same bytes contained in the original HTML document.Examples:Using htmlcxx is quite simple. Take a look at this example. #include < htmlcxx/html/ParserDom.h > ... //Parse some html code string html = "< html >< body >hey< /body >< /html >"; HTML::ParserDom parser; tree< HTML::Node > dom = parser.parseTree(html); //Print whole DOM tree cout ::iterator end = dom.end(); for (; it != end; ++it) { if (it->tagName() == "A") { it->parseAttributes(); cout attributes("href"); } } //Dump all text of the document it = dom.begin(); end = dom.end(); for (; it != end; ++it) { if ((!it->isTag()) && (!it->isComment())) { cout text(); } } What's New in This Release: · Compilation fixes for gcc 4.3.


htmlcxx Related Software