[IMP] tools.html2plaintext: consistent use of lxml.etree.HTMLParser to convert HTML to plaintext

We used to switch to using BeautifulSoup when available, but that lead to inconsistent behavior depending on the installed Python packages, and sometimes lead to bad surprises. There is no advantage in using BeautifulSoup rather than HTMLParser, and the latter is always available. bzr revid: odo@openerp.com-20121015120934-njaylf99dc5zekfw
2012-10-15 14:09:34 +02:00 · 2012-10-15 14:09:34 +02:00 · e06e3aad4a
parent 99c4f31111
commit e06e3aad4a
1 changed files with 2 additions and 10 deletions
--- a/openerp/tools/misc.py
+++ b/openerp/tools/misc.py
@ -312,16 +312,8 @@ def html2plaintext(html, body_id=None, encoding='utf-8'):
    html = ustr(html)
-    from lxml.etree import tostring
+    from lxml.etree import tostring, fromstring, HTMLParser
-    try:
+    tree = fromstring(html, parser=HTMLParser())
        from lxml.html.soupparser import fromstring
        kwargs = {}
    except ImportError:
        _logger.debug('tools.misc.html2plaintext: cannot use BeautifulSoup, fallback to lxml.etree.HTMLParser')
        from lxml.etree import fromstring, HTMLParser
        kwargs = dict(parser=HTMLParser())
    tree = fromstring(html, **kwargs)
    if body_id is not None:
        source = tree.xpath('//*[@id=%s]'%(body_id,))