[FIX] fields.function: type=binary: workaround for the low byte values (<=0x1f) unsupported in XML

We have a workaround in place for fields.function of binary type
that may return values that are invalid in XML documents, and thus
in our XML-RPC protocol. But out workaround failed to care for the
invalid XML codepoints (below 0x1f) that are well valid in UTF-8
encoding.
Added a sanity check for that as well, using a terrible workaround
for this last resort case: b64-encode the bytes, to avoid crashing
the request.

bzr revid: odo@openerp.com-20110906173140-vc4tl6wstzt8h06o
This commit is contained in:
Olivier Dony 2011-09-06 19:31:40 +02:00
parent 448a016824
commit 7d3d3a6aba
1 changed files with 28 additions and 19 deletions

View File

@ -32,7 +32,9 @@
* size
"""
import base64
import datetime as DT
import re
import string
import sys
import warnings
@ -727,34 +729,41 @@ def get_nice_size(value):
size = len(value)
return tools.human_size(size)
# See http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char
# and http://bugs.python.org/issue10066
invalid_xml_low_bytes = re.compile(r'[\x00-\x08\x0b-\x0c\x0e-\x1f]')
def sanitize_binary_value(value):
# binary fields should be 7-bit ASCII base64-encoded data,
# but we do additional sanity checks to make sure the values
# are not something else that won't pass via xmlrpc
# are not something else that won't pass via XML-RPC
if isinstance(value, (xmlrpclib.Binary, tuple, list, dict)):
# these builtin types are meant to pass untouched
return value
# For all other cases, handle the value as a binary string:
# it could be a 7-bit ASCII string (e.g base64 data), but also
# any 8-bit content from files, with byte values that cannot
# be passed inside XML!
# See for more info:
# Handle invalid bytes values that will cause problems
# for XML-RPC. See for more info:
# - http://bugs.python.org/issue10066
# - http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char
#
# One solution is to convert the byte-string to unicode,
# so it gets serialized as utf-8 encoded data (always valid XML)
# If invalid XML byte values were present, tools.ustr() uses
# the Latin-1 codec as fallback, which converts any 8-bit
# byte value, resulting in valid utf-8-encoded bytes
# in the end:
# >>> unicode('\xe1','latin1').encode('utf8') == '\xc3\xa1'
# Note: when this happens, decoding on the other endpoint
# is not likely to produce the expected output, but this is
# just a safety mechanism (in these cases base64 data or
# xmlrpc.Binary values should be used instead)
return tools.ustr(value)
# Coercing to unicode would normally allow it to properly pass via
# XML-RPC, transparently encoded as UTF-8 by xmlrpclib.
# (this works for _any_ byte values, thanks to the fallback
# to latin-1 passthrough encoding when decoding to unicode)
value = tools.ustr(value)
# Due to Python bug #10066 this could still yield invalid XML
# bytes, specifically in the low byte range, that will crash
# the decoding side: [\x00-\x08\x0b-\x0c\x0e-\x1f]
# So check for low bytes values, and if any, perform
# base64 encoding - not very smart or useful, but this is
# our last resort to avoid crashing the request.
if invalid_xml_low_bytes.search(value):
# b64-encode after restoring the pure bytes with latin-1
# passthrough encoding
value = base64.b64encode(value.encode('latin-1'))
return value
# ---------------------------------------------------------