The concatenator tries to only work with bytes without ever wondering
what is in the byte bucket: files are read to `str`, concatenated with
`str` (via join) and returned as `str`, usually considered to be utf-8
encoded. It's the author's job to correctly encode files to utf-8.
So far so good.
On runbot, there's apparently an issue in some CSS files in some cases
on the runbot: `web_dir` finds itself to be typed `unicode` (because
it contains non-ascii characters? Not sure at all), as a result
`re.sub` will decode the corresponding file data when trying to inject
the dir as replacement and the CSS reader will return a `unicode`
object.
Then, when concat_files try to compute the checksum it will need bytes
thus re-encode everything using the default codec (ascii) and the
non-ascii character(s) will blow up the encoding with a
UnicodeEncodeError.
Solution:
* Assume CSS files can contain non-ascii characters (they can, and
do), decode them using `utf-8` to get `unicode` strings in the CSS
reader
* Inject web_dir as usual via replacement, this still yields a
`unicode` object (a `str` web_dir will simply be decoded using the
ASCII codec, a non-ascii web_dir should have been decoded to
`unicode` using sys.getfilesystemencoding)
* Cleanly re-encode evrything to utf-8, so that the code outside the
reader only ever manipulates 8-bit "byte" strings
bzr revid: xmo@openerp.com-20120405070711-vjyw8g4mge2goyik