Solution provided by bruno costa: Your header looks very similar to ours, we just have a few extra parameters.
header("Content-Type: application/vnd.ms-excel"); header("Expires: 0"); header("Cache-Control: must-revalidate, post-check=0, pre-check=0"); header("content-disposition: attachment;filename=keywords.csv");
The key seems to be in the encoding, which MS Excel expects to be Unicode UTF-16LE, any other and the results become unreliable. And the byte-order mark, which it also expects to be present as the first two bytes in the file, 0xFF 0xFE, meaning it's a UTF-16LE file.
This is how we're doing it, after sending the header and having $content as the variable containing the CSV lines in UTF-8, we send the byte-order mark and the text in UTF-16LE:
echo chr(255) . chr(254) . mb_convert_encoding($content, 'UTF-16LE', 'UTF-8');
Another fundamental point is that the CSV content has to properly represent the text, either stored as UTF-8 if we want to support any alphabet or GB2312 if only Chinese as could happen in our cases, ISO-8859-15 for instance obviously wouldn't properly represent the text. In the end we just need to use mb_convert_encoding() to convert from our internal representation to what MS Excel expects.
In case it can be useful I'm also sending a few links we gathering when studying the issue, the 4th one explains the key BOM issue:
UTF BOM http://www.opentag.com/xfaq_enc.htm http://en.wikipedia.org/wiki/Byte_Order_Mark http://www.unicode.org/unicode/faq/utf_bom.html
PHP Multibyte String Functions http://www.php.net/manual/en/ref.mbstring.php#50298
fixed in #309