Mombu the Php Forum sponsored links

Go Back   Mombu the Php Forum > Php > #32880 : Unable to properly convert from ISO-8859-1 to UTF-8
User Name
Password
REGISTER NOW! Mark Forums Read

sponsored links


Reply
 
1 25th March 23:36
php-bugs
External User
 
Posts: 1
Default #32880 : Unable to properly convert from ISO-8859-1 to UTF-8



From: Diomedes_01 at yahoo dot com
Operating system: Solaris 9
PHP version: 5.0.4
PHP Bug Type: Strings related
Bug description: Unable to properly convert from ISO-8859-1 to UTF-8

Description:
------------
I am unable to properly encode certain strings from ISO-8859-1 to UTF-8. I
have tried using utf8_encode, mb_convert_encoding and iconv with no
success. The code I am attempting this on is as follows:

Reproduce code:
---------------
<?php
$main_test_string = "référendum sur la Constitution européenne";
$string_test = mb_detect_encoding($main_test_string, 'UTF-8,
ISO-8859-1');
echo "Encoding used: $string_test<br>"; // Properly displays ISO-8859-1

// First try converting with iconv
$iconv_test = iconv("ISO-8859-1", "UTF-8", $main_test_string);
echo "Iconv test: $iconv_test<br>"; // Displays nothing. No data
whatsoever

// Now try converting with mb_convert_encoding
$mb_test = mb_convert_encoding($main_test_string, "UTF-8", "ISO-8859-1");
$string_test2 = mb_detect_encoding($mb_test, 'UTF-8, ISO-8859-1');
echo "Encoding used: $string_test2<br>"; // Indicates string is now UTF-8
encoded (which is wrong)
echo "MB Test convert value: $mb_test<br>"; // Displays: référendum sur
la Constitution européenne; doesn't look like UTF-8 to me

// Finally try utf8_encode
$utf8_encode_test = utf8_encode($main_test_string);
$string_test3 = mb_detect_encoding($textfieldabstract, 'UTF-8,
ISO-8859-1');
echo "Encoding used: $string_test3<br>"; // Indicates string is now UTF-8
encoded (which is wrong)
echo "Abstract post conversion: $utf8_encode_test<br>"; // Same as before,
displays: référendum sur la Constitution européenne
?>

Expected result:
----------------
I should be seeing UTF-8 (Unicode) translated text of the style:
'Ελληνι'

Note that the above does work for non-latin based character sets like
chinese, japanese, russian, greek, etc.

Actual result:
--------------
What I am seeing is the following string:

référendum sur la Constitution européenne

Definately not UTF-8. Could be Klingon. :-)

I will admit I am not a Unicode master but this is certainly quite
puzzling. According to the do***entation, iconv is supposed to work in
this case but it is not displaying any data. I am running PHP 5.0.4 with
iconv enabled. (I see it in my phpinfo output)

Please advise.

--
Edit bug report at http://bugs.php.net/?id=32880&edit=1
--
Try a CVS snapshot (php4): http://bugs.php.net/fix.php?id=32880&r=trysnapshot4
Try a CVS snapshot (php5.0): http://bugs.php.net/fix.php?id=32880&r=trysnapshot50
Try a CVS snapshot (php5.1): http://bugs.php.net/fix.php?id=32880&r=trysnapshot51
Fixed in CVS: http://bugs.php.net/fix.php?id=32880&r=fixedcvs
Fixed in release: http://bugs.php.net/fix.php?id=32880&r=alreadyfixed
Need backtrace: http://bugs.php.net/fix.php?id=32880&r=needtrace
Need Reproduce Script: http://bugs.php.net/fix.php?id=32880&r=needscript
Try newer version: http://bugs.php.net/fix.php?id=32880&r=oldversion
Not developer issue: http://bugs.php.net/fix.php?id=32880&r=support
Expected behavior: http://bugs.php.net/fix.php?id=32880&r=notwrong
Not enough info: http://bugs.php.net/fix.php?id=32880&r=notenoughinfo
Submitted twice: http://bugs.php.net/fix.php?id=32880&r=submittedtwice
register_globals: http://bugs.php.net/fix.php?id=32880&r=globals
PHP 3 support discontinued: http://bugs.php.net/fix.php?id=32880&r=php3
Daylight Savings: http://bugs.php.net/fix.php?id=32880&r=dst
IIS Stability: http://bugs.php.net/fix.php?id=32880&r=isapi
Install GNU Sed: http://bugs.php.net/fix.php?id=32880&r=gnused
Floating point limitations: http://bugs.php.net/fix.php?id=32880&r=float
No Zend Extensions: http://bugs.php.net/fix.php?id=32880&r=nozend
MySQL Configuration Error: http://bugs.php.net/fix.php?id=32880&r=mysqlcfg
  Reply With Quote


  sponsored links


2 25th March 23:36
External User
 
Posts: 1
Default #32880 : Unable to properly convert from ISO-8859-1 to UTF-8



ID: 32880
Updated by: sniper@php.net
Reported By: Diomedes_01 at yahoo dot com
-Status: Open
+Status: Bogus
Bug Type: Strings related
Operating System: Solaris 9
PHP Version: 5.0.4
New Comment:

Sorry, but your problem does not imply a bug in PHP itself. For a
list of more appropriate places to ask for help using PHP, please
visit http://www.php.net/support.php as this bug system is not the
appropriate forum for asking support questions. Due to the volume
of reports we can not explain in detail here why your report is not
a bug. The support channels will be able to provide an explanation
for you.

Thank you for your interest in PHP.

What you're getting is really UTF-8 encoded string..

Previous Comments:
------------------------------------------------------------------------

[2005-04-28 23:46:20] Diomedes_01 at yahoo dot com

Description:
------------
I am unable to properly encode certain strings from ISO-8859-1 to
UTF-8. I have tried using utf8_encode, mb_convert_encoding and iconv
with no success. The code I am attempting this on is as follows:

Reproduce code:
---------------
<?php
$main_test_string = "référendum sur la Constitution européenne";
$string_test = mb_detect_encoding($main_test_string, 'UTF-8,
ISO-8859-1');
echo "Encoding used: $string_test<br>"; // Properly displays
ISO-8859-1

// First try converting with iconv
$iconv_test = iconv("ISO-8859-1", "UTF-8", $main_test_string);
echo "Iconv test: $iconv_test<br>"; // Displays nothing. No data
whatsoever

// Now try converting with mb_convert_encoding
$mb_test = mb_convert_encoding($main_test_string, "UTF-8",
"ISO-8859-1");
$string_test2 = mb_detect_encoding($mb_test, 'UTF-8, ISO-8859-1');
echo "Encoding used: $string_test2<br>"; // Indicates string is now
UTF-8 encoded (which is wrong)
echo "MB Test convert value: $mb_test<br>"; // Displays: référendum
sur la Constitution européenne; doesn't look like UTF-8 to me

// Finally try utf8_encode
$utf8_encode_test = utf8_encode($main_test_string);
$string_test3 = mb_detect_encoding($textfieldabstract, 'UTF-8,
ISO-8859-1');
echo "Encoding used: $string_test3<br>"; // Indicates string is now
UTF-8 encoded (which is wrong)
echo "Abstract post conversion: $utf8_encode_test<br>"; // Same as
before, displays: référendum sur la Constitution européenne
?>

Expected result:
----------------
I should be seeing UTF-8 (Unicode) translated text of the style:
'Ελληνι'

Note that the above does work for non-latin based character sets like
chinese, japanese, russian, greek, etc.

Actual result:
--------------
What I am seeing is the following string:

référendum sur la Constitution européenne

Definately not UTF-8. Could be Klingon. :-)

I will admit I am not a Unicode master but this is certainly quite
puzzling. According to the do***entation, iconv is supposed to work in
this case but it is not displaying any data. I am running PHP 5.0.4
with iconv enabled. (I see it in my phpinfo output)

Please advise.


------------------------------------------------------------------------


--
Edit this bug report at http://bugs.php.net/?id=32880&edit=1
  Reply With Quote


  sponsored links


Reply


Thread Tools
Display Modes




Copyright © 2006 SmartyDevil.com - Dies Mies Jeschet Boenedoesef Douvema Enitemaus -
666