Mombu the Programming Forum

Go Back   Mombu the Programming Forum > Programming > Encoding of the perl output (á é Ã* ó)
User Name
Password
REGISTER NOW! Mark Forums Read




Reply
 
1 4th September 15:40
tom
External User
 
Posts: 1
Default Encoding of the perl output (á é Ã* ó)



Hi,

My perl scripts is generating strange output of the characters when it
is opened by Browsers (mozilla, IE).

#!/usr/bin/perl
print "a e i o u á é í ó u â î ô û";

The below code outputs the rigth string to the console (or files if I
open with editors), but if I put it (with open FILE,output.html...) in
html file and open with a browser, look the result:
a e i o u á é Ã* ó u â î ô û

the generated file is of the type 'UTF-8 Unicode English text, with
very long lines', as says "file" command on linux.

Any hint about the solution?


Information about my system:
System Linux Ubuntu 6.06
perl, v5.8.7 built for i486-linux-gnu-thread-multi
/etc/environment sets:
LANG="pt_BR.UTF-8"
LANGUAGE="pt_BRtt_PT"


Thank you
Tom
  Reply With Quote


 


2 4th September 15:41
sherm pendley
External User
 
Posts: 1
Default Encoding of the perl output (á_é_Ã*_ó)



"Tom" <tomlobato@gmail.com> writes:


First off, you shouldn't simply assume that the output to FILE will be UTF-8.
Instead, use the appropriate IO layer, like this:

open(F, ">:utf8", "data.txt") or die "Error opening data.txt: $!";

That way, if you have data that hasn't yet been converted to UTF-8 - say it's
been input from another file that used a different :encoding layer - you can


Have you told your browser it's UTF-8? If your HTML is delivered over the
web, you'd do that with the appropriate HTTP headers. If it's loaded from
a local file, you could use a meta element in the HTML itself.

The question of what HTTP headers and/or meta element you need to use should
be directed to a more appropriate forum - likely one in the comp.infosystems.
http://www.* hierarchy.

sherm--

--
Cocoa programming in Perl: http://camelbones.sourceforge.net
Hire me! My resume: http://www.dot-app.org
  Reply With Quote
3 8th September 22:09
reto
External User
 
Posts: 1
Default Encoding of the perl output (á é Ã* ó)


Hi Tom,

a few points to consider...

1) You should be awared of what kind of character encoding your
*Webserver* is using. If you're running Apache checkout for this key
in your httpd.conf:

AddDefaultCharset UTF-8

A quick note this might be overruled on the HTML level within the
header section (see also point 2):

<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1"> or

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">


2) You could tell via Perl to wrap the UTF encoding information within
the HTTP header by adding the carset:

print $q->header('text/html; charset=UTF-8');


3) For the most secure way I suggest to keep your server settings on
UTF encoding and transform your output to a web encoded format,
See an example on this page:

http://www.infocopter.com/perl/web-encoding-decoding.html

--Reto
  Reply With Quote
4 13th September 17:15
ben bullock
External User
 
Posts: 1
Default _Encoding_of_the_perl_output_(á_é_Ã*_ó)


Switching the encoding of your post to utf8, I get the above results. In
case you can't see this utf8 post the characters in the second line that you
complained about are actually all displayed correctly when the post encoding
is utf8. It looks to me as if Perl is working properly, and the problem is
not Perl but that the browser encoding is not set properly.
  Reply With Quote
5 30th September 05:51
alan j. flavell
External User
 
Posts: 1
Default _Encoding_of_the_perl_output_(á_é_Ã*_ó)


Well, the proper place to set the character encoding for a browser is
from the document source - rather than being set in the browser by the
recipient, which is at best a workaround or repair technique.

Just exactly how to set the character encoding from the source has
become a bit complex with the various ifs and buts that have crept in,
but to summarise:

* if the document is arriving via HTTP then the ideal place is
on the HTTP protocol header (Content-type header, charset
attribute[1])

* if the document is browsed locally, or via a protocol which has no
content-type header (such as FTP), then...

+ for HTML: meta http-equiv
+ for XML-based content types: <?xml ... encoding=... ?>

+ for Appendix-C XHTML/1.0: both of the above

+ under some circumstances; a Unicode BOM.

[1] By the way, don't be misled by the name of the MIME "charset"
attribute. It specifies what in current terminology is more
accurately called a "character encoding scheme" (such as utf-8). It
does not specify a "character set" in the sense that term is currently
meant - the "Document Character Set" for HTML/4, as also for all
XML-based markups, is in principle Unicode, even when the encoding
scheme is only us-ascii.

That wasn't meant as a criticism of your posting, just an attempt to
clarify a few points which often seem to get misunderstood.

thanks

--

If the crash doesn't occur immediately, the [development] cycle is broken,
and the result is called a release. -- detha, in the monastery.
  Reply With Quote
6 5th October 08:13
ben bullock
External User
 
Posts: 1
Default _Re:_Encoding_of_the_perl_output_(á_é_í_ó)


Some other people (I see the names Sherm Pendley and Reto) already made
extensive posts about how to fix the encoding, but no-one seemed to have
mentioned that the supposedly incorrect output from Perl was actually
correct utf8.
  Reply With Quote


 


Reply


Thread Tools
Display Modes




Copyright © 2006 SmartyDevil.com - Dies Mies Jeschet Boenedoesef Douvema Enitemaus -
666