My problem with this approach is that data is reformatted before being used,
while my philosophy has always been to always store data in it's raw
original version and format when outputting (which would always be
consistent). So in this case, if someone (say in the forum of a website)
starts using html, I would store the raw data in some database table.
This makes it example for certain users to say "I want to see html code in
other people's posts" and others to say "I don't want to see their html
code". In the first case I will output the text in a pretty raw form, and in
the second case I will pass the data through htmlentities(). Personally, I
think it's a bad idea to alter data before storing, because you can never go
back to the way it was. If I store certain information, I store it raw. When
I output it, I can choose how to reformat the data, because it's not always
a HTML-based situation I'm going to be in.
So yes, your approach is faster, but less flexible. My approach is
consistent. I always store and handle my data raw, and when I output it, I
consider reformatting. In your case you create exceptional situations where
the default filter (which is server based, not application or website based)
is not applicable. Problem is though that many people won't be able to rely
on a default filter and therefor have to filter everything on input. And
with the way I want to handle my data (only reformatting on output) I don't
want to do any filtering at all on the input. Only on the output.
It's a very weird idea to me to filter out HTML on input, because the only
place where HTML tags could be abused is in the output. So that's where
filtering should take place, imho. Maybe it's hard to figure out a way to do
this the easiest way, but failing to come up with an output filtering idea
should not result in input filtering "just because it's easier" (which, I'm
very sorry to say reminds me once again of magic_quotes_gpc... it's much
easier to define such a rule globally, but you end up with a lot of crap).
And I don't mind writing "htmlentities()" all the time when I output data
from my databases to a browser. You talk about a global policy, but a
developer's policy should always include good security. So going over all
code and add "htmlentities" will not happen to said developer. He has
already done that while coding. Maybe if the name of "htmlentities" was only
4 characters like "echo", some people would be more eager to do output
filtering from the start?
By the way, I use PHP for software development and I'm never in the position
where a webserver admin would control what I can and cannot do, but I'm just
anticipating trouble for people who are in that position.
Ron
the
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php