Kenj 2006-08-30 21:10:54
We just completed the conversion of one text column in our database to
the XML datatype (about 15 million rows):
ALTER TABLE xmlTable ALTER COLUMN xmlColumn XML
Our database has gone from 36GB to 54GB with this single change (the
36GB and 54GB both indicate post-shrink utilized space).
I had read in another thread that XML was stored in a more efficient
binary format so, I was rather suprised at the growth of the database.
Is this to be expected or is there something I can configure to reduce
the storage requirement?
We’re going to test a text to varchar(max) conversion next, to see how
that stacks up against the XML conversion but would love to just store
XML as XML.
Assuming I can link directly so the google group aggregator, the thread
with the more efficient reference can be viewed at
Kent tegels 2006-08-30 21:11:11
I’m not too surprised, actually. The TEXT datatype was storing the XML which
is essentially one byte of storage for one byte of lexical XML. My guess
is that you only stored UTF-8 encoded XML in that type instance. The internal
storage of XML datatype is actually UTF-16 (as SQL Server understands collations,
not encodings). So in effect, you’ve likely doubled the amount of space consumed
to store the instances. Why not exactly double? SQL Server does do some tokenizing
of entity names, so we do see something less than 2x in most cases.
Expect it, and no, there’s only ways to increase the space impact, not really
The critical question comes downt to this: are you validating or other processing
the XML with the SQL context or not? If not, don’t use the XML type for exactly
this reason. If SQL Server is just secured storage of the XML for some other
application, consider using varbinary(max) instead so you don’t have to worry
about the encoding issues.
Kent Tegels, DevelopMentor
Dan at plurals 2006-08-30 21:11:20
I have found that it’s not obvious what the most compact way to save xml data is. I blogged the tests I did and how did them here:
Dan AT Pluralsight DOT COM
Kenj 2006-08-31 05:07:07
Great reply from both you and Dan Sullivan. The UTF-16 encoding was
what caught us! We would have really been in trouble had we converted
both TEXT columns; the high-water mark for converting the one column
was about 215GB 🙂
Since the SQL 2000 database we’re looking to upgrade just uses the text
columns as dumb storage for XML documents, we can definitely get by
without using the XML datatype. Assuming nobody is interested in
having the database enforce valid XML (it has to make it through MS
bizTalk before it gets to the DB), I’ll take a closer look at
varbinary(max); I had never even considered that one.
FYI: As I mentioned, we did run a varchar(max) conversion and it was
virtually instantaneous. I’m guessing this is handled internally in
almost the exact same manner as the text data type.
Thanks again for the helpful information.