Mombu the Programming Forum

Go Back   Mombu the Programming Forum > Programming > xml file parsing in C
User Name
Password
REGISTER NOW! Mark Forums Read




Reply
1 16th September 01:53
marc dubois
External User
 
Posts: 1
Default xml file parsing in C



hi,
is it possible to parse an XML file in C so that i can fulfill these
requirements :
1) replace all "<" and ">" signs inside the body of tag by a space, e.g. :
Example 1:
<foo> blabla < bla </foo>

becomes

<foo> blabla bla </foo>

Example 2:

<foo>> blablabla </foo>

becomes


<foo> blablabla </foo>


2) Remove all extra spaces at the end of every line of the XML file
3) Replace all special characters ( Unicode or Hexadecimal characters) by a
space


I mean the XML file is not well formed if there are "<" and ">" signs a
little bit everywhere,
it is not a valid file in that case, so i do not think the use of a parser
would be appropriate in that case. (How would the parser react when it
encounters a < that does not correspond to the beginning of a tag ???)

Do you have an idea on how i can write a program to deal with these
requirements ?
Technical environment is : Unix, KSH, and C (gcc)

I am thinking of using the "sed" command instead, i can get rid of the extra
spaces and replace the special characters but i still do not know how to
deal with the extra ">" and "<" signs.

Thanks for your help.
--
comp.lang.c.moderated - moderation address: clcm@plethora.net -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line. Sorry.
  Reply With Quote


 


2 16th September 01:54
willerz
External User
 
Posts: 1
Default xml file parsing in C



It's not generally possible, which is why generalised XML parsers make
no attempt to handle it.
--
comp.lang.c.moderated - moderation address: clcm@plethora.net -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line. Sorry.
  Reply With Quote
3 16th September 01:54
xian
External User
 
Posts: 1
Default xml file parsing in C


In XML < and > chars are not allowed to appear ecxept as the start and end
of a tag (you have to use the &gt; and &lt; entities if you want it
somewhere else). So just replacing all < and > chars you come across is
fine for all valid XML files, but can you garentee that all the XML you
will come across is valid?

--
/Xian

"Television is the first truly democratic culture - the first culture
available to everybody and entirely governed by what the people want. The
most terrifying thing is what people do want."
Clive Barnes
--
comp.lang.c.moderated - moderation address: clcm@plethora.net -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line. Sorry.
  Reply With Quote
Reply


Thread Tools
Display Modes




666