![]() |
|
|
|
|
1
16th September 01:53
External User
Posts: 1
|
hi,
is it possible to parse an XML file in C so that i can fulfill these requirements : 1) replace all "<" and ">" signs inside the body of tag by a space, e.g. : Example 1: <foo> blabla < bla </foo> becomes <foo> blabla bla </foo> Example 2: <foo>> blablabla </foo> becomes <foo> blablabla </foo> 2) Remove all extra spaces at the end of every line of the XML file 3) Replace all special characters ( Unicode or Hexadecimal characters) by a space I mean the XML file is not well formed if there are "<" and ">" signs a little bit everywhere, it is not a valid file in that case, so i do not think the use of a parser would be appropriate in that case. (How would the parser react when it encounters a < that does not correspond to the beginning of a tag ???) Do you have an idea on how i can write a program to deal with these requirements ? Technical environment is : Unix, KSH, and C (gcc) I am thinking of using the "sed" command instead, i can get rid of the extra spaces and replace the special characters but i still do not know how to deal with the extra ">" and "<" signs. Thanks for your help. -- comp.lang.c.moderated - moderation address: clcm@plethora.net -- you must have an appropriate newsgroups line in your header for your mail to be seen, or the newsgroup name in square brackets in the subject line. Sorry. |
|
|
|
|
2
16th September 01:54
External User
Posts: 1
|
It's not generally possible, which is why generalised XML parsers make
no attempt to handle it. -- comp.lang.c.moderated - moderation address: clcm@plethora.net -- you must have an appropriate newsgroups line in your header for your mail to be seen, or the newsgroup name in square brackets in the subject line. Sorry. |
|
|
3
16th September 01:54
External User
Posts: 1
|
In XML < and > chars are not allowed to appear ecxept as the start and end
of a tag (you have to use the > and < entities if you want it somewhere else). So just replacing all < and > chars you come across is fine for all valid XML files, but can you garentee that all the XML you will come across is valid? -- /Xian "Television is the first truly democratic culture - the first culture available to everybody and entirely governed by what the people want. The most terrifying thing is what people do want." Clive Barnes -- comp.lang.c.moderated - moderation address: clcm@plethora.net -- you must have an appropriate newsgroups line in your header for your mail to be seen, or the newsgroup name in square brackets in the subject line. Sorry. |
|