[SQL Server]SQL Server 2005 XML ASCII 문자(1~31) 사용시 오류 문제.
1. 오류 재현 코드
DECLARE @v_xml XML
DECLARE @v_looper INT
SELECT @v_looper = 1
WHILE (@v_looper<31)
BEGIN
SELECT @v_looper = @v_looper + 1
BEGIN TRY
SELECT @v_Xml = N'<rows>
<row attr="' + CHAR(@v_looper) + '" />
</rows>'
PRINT 'Success! ASCII Code:' + CAST(@v_looper AS VARCHAR(10))
END TRY
BEGIN CATCH
PRINT 'Failure! ASCII Code:' + CAST(@v_looper AS VARCHAR(10))
--PRINT ERROR_NUMBER()
--PRINT ERROR_MESSAGE()
END CATCH
END
실행결과
Error! ASCII Code:2
Error! ASCII Code:3
Error! ASCII Code:4
Error! ASCII Code:5
Error! ASCII Code:6
Error! ASCII Code:7
Error! ASCII Code:8
Success! ASCII Code:9
Success! ASCII Code:10
Error! ASCII Code:11
Error! ASCII Code:12
Success! ASCII Code:13
Error! ASCII Code:14
Error! ASCII Code:15
Error! ASCII Code:16
Error! ASCII Code:17
Error! ASCII Code:18
Error! ASCII Code:19
Error! ASCII Code:20
Error! ASCII Code:21
Error! ASCII Code:22
Error! ASCII Code:23
Error! ASCII Code:24
Error! ASCII Code:25
Error! ASCII Code:26
Error! ASCII Code:27
Error! ASCII Code:28
Error! ASCII Code:29
Error! ASCII Code:30
Error! ASCII Code:31
위 실행결과에서 확인할 수 있듯이 ASCII코드 1에서 31사이에 사용 가능한 Character는
9 (TAB : horizontal tab, XML #x9)
10 (LF : line feed, new line, XML #xA)
13 (CR : carriage return, XML #xD)
이렇게 세가지 뿐이고, 나머지 Character는 다음과 모두 에러를 발생한다.
메시지 9420, 수준 16, 상태 1, 줄 9
XML parsing: line 1, character 18, illegal xml character
이 같은 문제에 대해 MS에서는 해당 문자들은 XML 스펙에서 사용을 금지하고 있으므로 사용하지 말라고 이야기 하고 있다.
PRB: Error Message When an XML Document Contains Low-Order ASCII Characters
SYMPTOMS (증상)
When you attempt to use versions 3.0 or later of the MSXML parser to parse XML
documents that contain certain low-order non-printable ASCII characters (that
is, characters below ASCII 32), you may receive the following error message:
An Invalid character was found in text content.
CAUSE
Versions 3.0 and later of the MSXML parser strictly enforce the valid XML
character ranges that are defined by the World Wide Web Consortium (W3C) XML
language specification. XML documents that are parsed using versions 3.0 or
later of MSXML cannot contain characters that fall outside the defined valid XML
character ranges. The low-order non-printable ASCII characters in the ranges
that are listed in the "More Information" section are not valid XML characters.
An XML document that contains instances of these characters is not conformant
with the W3C specifications and cannot be parsed successfully with versions 3.0
and later of MSXML.
RESOLUTION
To resolve this problem, either remove instances of the low-order non-printable
ASCII characters, or replace the characters with an alternate valid character
such as the space character (ASCII 32, hex #x20). This solution makes the XML
document compliant with the W3C specifications. However, removing or replacing
instances of these characters may affect other applications that use the data
and to which the characters are significant. Such additional impact can only be
identified by testing and will need to be addressed by implementing a fix or
workaround that is appropriate for a specific situation.
STATUS
This behavior is by design.
MORE INFOMATION
Versions 2.6 and earlier of the MSXML parser permit XML documents to contain
low-order non-printable ASCII characters that fall outside the W3C valid XML
character ranges. However, the design of versions 3.0 and later of the MSXML
parser has been changed to strictly enforce the valid XML character ranges that
are defined in the W3C XML language specification. This design change is
required to be able to identify non-conformant XML documents.
The
following are the valid XML characters and character ranges (hex values) as
defined by the W3C XML language specifications 1.0:
#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
The following are the character ranges for low-order non-printable ASCII
characters that are rejected by MSXML versions 3.0 and later:
#x0 - #x8 (ASCII 0 - 8)
#xB - #xC (ASCII 11 - 12)
#xE - #x1F (ASCII 14 - 31)
This design change may affect the following users and applications:
- Internet Explorer users: Users who have been using Internet Explorer
versions 5.5 and earlier (and who did not install MSXML 3.0 in Replace mode) to
browse and view XML documents that contain one or more instances of the
specified low-order non-printable ASCII characters encounter the error message
after upgrading to Internet Explorer 6.0 because Internet Explorer 6.0 installs
MSXML 3.0 SP2 in Replace mode and uses it to parse XML documents.
- MDAC and ADO users: Developers and users who load ADO-persisted XML
documents that contain one or more instances of the specified low-order
non-printable ASCII characters into ADO Recordset objects encounter the error
message after upgrading to MDAC 2.7 because MDAC 2.7 installs MSXML 3.0 SP2,
which is the version of the MSXML parser that the ADO 2.7 Recordset object uses.
- Applications that use the MSXML Document Object Model (DOM):
Applications that use version independent PROGIDs to instantiate MSXML DOM
objects that are used to parse XML documents generate the specified error when
MSXML 3.0 or one of its service packs is installed in Replace mode or when the
code is modified to use the MSXML 3.0 or 4.0 version specific PROGIDs.
REFERENCES
For additional information on other known causes and workarounds for the error
message that is specified in the 'Symptoms' section, click the article numbers
below to view the articles in the Microsoft Knowledge Base:
238833 (http://support.microsoft.com/kb/238833/EN-US/ ) PRB: XML
Parser: Invalid Character Was Found in Text Content
275883 (http://support.microsoft.com/kb/275883/EN-US/ ) INFO: XML
Encoding and DOM Interface Methods
APPLIES TO
- Microsoft XML Parser 3.0
- Microsoft XML Parser 3.0 Service Pack 1
- Microsoft XML Parser 3.0 Service Pack 2
- Microsoft XML Core Services 4.0
- Microsoft Data Access Components 2.8
이에 대해서 W3C의 Extensible Markup Language (XML) 1.0 (Fifth Edition) 에서 다음과 같이 정의하고 있다. ("2.2.Characters"에 대한 내용만 일부 복사.)
Extensible Markup Language (XML) 1.0 (Fifth Edition)
W3C Recommendation 26 November 2008
2.2. Characters[Definition: A parsed entity contains text, a sequence of characters, which may represent markup or character data.] [Definition: A character is an atomic unit of text as specified by ISO/IEC 10646:2000 [ISO/IEC 10646]. Legal characters are tab, carriage return, line feed, and the legal characters of Unicode and ISO/IEC 10646. The versions of these standards cited in A.1 Normative References were current at the time this document was prepared. New characters may be added to these standards by amendments or new editions. Consequently, XML processors MUST accept any character in the range specified for Char. ]
Character Range
[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
/* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
The mechanism for encoding character code points into bit patterns may vary from entity to entity. All XML processors MUST accept the UTF-8 and UTF-16 encodings of Unicode [Unicode]; the mechanisms for signaling which of the two is in use, or for bringing other encodings into play, are discussed later, in 4.3.3 Character Encoding in Entities.
Note:
Document authors are encouraged to avoid "compatibility characters", as defined in section 2.3 of [Unicode]. The characters defined in the following ranges are also discouraged. They are either control characters or permanently undefined Unicode characters:
[#x7F-#x84], [#x86-#x9F], [#xFDD0-#xFDEF],
[#x1FFFE-#x1FFFF], [#x2FFFE-#x2FFFF], [#x3FFFE-#x3FFFF],
[#x4FFFE-#x4FFFF], [#x5FFFE-#x5FFFF], [#x6FFFE-#x6FFFF],
[#x7FFFE-#x7FFFF], [#x8FFFE-#x8FFFF], [#x9FFFE-#x9FFFF],
[#xAFFFE-#xAFFFF], [#xBFFFE-#xBFFFF], [#xCFFFE-#xCFFFF],
[#xDFFFE-#xDFFFF], [#xEFFFE-#xEFFFF], [#xFFFFE-#xFFFFF],
[#x10FFFE-#x10FFFF].
나의 결론 : 지금까지 이런 문제에 대해 전혀 모르고 있었다. XML를 사용하면서 W3C의 XML 스펙 문서를 자세히 읽어 본 적도 없고, XML관련 서적을 봤을 때 W3C의 XML 스펙에 대해 이야기하는 책도 접해보지 못했다. 새로운 기술을 사용할 때는 기본에 충실하고, 해당 표준 스펙에 대해 충분히 숙지 해야할 것이다. 그리고 또 다른 한가지는 모든 스펙문서가 영문으로 된 경우 많으므로 영어능력을 키우는 것도 중요하겠다. 여기서 다시 한번 모든 학문의 기초는 영어라는 이야기를 되새기면서... 이 씁쓸함은 머지?

(
0)

(
0)
Posted by 좐군