Re: Web Services/XML Parser Error



LX-i wrote:
let me know and I'll dig it up...

I needed to dig through my old files for something else, so here it is. It may not be the most elegant, but it has the really cool feature of being known to work. :)

The comments reference IMDS, which is the name of the program on which I worked at my last assignment. There is also a copybook referenced in the linkage section - it will be posted after the subprogram. Also note the date written - I was still grasping with the meaning of "class", and trying to make procedural code do something it wasn't quite meant to do. :)

=-=-=-=-= NFSXEN.cob =-=-=-=-=

Identification Division.
Program-ID. IMDS-XML-Class.
*>________________________________________________________________
*>
*> IMDS XML/HTML ENTITY ENCODING / DECODING SUBROUTINE
*>
*> This subroutine accepts a 500-character text field and
*> either replaces characters which XML finds offensive with their
*> entity equivalents, or replaces these entities with their
*> characters.
*>
*> WARNING - The same parameter is used for input and output, so
*> truncation will occur if the expanded text is
*> greater than 500 characters.
*>________________________________________________________________
*>
*> INPUT REQUIRED
*> --------------
*> - Input Parameter packet, consisting of...
*> - Encode / Decode switch - "E" for encode, "D" for decode
*> - Text - 500 position alphanumeric field
*> - Length - 3 position numeric field (output only)
*>
*> OUTPUT EXPECTED
*> ---------------
*> - The text parameter will be converted according to the
*> encode/decode switch
*> - The length parameter will be filled with the length of the
*> converted field
*>________________________________________________________________
*>
*> Author: SSgt Daniel J. Summers
*> Date Written: July 2003
*> Security: Unclassified
*>________________________________________________________________

Data Division.
Working-Storage Section.

*> Hold areas for strings
01 WS-In-String is Global Pic X(500) Value Spaces.
01 WS-Work-String is Global Pic X(500) Value Spaces.

*> Indices used to step through the strings
01 WS-Input-Idx is Global Pic 9(10) Binary Value 0.
01 WS-Work-Idx is Global Pic 9(10) Binary Value 0.

*> Length of the input string
01 WS-Input-Length is Global Pic 9(10) Binary Value 0.
*>________________________________________________________________

Linkage Section.
Copy W405-XML-Parameters-WSA.
*>________________________________________________________________

Procedure Division Using W405-XML-Class-Parameters

. IMDS-XML-Class.
*>***************************************************************
*>* Process this string
*>***************************************************************

*> Move the input to global WS, find the length of the input
Move W405-String to WS-In-String
Move 0 to WS-Input-Length
Move 1 to WS-Input-Idx WS-Work-Idx
Move Spaces to WS-Work-String

Inspect Function Reverse (WS-In-String)
Tallying WS-Input-Length for Leading Space

Compute WS-Input-Length = 500 - WS-Input-Length

*> Which method do we choose?
Evaluate True
When W405-Encode Call "XML-Encode"
When W405-Decode Call "XML-Decode"
End-Evaluate

*> Move the string back to the linkage section
Move WS-Work-String to W405-String

*> Compute the length of the output
Compute W405-Length = WS-Work-Idx - 1

Exit Program.
*>________________________________________________________________
*>________________________________________________________________

Identification Division.
Program-ID. XML-Encode.
*>________________________________________________________________
*>
*> This subprogram codes necessary entities for XML and HTML
*> documents.
*>________________________________________________________________

Data Division.
Working-Storage Section.

*> Hold area for character code
01 WS-Char-Code Pic 9(03) Value 0.
01 WS-Char-Code-X Redefines WS-Char-Code.
12 Pic 9(01).
12 WS-Char-Code-LT-100 Pic 9(02).
*>________________________________________________________________

Procedure Division

. XML-Encode.
*>***************************************************************
*>* Encode the input string with XML character entities
*>***************************************************************

Perform Until WS-Input-Idx > WS-Input-Length
Or WS-Work-Idx > 500

*> This function returns a number that represents a
*> character's ordinal position in the character set.
*> However, the character set begins with 0, and this
*> function returns at least 1. For example, the space
*> character is 33 from Function Ord, but 32 in the ASCII
*> character set. This is why we then subtract 1 from the
*> code this function returns.
Move Function Ord (WS-In-String (WS-Input-Idx:1))
to WS-Char-Code

Subtract 1 from WS-Char-Code

Evaluate WS-Char-Code

*> The following When clause catches leading and trailing
*> special characters, quote (34), ampersand (38),
*> apostrophy (39), less than (60), and greater than (62)
When 0 Thru 31
When 34
When 38
When 39
When 60
When 62
When 127 Thru 255

Move "&#" to WS-Work-String (WS-Work-Idx:2)
Add 2 to WS-Work-Idx

If WS-Char-Code <= 100
Move WS-Char-Code-LT-100
to WS-Work-String (WS-Work-Idx:2)
Add 2 to WS-Work-Idx
Else
Move WS-Char-Code
to WS-Work-String (WS-Work-Idx:3)
Add 3 to WS-Work-Idx
End-If

Move ";" to WS-Work-String (WS-Work-Idx:1)

*> This character is OK
When Other
Move WS-In-String (WS-Input-Idx:1)
to WS-Work-String (WS-Work-Idx:1)

End-Evaluate

Add 1 to WS-Input-Idx WS-Work-Idx

End-Perform

Exit Program
*>________________________________________________________________

. End Program XML-Encode.
*>________________________________________________________________
*>________________________________________________________________

Identification Division.
Program-ID. XML-Decode.
*>________________________________________________________________
*>
*> This subprogram decodes entities encoded by the above
*> subroutine from XML / HTML to regular text documents.
*>________________________________________________________________

Data Division.
Working-Storage Section.

*> The length of the ASCII code in the line
77 WS-Code-Length Pic 9(01) Value 0.

*> The character code that we'll convert
77 WS-Char-Code Pic 9(03) Value 0.
*>________________________________________________________________

Procedure Division

. XML-Decode.
*>***************************************************************
*>* Replace XML/HTML entity codes w/ their character equivalents
*>***************************************************************

Perform Until WS-Input-Idx > WS-Input-Length
Or WS-Work-Idx > 500

If WS-In-String (WS-Input-Idx:2) = "&#"
*> This is a character entity - determine if it's 2
*> or 3 digits
If WS-In-String (WS-Input-Idx + 4:1) = ";"
*> Entity is 0 - 99
Move Function
NumVal (WS-In-String (WS-Input-Idx + 2:2))
to WS-Char-Code
Add 1 to WS-Char-Code
Add 4 to WS-Input-Idx
Else
*> Entity is >= 100
If WS-In-String (WS-Input-Idx + 5:1) = ";"
Move Function
NumVal (WS-In-String (WS-Input-Idx + 2:3))
to WS-Char-Code
Add 1 to WS-Char-Code
Add 5 to WS-Input-Idx
Else
*> This will not handle character codes over 3
*> positions, as these codes aren't part of the
*> ASCII character set, and Function Char
*> doesn't support Unicode
Move Function
Ord (WS-In-String (WS-Input-Idx:1))
to WS-Char-Code
End-If
End-If

*> At this point, WS-Char-Code has the proper code for
*> the character we need to output
Move Function Char (WS-Char-Code)
to WS-Work-String (WS-Work-Idx:1)

Else
*> Just move the character - no entity found
Move WS-In-String (WS-Input-Idx:1)
to WS-Work-String (WS-Work-Idx:1)
End-If

Add 1 to WS-Input-Idx WS-Work-Idx

End-Perform

Exit Program
*>________________________________________________________________

. End Program XML-Decode.
*>________________________________________________________________
*>________________________________________________________________

End Program IMDS-XML-Class.

=-=-=-=-= W405.cob =-=-=-=-=

W405-XML-Parameters-WSA Proc.
*>________________________________________________________________
*>
*> IMDS XML Class Parameters
*>
*> These parameters are used to call the IMDS XML Encode/Decode
*> Class. See comments in NFSXEN.cob for details on how to
*> utilize this class.
*>________________________________________________________________

01 W405-XML-Class-Parameters.
*> METHOD
*> This tells the class what method to perform.
12 Pic X(01).
88 W405-Encode Value "E".
88 W405-Decode Value "D".
*> STRING
*> This is the text which is given on input, and returned,
*> converted, on output
12 W405-String Pic X(500).
*> OUTPUT LENGTH
*> This is filled with the length of the converted string
12 W405-Length Pic 9(03).

End W405-XML-Parameters-WSA.

=-=-=-=-= End of Source Code =-=-=-=-=

Since this was written for the Unisys 2200, here are a few notes about things that may or may not be standard COBOL...

- It is designed to work with ASCII only, characters 0-255.

- "is Global" makes the variables visible from the embedded subprograms.

- "Pic 9(10) Binary" is a 4x9-bit word. Defining variables this way is efficient in that environment, but may need some tweaks for a 2x8-bit structure.

- The "Proc" and "End" statements in W405 are needed to allow the copybook to be catalogued. If they don't work in your environment, they can be removed.

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ / \/ _ o ~ Live from Albuquerque, NM! ~
~ _ /\ | ~ ~
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
~ Business E-mail ~ daniel @ "Business Website" below ~
~ Business Website ~ http://www.djs-consulting.com ~
~ Tech Blog ~ http://www.djs-consulting.com/linux/blog ~
~ Personal E-mail ~ "Personal Blog" as e-mail address ~
~ Personal Blog ~ http://daniel.summershome.org ~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

GEEKCODE 3.12 GCS/IT d s-:+ a C++ L++ E--- W++ N++ o? K- w$ !O M--
V PS+ PE++ Y? !PGP t+ 5? X+ R* tv b+ DI++ D+ G- e h---- r+++ z++++

"Who is more irrational? A man who believes in a God he doesn't see,
or a man who's offended by a God he doesn't believe in?" - Brad Stine
.



Relevant Pages

  • Re: Web Services/XML Parser Error
    ... *> Length of the input string ... *>* Encode the input string with XML character entities ... Subtract 1 from WS-Char-Code ... Add 2 to WS-Work-Idx ...
    (comp.lang.cobol)
  • RfD: Escaped Strings version 4
    ... the S" string can only contain printable characters, ... the S" string cannot contain the '"' character, ... as an escape character for the entry of characters that cannot be ... \b BS (backspace, ASCII 8) ...
    (comp.lang.forth)
  • RfD: Escaped Strings version 4
    ... the S" string can only contain printable characters, ... the S" string cannot contain the '"' character, ... as an escape character for the entry of characters that cannot be ... \b BS (backspace, ASCII 8) ...
    (comp.lang.forth)
  • Re: RfD: Escaped Strings
    ... the S" string can only contain printable characters, ... the S" string cannot contain the '"' character, ... \b BS (backspace, ASCII 8) ... \ ** escapes to characters much as C does. ...
    (comp.lang.forth)
  • Re: A note on computing thugs and coding bums
    ... code is valid for any character set that is legal in C (which is a ... characters in the required source character set ... A String, in C Sharp or Java, can be redefined. ... allow programmers to handle some other data format, ...
    (comp.programming)