Structures in Assembly Language

From: Randall Hyde (randyhyde_at_earthlink.net)
Date: 08/18/04


Date: Wed, 18 Aug 2004 15:24:33 GMT

Hi All,

I've read several posts concerning structures and their
implementation in assembly language. Given some
misconceptions about structures in assembly language,
I pieced together the following article about structures
in assembly.
Cheers,
Randy Hyde

Structures in Assembly Language Programs

Structures, or records, are an abstract data type
that allows a programmer to collect different
objects together into a single, com- posite,
object. Structures can help make programs easier
to read, write, modify, and maintain. Used
appropriately, they can also help your programs
run faster. Despite the advantages that structures
offer, their appearance in assembly language is a
relatively recent phenomenon (in the past two
decades, or so), and many assemblers still do not
support this facility. Furthermore, many
"old-timer" assembly language programmers attempt
to argue that the appear- ance of records violates
the whole principle of "assembly language
programming." This article will certain refute
such arguments and describe the benefits of using
structures in an assembly language program.

Despite the fact that records have been available
in various assembly languages for years (e.g.,
Microsoft's MASM assembler introduced structures
in 80x86 assembly language in the 1980s), the
"lack of support for structures" is a common
argument against assembly language by HLL
programmers who don't know much about assembly.
In some respects, their ignorance is justified --
many assemblers don't support structures or
records. A second goal of this article is to
educate assembly language programmers to counter
claims like "assembly language doesn't support
struc- tures." Hopefully, that same education will
convince those assem- bly language programmers
who've never bothered to use structures, to
consider their use.

This article will use the term "record" to denote
a struc- ture/record to avoid confusion with the
more general term "data structure". Note,
however, that the terms "record" and "structure"
are synonymous in this article.

 What is a Record (Structure)?

The whole purpose of a record is to let you
encapsulate differ- ent, but logically related,
data into a single package. Here is a typi- cal
record declaration, in HLA using the RECORD /
ENDRECORD declaration:

type

   student:

            record

               Name: string;
               Major: int16;
               SSN: char[12];
               Midterm1: int16;
               Midterm2: int16;
               Final: int16;
               Homework: int16;
               Projects: int16;

            endrecord;

The field names within the record must be unique.
That is, the same name may not appear two or more
times in the same record. However, in reasonable
assemblers (like HLA) that support true
structures, all the field names are local to that
record. With such assemblers, you may reuse those
field names elsewhere in the pro- gram.

The RECORD/ENDRECORD type declaration may appear
in a variable declaration section (e.g., an HLA
STATIC or VAR section) or in a TYPE declaration
section. In the previous example the Stu- dent
declaration appears in an HLA TYPE section, so
this does not actually allocate any storage for a
Student variable. Instead, you have to
explicitly declare a variable of type Student.
The following example demonstrates how to do
this:

var

 John: Student;

This allocates 28 bytes of storage: four bytes for
the Name field (HLA strings are four-byte
pointers to character data found else- where in
memory), 12 bytes for the SSN field, and two bytes
for each of the other six fields.

If the label John corresponds to the base address
of this record, then the Name field is at offset
John+0, the Major field is at offset John+4, the
SSN field is at offset John+6, etc.

To access an element of a structure you need to
know the offset from the beginning of the
structure to the desired field. For exam- ple, the
Major field in the variable John is at offset 4
from the base address of John. Therefore, you
could store the value in AX into this field using
the instruction

mov( ax, (type word John[4]) );

Unfortunately, memorizing all the offsets to
fields in a record defeats the whole purpose of
using them in the first place. After all, if
you've got to deal with these numeric offsets why
not just use an array of bytes instead of a
record?

Well, as it turns out, assemblers like HLA that
support true records commonly let you refer to
field names in a record using the same mechanism
C/C++ and Pascal use: the dot operator. To store
AX into the Major field, you could use "mov( ax,
John.Major );" instead of the previous
instruction. This is much more readable and
certainly easier to use.

 Record Constants

HLA lets you define record constants. In fact,
HLA is probably unique among x86 assemblers
insofar as it supports both symbolic record
constants and literal record constants. Record
constants are useful as initializers for static
record variables. They are also quite useful as
compile-time data structures when using the HLA
com- pile-time language (that is, the macro
processor language). This section discusses how
to create record constants.

A record literal constant takes the following form:

RecordTypeName:[ List_of_comma_separated_constants ]

The RecordTypeName is the name of a record data
type you've defined in an HLA TYPE section prior
to this point. To create a record constant you
must have previously defined the record type in a
TYPE section of your program.

The constant list appearing between the brackets
are the data items for each of the fields in the
specified record. The first item in the list
corresponds to the first field of the record, the
second item in the list corresponds to the second
field, etc. The data types of each of the
constants appearing in this list must match their
respective field types. The following example
demonstrates how to use a lit- eral record
constant to initialize a record variable:

type

   point:
         record

            x:int32;
            y:int32;
            z:int32;

         endrecord;

static

   Vector: point := point:[ 1, -2, 3 ];

This declaration initializes Vector.x with 1,
Vector.y with -2, and Vector.z with 3.

You can also create symbolic record constants by
declaring record objects in the CONST or VAL
sections of an HLA program. You access fields of
these symbolic record constants just as you would
access the field of a record variable, using the
dot operator. Since the object is a constant,
you can specify the field of a record constant
anywhere a constant of that field's type is legal.
You can also employ symbolic record constants as
record variable initializ- ers. The following
example demonstrates this:

type

   point:

         record

            x:int32;
            y:int32;
            z:int32;

         endrecord;

const

   PointInSpace: point := point:[ 1, 2, 3 ];

static

   Vector: point := PointInSpace;
   XCoord: int32 := PointInSpace.x;

 Arrays of Records

It is a perfectly reasonable operation to create
an array of records. To do so, you simply create
a record type and then use the standard array
declaration syntax when declaring an array of that
 record type. The following example demonstrates
 how you could do this:

type

   recElement:

      record

         << fields for this record >>

      endrecord;

      .
      .
      .

static

   recArray: recElement[4];

Naturally, you can create multidimensional arrays
of records as well. You would use the standard
row or column major order func- tions to compute
the address of an element within such records.
The only thing that really changes (from the
discussion of arrays) is that the size of each
element is the size of the record object.

static

   rec2D: recElement[ 4, 6 ];

 Arrays and Records as Record Fields

Records may contain other records or arrays as
fields. Consider the following definition:

type

   Pixel:

      record

         Pt: point;
         color: dword;

      endrecord;

The definition above defines a single point with a
32 bit color component. When initializing an
object of type Pixel, the first ini- tializer
corresponds to the Pt field, not the x-coordinate
field. The following definition is incorrect:

static

    ThisPt: Pixel := Pixel:[ 5, 10 ]; // Syntactically incorrect!

The value of the first field ('5') is not an
object of type point. Therefore, the assembler
generates an error when encountering this
statement. HLA will allow you to initialize the
fields of Pixel using declarations like the
following:

static

    ThisPt: Pixel := Pixel:[ point:[ 1, 2, 3 ], 10 ];
    ThatPt: Pixel := Pixel:[ point:[ 0, 0, 0 ], 5 ];

Accessing Pixel fields is very easy. Like a high
level language you use a single period to
reference the Pt field and a second period to
access the x, y, and z fields of point:

      stdout.put( "ThisPt.Pt.x = ", ThisPt.Pt.x,
   nl );

      stdout.put( "ThisPt.Pt.y = ", ThisPt.Pt.y,
   nl );

      stdout.put( "ThisPt.Pt.z = ", ThisPt.Pt.z,
   nl );

       .

       .

       .

   mov( eax, ThisPt.Color );

You can also declare arrays as record fields. The
following record creates a data type capable of
representing an object with eight points (e.g., a
cube):

type

   Object8:

      record

         Pts: point[8];
         Color: dword;

      endrecord;

There are two common ways to nest record
definitions. As noted earlier in this section,
you can create a record type in a TYPE section
and then use that type name as the data type of
some field within a record (e.g., the Pt:point
field in the Pixel data type above). It is also
possible to declare a record directly within
another record without creating a separate data
type for that record; the following example
demonstrates this:

type

   NestedRecs:

      record

         iField: int32;
         sField: string;
         rField:
            record
               i:int32;
               u:uns32;
            endrecord;
         cField:char;

      endrecord;

Generally, it's a better idea to create a separate
type rather than embed records directly in other
records, but nesting them is per- fectly legal and
a reasonable thing to do on occasion.

 Controlling Field Offsets Within a Record

By default, whenever you create a record, most
assemblers automatically assign the offset zero
to the first field of that record. This
corresponds to records in a high level language
and is the intu- itive default condition. In some
instances, however, you may want to assign a
different starting offset to the first field of
the record. The HLA assembler provides a
mechanism that lets you set the starting offset
of the first field in the record.

The syntax to set the first offset is

name:

    record := startingOffset;

        << Record Field Declarations >>

    endrecord;

Using the syntax above, the first field will have
the starting off- set specified by the
startingOffset int32 constant expression. Since
this is an int32 value, the starting offset value
can be positive, zero, or negative.

One circumstance where this feature is invaluable
is when you have a record whose base address is
actually somewhere within the data structure.
The classic example is an HLA string. An HLA
string uses a record declaration similar to the
following:

   record

      MaxStrLen: dword;
      length: dword;
      charData: char[xxxx];

   endrecord;

However, HLA string pointers do not contain the
address of the MaxStrLen field; they point at
the charData field. The str.strRec record type
found in the HLA Standard Library Strings module
uses a record declaration similar to the
following:

type

   strRec:
      record := -8;
         MaxStrLen: dword;
         length: dword;
         charData: char;
      endrecord;

The starting offset for the MaxStrLen field is -8.
Therefore, the offset for the length field is -4
(four bytes later) and the offset for the
charData field is zero. Therefore, if EBX points
at some string data, then "(type str.strRec
[ebx]).length" is equivalent to "[ebx-4]" since
the length field has an offset of -4.

 Aligning Fields Within a Record

To achieve maximum performance in your programs,
or to ensure that your records properly map to
records or structures in some high level
language, you will often need to be able to
control the alignment of fields within a record.
For example, you might want to ensure that a
dword field's offset is an even multiple of four.
You use the ALIGN directive in a record
declaration to do this. The following example
shows how to align some fields on important
boundaries:

type

   PaddedRecord:
      record
         c: char;
            align(4);
         d: dword;
         b: boolean;
            align(2);
         w: word;
      endrecord;

Whenever HLA encounters the ALIGN directive within
a record declaration, it automatically adjusts
the following field's off- set so that it is an
even multiple of the value the ALIGN directive
specifies. It accomplishes this by increasing the
offset of that field, if necessary. In the
example above, the fields would have the fol-
lowing offsets: c:0, d:4, b:8, w:10.

If you want to ensure that the record's size is a
multiple of some value, then simply stick an ALIGN
directive as the last item in the record
declaration. HLA will emit an appropriate number
of bytes of padding at the end of the record to
fill it in to the appropriate size. The
following example demonstrates how to ensure that
the record's size is a multiple of four bytes:

type

   PaddedRec:
      record
         << some field declarations >>

         align(4);

      endrecord;

Be aware of the fact that the ALIGN directive in a
RECORD only aligns fields in memory if the record
object itself is aligned on an appropriate
boundary. Therefore, you must ensure appropriate
 alignment of any record variable whose fields
 you're assuming are aligned.

If you want to ensure that all fields are
appropriately aligned on some boundary within a
record, but you don't want to have to man- ually
insert ALIGN directives throughout the record, HLA
provides a second alignment option to solve your
problem. Consider the fol- lowing syntax:

type

  alignedRecord3 :
    record[4]

      << Set of fields >>

    endrecord;

The "[4]" immediately following the RECORD
reserved word tells HLA to start all fields in
the record at offsets that are multiples of four,
regardless of the object's size (and the size of
the objects preceeding the field). HLA allows any
integer expression that pro- duces a value in the
range 1..4096 inside these parenthesis. If you
specify the value one (which is the default), then
all fields are packed (aligned on a byte
boundary). For values greater than one, HLA will
align each field of the record on the specified
boundary. For arrays, HLA will align the field
on a boundary that is a multiple of the array
element's size. The maximum boundary HLA will
round any field to is a multiple of 4096 bytes.

Note that if you set the record alignment using
this syntactical form, any ALIGN directive you
supply in the record may not pro- duce the desired
results. When HLA sees an ALIGN directive in a
record that is using field alignment, HLA will
first align the current offset to the value
specified by ALIGN and then align the next
field's offset to the global record align value.

Nested record declarations may specify a different
alignment value than the enclosing record, e.g.,

type

   alignedRecord4 : record[4]
      a:byte;
      b:byte;
      c:record[8]
         d:byte;
         e:byte;
      endrecord;
      f:byte;
      g:byte;
   endrecord;

In this example, HLA aligns fields a, b, f, and g
on dword bound- aries, it aligns d and e (within
c) on eight-byte boundaries. Note that the
alignment of the fields in the nested record is
true only within that nested record. That is, if
c turns out to be aligned on some boundary other
than an eight-byte boundary, then d and e will
not actually be on eight-byte boundaries; they
will, however be on eight-byte boundaries
relative to the start of c.

In addition to letting you specify a fixed
alignment value, HLA also lets you specify a
minimum and maximum alignment value for a record.
The syntax for this is the following:

type

   recordname : record[maximum : minimum]

      << fields >>

   endrecord;

Whenever you specify a maximum and minimum value
as above, HLA will align all fields on a boundary
that is at least the minimum alignment value.
However, if the object's size is greater than the
minimum value but less than or equal to the
maximum value, then HLA will align that
particular field on a boundary that is a multiple
of the object's size. If the object's size is
greater than the maximum size, then HLA will align
the object on a boundary that is a multiple of
the maximum size. As an example, consider the
fol- lowing record:

type

   r: record[ 4:1 ];
      a:byte; // offset 0
      b:word; // offset 2
      c:byte; // offset 4
      d:dword[2]; // offset 8
      e:byte; // offset 16
      f:byte; // offset 17
      g:qword; // offset 20
   endrecord;

Note that HLA aligns g on a dword boundary (not
qword, which would be offset 24) since the
maximum alignment size is four. Note that since
the minimum size is one, HLA allows the f field to
 be aligned on an odd boundary (since it's a
 byte).

If an array, record, or union field appears within
a record, then HLA uses the size of an array
element or the largest field of the record or
union to determine the alignment size. That is,
HLA will align the field without the outermost
record on a boundary that is compatible with the
size of the largest element of the nested array,
union, or record.

HLA sophisticated record alignment facilities let
you specify record field alignments that match
that used by most major high level language
compilers. This lets you easily access data types
 used in those HLLs without resorting to inserting
 lots of ALIGN directives inside the record.

 Using Records/Structures in an Assembly
Language Program

In the "good old days" assembly language
programmers typi- cally ignored records. Records
and structures were treated as unwanted
stepchildren from high-level languages, that
weren't nec- essary in "real" assembly language
programs. Manually counting offsets and
hand-coding literal constant offsets from a base
address was the way "real" programmers wrote code
in early PC applica- tions. Unfortunately for
those "real programmers", the advent of
sophisticated operating systems like Windows and
Linux put an end to that nonsense. Today, it is
very difficult to avoid using records in modern
applications because too many API functions
require their use. If you look at typical Windows
and Linux include files for C or assembly
language, you'll find hundreds of different
structure dec- larations, many of which have
dozens of different members. Attempting to keep
track of all the field offsets in all of these
struc- tures is out of the question. Worse,
between various releases of an operating system
(e.g., Linux), some structures have been known to
change, thus exacerbating the problem. Today, it's
unreasonable to expect an assembly language
programmer to manually track such offsets - most
programmers have the reasonable expectation that
the assembler will provide this facility for
them.

 Implementing Structures in an Assembler

Unfortunately, properly implementing structures in
an assem- bler takes considerable effort. A large
number of the "hobby" (i.e., non-commercial)
assemblers were not designed from the start to
support sophisticated features such as
records/structures. The sym- bol table management
routines in most assemblers use a "flat" lay- out,
with all of the symbols appearing at the same
level in the symbol table database. To properly
support structures or records, you need a
hierarchical structure in your symbol table
database. The bad news is that it's quite
difficult to retrofit a hierarchical structure
over the top of a flat database (i.e., the symbol
"hobby assembler" symbol table). Therefore,
unless the assembler was originally designed to
handle structures properly, the result is usu-
ally a major hacked-up kludge.

Four assemblers I'm aware of, MASM, TASM, OPTASM,
and HLA, handle structures well. Most other
assemblers are still trying to simulate
structures using a flat symbol table database,
with vary- ing results.

Probably the first attempt people make at records,
when their assembler doesn't support them
properly, is to create a list of con- stant
symbols that specify the offsets into the record.
Returning to our first example (in HLA):

type

   student:
            record
               Name: string;
               Major: int16;
               SSN: char[12];
               Midterm1: int16;
               Midterm2: int16;
               Final: int16;
               Homework: int16;
               Projects: int16;
            endrecord;

One attempt might be the following:

const
    Name := 0;
    Major := 4;
    SSN := 6;
    Midterm1 := 18;
    Midterm2 := 20;
    Final := 22;
    Homework := 24;
    Projects := 26;
    size_student := 28;

With such a set of declarations, you could reserve
space for a student "record" by reserving
"size_student" bytes of storage (which almost all
assemblers handle okay) and then you can access
fields of the record by adding the constant offset
to your base address, e.g.,

static

    John : byte[ size_student ];
      .
      .
      .
    mov( John[Midterm1], ax );

There are several problems with this approach.
First of all, the field names are global and must
be globally unique. That is, you cannot have two
record types that have the same fieldname (as is
possible with the assembler supports true
records). The second problem, which is
fundamentally more problematic, is the fact that
you can attach these constant offsets to any
object, not just a "stu- dent record" type object.
For example, suppose "ClassAverage" is an array
of words, there is nothing stopping you from
writing the following when using constant equate
values to simulate record off- sets:

 mov( ClassAverage[ Midterm1 ], ax );

Finally, and probably the most damning criticism
of this approach, is that it is very difficult to
maintain code that accesses structures in this
manner. Inserting fields into the middle of a
record, changing data types, and coming up with
globally unique names can create all sorts of
problems. Many high-level language programmers
who've tried to learn assembly language have given
up after discovering that they had to maintain
records in this fashion in an assembly language
program (too bad they didn't start off with a
reasonable assembler that properly supports
structures).

Manually maintaining all the constant offsets is a
maintenance nightmare. So somewhere along the
way, some assembly language programmers figured
out that they could write macros to handle the
declaration of constant offsets for them. For
example, here's how you could do this in an HLA
program:

program t;

#macro struct( _structName_, _dcls_[] ):
    _dcl_, _id_, _type_, _colon_, _offset_;

    ?_offset_ := 0;
    ?_dcl_:string;
    #for( _dcl_ in _dcls_ )

        ?_colon_ := @index( _dcl_ , 0, ":" );
        #if( _colon_ = -1 )

            #error
            (
                "Expected <id>:<type> in struct "
                "definition, encountered: ",
                _dcl_
            )

        #else

            ?_id_ := @substr( _dcl_, 0, _colon_ );
            ?_type_ :=
                  @substr
                   (
                        _dcl_,
                        @length( _dcl_ ) - _colon_
                   );
            ?@text( _id_ ) := _offset_;
            ?_offset_ :=
                 _offset_ + @size( @text( _type_ ));

        #endif;

    #endfor
    ?_structName_:text :=
           "byte[" + @string( _offset_ ) + "]";

#endmacro

struct( threeItems, i:byte, j:word, k:dword )

static
    aStruct: threeItems;

begin t;

    mov( (type byte aStruct[i]), al );
    mov( (type word aStruct[j]), ax );
    mov( (type dword aStruct[k]), eax );

end t;

The "struct" macro expects a set of valid HLA
variable declara- tions supplied as macro
arguments. It generates a set of constants using
the supplied variable names whose offsets are
adjusted according to the size of the objects
previously appearing in the list. In this
example, HLA creates the following equates:

  i = 0
  j = 1
  k = 3

This declaration also creates a "data type" named
"threeItems" which is equivalent to "byte[7]"
(since there are seven bytes in this record) that
you may use to create variables of type
"threeItems", as is done in this example.

Creating structures with macros solves one of the
three major problems: it makes it easier to
maintain the constant equates list, as you do not
have to manually adjust all the constants when
inserting and removing fields in a record. This
does not, however, solve the other problems
(particularly, the global identifier problem).

While fancier macros could be written, macros that
generate identifiers like "objectname_fieldName"
that help solve the globally unique problem, the
bottom line is that these hacks begin to fail
when you attempt to declare nested records, arrays
within records, and arrays of records (possibly
containing nested records and arrays of records).
The bottom line is this: assemblers that don't
properly support structures are going to have
problems when you've got to work with data
structures from high-level languages (e.g., OS API
 calls, where the OS is written in C, such as
 Windows and Linux). You're much better off using
 an assembler that fully supports struc- tures
 (and other advanced data types) if you need to
 use structures in your programs.



Relevant Pages

  • Why people perceive ASM to be hard to read
    ... > Its much the same comment as I made to Robert, horses for courses, ... because most people perceive that assembly language programs are *hard* ... assemblers that attempt to simulate these features. ... existed because *most* assembly language programmers weren't using ...
    (alt.lang.asm)
  • Re: Hey Mr. Hyde!
    ... fewer and fewer students get introduced to assembly language ... programming and, in fact, wind up with four years' of instructors telling ... It really isn't Randy Hyde, Universities, OSes, or high-level assemblers ...
    (alt.lang.asm)
  • Re: Ten years later
    ... assembly language source code should be written in a completely ... would hope that the influence of a HLL-like programming style will ... assemblers around here. ... The same can be said for FASM. ...
    (alt.lang.asm)
  • Re: Ten years later
    ... assembly language source code should be written in a completely ... Most students give up ASM right after the complete the course. ... assemblers around here. ... HLA supports), then I guess we've got to strike NASM, FASM, and Gas ...
    (alt.lang.asm)