Re: String filtering



David Trudgett wrote:


Good.  It makes the String and Unbounded_String versions practically
equivalent - probably both in CPU and memory use.


Much of a muchness, I would guess. Profiling particular applications
on particular compilers is the only way to tell for sure, though.

Got some figures. As expected, String is always faster than Unbounded_String. Maybe surprisingly, Vector is somewhat faster than Unbounded_String in all cases, provided inlining is used. Heap means the String objects have been allocated using new. Compiler is GCC 4.1 on GNU/Linux x86.

-O2 -gnatn -gnato:

1. iteration, 10 chars, 1000000 runs.
Fixed:      2.084710000
Heap:       1.498224000
Unbounded:  7.608056000
Vector:     5.686385000
2. iteration, 10000 chars, 1000 runs.
Fixed:      0.421747000
Heap:       0.477814000
Unbounded:  0.787875000
Vector:     0.515643000
3. iteration, 1000000 chars, 10 runs.
Fixed:      0.560290000
Heap:       0.622039000
Unbounded:  1.137758000
Vector:     0.917281000

-O2 -gnato

1. iteration, 10 chars, 1000000 runs.
Fixed:      1.730108000
Heap:       1.604875000
Unbounded:  7.659804000
Vector:     6.483596000
2. iteration, 10000 chars, 1000 runs.
Fixed:      0.510872000
Heap:       0.566339000
Unbounded:  0.872703000
Vector:     1.044757000
3. iteration, 1000000 chars, 10 runs.
Fixed:      0.650525000
Heap:       0.710203000
Unbounded:  1.213516000
Vector:     1.437887000


The Vector function uses Vec_String in place of Unbounded_String, where subtype Vec_String is Character_Vectors.Vector:


function Strip_Non_Alphanumeric (Str: in Vec_String) return Vec_String is use Character_Vectors, Ada.Containers;

     Dest_Char: Index_Subtype'Base := 0;
     New_Str: Vec_String;
     Dest_Size: constant Count_Type := Length(Str);

  begin
     if Dest_Size > 0 then
        New_Str := To_Vector(Dest_Size);
        for Src_Char in 1 .. Last_Index(Str) loop
           if Is_In(Element(Str, Src_Char), Alpha_Num_Space_Set) then
              Dest_Char := Dest_Char + 1;
              Replace_Element
                (New_Str, Dest_Char, Element(Str, Src_Char));
           end if;
        end loop;
     else
        null;
     end if;
     return New_Str;
  end Strip_Non_Alphanumeric;
.



Relevant Pages

  • Re: Cleaning data - performance issue
    ... every iteration of the loop. ... by creating one RegEx instance and setting the options on it to do a ... I did as suggested and moved the parsing of chars outside the method itself ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Size of Vector limited to 1024 MB of Heap Size
    ... raised STORAGE_ERROR: heap exhausted ... package Generic_Vector is new Ada.Containers.Vectors ... Peter Schildmann wrote: ... to terminate an endless loop. ...
    (comp.lang.ada)
  • Re: No std::bad_alloc
    ... test program running a loop which dynamically allocates some objects ... on the heap and than run an external application via "system". ...
    (comp.unix.programmer)
  • Re: Size of Vector limited to 1024 MB of Heap Size
    ... 1024 MB of used Heap memory. ... package Generic_Vector is new Ada.Containers.Vectors ... for N in Index_Type'Range loop ...
    (comp.lang.ada)
  • Re: Size of Vector limited to 1024 MB of Heap Size
    ... Adjust your heap size in the linking phase. ... Ada Compiler limit the memory to 2 GB per default. ... end loop; ... the final water mark is, needless to say, higher. ...
    (comp.lang.ada)