Re: Substring



On Mon, 18 May 2009 18:40:11 -0700, Mark Space <markspace@xxxxxxxxxxxxxx> wrote:

Lew wrote:

But it's the exact same array as the one pointed to by 'test', so it's not doing any harm, and it's apparently faster in that it saves allocation of a new array.
Let's see, no harm, some benefit. I don't see a problem.


In this brief example, sure. But what if the "test" string is large and acquired some other way besides a program constant? Let's say read in as part of parsing a large text file, or downloaded from the network?

I don't think it takes too much imagination to see the behavior of substring() in the is case as a memory leak. [...]

I do.

I doubt Java chooses this implementation arbitrarily. A common scenario is in fact taking an initial string and chopping it into smaller pieces. It's more efficient to do this using a single common source array and just maintaining indices into the array, both in terms of memory usage and in terms of speed.

There may well be scenarios in which one starts with a large string, and then retains just a single tiny subset of that string, but I doubt they are all that common. And even there, in most cases the retention of the original string isn't going to be a problem.

In the scenario where it _is_ a problem, someone will have determined that through proper measurement, and then can state unequivocally that a work-around is needed. But, as you yourself pointed out, Java does in fact offer a reasonable work-around: instantiate a new String instance passing the substring as the constructor.

Worrying about the shared array is a premature optimization at best, and ignores an important "common case" optimization at worst.

[...]
Obviously, one doesn't want create new strings for no good reason. Computer programmers often have to choose between fast algorithms which use more memory (as substring() does) and slower algorithms which conserve memory (as String(String) does).

That's a false dichotomy, since most of the time, the fast algorithm also uses less memory.

Selecting which one to use can be a dark art, but one should be aware that they have a choice and different optimizations are available (and I'm thinking primarily of the OP here, who seems still a little unclear about basics like references and how they work).

No disrespect intended to the OP, but a person who is still at this phase in their learning is a long way off from caring about this level of performance optimization, even in the situations where it might be applicable.

Pete
.



Relevant Pages

  • Re: Fast string operations
    ... Looping: I thought looping over arrays in managed code was "slow" ... array handling and such. ... The problem with TrimHelper is that it always returns a new string instance. ... The customer perceives this as a memory leak. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: fscanf issues, please help.
    ... those pointers point to strings of whatever length. ... An array of pointers is an array of pointer, ... they pont to some random places in memory. ... be an integer, the next one a string, followed by a hash etc., ...
    (comp.lang.c)
  • Re: Concat some string not ended...
    ... > i try to concat some string not ended ... What you're dealing here with are simple char arrays ... 'size' isn't compile-time constant and thus the length of the array ... a char pointer and than allocate enough memory. ...
    (comp.unix.programmer)
  • string literals (was Reset a string?)
    ... > - An unamed static array of chars ... within the string "some string that ends with a specific word". ... It remains in memory, because a static-duration array exists until ...
    (comp.lang.c)
  • Re: Size of arrary
    ... know how much memory my application really takes, ... > knowing how much memory each array takes? ... > For the string array, you'll need to loop through all the elements because ... > all the same, but in that case, why are you using a variant?). ...
    (microsoft.public.vb.controls)