TCollection_AsciiString memory leaks

With next code I have 60 Mb of licked memory:
TCollection_AsciiString aString;

for (int i = 0; i {

aString += "Tested value";
}

but if I'm use with next, there is no any problem

for (int i = 0; i {
TCollection_AsciiString aString;
aString += "Tested value";
}

Also I'm test next with std::string and QString and they working without leaks, it's a problem of TCollection_AsciiString

JuryS's picture

Here is some big problem with XmlMDataStd_ByteArrayDriver, where use TCollection_AsciiString. And in all another classes

Roman Lygin's picture

Hi Yuriy,

What makes you believe there is a leak ?
Can this help to double-check - http://opencascade.blogspot.com/2011/06/is-my-memory-leaking-part3.html ?

AFAIR, operator+=() uses Standard::Reallocate(), so unlikely are there any leaks. But when using custom allocators (e.g. OCC or TBB), the memory is hoarded by the allocator and will be reused in the future. When using standard allocator (MMGT_OPT=0) the memory is immediately returned back to the system.

In your second case, as all the strings have the same lengths, the same chunk of memory (freed by a destructor) is used over and over again.

HTH,
Roman

JuryS's picture

Hi, I don't try to find any memory leaks, but when I input preview image with XmlMDataStd_ByteArrayDriver I have over 100 Mb memory not released after.... Also in this driver need some functions like itoa, atoi to save byte array data in 16 radix, it's see before:
>137 80 78 71 13 10 26 10 0 0 0 13 73 ...
and after:
>89504e470d0a1a0a0d49484452 ...

I'm now replace all TCollection_AsciiString with char* / std::string, also replace Quantity_Color float/double values with const unsigned char*(0...255 values).

I see that this engine classes is not correct worked, not optimized and that is problem.

Denis Barbier's picture

Hi Yuriy, your code may consume more memory than expected, but your diagnostic looks wrong, it seems very unlikely that TCollection_AsciiString is the culprit. Maybe it could help if you can provide a simple self-contained test case.

That said, TCollection_AsciiString is awful; a less intrusive solution is to change its private member mystring from Standard_PCharacter to std::string and rewrite internal code to drop all hand-crafted hacks and use only std::string functions, that could be useful.

Forum supervisor's picture

Dear Yuriy,
Probably you don't try to find memory leaks, but you report it.
I hope you understand that without publishing of self-contained test
and a method of leaks measuring this information is useless.
Pay your attention, please to the post of Roman above.
It perfectly answer to your case and I hope may help you very much.
Regards

JuryS's picture

Ok !

Try next code:

TCollection_AsciiString aString;

for (int i = 0; i < 100000; ++i)
{

aString += "Tested value";
}

Result : 129.625 seconds.

This will suspend your application for some minutes. And also memory leaks with 60 Mb
I don't need any debugger to see that this instruction is work not correctly.

QTime myTime;
std::string aString;

myTime.start();

for (int i = 0; i < 100000; ++i)
{

aString += "Tested value";
}

double myMlsec = myTime.elapsed(); myMlsec /= 1000;
qDebug() << myMlsec;

And some test with std::string: (!) 0.015 sec

!!! It works faster in 8600 times

JuryS's picture

And it's no user interface, it's Engine classes. Also without any debugger and TBB and other:

qDebug() << sizeof(double);
qDebug() << sizeof(float);
qDebug() << sizeof(unsigned char);

Result:

8
4
1

Which type more effectly to use ???
Which function ? glColor3fv or glColor3ubv ?

It's cool have integer value with Quantity_Color.Name(), but it's distortion original color and materials.

Denis Barbier's picture

Yuriy, please send a full test case to reproduce your problems, not an excerpt. Here is mine, taken from your code, it runs on 0.014s on my machine (of course, it runs faster with std::string), I doubt that my machine is 9000 times faster than yours. And valgrind does not report any memory leak.

Attachments: 
JuryS's picture

Denis, you are really don't have problems with next ?
I make single project with your main code and wait for 3 minutes before the application complete.
Visual C++ 9.0
Now I tried this under Linux and MacOS and reply here.

Thanks.

Denis Barbier's picture

My apologies, you were right. It worked for me on Linux because I have MMGT_OPT set to 0 in my environment. After unsetting this environment variable, my test case takes 32 seconds instead of 0.014s. Now I know why this environment variable is set ;-)

Roman Lygin's picture

Of course, as in this scenario the memory is never released.
The whole point in custom memory allocator is when you regularly reuse deallocated memory. In this isolated scenario this is not the case.

Denis Barbier's picture

I do not get your point. Yuriy did nothing wrong with the memory allocator; his only error was to rely on the standard behaviour instead of disabling it in order to use the system memory allocator.

And this is not only about memory, but also performance. His test case takes 32s on my machine when MMGT_OPT is unset, 15s with "optimized" memory allocator (MMGT_OPT=1) and 0.01s (or less) when using the system memory allocator (MMGT_OPT=0).

I am more and more annoyed by developers who pretend that "my custom code is optimized". Almost always, someone is bitten by a corner case which had not been taken into account, and performances become dramatic. Have a look at Standard_String.hxx for instance. In the 90's it may have been a good idea to write all those macros (STRCPY, EXTSTRINGCOPY, EXTENDEDSTRINGCOPY, etc, with or without OptJr) to speed up string processing, but nowadays strcpy is written in assembly and is much more optimized. I am 100% sure that this hand-crafted code is slower than a simple strcpy in *all* cases. And of course, this is less readable and error-prone.

I do not use OCAF; so Yuriy, if you can provide a simple self-contained test case which uses XmlMDataStd_ByteArrayDriver, I am willing to play with this idea and replace custom code in TCollection_*String classes by std::string to prove my point.

JuryS's picture

Thank you

Denis Barbier's picture

Replying to myself: on his blog, Roman gave a link to an article he wrote on Intel's blog http://software.intel.com/en-us/blogs/2011/04/04/tbb-adoption-in-cad-tec...

I do not doubt that newer OCCT releases fix some performance problems exhibited by these graphs, and one could write another article which explains how the new optimized memory allocator performs much better thanks to benchmarking and microoptimizations.
But my point is that, without any code change, performances are much higher on Windows Server 2008 than on Windows Server 2003. So it is much better for the long term to rely on improvements of standard libraries, rather than writing complicated code which becomes obsolete in few years (and which also exhibits dramatic performances in unexpected corner cases).

Roman Lygin's picture

Denis, Yuriy, all,

I believe we are mixing several issues here and this gets us distracted. So I'd like to clarify. Let's also stick to the problematic test case as posted by Denis.

1. Memory leak as initially claimed. This is a false statement, and everybody agrees, right? No memory is leaked.

2. TCollection_AsciiString to use std::string vs current OCC implementations of 1990-es. That could make sense, indeed. The only thing to keep in mind is that it must rather be basic_string that would accept allocator object. That allocator object shall implement C++ allocator using Standard::Allocate(), Standard::Free(). This is important to preserve flexibility of using arbitrary memory allocator.
Performance testing must be applied to confirm this new implementation will be any more efficient.

3. Performance.
a. With MMGT_OPT=0 (i.e. system allocator) the test behaves 100-1000x faster than with MMGT_OPT=1 (OCC allocator).
I used Intel Parallel Amplifier to understand the root-cause of this huge slow-down. The root-cause is that as of i=3333 (i.e. ~97% of time!) special branch in OCC allocator is used, which is use of mmap() on Linux and CreateFileMapping() on Windows (see Standard_MMgrOpt::AllocMemory()). This imposes huge overhead being called for each call of operator+=().
This could be hint to the OCC team to revisit the work with large blocks.

b. With MMGT_OPT=2 (i.e. TBB allocator) the code behaves ~2x faster than with MMGT_OPT=0. It makes sense to increase n to 1000000 to have meaningful times (mine were ~13sec vs ~25sec). Thus, TBB is a winner in this case ;-).
Windows7, VS2005 SP1.
AFAICT, the reason is efficient work of TBB with large-blocks and their caching, taking benefits of hoarding large blocks and reusing them for numerous iterations before a next larger block is allocated.

4. Custom allocators vs existing system allocators.
I perfectly agree with Denis' vision here in long-term. At some point there must be system allocators that would make attempts to have custom allocators worthless. The only exception would likely be memory regions (allocation of a large block and then deallocation at once). Windows 2008 vs 2003 prove the trend (I intentionally left both charts in the quoted blog post to show the difference). The problem however is that we are far from there yet. OS vendors still do not offer optimal allocators, especially for *threaded* scenarios. Therefore TBB allocator, for instance, is often a winner here. In recent experiments with Salome, TBB gave extra 3x speed-up vs system allocator, especially on large blocks. So there is a lot of demand for that.

5. Comments on Quantity_Color, sizeof (double), etc are fully out of my outstanding, so I have no comments on that.

Thanks for a good discussion.

HTH,
Roman

Denis Barbier's picture

Thanks Roman, I fully agree with everything you wrote here. For the record, I also found TBB to be faster than system allocator on Linux ;-)

JuryS's picture

+1

Forum supervisor's picture

Dear Roman,
Thanks for your accurate and detailed analysis.
We will take into account your ideas and will revisit as the work of the memory manager as the TCollection_AsciiString at the nearest time (as soon as we will have resources for this).
Regards

Roman Lygin's picture

Guys, thanks for confirmations, and to the OCC team for willingness to look into the issue.
Thanks again for a good discussion.

JuryS's picture

0.25 ms my computer need upgrade (((
MMGT_OPT = 0, then this problem is gone, but this option must be set before application started, not with qputenv.
And by default this option is set to 1, need compile without MMGT_OPT use.
I exclude from the working out:
TCollection_AsciiString
Quantity_Color

Here is some way to store ByteArray data in XML with FF encoding/decoding.
Also if your system don't contains itoa function there is myitoa function, that maybe uncommented.
There is no problem to save ByteArray in bin format, but for XML I may create plugin with assembler to use, for the sample, to view image, extracted from FF string under MS Exporer or Debian gnome.

Denis Barbier's picture

Sorry Yuriy, there is some misunderstanding between us. I need a C++ source file to compile a program which I can run (with a 'main' method), like my test_string.cpp

JuryS's picture

ok, need some time

JuryS's picture

also some think about base64 encode/decode for byte array data, but result is more that use my FF algo.
Image size: 15 384
XML without image: 104 332
XML with image no encoding: 159 624
XML with image encoded with FF: 139 264
XML with image encoded with base: 178 011 (!)
XML with image encoded with default ANSI string: 124 116 but that XML can't be read because contains some operating symbols like <>[]/\n\b and other

I don't understand why mail programs use this base64 algoritm.
It's strange.