What does multithreading with "HAVE_TBB = true" speed up?

What does multithreading with "HAVE_TBB = true" speed up?
I have been experimenting with multithreading in OpenCascade, by setting the environment variable HAVE_TBB to true (and rebuilding!).
However, it doesn't seem to speed anything up.
This is with OpenCascade 7.0.0.

What operations should "HAVE_TBB = true" speed up?

One of the things I am  doing is BRepMesh_IncrementalMesh, e.g:

    const Standard_Real    theLinDeflection = 1e-5; 
    const Standard_Boolean isRelative = Standard_True;
    const Standard_Real    theAngDeflection = 0.5;
    const Standard_Boolean isInParallel = Standard_True;
    const Standard_Boolean adaptiveMin = Standard_True;

    //const TopoDS_Shape aShape already defined
    
    BRepMesh_IncrementalMesh m(aShape, theLinDeflection, isRelative, theAngDeflection, isInParallel, adaptiveMin);

The speed of this seems unaffected by "HAVE_TBB = true", but it is affected by the isInParallel flag.
How does BRepMesh_IncrementalMesh's isInParallel flag relate to "HAVE_TBB = true"?

Thanks

Jim

Roman Lygin's picture

Jim,
 

HAVE_TBB is merely a preprocessor macro that specifies whether the code that *can potentially* run parallel will be compiled to *potentially* run parallel with TBB. Whether it *will* run parallel and how efficiently will it run (in terms of speed up and scalability) strongly depends upon at least:

1. speeding up the relevant code (refer to Amdahl's law - if you try to parallelize the code that only takes 10% of your time, your maximum theoretical speed up will be =1 / (1-0.1) = 1.11)

2. implementation efficiency (how efficiently you take advantage of the parallel programming model and reduce parallelism overhead; for instance, if you create and destroy a new thread to process a particular vertex you will obviously observe loss of performance instead of any speed up)

3. a particular workload (whether you have enough "parallel slack", i.e. excessive work that can run in parallel - for instance, if you triangulate a box of 6 faces on a 8 core machine you obviously do not have enough slack)
 

For typical mesher workflow one would parallelize discretization of edges, then internal triangulation of faces. The latter is the usually the greatest bottleneck and thus a first candidate to parallelize. I don't know if BRepMesher really does this. Quick search reveals that BRepMesh_FastDiscret::Process(shape) processes faces independently that might cause a data race on partner faces, so perhaps some pre-processing is done upstream.
 

Also BRepMesher chose tbb::parallel_for_each algorithm instead of tbb::parallel_for what can be less efficient than the latter for this particular use case.

Again, these are just general thoughts. "Your mileage can vary".

Thanks,

Roman

Forum supervisor's picture

Hello Jim,

there are several aspects related to multi-threading in OCCT:

  • HAVE_TBB does NOT enables multi-threading, this option enables usage of Intel TBB library.
    In the past, it was the only way for multi-threaded algorithms in OCCT, but now even without Intel TBB multi-threading is still available - see OSD_Parallel (but might be less efficient).
  • Multi-threading optimizations are available in various OCCT algorithms like:
  • Multi-threading is NOT enabled by default (regardless of macros HAVE_TBB) - user have to enable it explicitly when required within specific algorithm.
  • The efficiency of multi-threading (comparing to single-threaded execution) depends on the nature of input data
    (e.g. multi-threading in TKMesh is better when shape contains many Faces).

Best regards,
Forum supervisor

Jim Williams's picture

Thanks for both those replies.

Looks like BRepMesh_IncrementalMesh 's parameter isInParallel  is all I need to use , for now at least.

Jim