[Scummvm-devel] Wintermute profiling

Willem Jan Palenstijn wjp at usecode.org
Sun Aug 25 21:32:43 CEST 2013


On Sun, Aug 25, 2013 at 10:38:21AM +0000, Willem Jan Palenstijn wrote:
> > Let me get done with UITIledImage - which is the only thing that, I've 
> > found, hinders the effectiveness of said rect changes -, reprofile the 
> > whole deal and optimized when needed (I may have  been misled by said 
> > UITIledImage when working on the rect system) and then we can start 
> > having fun.
> 
> How does it hinder the effectiveness exactly? Too many rectangles causing the
> new rectangle merge/split code to be slow? Or something more subtle?
> 
> Would it work to just batch the dirty rectangle updates? I.e., on startBatch
> start a new temporary rectangle and just naively (and cheaply) extend that
> with new rectangles. Then when endBatch is called, send the new combined
> rectangle for the entire batch to the main multi-dirty-rect system?



Let me expand on this a bit more. This is how I think the rendering behaves,
performance-wise. Please correct or amend if your experience is different.  The
conclusions are based on fairly fine-grained profiling of the current master
branch, including differentials after some experimental patches.


All the text below is about the situation with _tempDisableDirtyRects disabled.


For scenes with few tickets and relatively many changing areas, actual blitting
is the bottleneck. This is why the dirty rect system was introduced, and this
is also where a more advanced dirty rect system helps.

Once the number of tickets grows (such as due to a large UITiledImage), the
bottleneck shifted to ticket management. Two hotspots:

1. the compare loop suddenly yielded far fewer hits than before, causing it to
loop much longer. This turned out to be broken batching, and is fixed with a
quick hack in
https://github.com/wjp/scummvm/commit/fdf4237a51a5680e3aa9872dfd55a629394fefbf
.

2. the loops to keep ticket->_drawNum up to date and compute
_renderQueue.size() became very costly. This is fixed by the logic refactoring
in https://github.com/wjp/scummvm/commits/wme_lineartickets , removing the need
for the loops.

After that, the bottleneck is back where it was expected to be, at blitting,
which is the area where the multi dirty rect system improves things.


With the single dirty rect updates, this branch makes menu rendering (with
UITiledImage windows) very fast again, as there is almost nothing to
update/render per frame.



The next bit is speculation as I didn't do any profiling for this:

I'm assuming your current multi-rect code has quadratic complexity.
This would mean that it will become expensive with the numbers of tickets
involved in UITiledImage (1000+).

That very likely means that the main benefit of batching a UITiledImage
into a single update is that it reduces the number of dirty rectangles.
This effect can be reached much more simply by just batching the dirty
rectangles instead of the rendering.



As I said, this is partly speculation, so please correct any misconceptions if
your profiling data points at other problematic areas.


-Willem Jan




More information about the Scummvm-devel mailing list