There are some concern if the Runtime's performance is enough and one discernible concern is the clearing of the framebuffers now that we have doublebuffering.
One possible solution could be to use DMA or other HW accelerated memory operations block in SoC to clear the framebuffers. This would require create API and callback hook for clear operation (or if suitable, clearBitmap individual bitmaps that are big enough to warrent overheads). To really gain from HW acceleration, it should run asynchronously to the CPU, because now the clear in SW is in tight loop that most probably is memory BW constrained anyway.
Asynchronous operation, on the other hand, requires synchronization call(back) as polling or interrupt handler so that we know when the operation is done and the surface is safe to render into.
Chained DMA-lists, if available in platform, could be filled during clearing bitmaps and building list of dirty areas. It would then be pretty efficient to just kick the operation running, do something with CPU while the DMA goes through the programmed transfers.
HW setup time should be low and by using DMA-lists, the setup time could be amortized by having several operations back-to-back without intervening.