Closed Bug 424715 Opened 16 years ago Closed 15 years ago

Takes excessively long to render raytracing benchmark

Categories

(Core :: Layout, defect, P1)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: zurtex, Unassigned)

References

(Depends on 3 open bugs, )

Details

(Keywords: perf, Whiteboard: Full Render on DOM Raytracer freezes Firefox for a while, be prepared!!)

Attachments

(2 files, 1 obsolete file)

Run Full Render benchmark on this site:

http://nontroppo.org/timer/progressive_raytracer.html

Be prepared for Firefox to more or less freeze for up to 10 mins on a good CPU.

Take latest snapshot of Opera, runs about 10x faster (12x faster when I'm not using a clean profile for Firefox). Something is clearly going very very wrong with Firefox rendering. I don't think it matters if other browsers are running a little faster, but this is clearly indicative of something that needs fixing.
Product: Firefox → Core
QA Contact: general → general
Assignee: nobody → general
Component: General → JavaScript Engine
QA Contact: general → general
Keywords: perf
It was discussed in IRC this might be due to the DOM tree rebuilding every time it does a pass.
Also discussed it might be the way Firefox renders in the first place, using Continuations as discussed by ROC: http://weblogs.mozillazine.org/roc/archives/2007/10/if_i_did_it.html
Extremely unlikely to be a JS engine issue here, from reading the code and light experimentation.

Basic render shows the effects clearly enough:

Basic render: 2.74 sec
Make drawBlock a no-op: 0.3 sec
Just remove appendChild from drawBlock: 0.97 sec
Make javaSphereColour always return 'rgb(0,0,0)': 2.43 sec

So about 90% of the time is spent in the DOM, 65% in node creation and the property setting.

If I make drawBlock a no-op, a full render completes in 1.2 seconds on today's trunk, Vista, 2.mumble GHz CPU.

So I think the thesis of the page is incorrect, to the extent that the test actually demonstrates it. :)

(If you switch to a canvas rather than using the <div>s, and have "javaSphereColour" return [r,g,b] instead of the string, I suspect you will find that a Firefox nightly or even beta4 is pretty competitive with Opera.  Maybe even faster enough that you'll want to file a bug on them. :) )
Assignee: general → nobody
Component: JavaScript Engine → DOM
QA Contact: general → general
Summary: JavaScript takes excessively long to render raytracing benchmark → Takes excessively long to render raytracing benchmark
Attached file "Raytracer" using canvas (obsolete) —
I didn't know there was a bug on this. I rewrote this awhile ago using canvas, and performance on FF and Safari at least (Opera doesn't work(?)) is about equal. I did note that if you force FF to resize the canvas, it gets significantly slower (but nowhere near the DOM-version slowness).

This has some small rendering errors, but they're due to the implementation. I got rid of them once, but forget how.
I posted this over on the opera forum and they fixed the HTML in the attachment here:

http://paste.css-standards.org/36923/view

So that it now works on Opera (something about forgetting to close a tag or something, not sure). Opera actually still runs this faster, but in the range of 10 - 40%, so on the same order of magnitude.

The problem here is Firefox's dom performance, not its Javascript performance. As Shaver correctly postulated. 
Depends on: 233463
The other bug had more useful information, but this bug was already confirmed so it seemed the right choice, I'll reverse the duplicate if anyone strong disagrees.

Please see attachment 280318 [details] and attachment 280327 [details] for the test and the Jprof Profile Report, respectively. Also quoting bug 395635 comment 2:

> jprof
> 
> Flat Profile
> 
> Total hit count: 13621
> Count %Total  Function Name
> 3146   23.1     nsLineBox::LastChild() const
> 3085   22.6     nsLayoutUtils::GetLastSibling(nsIFrame*)
> 3031   22.3     nsLineBox::RFindLineContaining(nsIFrame*, nsLineList_iterator
> const&, nsLineList_iterator&, int*)
> 3021   22.2     nsFrameList::AppendFrames(nsIFrame*, nsIFrame*)
> ...
> 

And bug 395635 comment 3:

> nsLineBox::LastChild() is mentioned by BZ in bug 40988 comment #62 and also in
> bug 237735 comment #4, but there isn't probably any bug for it.
> 
> nsLayoutUtils::GetLastSibling is probably bug 233463 - adding it to
> dependencies
> 
> nsLineBox::RFindLineContaining is mentioned by BZ in bug 304598 comment #19

(In reply to comment #3)
> nsLineBox::LastChild() is mentioned by BZ in bug 40988 comment #62 and also in
> bug 237735 comment #4, but there isn't probably any bug for it.
> 
> nsLayoutUtils::GetLastSibling is probably bug 233463 - adding it to
> dependencies
> 
> nsLineBox::RFindLineContaining is mentioned by BZ in bug 304598 comment #19
The equivalent WebKit bug was https://bugs.webkit.org/show_bug.cgi?id=15148 which turned out to be n^2 behavior due to checking the list of floating objects for duplicates on each append. Switching to a data structure with O(1) membership testing eliminated that. Not sure if the situation is similar for Gecko, but hopefully that's useful information. Sorry for the bugspam if not.
Flags: blocking1.9.1?
Thanks David. Asa, I'm not sure it deserves blocking, maybe wanted, and while it is an obscure benchmark I can't help but think improving it will affect real world performance somewhere.

I thought I'd do another benchmark test given there's been no updated figures since the Firefox 3.0 betas, I used Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9.1b1pre) Gecko/20080907032646 Minefield/3.1b1pre ID:20080907032646 on a AMD Athlon 3800+ X2 (2GHz). A clean profile was used for every browser.

So in order of fastest to slowest on browsers I could test (I couldn't get the webkit nightlies to work):

Chrome - 29.69 seconds
Opera 9.6RC - 31.609 seconds
Safari 3.1.2 - 38.734 seconds
Firefox Nightly Trace Enabled - 537.907 seconds
Firefox Nightly - 538.344 seconds
IE8 Beta 2 - CRASH - Pass 84/120 - 2269.468 seconds

Given the increased time of each pass I would surmise if IE8 didn't crash it would of taken about 3 hours, it also had a significant memory increase on each pass, it was at about 900 MB when it crashed, it probably would of increased to about 2 - 3 GB. But the other figures seem to show that under the 1 minute mark is more than reasonable.
Flags: wanted1.9.1?
Not blocking on this, but giving this to bent to get a profile for what's actually slow here.
Assignee: nobody → bent.mozilla
Flags: wanted1.9.1?
Flags: wanted1.9.1+
Flags: blocking1.9.1?
Flags: blocking1.9.1-
Priority: -- → P1
Shark results:

27.5%  nsLayoutUtils::GetLastSibling(nsIFrame*)	
23.9%  nsFrameList::LastChild() const	
17.0%  nsLineBox::LastChild() const	
15.5%  nsLineBox::IndexOf(nsIFrame*) const	

Seems like we're spending all of our time walking through linked lists. Over to layout.
Assignee: bent.mozilla → nobody
Component: DOM → Layout: Misc Code
QA Contact: general → layout.misc-code
OK.  So the obvious thing there is of course bug 233463.  But as comment 7 says, this situation is similar to that in bug 304598...  Except that I thought we fast-pathed appends in the abs pos processing in frame constructor.  Is that not working?  Or are things just slow even with it working?
I was discussing this benchmark on the IE blog and multiple other people managed to pass the whole test in less than a minute on IE 8 Beta 2, though I'm still not able to replicate this result on any computer (looking in to it). But assuming it's common then it buts Firefox squarely 8-12x slower than any other browser.
OK, to answer my own question from comment 11, here's where time is spent "late" in the benchmark:

26% nsFrameList::LastChild() called from FindAppendPrevSibling
23% in nsLineBox::RFindLineContaining called from nsBlockFrame::InsertFrames

Those are presumably for the placeholder.

25% nsLayoutUtils::GetLastSibling called from nsFrameConstructorState::ProcessFrameInsertions.
23% nsFrameList::AppendFrames called from nsAbsoluteContainingBlock::AppendFrames.

Those are for the abs pos frame.  So yes, this is bug 233463 in spades.

That said, doing lazy frame construction might help somewhat.  I filed bug 502937 on that.

And it also looks like that AppendFrames call from ProcessFrameInsertions is uncalled-for.  Filed bug 502941 on that.
Depends on: 502941, lazyfc
Attached file Raytracer in canvas [fast] —
I hope the Author doesn't mind. I'm re-uploading his raytracing tests so they're both on bugzilla. I've fixed his canvas demo to remove the visual artifacts and make it work in Opera.
Attachment #312760 - Attachment is obsolete: true
Attached file Raytracer in DOM [slow] —
Whiteboard: Full Render on DOM Raytracer freezes Firefox for a while, be prepared!!
But I thought Firefox was supposed to be fairly good with DOM manipulation. Is it the way this thing keeps updating? For example, is it possible to generate the same image as a full array of objects, and render the full list of <div>s all at once for comparison? Would Firefox be faster than the current score if it did not individually render each DOM update, but only had to make one pass at the end?
With the fixes for bug 512471, bug 512336, bug 512470, bug 233463 I see Firefox render this testcase in about 63 seconds.  That's about 2x slower than Safari 4 on the same hardware; 1.2x slower than Opera 10.

That said, with those patches the frametree bottlenecks seem to be gone: 80% of that time is painting.  I'll reprofile once the patches land.

As far as comment 18 goes, the answer is yes.  If you generate the DOM with a display:none parent and then show it all at once, Firefox without the above patches would be a lot faster than it is on the repeated "live" DOM updates the testcase does.
Depends on: 512471, 512336, 512470
Yeah, that's because I'm still cleaning them up and try-servering them and running local tests in a debug build and such minor things before attaching them.
Depends on: 516732
Depends on: 516740
Depends on: 516742
s/bug 512470/bug 516742/ in the above.

With those patches applied, bug 516732 and bug 516740 cover possible issues in the painting.  The second is particular interesting: we're 2x slower than Safari 4 on the attached testcase, but 2x faster (we get 4x faster, their performance doesn't change) if the innerHTML update is taken out.
Would it matter that WebKit uses CGContextFillRect whereas we use CGContextFillPath? 

I have a hunch that the former is much faster. I'll see if I can switch Cairo to CGContextFillRect and time it.
I don't see WebKit filling paths at all, e.g.

WebKit:

1) WebCore::RenderBox::paintFillLayer() -> CGContextFillRect

	0.0%	42.6%	WebCore	                                        WebCore::RenderBox::paintFillLayer(WebCore::RenderObject::PaintInfo const&, WebCore::Color const&, WebCore::FillLayer const*, int, int, int, int, WebCore::CompositeOperator)
	1.0%	42.4%	WebCore	                                         WebCore::RenderBoxModelObject::paintFillLayerExtended(WebCore::RenderObject::PaintInfo const&, WebCore::Color const&, WebCore::FillLayer const*, int, int, int, int, WebCore::InlineFlowBox*, WebCore::CompositeOperator)
	0.3%	40.1%	WebCore	                                          WebCore::GraphicsContext::fillRect(WebCore::FloatRect const&, WebCore::Color const&)
	0.2%	27.9%	CoreGraphics	                                           CGContextFillRect

Firefox:

1) nsCSSRendering::PaintBackground() results in a CGContextFillPath
 
	0.0%	16.5%	XUL	                                                     nsCSSRendering::PaintBackground(nsPresContext*, nsIRenderingContext&, nsIFrame*, nsRect const&, nsRect const&, unsigned int, nsRect*)
	0.6%	16.0%	XUL	                                                      nsCSSRendering::PaintBackgroundWithSC(nsPresContext*, nsIRenderingContext&, nsIFrame*, nsRect const&, nsRect const&, nsStyleBackground const&, nsStyleBorder const&, unsigned int, nsRect*)
	0.0%	12.6%	XUL	                                                       _moz_cairo_fill_preserve
	0.1%	12.6%	XUL	                                                        _cairo_gstate_fill
	0.0%	12.4%	XUL	                                                         _cairo_surface_fill
	0.1%	12.1%	XUL	                                                          _cairo_quartz_surface_fill
	0.0%	5.8%	CoreGraphics	                                                           CGContextFillPath

2) PaintBackgroundLayer seems to trigger a path fill as well.

	0.0%	0.0%	XUL	                                                      PaintBackgroundLayer(nsPresContext*, nsIRenderingContext&, nsIFrame*, unsigned int, nsRect const&, nsRect const&, nsRect const&, nsStyleBackground const&, nsStyleBackground::Layer const&)
	0.0%	0.0%	XUL	                                                      _cairo_path_fixed_fini
	0.0%	0.0%	XUL	                                                      gfxContext::Rectangle(gfxRect const&, int)
	0.0%	0.0%	XUL	                                                      gfxContext::NewPath()
	0.0%	0.0%	XUL	                                                      gfxContext::Fill()
Filed bug 516931 for the fill rect vs fill path issue.
Depends on: 516931
Depends on: 516924
With the patches for bug 516740 and bug 516924 applied in addition to the ones in comment 20 and comment 22, we're about 1.5x faster than webkit.  With those patches, the approximate time breakdown is:

Painting: 42% (25% is building the display list, marking all those abs pos frames
               with properties, sorting it, etc).
Reflow: 29% (about 2/3 of this is reflowing the placeholders!)
Actually running the JS (setting style, creating nodes, etc, etc): 25%

Will dig into the JS part.  roc, do we want a bug on the display list stuff?  Is there something we can do to avoid doing quite so much work for reflowing placeholders?  That part is still ending up O(N^2) overall, since we're reflowing every placeholder each time through...
Depends on: 517038
This is a super-bad case for display lists, we have thousands of visible display items, whereas most pages have on the order of a hundred display items visible at any one time. They're not killing us here, so I don't want to do a lot of work to restructure things. However, if we can find some simple local optimizations that help significantly, I guess that'd be worth doing, and we can have a bug on that. But we should wait for bug 513082 to land because it will change this code significantly.
One other thing to note here is that the script renders via a series of setTimeout(0)s, 3 rows of blocks at a time. I believe that adds 10ms of latency per 3 rows, guaranteed. I believe trunk Webkit (or at least Chrome) only adds 4ms there. At the moment we're probably covering up that latency because we're reflowing and painting during that time, but if we get fast enough, we'll stop getting faster because the setTimeout latency will dominate.

This would also be affected by compositor-phase-2. Once we start reflowing and painting each 3-rows-of-blocks in less than 1/60s (or thereabouts), compositor-phase-2 would help by coalescing paints, and possibly reflows, to limit the rate to 60Hz or whatever your screen refresh rate is. (Mmmm, refresh-rate-dependent performance will make for fun benchmarking times!)
> But we should wait for bug 513082 to land

OK.

> I believe trunk Webkit (or at least Chrome) only adds 4ms there.

Just chrome, not webkit.  The latency the timeouts introduce into this testcase is 1200ms for us.  We'd have to get about 20x faster than we are with all the patches we have in flight here before we run up against that limit.  ;)

> Once we start reflowing and painting each 3-rows-of-blocks in less than 1/60s
> (or thereabouts)

So in 16ms.  Right now we're closer to averaging 160ms per line, more toward the end.  So no danger of that biting us any time too soon.
Another surge of progress, I see. Great to see how many effects are seen here, and how completely this is being explored. I currently am soon to receive my first netbook purchase, the highest-end ASUS 1005HA (N280/1GB/XP-SP3). Will be checking some of this on such low-end hardware.

Many of these small, individual issues uncovered by such an extreme test case, should be related to numerous claims that Firefox is too "heavy" for Atom processors. I think there has been too much testing on full-power multicore systems with plenty of memory. Unfortunately netbooks are popular, and doubly unfortunately they are MUCH weaker than the hardware that Mozilla has come to expect. This situation is anticipated to improve once Windows 7 ships and multiprocessor Atom-based netbooks come out, yet seriously addressing these small hits to performance may substantially future-proof the Firefox core.

But I digress. When my netbook hits, I'll come back with what I'm seeing. Unless anyone else has already tried it and not said anything...?
I see some weird behaviour on this. I tried running the basic render consecutively a few times, and the rendering times are increasing for each run. These are the times I got:
5.867 s
21.741 s
36.955 s
50.264 s
63.921 s
77.695 s
so it seems the times are linearly increasing for each consecutive run. 

Chrome and Safari seem to be exhibiting similar behaviour, though, so maybe it's nothing to worry about.
The Javascript adds DIV tags to the page, the tags aren't removed on successive runs so the complexity increases in linear order, so finding the time takes some a + b.t is perfectly expected.
Depends on: 518114
Depends on: 518115
On my old hardware (2 GHz AMD 64 X2 + 2 GB of RAM) on XP x64 I now get these results for full render:

Firefox Trunk: 25.992 Seconds
Opera 10: 29.359 Seconds
Chrome Dev Build: 37.673 Seconds
I think the scope of this bug is well and truly fixed, every test I've done Firefox narrowly outperforms the competition. 

Anyone feel free to re-open if you think this bug has more life to it. But I'd imagine it'd be better to create a new bug on benchmarking specific DOM performances cases or other related issues.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
There seems to have been a MASSIVE performance regression on this. Using much faster hardware than in my previous tests I now see:

Firefox (Trunk): 286.257 Seconds
Chrome 5: 12.754 Seconds
Opera 10.6: 7.774 seconds

I guess I'll try and narrow down a regression window later and file a new bug.
Result of regression window testing on Mozilla Central:

Gecko/20100422 Minefield/3.7a5pre; 14.062 Seconds
Gecko/20100619 Minefield/3.7a6pre: 13.954 Seconds
Gecko/20100702 Minefield/4.0b2pre: 14.207 Seconds
Gecko/20100710 Minefield/4.0b2pre; 14.196 Seconds
Gecko/20100714 Minefield/4.0b2pre: 14.363 Seconds
Gecko/20100715 Minefield/4.0b2pre: 15.408 Seconds
Gecko/20100716 Minefield/4.0b2pre: 358.548 Seconds
Gecko/20100717 Minefield/4.0b2pre: > 100 Seconds (crashed)

This give this Window: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=5fda39cd703c&tochange=96de199027d7

I would hazard a guess that it was one of roc's many check-ins that caused this performance regression.
Other possibility that stands out to me: http://hg.mozilla.org/mozilla-central/rev/d5bc811bad0a
Damian, I filed bug 585258 on the obvious issue a profile shows.  We should remeasure once that's fixed.
Thanks for the info! Been out of the Mozilla/bug loop for along while.
Depends on: 705561
Product: Core → Core Graveyard
Component: Layout: Misc Code → Layout
Product: Core Graveyard → Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: