424715 - Takes excessively long to render raytracing benchmark

Reporter

Description

•

16 years ago

Run Full Render benchmark on this site:

http://nontroppo.org/timer/progressive_raytracer.html

Be prepared for Firefox to more or less freeze for up to 10 mins on a good CPU.

Take latest snapshot of Opera, runs about 10x faster (12x faster when I'm not using a clean profile for Firefox). Something is clearly going very very wrong with Firefox rendering. I don't think it matters if other browsers are running a little faster, but this is clearly indicative of something that needs fixing.

Damian Shaw [Quan]

Reporter

Updated

•

16 years ago

Product: Firefox → Core

QA Contact: general → general

Damian Shaw [Quan]

Reporter

Updated

•

16 years ago

Assignee: nobody → general

Component: General → JavaScript Engine

QA Contact: general → general

Damian Shaw [Quan]

Reporter

Updated

•

16 years ago

Keywords: perf

Damian Shaw [Quan]

Reporter

Comment 1

•

16 years ago

It was discussed in IRC this might be due to the DOM tree rebuilding every time it does a pass.

Damian Shaw [Quan]

Reporter

Comment 2

•

16 years ago

Also discussed it might be the way Firefox renders in the first place, using Continuations as discussed by ROC: http://weblogs.mozillazine.org/roc/archives/2007/10/if_i_did_it.html

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 3

•

16 years ago

Extremely unlikely to be a JS engine issue here, from reading the code and light experimentation.

Basic render shows the effects clearly enough:

Basic render: 2.74 sec
Make drawBlock a no-op: 0.3 sec
Just remove appendChild from drawBlock: 0.97 sec
Make javaSphereColour always return 'rgb(0,0,0)': 2.43 sec

So about 90% of the time is spent in the DOM, 65% in node creation and the property setting.

If I make drawBlock a no-op, a full render completes in 1.2 seconds on today's trunk, Vista, 2.mumble GHz CPU.

So I think the thesis of the page is incorrect, to the extent that the test actually demonstrates it. :)

(If you switch to a canvas rather than using the <div>s, and have "javaSphereColour" return [r,g,b] instead of the string, I suspect you will find that a Firefox nightly or even beta4 is pretty competitive with Opera.  Maybe even faster enough that you'll want to file a bug on them. :) )

Assignee: general → nobody

Component: JavaScript Engine → DOM

QA Contact: general → general

Damian Shaw [Quan]

Reporter

Updated

•

16 years ago

Summary: JavaScript takes excessively long to render raytracing benchmark → Takes excessively long to render raytracing benchmark

Wesley Johnston (:wesj)

Comment 4

•

16 years ago

Attached file "Raytracer" using canvas (obsolete) — Details

I didn't know there was a bug on this. I rewrote this awhile ago using canvas, and performance on FF and Safari at least (Opera doesn't work(?)) is about equal. I did note that if you force FF to resize the canvas, it gets significantly slower (but nowhere near the DOM-version slowness).

This has some small rendering errors, but they're due to the implementation. I got rid of them once, but forget how.

Damian Shaw [Quan]

Reporter

Comment 5

•

16 years ago

I posted this over on the opera forum and they fixed the HTML in the attachment here:

http://paste.css-standards.org/36923/view

So that it now works on Opera (something about forgetting to close a tag or something, not sure). Opera actually still runs this faster, but in the range of 10 - 40%, so on the same order of magnitude.

The problem here is Firefox's dom performance, not its Javascript performance. As Shaver correctly postulated.

Damian Shaw [Quan]

Reporter

Updated

•

16 years ago

Depends on: 233463

Damian Shaw [Quan]

Reporter

Comment 7

•

16 years ago

The other bug had more useful information, but this bug was already confirmed so it seemed the right choice, I'll reverse the duplicate if anyone strong disagrees.

Please see attachment 280318 [details] and attachment 280327 [details] for the test and the Jprof Profile Report, respectively. Also quoting bug 395635 comment 2:

> jprof
> 
> Flat Profile
> 
> Total hit count: 13621
> Count %Total  Function Name
> 3146   23.1     nsLineBox::LastChild() const
> 3085   22.6     nsLayoutUtils::GetLastSibling(nsIFrame*)
> 3031   22.3     nsLineBox::RFindLineContaining(nsIFrame*, nsLineList_iterator
> const&, nsLineList_iterator&, int*)
> 3021   22.2     nsFrameList::AppendFrames(nsIFrame*, nsIFrame*)
> ...
> 

And bug 395635 comment 3:

> nsLineBox::LastChild() is mentioned by BZ in bug 40988 comment #62 and also in
> bug 237735 comment #4, but there isn't probably any bug for it.
> 
> nsLayoutUtils::GetLastSibling is probably bug 233463 - adding it to
> dependencies
> 
> nsLineBox::RFindLineContaining is mentioned by BZ in bug 304598 comment #19

(In reply to comment #3)
> nsLineBox::LastChild() is mentioned by BZ in bug 40988 comment #62 and also in
> bug 237735 comment #4, but there isn't probably any bug for it.
> 
> nsLayoutUtils::GetLastSibling is probably bug 233463 - adding it to
> dependencies
> 
> nsLineBox::RFindLineContaining is mentioned by BZ in bug 304598 comment #19

David Smith

Comment 8

•

16 years ago

The equivalent WebKit bug was https://bugs.webkit.org/show_bug.cgi?id=15148 which turned out to be n^2 behavior due to checking the list of floating objects for duplicates on each append. Switching to a data structure with O(1) membership testing eliminated that. Not sure if the situation is similar for Gecko, but hopefully that's useful information. Sorry for the bugspam if not.

Asa Dotzler [:asa]

Updated

•

16 years ago

Flags: blocking1.9.1?

Damian Shaw [Quan]

Reporter

Comment 9

•

16 years ago

Thanks David. Asa, I'm not sure it deserves blocking, maybe wanted, and while it is an obscure benchmark I can't help but think improving it will affect real world performance somewhere.

I thought I'd do another benchmark test given there's been no updated figures since the Firefox 3.0 betas, I used Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9.1b1pre) Gecko/20080907032646 Minefield/3.1b1pre ID:20080907032646 on a AMD Athlon 3800+ X2 (2GHz). A clean profile was used for every browser.

So in order of fastest to slowest on browsers I could test (I couldn't get the webkit nightlies to work):

Chrome - 29.69 seconds
Opera 9.6RC - 31.609 seconds
Safari 3.1.2 - 38.734 seconds
Firefox Nightly Trace Enabled - 537.907 seconds
Firefox Nightly - 538.344 seconds
IE8 Beta 2 - CRASH - Pass 84/120 - 2269.468 seconds

Given the increased time of each pass I would surmise if IE8 didn't crash it would of taken about 3 hours, it also had a significant memory increase on each pass, it was at about 900 MB when it crashed, it probably would of increased to about 2 - 3 GB. But the other figures seem to show that under the 1 minute mark is more than reasonable.

Flags: wanted1.9.1?

Johnny Stenback (:jst)

Comment 10

•

16 years ago

Not blocking on this, but giving this to bent to get a profile for what's actually slow here.

Assignee: nobody → bent.mozilla

Flags: wanted1.9.1?

Flags: wanted1.9.1+

Flags: blocking1.9.1?

Flags: blocking1.9.1-

Priority: -- → P1

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 11

•

16 years ago

Shark results:

27.5%  nsLayoutUtils::GetLastSibling(nsIFrame*)	
23.9%  nsFrameList::LastChild() const	
17.0%  nsLineBox::LastChild() const	
15.5%  nsLineBox::IndexOf(nsIFrame*) const	

Seems like we're spending all of our time walking through linked lists. Over to layout.

Assignee: bent.mozilla → nobody

Component: DOM → Layout: Misc Code

QA Contact: general → layout.misc-code

Boris Zbarsky [:bzbarsky]

Comment 12

•

16 years ago

OK.  So the obvious thing there is of course bug 233463.  But as comment 7 says, this situation is similar to that in bug 304598...  Except that I thought we fast-pathed appends in the abs pos processing in frame constructor.  Is that not working?  Or are things just slow even with it working?

Damian Shaw [Quan]

Reporter

Comment 13

•

16 years ago

I was discussing this benchmark on the IE blog and multiple other people managed to pass the whole test in less than a minute on IE 8 Beta 2, though I'm still not able to replicate this result on any computer (looking in to it). But assuming it's common then it buts Firefox squarely 8-12x slower than any other browser.

Boris Zbarsky [:bzbarsky]

Comment 14

•

15 years ago

OK, to answer my own question from comment 11, here's where time is spent "late" in the benchmark:

26% nsFrameList::LastChild() called from FindAppendPrevSibling
23% in nsLineBox::RFindLineContaining called from nsBlockFrame::InsertFrames

Those are presumably for the placeholder.

25% nsLayoutUtils::GetLastSibling called from nsFrameConstructorState::ProcessFrameInsertions.
23% nsFrameList::AppendFrames called from nsAbsoluteContainingBlock::AppendFrames.

Those are for the abs pos frame.  So yes, this is bug 233463 in spades.

That said, doing lazy frame construction might help somewhat.  I filed bug 502937 on that.

And it also looks like that AppendFrames call from ProcessFrameInsertions is uncalled-for.  Filed bug 502941 on that.

Depends on: 502941, lazyfc

Damian Shaw [Quan]

Reporter

Comment 16

•

15 years ago

Attached file Raytracer in canvas [fast] — Details

I hope the Author doesn't mind. I'm re-uploading his raytracing tests so they're both on bugzilla. I've fixed his canvas demo to remove the visual artifacts and make it work in Opera.

Attachment #312760 - Attachment is obsolete: true

Damian Shaw [Quan]

Reporter

Comment 17

•

15 years ago

Attached file Raytracer in DOM [slow] — Details

Damian Shaw [Quan]

Reporter

Updated

•

15 years ago

Whiteboard: Full Render on DOM Raytracer freezes Firefox for a while, be prepared!!

Calc-Yolatuh

Comment 18

•

15 years ago

But I thought Firefox was supposed to be fairly good with DOM manipulation. Is it the way this thing keeps updating? For example, is it possible to generate the same image as a full array of objects, and render the full list of <div>s all at once for comparison? Would Firefox be faster than the current score if it did not individually render each DOM update, but only had to make one pass at the end?

Boris Zbarsky [:bzbarsky]

Comment 19

•

15 years ago

With the fixes for bug 512471, bug 512336, bug 512470, bug 233463 I see Firefox render this testcase in about 63 seconds.  That's about 2x slower than Safari 4 on the same hardware; 1.2x slower than Opera 10.

That said, with those patches the frametree bottlenecks seem to be gone: 80% of that time is painting.  I'll reprofile once the patches land.

As far as comment 18 goes, the answer is yes.  If you generate the DOM with a display:none parent and then show it all at once, Firefox without the above patches would be a lot faster than it is on the repeated "live" DOM updates the testcase does.

Depends on: 512471, 512336, 512470

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 20

•

15 years ago

bug 512470, bug 512471 and bug 512336 don't have patches in them?

Boris Zbarsky [:bzbarsky]

Comment 21

•

15 years ago

Yeah, that's because I'm still cleaning them up and try-servering them and running local tests in a debug build and such minor things before attaching them.

Boris Zbarsky [:bzbarsky]

Updated

•

15 years ago

Depends on: 516732

Boris Zbarsky [:bzbarsky]

Updated

•

15 years ago

Depends on: 516740

Boris Zbarsky [:bzbarsky]

Updated

•

15 years ago

Depends on: 516742

Boris Zbarsky [:bzbarsky]

Comment 22

•

15 years ago

s/bug 512470/bug 516742/ in the above.

With those patches applied, bug 516732 and bug 516740 cover possible issues in the painting.  The second is particular interesting: we're 2x slower than Safari 4 on the attached testcase, but 2x faster (we get 4x faster, their performance doesn't change) if the innerHTML update is taken out.

Joel Reymont (:joelr)

Comment 23

•

15 years ago

Would it matter that WebKit uses CGContextFillRect whereas we use CGContextFillPath? 

I have a hunch that the former is much faster. I'll see if I can switch Cairo to CGContextFillRect and time it.

Joel Reymont (:joelr)

Comment 24

•

15 years ago

I don't see WebKit filling paths at all, e.g.

WebKit:

1) WebCore::RenderBox::paintFillLayer() -> CGContextFillRect

	0.0%	42.6%	WebCore	                                        WebCore::RenderBox::paintFillLayer(WebCore::RenderObject::PaintInfo const&, WebCore::Color const&, WebCore::FillLayer const*, int, int, int, int, WebCore::CompositeOperator)
	1.0%	42.4%	WebCore	                                         WebCore::RenderBoxModelObject::paintFillLayerExtended(WebCore::RenderObject::PaintInfo const&, WebCore::Color const&, WebCore::FillLayer const*, int, int, int, int, WebCore::InlineFlowBox*, WebCore::CompositeOperator)
	0.3%	40.1%	WebCore	                                          WebCore::GraphicsContext::fillRect(WebCore::FloatRect const&, WebCore::Color const&)
	0.2%	27.9%	CoreGraphics	                                           CGContextFillRect

Firefox:

1) nsCSSRendering::PaintBackground() results in a CGContextFillPath
 
	0.0%	16.5%	XUL	                                                     nsCSSRendering::PaintBackground(nsPresContext*, nsIRenderingContext&, nsIFrame*, nsRect const&, nsRect const&, unsigned int, nsRect*)
	0.6%	16.0%	XUL	                                                      nsCSSRendering::PaintBackgroundWithSC(nsPresContext*, nsIRenderingContext&, nsIFrame*, nsRect const&, nsRect const&, nsStyleBackground const&, nsStyleBorder const&, unsigned int, nsRect*)
	0.0%	12.6%	XUL	                                                       _moz_cairo_fill_preserve
	0.1%	12.6%	XUL	                                                        _cairo_gstate_fill
	0.0%	12.4%	XUL	                                                         _cairo_surface_fill
	0.1%	12.1%	XUL	                                                          _cairo_quartz_surface_fill
	0.0%	5.8%	CoreGraphics	                                                           CGContextFillPath

2) PaintBackgroundLayer seems to trigger a path fill as well.

	0.0%	0.0%	XUL	                                                      PaintBackgroundLayer(nsPresContext*, nsIRenderingContext&, nsIFrame*, unsigned int, nsRect const&, nsRect const&, nsRect const&, nsStyleBackground const&, nsStyleBackground::Layer const&)
	0.0%	0.0%	XUL	                                                      _cairo_path_fixed_fini
	0.0%	0.0%	XUL	                                                      gfxContext::Rectangle(gfxRect const&, int)
	0.0%	0.0%	XUL	                                                      gfxContext::NewPath()
	0.0%	0.0%	XUL	                                                      gfxContext::Fill()

Joel Reymont (:joelr)

Comment 25

•

15 years ago

Filed bug 516931 for the fill rect vs fill path issue.

Boris Zbarsky [:bzbarsky]

Updated

•

15 years ago

Depends on: 516931

Boris Zbarsky [:bzbarsky]

Updated

•

15 years ago

Depends on: 516924

Boris Zbarsky [:bzbarsky]

Comment 26

•

15 years ago

With the patches for bug 516740 and bug 516924 applied in addition to the ones in comment 20 and comment 22, we're about 1.5x faster than webkit.  With those patches, the approximate time breakdown is:

Painting: 42% (25% is building the display list, marking all those abs pos frames
               with properties, sorting it, etc).
Reflow: 29% (about 2/3 of this is reflowing the placeholders!)
Actually running the JS (setting style, creating nodes, etc, etc): 25%

Will dig into the JS part.  roc, do we want a bug on the display list stuff?  Is there something we can do to avoid doing quite so much work for reflowing placeholders?  That part is still ending up O(N^2) overall, since we're reflowing every placeholder each time through...

Boris Zbarsky [:bzbarsky]

Updated

•

15 years ago

Depends on: 517038

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 27

•

15 years ago

This is a super-bad case for display lists, we have thousands of visible display items, whereas most pages have on the order of a hundred display items visible at any one time. They're not killing us here, so I don't want to do a lot of work to restructure things. However, if we can find some simple local optimizations that help significantly, I guess that'd be worth doing, and we can have a bug on that. But we should wait for bug 513082 to land because it will change this code significantly.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 28

•

15 years ago

One other thing to note here is that the script renders via a series of setTimeout(0)s, 3 rows of blocks at a time. I believe that adds 10ms of latency per 3 rows, guaranteed. I believe trunk Webkit (or at least Chrome) only adds 4ms there. At the moment we're probably covering up that latency because we're reflowing and painting during that time, but if we get fast enough, we'll stop getting faster because the setTimeout latency will dominate.

This would also be affected by compositor-phase-2. Once we start reflowing and painting each 3-rows-of-blocks in less than 1/60s (or thereabouts), compositor-phase-2 would help by coalescing paints, and possibly reflows, to limit the rate to 60Hz or whatever your screen refresh rate is. (Mmmm, refresh-rate-dependent performance will make for fun benchmarking times!)

Boris Zbarsky [:bzbarsky]

Comment 29

•

15 years ago

> But we should wait for bug 513082 to land

OK.

> I believe trunk Webkit (or at least Chrome) only adds 4ms there.

Just chrome, not webkit.  The latency the timeouts introduce into this testcase is 1200ms for us.  We'd have to get about 20x faster than we are with all the patches we have in flight here before we run up against that limit.  ;)

> Once we start reflowing and painting each 3-rows-of-blocks in less than 1/60s
> (or thereabouts)

So in 16ms.  Right now we're closer to averaging 160ms per line, more toward the end.  So no danger of that biting us any time too soon.

Calc-Yolatuh

Comment 30

•

15 years ago

Another surge of progress, I see. Great to see how many effects are seen here, and how completely this is being explored. I currently am soon to receive my first netbook purchase, the highest-end ASUS 1005HA (N280/1GB/XP-SP3). Will be checking some of this on such low-end hardware.

Many of these small, individual issues uncovered by such an extreme test case, should be related to numerous claims that Firefox is too "heavy" for Atom processors. I think there has been too much testing on full-power multicore systems with plenty of memory. Unfortunately netbooks are popular, and doubly unfortunately they are MUCH weaker than the hardware that Mozilla has come to expect. This situation is anticipated to improve once Windows 7 ships and multiprocessor Atom-based netbooks come out, yet seriously addressing these small hits to performance may substantially future-proof the Firefox core.

But I digress. When my netbook hits, I'll come back with what I'm seeing. Unless anyone else has already tried it and not said anything...?

Petter M

Comment 31

•

15 years ago

I see some weird behaviour on this. I tried running the basic render consecutively a few times, and the rendering times are increasing for each run. These are the times I got:
5.867 s
21.741 s
36.955 s
50.264 s
63.921 s
77.695 s
so it seems the times are linearly increasing for each consecutive run. 

Chrome and Safari seem to be exhibiting similar behaviour, though, so maybe it's nothing to worry about.

Damian Shaw [Quan]

Reporter

Comment 32

•

15 years ago

The Javascript adds DIV tags to the page, the tags aren't removed on successive runs so the complexity increases in linear order, so finding the time takes some a + b.t is perfectly expected.

Boris Zbarsky [:bzbarsky]

Updated

•

15 years ago

Depends on: 518114

Boris Zbarsky [:bzbarsky]

Updated

•

15 years ago

Depends on: 518115

Damian Shaw [Quan]

Reporter

Comment 33

•

15 years ago

On my old hardware (2 GHz AMD 64 X2 + 2 GB of RAM) on XP x64 I now get these results for full render:

Firefox Trunk: 25.992 Seconds
Opera 10: 29.359 Seconds
Chrome Dev Build: 37.673 Seconds

Damian Shaw [Quan]

Reporter

Comment 34

•

15 years ago

I think the scope of this bug is well and truly fixed, every test I've done Firefox narrowly outperforms the competition. 

Anyone feel free to re-open if you think this bug has more life to it. But I'd imagine it'd be better to create a new bug on benchmarking specific DOM performances cases or other related issues.

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Damian Shaw [Quan]

Reporter

Comment 35

•

14 years ago

There seems to have been a MASSIVE performance regression on this. Using much faster hardware than in my previous tests I now see:

Firefox (Trunk): 286.257 Seconds
Chrome 5: 12.754 Seconds
Opera 10.6: 7.774 seconds

I guess I'll try and narrow down a regression window later and file a new bug.

Damian Shaw [Quan]

Reporter

Comment 36

•

14 years ago

Result of regression window testing on Mozilla Central:

Gecko/20100422 Minefield/3.7a5pre; 14.062 Seconds
Gecko/20100619 Minefield/3.7a6pre: 13.954 Seconds
Gecko/20100702 Minefield/4.0b2pre: 14.207 Seconds
Gecko/20100710 Minefield/4.0b2pre; 14.196 Seconds
Gecko/20100714 Minefield/4.0b2pre: 14.363 Seconds
Gecko/20100715 Minefield/4.0b2pre: 15.408 Seconds
Gecko/20100716 Minefield/4.0b2pre: 358.548 Seconds
Gecko/20100717 Minefield/4.0b2pre: > 100 Seconds (crashed)

This give this Window: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=5fda39cd703c&tochange=96de199027d7

I would hazard a guess that it was one of roc's many check-ins that caused this performance regression.

Damian Shaw [Quan]

Reporter

Comment 37

•

14 years ago

Other possibility that stands out to me: http://hg.mozilla.org/mozilla-central/rev/d5bc811bad0a

Boris Zbarsky [:bzbarsky]

Comment 38

•

14 years ago

Damian, I filed bug 585258 on the obvious issue a profile shows.  We should remeasure once that's fixed.

Damian Shaw [Quan]

Reporter

Comment 39

•

14 years ago

Thanks for the info! Been out of the Mozilla/bug loop for along while.

Damian Shaw [Quan]

Reporter

Updated

•

13 years ago

Depends on: 705561

BMO Automation

Updated

•

6 years ago

Product: Core → Core Graveyard

Nobody; OK to take it and work on it

Assignee

Updated

•

6 years ago

Component: Layout: Misc Code → Layout

Product: Core Graveyard → Core

"Raytracer" using canvas 16 years ago Wesley Johnston (:wesj) 3.12 KB, text/html		Details
Raytracer in canvas [fast] 15 years ago Damian Shaw [Quan] 3.06 KB, text/html		Details
Raytracer in DOM [slow] 15 years ago Damian Shaw [Quan] 3.42 KB, text/html		Details