You are on page 1of 20

Scenegraphs: Past, Present and Future

By Avi Bar-Zeev

Disclaimer: I have restricted my discussion to certain well-known scenegraphs and "scene


graphs" (I prefer it without the wordbreak). Please comment on any oversights or errors I
may have made, but be aware that my goal is to generalize and draw conclusions about
scenegraph technology, not to form a comprehensive archive (though I'd be happy to add
your link to the "scenegraphs today" section).

If you have any questions or comments, please feel free to email them to me.
I'd be happy to work up a version 2 of this if I get enough feedback on this draft.

This work is licensed under a Creative Commons License, free to use, copy, and link (but not
sell or modify)
as long as the attribution remains.

Updated: 4/8/2003 for spelling, headers,and added links to "Scenegraphs Today" section

Updated: 9/13/2005 updated bio, added link to scenegraphs

Sections:

1. Scenegraphs: a brief history and evolution,


2. Scenegraphs Today
3. Scenegraphs Future

About the Author

In the Beginning…
To help understand where scenegraphs came from, it's useful to take a quick look at the
evolution of graphics languages like OpenGL and DirectX. Early on, real-time graphics
existed on special image generation (IG) hardware that contained entire visual databases in
closed proprietary form. Modellers created their databases and loaded them onto the
hardware IG. Programmers were generally limited to modifying elements of these
databases, like the position and rotation of a helicopter or setting the time-of-day.

SGI introduced a more open and programmable option for image generation hardware and
along with it, graphical languages that allowed more direct programmability of the image
pipeline. OpenGL (from SGI's original "GL") consists of a stream of primitive drawing
commands (draw polygon, line, point, etc..) state settings (set color, texture, etc..) and
matrix manipulations (push/pop to model-view or perspective matrix, etc..). But it contains
very little information that allows the system to self-optimize and improve performance.

This was fine for drawing all sorts of scenes. But polygons that are out of view do consume
resources – the hardware doesn't even know they’re out of view until very late in the
rendering pipeline. Unnecessary state changes, extra texture loads, and other common
graphics procedures are best avoided if they don't contribute to the final image.

Culling

Culling is the process of removing everything from a scene that will not contribute to the
final image, including things that are behind the observer, off-screen, or, in more advanced
systems, hidden behind other objects (i.e., occluded). Generalized frustum culling works by
comparing each object's spatial boundaries with a viewing frustum – a truncated pyramid
that represents the visible volume of space. OpenGL does this implicitly when you send it
polygons – by default, it transforms and clips all polygons to the edges of the viewing
volume (most hardware uses a combination of gross clipping and 2D scissoring, but that's a
bit too detailed for this section of the article).

Rather than do the heavy work at the OpenGL and polygon level, scenegraph architects
realized they could better perform culling at higher level abstractions for greater efficiency.
If we can remove the invisible objects first, then we make the hardware do less work and
generally improve performance and the all-important frame-rate.

The way it works is fairly


straightforward. Any object that
is entirely within the viewing
frustum is sent on down to the
hardware. For objects that are
part in/part out, we usually don’t
bother checking individual
polygons on the CPU, but we
might break a very complex
object into several simpler ones
so some of them may be culled
in or out individually. Of course,
any object that is entirely
outside the culling volume is
rejected early on.

Hierarchy

To efficiently perform this calculation, it's beneficial to organize the objects into a hierarchy
or tree, propagating any shared information towards the root of the tree. There are many
kinds of trees we could use. But lets keep it to a simple one-parent N-child hierarchy -- a
directed acyclic graph or DAG.

Such a basic scenegraph will have a root node, with one or more children. Each child node
can in turn contain zero or more children, some of which will be the graphical objects we
want to draw. The other nodes are there for structural purposes and can get quite complex,
as we'll see later on.

For example, if a building was composed of rooms, a group node of the scenegraph (call it
"Building") might contain several nodes (called "room-0" "room-1" and so on). The bounding
box of the "Building" node would be defined such that it contains the bounding boxes of all
of the rooms. So if the building node was determined to be invisible, then there would be no
need to check the child nodes since they would also be invisible.

Another benefit of hierarchy was in ease of manipulation. Given a car containing doors and
wheels, it was much easier to move the "car" node and have the child nodes (doors and
wheels) follow automatically. Without a hierarchy, one might probably have to move each of
these sub-objects synchronously each time the car moved. Of course, that could be solved
with some clever back-pointers among dependent matrices, but that's exactly what
scenegraphs are doing in a more formal fashion.

So for example, consider a tank. It might have the following hierarchical representation:

By splitting the object into “nodes” and representing the


connectivity between these nodes, we can better
manipulate the final polygons of the tank.

We
can

animate pieces separately. We can


rotate the turret, fire the gun, and
open the hatch. We can animate the
left and right tread to simulate
turning.

Rendering Advantages -- State Sorting

Scenegraphs showed clear benefits for improving rendering performance and making more
optimal use of the available hardware resources. By keeping a “retained” model of the
virtual world, scenegraphs could make additional optimizations, such as parallel processing
culling and drawing, and most importantly: state sorting.

State sorting is a concept whereby all of the objects being rendered are sorted by
similarities in state (texture map, lighting values, transparency, and so on). Since changing
state is often an expensive operation due to hardware implementations, this is usually a big
performance win, even on the newest hardware. A good example of this is turning lighting
on and off -- imagine a generic SIMD hardware architecture, executing the same code over
four parallel geometry processors. There may be one version of the code for "lit" objects
and one version for "unlit." Changing from lit to unlit state can cause all four processors to
flush and reload. But if we can try to turn lighting on or off only once per frame instead of
once per object, we can improve performance.
For an even stronger example, imagine we were drawing 100 cars, each containing some
polygons in metal (state 1), rubber (state 2) and glass (state 3), it might be beneficial to
draw all of the metal objects first, then the rubber ones, and then the glass. We can have 3
state changes, or we can have 300. And at least some state sorting is already required if
we're depth sorting the windows for correct blending results.

However, early state sorting was hampered by the fact that if two objects had very different
transformations (for example two windshields on two cars in different locations), it was
costly to sort these objects by state alone because changing the viewing matrices was also
a fairly expensive operation. Today, however, it is usually much cheaper to sort by state
first, though exactly which state is the most expensive (and therefore the most important
sort key) varies from platform to platform. We might even want our engine to be able to vary
how it state sorts depending on the hardware. As we'll see later in the article, this is where
scenegraphs can excel.

State Encapsulation

Early scenegraphs employed the concept of state encapsulation to facilitate state sorting.
This meant each object in the scenegraph would point to a separate state structure--a set of
material colors, texture, lighting, transparency, and so on. The scenegraph could then
compare these state objects for similarities or just sort by the pointers. Even still, when
switching from one state set to another, the system tried to only change the relevant
differences and not blindly apply all state parameters, some of which, like texture loads and
binds, could be very expensive time-wise.

In these systems, state sharing was achieved by having two graphical objects point to the
same state set. This had other advantages, such as being able to quickly switch from
“visible light” states to “infrared” states using simple pointer swaps.

In this example, many of


the nodes (rectangles) in
the Tank hierarchy are
assigned states (ovals).
When the tank is drawn,
we can sort the objects by
state and try to minimize
the number of state
changes. For example, we
can draw the left and right
tread at the same time and
only set the “rubber” state
once. Since depth-first traversal would visit these in that order anyway, we haven’t gained
much. But we’d want to draw the base and turret at the same time too; so state
encapsulation sorting can provide the needed information to make this possible.

Transform Graphs
Early scenegraphs were primarily transform graphs, representing object hierarchies in terms
of inherited parent/child transformation relationships. For example, a car node might have
four wheel-nodes that would be specified relative to axle and steering nodes (their center of
rotation), which would in turn be specified relative to the car. Or, perhaps, a building might
contain walls, floors, windows, and interior rooms, which might contain desks and chairs
and so on.

Dynamic Coordinate Systems

Dynamic Coordinate Systems (DCS) were added for things like our tank, where we wanted
the tank to be able to move around from frame to frame and the turret to rotate
independently. DCS nodes were originally more expensive, mainly because there was extra
bookkeeping information that could not be pre-computed, but instead needed to be
re-computed when the object moved, or at worst each frame.

What bookkeeping? Take culling, for example. It often uses bounding boxes or spheres to
contain all of a node’s children and their bounding boxes, recursively. If the node’s
bounding volume is invisible, all of the children are therefore invisible. When a child moves,
the bounding box needs to be re-computed. So we might write the logic as: re-compute the
bounding box only when a child moves. But what happens when all of the children move?
Do we re-compute the bounding box each time or wait till they’re all done moving? In that
case, it might be better to re-compute the bounding box once per frame, or better yet, store
a flag that says if any of the children changed that frame and then re-compute the box at
most once per frame. This sort of tradeoff is the kind of thing scenegraphs excel at, where
immediate mode rendering does little to help.

Static Coordinate Systems

In the case of buildings, since they don’t move, we could use static coordinate systems
(called SCS in Performer). These were simple matrix transformations without a lot of
overhead. The main difference being that SCS nodes could pre-compute important
information, like bounding boxes and collision information. More importantly, in a MP
(multi-process) system, SCS nodes are guaranteed to remain the same from process to
process, whereas DCS nodes need to be buffered so that changes in one process don’t
have immediate effects in another.

Aside: for a quick example of the sort of MP problems that arise, consider two cubes that
are being manipulated in one process and drawn in another. If the first process modifies
both cubes before either is drawn, things are happy. If the first process moves the cubes
after they’re drawn, things are okay, but you won’t see the change until the next rendered
frame, by which time something else might have happened. But if the first process modifies
one cube and then both are drawn before it can modify the other, you can see strange
artifacts that make the cubes appear to oscillate with respect to each other. Worse still, in a
true MP system, the first process can be in the middle of updating one cube while the other
is drawn, causing unpredictable results.

We may not be used to using multi-threading or multi-processing on wintel boxes, but it's
becoming more and more important, even on single CPU machines. With hyper-threading,
AGP bottlenecks, and consoles that contain many independent processors, synchronizing a
dedicated "draw" process with a main application, possibly running at a different frame-rate
is going to be a challenge more and more people will be familiar with.

Adding Groups, LOD, and other useful nodes


In addition to coordinate system nodes and basic graphical objects, scenegraphs added
other types of nodes to take advantage of the “retained mode” and frame-to-frame
coherence optimizations. Most of these node types derive from the basic group node, which
acts as a simple container for any number of children, spatially proximate or not but does
not impose any restriction on its children.

Level of Detail nodes use computations about how far an object is from the observer to
“dial in” the amount of detail shown or switch between two or more child nodes which
represent an object at various fidelities. The basic idea is that a far-away object can be
rendered at lower fidelity (fewer polygons, smaller textures, etc.). Many schemes have been
invented to deal with object switching or fading between LOD states, and the state of the art
lies in various so-called continuous level of detail schemes.

Switch nodes are a form of group node that sets the active child node (zero or one out of N
children) based on some key value (e.g., 0 to n-1). Sequence nodes are a form of switch
where the key value cycles based on time. Animations can be made with sequence nodes –
each frame of animation is stored as a unique child object and the parent sequence node
controls the active frame. A DCS-Sequence is useful for motion-captured joint animation, for
example, where an array of transformations is applied in the same way a sequence node
iterates through the list of children (it used to require having N SCS nodes under a
Sequence, which was wasteful). DCSSequences can, for example, be efficiently
compressed and stored and take very little CPU time to play back (though their interactivity
leaves something to be desired).

Performer

SGI’s Performer was an early example of a scenegraph that was primarily a multi-process
transformation graph. Performer had state objects which did not exist in the hierarchy per
se, but were referenced by graphical objects. Performer made many advances in the use of
MP programming techniques to optimize performance on SGI’s multi processor systems.
Performer did a great job of state sorting, though an early design decision limited state
sorting to only under individual DCS nodes – in other words, objects could not be grouped
for similar-state rendering if they had different DCS nodes above them. Performer also
made extensive use of traversal masks and per-node callbacks for special effects.

Adding State Nodes to the Tree


Later scenegraphs added the notion of state as an actual node type. This had some
advantages, especially in terms of being able to aggregate common state. For example, if
there were 100 brick objects, we could insert a “brick” material node as parent to those 100
objects and the scenegraph render process would implicitly render these together. In fact,
one of the principal benefits of state nodes are that explicit state sorting is given to the
scenegraph modeler. For skilled modelers, this provides more control and more potential for
optimization than automatic state sorting. But in the general case, it probably is not a win.

Why? An illustrative example takes 100 tank objects, each with three states (say tread,
metal, and camo). But since we want the tanks to each be independently movable, they
would be grouped with each tank having its own parent DCS node, plus some more DCS
nodes for the turret and tread wheels if desired. Below that top DCS, we’d see the three
state nodes and below those, the individual geometry (shared or instanced). This means, in
practical terms, that we’d have 100 tread, metal, and camo nodes and that we’d change
state at least 300 times during the rendering of the scene. A better scheme might group the
graphical objects by the three common states, but that would require each geometry object
to have its own DCS and we’d run the risk of a turret forgetting to drive on when the base of
the tank does.

VisKit

Paradigm’s VisKit is a good example of this approach. It also added other useful node types
like "cameras" (representing the observer in the scenegraph, rather than as implicitly at
0,0,0 in modelview space. But in other ways, VisKit was very similar to early versions of
Performer (not surprisingly, since its designer was the person who had managed the early
Performer team at SGI).

Adding Action or Event Nodes


Many scenegraphs had the notion of per-node callbacks that the programmer could specify.
In Performer, each node could have multiple callbacks, depending on the context. In Cull
processing, any cull callbacks (if present) would be invoked to affect the culling result. In
Draw processing, any draw callbacks would similarly be invoked for drawing special effects.
Since these processes worked in a hierarchical depth-first traversal fashion, pre- and post-
traversal callbacks were often provided to let things be done before and/or after traversal of
child nodes. Application-side callbacks were also provided to do computation or automation
on a node once each frame (e.g., for conditional logic, for animation, to move a DCS, collect
statistics, and so on).

However, the main drawbacks of such automatic actions per node are twofold. First, they
are very difficult to schedule efficiently, since the application does not know in advance
which nodes will be visible or how much time any given callback might consume. They can
take an arbitrary amount of time to execute, and generally block further processing of
culling or drawing (blocking on draw can cause “bubbles” or stalls in hardware queues).
They are also somewhat scattered in terms of cache coherence and branch
prediction—similar operations are almost never performed in repeated series. In Performer
apps, for example, callbacks were sometimes found to cause CPU bottlenecks and
non-deterministic behaviors.

The second drawback of callbacks is more complicated. Since app-side callbacks need to
be invoked before the culling or drawing traversals begin (since the app can change the
positions of objects, moving them in and out of view, for example), the app traversal
generally visits every object in the scenegraph, even those that are way off screen. This can
be very costly and ultimately defeats the advantages that culling gives over a brute-force
immediate mode implementation.

A better system might do some culling first and then do per-node processing based on how
close an object was to being in view. Far away objects usually need limited processing,
usually just to determine when they will enter the view. And the app process may move an
object. So there’s still a cyclic dependency between this optimization and culling which
needs to be addressed.

Inventor

Inventor existed at SGI at the roughly same time as Performer with a very different
approach. The goal there was usability over performance. The result was a very elaborate
and highly re-usable set of scenegraph nodes, but at the cost of performance. So much so
that Inventor was relegated to academic projects and rapid prototyping but to my
knowledge, no serious (i.e., high performance) real-time efforts. Many people tried to mix
Performer and Inventor to get the best of both worlds, but this was almost always a dead
end.

Adding Event Nodes


Event nodes were a later addition to systems like Inventor and its descendent, VRML. The
idea behind a scenegraph event system is fairly clever in theory. If the camera or observer
is an object in the scenegraph, we can test to see when this object collides with one or more
invisible “trigger” volumes also in the scenegraph. A trigger or sensor object could be linked
to an effector or action object that would animate a node, for example. Events could be
mouse or keyboard based too, so if you click on a 3D button, something else happens in the
virtual world.

In this way, one could write an entire user-interactive program in a scenegraph. Doors could
be opened, lights turned on by flicking virtual switches, and so on. All data driven.

VRML

Virtual Reality Modeling/Markup Language was the extension of Inventor, drafted after
many competing forces finally came together (lead by SGI at the time). It was very similar to
Inventor in form and function and suffered from many of the same performance disabilities.
But the main benefit was that it was highly self-contained and simple to transport across
network connections. It also added concepts for extensibility and portability that Inventor
largely lacked (being SGI-specific) and is now being further revised in something called
Web3D or X3D or VRML200x.

Special Nodes
Body and Facial Animation

X3D and MPEG-4 add special node types for Body and Facial Animations, since for
humans, there are some clever ways to extract differences from a standard (implicit) model
for better compression. We can encode phonetic visual expressions (visemes) as well as
joint animations for elbows and wrists using many fewer bits than if we were coding these
things generically.

GeoSpatial

GeoSpatial problems (like drawing the entire earth) require some special nodes to deal with
the inherent hardware precision limitations of graphics hardware, namely single precision
floating point. True geospatial information requires more than 23 bits of mantissa to properly
represent and scenegraphs are generally done using 32-bit floats, so we add some new
GeoNode types to various scenegraph schemes. GeoVrml is one such approach, driven
largely by the folks at SRI. Keyhole used its own approach for EarthViewer.

PVS

“Potential Visual Set” is a broad term for a sort of generalized culling technique. In basic
culling, we take the entire scene and recursively find which objects fall on or within some
bounding volume, usually a frustum (a truncated pyramid, approximating the viewing
volume). In generalized culling, we might have pre-computed lists of objects that are
spatially grouped (like “group” nodes, only they need not be hierarchically associated) and
probably visible at the same time. Other techniques might make use of shadow or blocker
objects that rule out certain regions of space.

The “Cell and Portal” approach, for example, usually groups the world into rooms or cells,
with each cell having a list of objects in it and a list of portals, doors, or other connections to
the adjacent (or even distant but connected) cells. When a portal is deemed visible, the
culling routine looks at the portal’s connected cell and checks all of its portals, and so on
and so on recursively, each time adding (ORing) the overall set of visible objects and each
time, reducing (ANDing) the frustum to the portal (door) we can see through. In simpler
implementations, objects within a single cell are considered visible whenever their cell is
culled in. Often traditional spatial culling is used to further narrow the visible set.

What’s most interesting about Cells and Portals is that it can also generalize the notion of
rendering to framebuffers and destinations and make use of standins or impostors. A
doorway can be a portal to another room, or it can be implemented as a textured polygon,
pre-rendered from an image of that room from the correct perspective. If it’s done right,
there’s no way to tell the difference. Mirrors are implemented in much the same way. A
mirror can be rendered by inverting the view matrix and projecting the camera through the
mirror, then drawing normally into the framebuffer. Or it can be rendered by projecting the
camera through the mirror, rendering the scene to a texture, and applying the texture to the
mirror as a painting.

The downside of PVS techniques is that they’re usually added to scenegraphs as an


afterthought and not built in from the ground up. NetImmerse is/was a game engine that
made extensive use of Cells and Portals.

Inventor Revisited
Inventor is easy to use. It provides a rich set of node types which make it easy to get
something up and running quickly. And it adds some nice 3D GUI types too, which make
producing a finished application that much quicker.

However, Inventor is a poor performer. It suffers from some critical design flaws, such as
virtualizing all interfaces, even to atomic data members, which doesn’t help performance
any (even COM objects try not to virtualize member getters and setters). But the biggest
flaw is in the execution model, the active nodes in the scenegraph consume CPU time while
the scene is being rendered. And since all nodes must be visited, view frustum culling is not
common, even at the rendering stage. So richly immersive scenes will be slow unless the
programmer makes the effort to optimize it by him or herself.

VRML Revisited
VRML suffers from many of the same performance limitations as Inventor. It’s nice to be
able to specify what are essentially dataflow programs right in the scenegraph by hooking
sensors to effectors using routes or linkages and place active clickable objects in the world
with a few lines of text. But VRML suffers from a severe namespace problem, where
declared objects can be ambiguously or incompletely defined (via dangling external
references) and so on.

Just looking at the dataflow problem gives some sense of how buggy a VRML system can
be. If a scenegraph finds an effector node first and then finds a sensor node that drives the
effector, what is the proper way to process this? Do we normally process the effector node
first, then the sensor, thereby potentially computing the effector again this frame (risking an
infinite loop or at least a performance hit to fix the problem)? Or do we wait till next frame
where it may be too late? Or, perhaps, do we sort the entire scenegraph to make sure all
sensors come before their down-wind effectors (if that is even possible given the cyclic
possibilities)? This could bring up problems with state and transform dependencies and
make objects go haywire.

Given global DEF/USE semantics, can we have two objects using the same global name or
is this an error? It could be accidental. If so, did we mean to use the first one or the last
one? If we try to use the hierarchy to segment the namespace (as is done in Java, for
example), what happens when we subtly reorganize the scenegraph because two objects
that had been attached now can move independently (for example, a car riding on a moving
flatbed train now drives off at the station). What if we want to reorganize the scenegraph for
better state-based performance on different target hardware configurations? We could
easily break our nice scenegraph-based program in the process.

Scenegraphs Today
Scenegraphs today are quite sophisticated and quite readily
available, even free and open sourced. They're generally well Sidebar:
suited for cross-platform game development. But current
Scenegraphs and
"Scene Graphs"
I'm aware of:

OpenSG
Features

(recommended for
price/performance)
Open Scenegraph
Features and Goals

X3D - Overview

Java3D - Overview
(PDF)

Gizmo3D -Overview

RenderWare
Main Site

NetImmerse/Gamebryo
Main Site
scenegraphs do have some important weaknesses. One is an
overloading of the tree concept with all sorts of bells and
whistles that slow things down. Another is that without structural
changes, coordinating changes in distributed systems is difficult.
Very few of the current crop of scenegraphs were designed with
MMOGs in mind.

The heart of the problem is an overloading of what was once a


nice, straightforward performance improvement over immediate
mode OpenGL. We moved to hierarchies so we could cull and
draw more efficiently. Then we added in all this extra stuff, like
hanging ornaments on a Christmas tree, except that some of the
ornaments are nice juicy steaks and some are whole live cows.
They simply don't belong.

Put another way, the original transform-graph concept sought to


organize the visual database spatially to take advantage of
grouping proximate or linked objects. We propagated shared
spatial information up the tree, where we could make earlier
traversal decisions and save time in true log-n tree fashion.

But we have more than one way of organizing our visual


database. Culling and PVS techniques want to have spatially
organized databases for optimum performance. If the
scenegraph is instead organized largely by state, then we might
need to cull each 3-state tank (in the tank example) three times,
once for each articulated part, instead of being able to cull out
each tank once and only once. But if we want to get the best
hardware performance, we really do want to sort the visible set
by the most expensive state changes first. Moreover, since
states don’t change that often, we don’t want to re-sort the scene
every frame. But if we start with a spatial view each time and sort
only the visible objects, that seems that’s we’re stuck with (as
was the case with Performer, believe it or not). If we re-sort the
whole scenegraph for state optimization (only once, hopefully),
we lose the nice spatial coherence we count on for fast culling.

Given that we want to hook some nodes up to other nodes to


enable event processing, we’d also like a guaranteed consistent
way of naming objects that doesn’t change after spatial or state
sorting or doesn’t even change if parts of the scene are currently
loaded or not (early scenegraphs were entirely memory
resident). We want a logical or semantic naming scheme, like in
namespaces. We want handles that persist and reflect
structures that may not even be local.

By executing actions at each node during a depth-first traversal,


we are most likely invoking bits of code in an arbitrary (almost
random) order. This runs counter to the advanced scheduling
many compilers try to do to take advantage of CPU branch
prediction and pipelining, instruction pre-fetch and high-speed Proto Games / Wild
local caching, to name a few. Instruction and Data Cache misses Magic - Homepage
can affect performance by up to 10x on many systems. So
doesn’t it make sense, that if we have 100 physics nodes and
100 inverse kinematics animation nodes, we try to process those
nodes together, just like we tried to do for state (especially for
systems with special vectorizing or SIMD capabilities). So this
gives yet another competing organizational approach to how to
optimize the scenegraph.

Put all of these together and it’s easy to see that the current
evolution of scenegraphs has taken a wrong turn somewhere.
And it will require a change in approach to move past the
roadblock.
(recommended for Vis-Sim)
OpenPerformer

Overview

Note: if a scenegraph
is not recommended, it
may simply mean I
probably haven't
evaluated it yet.

Scenegraph's Tomorrow
Granted, it is probably impossible to find a single perfect organization for a scenegraph that
simultaneously optimizes for spatial, state, semantic, and CPU considerations. Some
people try to hand-design theirs to straddle the fence and make the best of what they have.
But a better idea is to remove one of the fundamental constraints: that there need be a
single scenegraph organization for a given visual database.

It is entirely possible that we can have a single set of objects, call it an object soup, but
have two, three, four, or more hierarchies linking these objects into independent and
complimentary organizations. It’s been on the wish of a number of scenegraph designers for
years, though it’s never been a requirement before distributed databases came along.

But how to implement this is another matter. The solution, it seems, lies in the separation of
concepts of scenegraph “nodes” from the “objects” they represent. By making shared
objects live in a soup, we minimize the amount of waste and miscoordination we might see
with four or more simultaneous object hierarchies. This way the “node” part of an object is
just a few bytes – just enough to point to the object in the soup and to the
parent/child/sibling relationships in this particular view. All of the real “meat” is kept once in
the object, which ideally contains back pointers to each node in each graph, limited to a
small number like four.

Is this rocket science? Not really. Relational databases have separated indices from data
since the dawn of time. And scenegraphs are just one way of indexing into big visual
databases. Once scenegraph designers come to grips with that, the rest is downhill.

The second problem is how to correlate among multiple database views (i.e., sets of
indices). Since lightweight nodes in two views point back to the same object, it’s easy to see
how given a node in one database view, we could find the corresponding node another view
-- just follow the back pointers. This lets us cull using the optimized spatial view and render
using the hardware-optimized state view.

The heart of an efficient distributed database implementation, then, is using the spatial view
to limit what happens in the other views (rendering, culling, physics, animation, and so on)
and distributing changes in the spatial view among disparate systems. The state, semantic,
and application views do not generally change, except for visibility and priority per time
interval, so the real meat of the task is in synchronizing the spatial views.

Semantic View
The semantic or logical view of a visual database is just a convenient way of accessing
objects in the object soup. Think of it as the google (albeit local, not web-wide) of visual
databases. The organization is arbitrary and entirely up to the developer. A developer might
use the semantic view as a large dictionary of objects, organized by object type, subtype
and so on. Or a game may divide objects up by their role in game play. But the main idea is
that the leaves of this tree are the actual objects in the world.

What’s important is that the logical/semantic structure is well known (published) for all
concurrent developers to use. It is a rendezvous point, as well a convenience.

But it can be used for more elaborate schemes as well. For example, if the semantic view is
organized into “vehicles” and then “cars” under that, we could perform some operation on
all of the game universe’s cars at once (perhaps, proximity tracking).

And there is no reason why objects could not be located under more than one branch of the
semantic tree. There could be a branch called “physical objects” as well as the “vehicle/car”
branch. One could set the physics computation process everything under the “physics
objects” branch automatically.

State View
As discussed earlier, the State View is intended to be a platform-specific state sort and
state aggregation view. For a platform on which texture fetching is very expensive, we might
see textureIDs as the most significant branches in the tree, thereby minimizing the number
of textureID changes. On another platform, lighting mode might be more expensive to
change. The State View can generally be computed on the client at load-time and does not
change much. Which objects are on or off does, but their fundamental draw order does not.

One exception to that rule is depth-sorted objects, like transparent polygons. Here, we
might have a branch of the state view that is somewhat dynamic without slowing down the
rest of the system.

Shading Languages

One of the latest buzzwords in modern computer graphics is Shading Languages. The main
idea is that complex images can be constructed by mathematically combining (adding,
subtracting, multiplying, etc..) many simpler images, often through a small assembly
language program instead of using actual framebuffer operations. For example, a nice 3D
bump mapped brick texture (where the bump mapping provides nice light and shadow cues
to make the brick seem more 3D) might be described as a combination of a flat red texture,
two or three rendering stages of bump mapping (rendering light and shadows), a light map
for global shadows, and perhaps a specular highlight map if the object has little glass or
metal bits.

Shaders can be expressed as programs, algorithms, or as a “shading tree,” where the


constituent sub-shaders are broken down in hierarchical fashion, like we see for spatial
transformations. This shading tree might be explicit, if the underlying scenegraph supports
such advanced concepts, or it might be implicit, as an abstract representation of (for
purposes of understanding) some pre-compiled code.

It’s important to realize that the shading tree we see could easily vary from hardware
platform to hardware platform, depending on the graphics capabilities and other factors. For
example, some hardware supports advanced bump mapping in a single operation – so the
state tree node in that case would be one node. Other hardware might not support bump
mapping at all, but we can still achieve bump mapping effects by making multiple simpler
rendering passes (one for the texture, one for light areas and one for dark areas). So the
shading tree in that case might have a parent node with three children, representing the
three passes.
(note: here, boxes are states and circles are rendering objects). Shading trees will also vary
from software API to software API. But since we’ve separated out the notion of our Spatial
View (see below) from the State View, this affords us a good place to handle the interface
between with underlying graphics APIs we might want to use (such as OpenGL or DirectX).

Application View
The presence of an Application View is not a strict requirement. In fact, it is the least useful
view out of the bunch, mainly because compilers are so much better at scheduling code on
CPUs. And a big problem with data-driven programs is that they can be very hard to debug
and stamp out pseudo race conditions. But, on the other hand, they’re very nice for rapid
prototyping and platform-neutral abstraction. They're also quite useful for giving game
players the ability to dynamically change game behaviors (e.g., mod programming or simple
tunability).

Spatial View
The Spatial View, on the other hand, is the most important view from a distributed database
point of view (and with all of the MMOG pushes out there, who isn't building a distributed
database these days?). By organizing the world into spaces and sub-spaces, we can
efficiently decide how to route messages, prioritize computations, and cull the database to
minimize rendering time and network traffic.

The subdivision of the world into hierarchical spaces is not arbitrary, but there are a number
of valid schemes for doing so. What is important is that the subdivision scheme be fairly
well tuned to the culling procedure, that no node has too many or too few children (i.e., too
tall or wide a tree). In other words, the same rules that apply to well balanced trees in
general.

The choice of whether spaces are static or dynamic is also open. For quad tree schemes,
the subdivision is relatively static. If an object moves, it might cause new quad cells to be
created or destroyed, but no quad-cells ever move. For spheres or bounding box trees, the
bounding volumes will likely move as the objects they contain move. Rules for stretching
volumes and forcing children in or out of them are also flexible and fairly easy to impement
as iterative solutions (with local, not global optimization). In this scheme, bounding volumes
can overlap, but they need not do so.

It’s not even a problem for an object to be contained in multiple bounding volumes as long
as it's not culled in or out more than once per frame. I've played with systems with "floating"
spaces that group objects for lighting purposes (e.g., all objects that are affected by a light
are in one space). Grouping objects in formation is another useful extension. A group of
tanks or fighters can be dynamically gathered by their proximity and culled as a group, even
if there isn't a single "parent" node in the traditional scenegraph sense.

Summary
I've covered the basics of scenegraphs, where they came from, where they are, and where I
think they're going, at least from one point of view. Much of this work is related to on-going
development of a so-called "multi-view" scenegraph. The ultimate goal of this work is to
come up with a simple, light-weight system for optimizing rendering across many platforms.
Look for future articles on my progress with this work.

This document intentionally doesn't directly address whether you should or shouldn't use a
scenegraph in your 3D app. I trust that given the full facts you'll know best what you need.
But for those people who dismiss scenegraphs out of hand, I hope this article does at least
shed some light on the likelihood that you are using a scenegraph in one way or another,
whether you call it "portals," bones," "linked matrices," or anything else. Because when it
comes down to it, this is all just common sense and experience put to work.

About the Author

Avi Bar-Zeev has over twelve years experience building applications in 3D


entertainment and visual simulation. He is a co-founder of Keyhole, Inc..
(developer of EarthViewer, now callled "Google Earth") and was an early
employee of Intrinsic Graphics. Early in his career, he helped develop Disney's
Aladdin's Magic Carpet VR Ride and has since developed novel visibility culling,
scenegraph, rendering, database, and network coordination technologies for
clients and past employers. He is currently working as an independent
consultant.

Feel free to contact him if you have questions or comments on this work.

Back to the top

Back to RealityPrime Technology Services

This work is licensed under a Creative Commons License.

You might also like