This article was written 2000 and will not be updated any more.



3D Computer Graphics


There's an old story about the person who wished his computer were as easy to
use as his telephone. That wish has come true, since I no longer
know how to use my telephone.
- Bjarne Stroustrup -

The use of images as carrier for information has a lot of advantages towards text-based information. At first everyone who is able to use his eyes can interpret a picture, while text can be written in different languages. A lot of information can be represented in one picture and complex relations can be shown. The human brain has several ways to accumulate information (associative, spatial, episodic, declarative or through procedures) but memorizing context through images (spatial) is one of the best ways to store information.

Of course, there is a trade-off for using images, because this way of representation does also have disadvantages. Images are not sortable, so it is difficult to find specific information in a huge collection of them. The visualization of information is not specified, so the same picture can be interpreted different. At last, a picture can have a lot of redundant information and the relevant part is not automatically conveyed to the viewer. But images are obviously a great supplement for text or audio information.

For a lot of applications, images have to contain 3D information. Computer generated special effects are used in movies or advertises, scientific visualization wants to represent high-dimensional information and not to mention computer games or applications based on Virtual Reality. The main task now is to reduce 3D information (or even a higher dimension!) to 2D, as if you were viewing a 3D world through a 2D window. There are many forms of depth-cueing, so that the viewer is able to recognize the "lost" dimension. Sizes and lighting can be reduced at distance, linear perspective, shadows and distant objects being blocked from view by closer ones round out the impression of volume. Many of the methods used in three-dimensional graphics are less than fifteen years old. In the following chapters, I will describe the basics of these techniques without the depth, that one will need who wants to program rendering software. It is just an introduction to the most frequently used terms.

Three dimensional space has length, width and depth. These can be described with a coordinate system containing (x, y, z) values corresponding to each dimension. The most important tool for the generation of a 3D scene is the transformation. That means translation, rotation and scaling of an object. Such a transformation can be represented by a 3x3 matrix and a set of transformations (without translation) can be combined into a single matrix by superposition.

Because of the perspective (see below), a translation is technically not linear. By increasing the dimensionality of the space, it is possible to make the translation a linear transformation. That can be compared with the introduction of imaginary numbers to the numerical system, which makes a lot of difficult operations easier. These coordinates are called homogeneous. A transformation matrix now contains 4x4 entries. A vertex (corner point in space on object surface) is represented as (X,Y,Z,w) for any scale factor w unequal 0.

When our goal is to approximate a single rotation by incremental rotations we get into trouble. Rotations are not interchangeable - rotating about X then Y doesn't give the same answer as rotating around Y then X unless the angles are infinitely small. An alternative is using quaternions. Like homogeneous coordinates they use four coordinates rather than three, in order to avoid singularities. Quaternions are more useful than just a compact description of a rotation or an orientation. In their raw form they can be quickly multiplied together and converted to a rotation matrix in one step: important qualities when creating a real-time application. With a quaternion, all rotations can be described on a four-dimensional sphere, each position on the surface of the sphere represents a rotation / orientation. By finding the shortest path between two points on the surface, a rotation can be approximated through increments.

Another important entity of 3D graphics is the vector. It is described by three scalar components and is thought of as a line running from the origin of the coordinate system (0, 0, 0) through the specified coordinate. Processing is often carried out using normal vectors. These are vectors that are perpendicular to the plane of the polygon (a polygon is a two-dimensional, flat shape bounded by straight lines going from one vertex to the other). This normal vector can be found by taking the cross product of two edge vectors of the polygon.

Perspective
Perspective projection.

The viewing surface in computer graphics is flat, so we need to create the illusion of a third dimension. Different methods have been used, but the most realistic one only consists in a simple division. It is called perspective projection and is characterized by a center of projection which can be found in the prolongation of the screen's middlepoint normal.

A computation goes like this:
new_vertex_on_screen[X] = screen_center[X] + focal_distance * (old_vertex_in_space[X] / old_vertex_in_space[Z])
new_vertex_on_screen[Y] = screen_center[Y] - focal_distance * (old_vertex_in_space[Y] / old_vertex_in_space[Z])

In this formula 'focal_distance' is a constant that specifies how far the imaginary view point is behind the screen.

Another projection, which is a lot easier to perform is the orthogonal projection. We simply scale two of the axes to a relevant size for the screen and discard the remaining axis. Orthogonal projection is quite commonly used in design applications where it's useful for the designer not to be confused by changing scale. But three dimensional information (depth) can not be conveyed, so perspective projection is most commonly used since it gives a more realistic impression of the third dimension.

An obvious criteria any object has to meet is to be in the field of view of the viewer (or viewing pyramid). This field of view is represented by the view volume in 3d space. It is defined by the projected planes from the screen's sides. Each polygon in the scene has to be tested for being outside the viewer's field of view, including behind him. A good optimization is the use of bounding volumes. These are simplified objects (cubes, spheres) that accurately encase all of the object's vertices. If any part of the bounding volume is outside of the viewing frustum, the object itself has to be tested. The part's outside the view have to be cut off from the polygon. This operation is called clipping. Polygon clipping is always performed before the objects are projected to the screen.

If polygons are drawn in arbitrary order, distant objects will be overwritten by closer objects. Thus, the number of polygons in a scene can be reduced. The simplest method is called backface culling or backface removal. Polygons that are facing away from the viewer are not drawn. This reduces the number of faces to be processed almost by half and saves a lot of computing time.

Another method is Z-buffering, which comes down to a point-by-point depth comparison. It consists in keeping the z coordinate of every point we put on the screen in a huge array. Points, that have to be drawn at the same (x,y) place are checked for the minimum z value. The required computing time increases linearly with the number of polygons. In software rendering, z-buffer techniques are slow and they use a large amount of memory. But if it is supported by the graphics hardware, z-buffering is superior to other techniques because the cost can be almost zero. The only penalty is that it requires additional memory, either hardware on-board memory or system memory.

An elegant and efficient alternative for sorting the polygons is to arrange them in a Binary Space Partition tree (BSP tree). Each node in the tree represents a division of space. These partitions are recursively subdivided until each node only consists of one single polygon. But most frequently, Z-buffering is used for the removal of hidden surfaces, because it can be implemented very simply.

Diffuse reflection
Sphere under diffuse reflection.

A reflection model describes the interaction of light with a surface. It is important, because we can only see objects through these reflections. Calculating the light is a complex process that can vary because light sources can be of several different types. Directional light is assumed to be parallel, originating from an infinitely distant source. Positional sources diverge from a certain position. A lot of factors are needed for light calculation: the polygon normal, the vertex position, the position and direction of the light source and the viewer. Local lighting models calculate only single reflections while global models also take multiple reflections into account.

Reflections have to be calculated with several methods, which are linearly combined at the end of the process. Ambient lighting is the simplest to deal with. The ambient term is essentially global light that illuminates totally evenly regardless of viewing position or direction. In a local reflection model, the ambient component is modeled as a constant.

When light hits a surface some of the light is scattered in all directions and not all of it reaches the eye. This is called diffuse reflection and all objects behave this way. The closer the angle of light vector and normal vector, the more light is reflected and the surface appears to be brighter. The viewing direction plays no part in determining the light level. Objects tend to appear flat, when this method is used alone.

Specular reflection
Sphere under specular reflection.

But smoother surfaces also have another property: they reflect light straight back to the eye, just the way a perfect mirror works. The amount of light which reaches the eye falls off rapidly as the angle between the reflection and eye vector increases. This is seen as a sharp highlight which characterizes the specular reflection, also called Phong shading. Only in an ideal specular reflection, the whole wavelength (total intensity) of the incoming light is reflected, since the light rays don't scatter.

Incremental shading techniques apply simple reflection models to polygons. Light intensities are calculated at the vertices and then interpolated for interior points. A well-known standard is Goroud shading, which is simple and economic. It is normally restricted to the diffuse component, but the polygon does not appear to be flat. Specular reflection can also be incorporated in the scheme. This happens by adding a specular highlight in the middle of a polygon.

Ray Tracing is a superior technique to deal with illumination-reflection models. It's main difference to the simple local reflection models is the depth, to which the light interaction is examined. Very realistic images can be produced, but it also takes a lot of computing time to process the complete traveling of the light rays in the scene. The principle of Ray Tracing is based on the fact, that an observer sees a point on a surface as a result of a light ray, which has traveled from a source to the eye. Each pixel of the image is tracked backwards from the observer into the scene to determine what object is visible at this pixel and how that object is lit or shadowed. The color of this pixel is the result of an analysis of the interaction at the first object encountered during the backwards trace of the ray. Thus, hidden surface removal and shading is also combined in this model. Spheres are used frequently in Ray Tracing scenes, because they are easily calculated.

The result of this rendering technique is often too good (super-real): edges and shadows seem to be too sharp. This can be solved by increasing the number of computed rays at these intersections. This advancement is called Distributed Ray Tracing. The rays also produce blurred reflection by following additional paths than the 'exact' predicted. Depth of field and penumbrae can be modeled but the overhead often becomes impractical. By decreasing the computed depth to which rays are traced, computing time can be cut down, but the amount of the possible decrease depends on the scene. Highly reflective surfaces and transparent objects require a high depth. Ray-traced images can take even days to compute, so this method is excluded from the domain of real-time animation for the near future. But it might be possible, that some day computer systems will be able to do simple ray-tracing in real-time.

In Volume Rendering, the data representation is based on voxels. A voxel is the 3D (spatial) equivalent to the pixel. Numerous small cubes as basic elements form the represented objects. Medical imaging is one of the most popular applications: image planes of x-ray data are reconstructed to a 3D model of the scanned object. This method is called computed tomography (CT). In contrary to the rendering of surfaces, we want to view the volume of the body, so we are dealing with many nested surfaces, which have to be rendered with different levels of opacity. Thus, each voxel has to contain the extra information of color and opacity.

Basically, these values are accumulated along the viewing direction, which means that for each shown pixel a ray is 'fired' and all voxels, through which it is passing by, are processed. Multiple reflection is not taken into account. Color is used to indicate different objects or tissue types as well as the shape of an object. There are two different methods for processing the final image. First, each voxel is considered to absorb and to emit light. Images of this type appear to be nebulous. Second, it is considered that light will be transmitted in volume elements and reflected by surface-like planes. This method will create images which show surfaces with even small details. Both methods can be combined to get the desired result.

One big problem of Volume Rendering is the aliasing effect, because usually the ray is not passing directly through the middle of the voxels. This is causing jagged silhouette edges, which can be reduced by supersampling. This means that the resolution of the image is increased, so more rays are processed than necessary. At the end of the process, the image is resampled to the desired resolution. This does not always solve the problem because the effect is just shifted up the frequency spectrum, but it is the easiest way to reduce aliasing artefacts.

Ray Tracing is an elegant technique but it is based only on specular reflection. To trace diffuse interaction would need a very large number of rays at each point of a surface and it is thus very difficult to process and render. One solution to this is called Radiosity. The environment is divided into large elements over which the illumination is constant. Of course, this is only useful in scenes with mostly non-specular objects. Since the Radiosity method takes the interaction of diffuse light between elements in a scene into account, it is excellent for generating pictures of interior environments. In addition, the calculation of the resulting illumination does not depend on the viewer position. The form factors, which are used to characterize the effects of the geometry of two surfaces and the radiative exchange between them, have to be computed only once. Thus, different views are obtained easily from the general solution. A combination of Radiosity and Ray Tracing is often used to include the modelling of specular phenomena. Although Radiosity is a method to obtain most realistic images, it should be mentioned, that the calculation of the form factors takes a long time and the processor demands are therefore at least as heavy as for Ray Tracing.

Texture Mapping is used to add realism to computer graphics images, because it makes objects seem more complex than they really are in their data representation. When mapping an image onto an object, the color of the object at each pixel is modified by a corresponding color from the image. This removes the plastic look of the flat surface which results from the trivial distinction of colour used by the reflection models. Since the surface of an object usually has edges, the image must be warped to match any distortion. By using transparency values in the texture image as well as color values, transparent or semi-transparent images can be lain over objects. This technique is useful for simulating clouds, in which the background shows through. Texture Mapping can even be used to control shading across a surface (bump-mapping).

Another useful form of Texture Mapping is called environment mapping. It refers to the process of reflecting the surrounding environment in a shiny object and can also be considered as a simple form of Ray Tracing. In practice, four rays through a pixel point on the surface define a reflection cone which gives an additional shading attribute for the pixel. This way, a texture which approximately represents the environment is mapped onto the object. A very interesting use for environment mapping is the faking of Phong Shading. You simply create a map with a blob on it, which looks like a phong highlight.

Many graphics systems nowadays provide hardware that supports texture mapping, even environment mapping. As a result, generating a texture mapped scene does not need to take longer than generating a scene without texture mapping.

Back to the top

XHTML conformity smbol CSS conformity symbol
Last revised on March 27, 2007