3D Computer Graphics
use as his telephone. That wish has come true, since I no longer
know how to use my telephone.
- Bjarne Stroustrup -
- Images as information carrier
- Three-dimensional geometry
- The creation of perspective
- Removal of hidden surfaces
- Reflection and illumination
- Ray Tracing
- Volume Rendering
- Radiosity
- Textures
The use of images as carrier for information has a lot of advantages towards text-based
information. At first everyone who is able to use his eyes can interpret
a picture, while text can be written in different languages. A lot of
information can be represented in one picture and complex relations can
be shown. The human brain has several ways to accumulate information
(associative, spatial, episodic, declarative or through procedures) but
memorizing context through images (spatial) is one of the best ways to store
information.
Of course, there is a trade-off for using images, because
this way of representation does also have disadvantages. Images are not
sortable, so it is difficult to find specific information in a huge
collection of them. The visualization of information is not specified, so the
same picture can be interpreted different. At last, a picture can have a
lot of redundant information and the relevant part is not automatically
conveyed to the viewer. But images are obviously a great supplement for text or
audio information.
For a lot of applications, images have to contain 3D
information. Computer generated special effects are used in movies or
advertises, scientific visualization wants to represent high-dimensional
information and not to mention computer games or applications based on Virtual
Reality. The main task now is to reduce 3D information (or even a higher
dimension!) to 2D, as if you were viewing a 3D world through a 2D window. There
are many forms of depth-cueing, so that the viewer is able to recognize the
"lost" dimension. Sizes and lighting can be reduced at distance, linear
perspective, shadows and distant objects being blocked from view by closer ones
round out the impression of volume. Many of the methods used in
three-dimensional graphics are less than fifteen years old. In the following
chapters, I will describe the basics of these techniques without the depth, that
one will need who wants to program rendering software. It is just an
introduction to the most frequently used terms.
Three dimensional space has length, width and depth. These can be described with a
coordinate system containing (x, y, z) values corresponding to each dimension.
The most important tool for the generation of a 3D scene is the
transformation. That means translation, rotation and scaling of an
object. Such a transformation can be represented by a 3x3 matrix and a set of
transformations (without translation) can be combined into a single matrix by
superposition.
Because of the perspective (see below), a translation is
technically not linear. By increasing the dimensionality of the space, it is
possible to make the translation a linear transformation. That can be compared
with the introduction of imaginary numbers to the numerical system, which makes
a lot of difficult operations easier. These coordinates are called
homogeneous. A transformation matrix now contains 4x4 entries. A
vertex (corner point in space on object surface) is represented as
(X,Y,Z,w) for any scale factor w unequal 0.
When our goal is to
approximate a single rotation by incremental rotations we get into trouble.
Rotations are not interchangeable - rotating about X then Y doesn't give the
same answer as rotating around Y then X unless the angles are infinitely small.
An alternative is using quaternions. Like homogeneous coordinates they
use four coordinates rather than three, in order to avoid singularities.
Quaternions are more useful than just a compact description of a rotation or an
orientation. In their raw form they can be quickly multiplied together and
converted to a rotation matrix in one step: important qualities when creating a
real-time application. With a quaternion, all rotations can be described on a
four-dimensional sphere, each position on the surface of the sphere
represents a rotation / orientation. By finding the shortest path between two
points on the surface, a rotation can be approximated through
increments.
Another important entity of 3D graphics is the vector. It
is described by three scalar components and is thought of as a line running from
the origin of the coordinate system (0, 0, 0) through the specified coordinate.
Processing is often carried out using normal vectors. These are vectors
that are perpendicular to the plane of the polygon (a polygon is a
two-dimensional, flat shape bounded by straight lines going from one vertex to
the other). This normal vector can be found by taking the cross product of two
edge vectors of the polygon.
The viewing surface in computer graphics is flat, so we need to
create the illusion of a third dimension. Different methods have been used, but
the most realistic one only consists in a simple division. It is called
perspective projection and is characterized by a center of projection
which can be found in the prolongation of the screen's middlepoint normal.
A computation goes like this:
new_vertex_on_screen[X] = screen_center[X] + focal_distance *
(old_vertex_in_space[X] / old_vertex_in_space[Z])
new_vertex_on_screen[Y] =
screen_center[Y] - focal_distance * (old_vertex_in_space[Y] /
old_vertex_in_space[Z])
In this formula 'focal_distance' is a constant that
specifies how far the imaginary view point is behind the screen.
Another projection, which is a lot easier to perform is the orthogonal
projection. We simply scale two of the axes to a relevant size for the
screen and discard the remaining axis. Orthogonal projection is quite commonly
used in design applications where it's useful for the designer not to be
confused by changing scale. But three dimensional information (depth) can not be
conveyed, so perspective projection is most commonly used since it gives a more
realistic impression of the third dimension.
An obvious criteria any
object has to meet is to be in the field of view of the viewer (or viewing
pyramid). This field of view is represented by the view volume in 3d space. It
is defined by the projected planes from the screen's sides. Each polygon in the
scene has to be tested for being outside the viewer's field of view, including
behind him. A good optimization is the use of bounding volumes. These are
simplified objects (cubes, spheres) that accurately encase all of the object's
vertices. If any part of the bounding volume is outside of the viewing frustum,
the object itself has to be tested. The part's outside the view have to be cut
off from the polygon. This operation is called clipping. Polygon clipping
is always performed before the objects are projected to the screen.
If polygons are drawn in arbitrary order, distant objects will be overwritten by closer objects.
Thus, the number of polygons in a scene can be reduced. The simplest method is called
backface culling or backface removal. Polygons that are facing away from the
viewer are not drawn. This reduces the number of faces to be processed almost
by half and saves a lot of computing time.
Another method is
Z-buffering, which comes down to a point-by-point depth comparison. It
consists in keeping the z coordinate of every point we put on the screen in a
huge array. Points, that have to be drawn at the same (x,y) place are checked
for the minimum z value. The required computing time increases linearly
with the number of polygons. In software rendering, z-buffer techniques are slow
and they use a large amount of memory. But if it is supported by the graphics
hardware, z-buffering is superior to other techniques because the cost can be
almost zero. The only penalty is that it requires additional memory, either
hardware on-board memory or system memory.
An elegant and efficient
alternative for sorting the polygons is to arrange them in a Binary Space
Partition tree (BSP tree). Each node in the tree represents a division of
space. These partitions are recursively subdivided until each node only consists
of one single polygon. But most frequently, Z-buffering is used for the removal
of hidden surfaces, because it can be implemented very simply.
A reflection model describes the interaction of light with a surface. It is
important, because we can only see objects through these reflections.
Calculating the light is a complex process that can vary because light sources
can be of several different types. Directional light is assumed to be
parallel, originating from an infinitely distant source. Positional
sources diverge from a certain position. A lot of factors are needed for
light calculation: the polygon normal, the vertex position, the position and
direction of the light source and the viewer. Local lighting models
calculate only single reflections while global models also take multiple
reflections into account.
Reflections have to be calculated with several
methods, which are linearly combined at the end of the process. Ambient
lighting is the simplest to deal with. The ambient term is essentially
global light that illuminates totally evenly regardless of viewing position or
direction. In a local reflection model, the ambient component is modeled as a
constant.
When light hits a surface some of the light is scattered in all
directions and not all of it reaches the eye. This is called diffuse
reflection and all objects behave this way. The closer the angle of light
vector and normal vector, the more light is reflected and the surface appears to
be brighter. The viewing direction plays no part in determining the light level.
Objects tend to appear flat, when this method is used alone.
But smoother surfaces also have another property: they reflect light straight back
to the eye, just the way a perfect mirror works. The amount of light which
reaches the eye falls off rapidly as the angle between the reflection and eye
vector increases. This is seen as a sharp highlight which characterizes the
specular reflection, also called Phong shading. Only in an ideal
specular reflection, the whole wavelength (total intensity) of the incoming
light is reflected, since the light rays don't scatter.
Incremental shading techniques apply simple reflection models to polygons. Light intensities are
calculated at the vertices and then interpolated for interior points. A
well-known standard is Goroud shading, which is simple and economic. It
is normally restricted to the diffuse component, but the polygon does not appear
to be flat. Specular reflection can also be incorporated in the scheme. This
happens by adding a specular highlight in the middle of a polygon.
Ray Tracing is a superior technique to deal with illumination-reflection models. It's main
difference to the simple local reflection models is the depth, to which the
light interaction is examined. Very realistic images can be produced, but it
also takes a lot of computing time to process the complete traveling of
the light rays in the scene. The principle of Ray Tracing is based on the fact,
that an observer sees a point on a surface as a result of a light ray,
which has traveled from a source to the eye. Each pixel of the image is tracked
backwards from the observer into the scene to determine what object is
visible at this pixel and how that object is lit or shadowed. The color of this
pixel is the result of an analysis of the interaction at the first object
encountered during the backwards trace of the ray. Thus, hidden surface removal
and shading is also combined in this model. Spheres are used frequently in Ray
Tracing scenes, because they are easily calculated.
The result of this
rendering technique is often too good (super-real): edges and shadows
seem to be too sharp. This can be solved by increasing the number of computed
rays at these intersections. This advancement is called Distributed Ray
Tracing. The rays also produce blurred reflection by following additional
paths than the 'exact' predicted. Depth of field and penumbrae can be modeled
but the overhead often becomes impractical. By decreasing the computed depth to
which rays are traced, computing time can be cut down, but the amount of the
possible decrease depends on the scene. Highly reflective surfaces and
transparent objects require a high depth. Ray-traced images can take even days
to compute, so this method is excluded from the domain of real-time animation
for the near future. But it might be possible, that some day computer systems
will be able to do simple ray-tracing in real-time.
In Volume Rendering, the data representation is based on voxels. A voxel is the 3D
(spatial) equivalent to the pixel. Numerous small cubes as basic elements form
the represented objects. Medical imaging is one of the most popular
applications: image planes of x-ray data are reconstructed to a 3D model of the
scanned object. This method is called computed tomography (CT). In
contrary to the rendering of surfaces, we want to view the volume of the
body, so we are dealing with many nested surfaces, which have to be rendered
with different levels of opacity. Thus, each voxel has to contain the extra
information of color and opacity.
Basically, these values are
accumulated along the viewing direction, which means that for each shown pixel a
ray is 'fired' and all voxels, through which it is passing by, are processed.
Multiple reflection is not taken into account. Color is used to indicate
different objects or tissue types as well as the shape of an object. There are
two different methods for processing the final image. First, each voxel is
considered to absorb and to emit light. Images of this type appear to be
nebulous. Second, it is considered that light will be transmitted in volume
elements and reflected by surface-like planes. This method will create images
which show surfaces with even small details. Both methods can be combined to get
the desired result.
One big problem of Volume Rendering is the
aliasing effect, because usually the ray is not passing directly through
the middle of the voxels. This is causing jagged silhouette edges, which
can be reduced by supersampling. This means that the resolution of the
image is increased, so more rays are processed than necessary. At the end of the
process, the image is resampled to the desired resolution. This does not always
solve the problem because the effect is just shifted up the frequency spectrum,
but it is the easiest way to reduce aliasing artefacts.
Ray Tracing is an elegant technique but it is based only on specular reflection. To trace
diffuse interaction would need a very large number of rays at each point of a
surface and it is thus very difficult to process and render. One solution to
this is called Radiosity. The environment is divided into large elements
over which the illumination is constant. Of course, this is only useful in
scenes with mostly non-specular objects. Since the Radiosity method takes the
interaction of diffuse light between elements in a scene into account, it is
excellent for generating pictures of interior environments. In addition, the
calculation of the resulting illumination does not depend on the viewer
position. The form factors, which are used to characterize the effects of
the geometry of two surfaces and the radiative exchange between them, have to be
computed only once. Thus, different views are obtained easily from the general
solution. A combination of Radiosity and Ray Tracing is often used to include the modelling
of specular phenomena. Although Radiosity is a method to obtain most realistic images, it should be
mentioned, that the calculation of the form factors takes a long time and the processor demands are
therefore at least as heavy as for Ray Tracing.
Texture Mapping is used to add realism to computer graphics images, because it makes objects
seem more complex than they really are in their data representation. When mapping an image
onto an object, the color of the object at each pixel is modified by a corresponding color from
the image. This removes the plastic look of the flat surface which results from the trivial
distinction of colour used by the reflection models. Since the surface of an object usually has
edges, the image must be warped to match any distortion. By using transparency values in
the texture image as well as color values, transparent or semi-transparent images can be lain over
objects. This technique is useful for simulating clouds, in which the background shows through.
Texture Mapping can even be used to control shading across a surface (bump-mapping).
Another useful form of Texture Mapping is called environment mapping. It refers to
the process of reflecting the surrounding environment in a shiny object and can also be considered
as a simple form of Ray Tracing. In practice, four rays through a pixel point on the surface define
a reflection cone which gives an additional shading attribute for the pixel. This way, a texture
which approximately represents the environment is mapped onto the object. A very interesting use
for environment mapping is the faking of Phong Shading. You simply create a map with a blob on it,
which looks like a phong highlight.
Many graphics systems nowadays provide hardware that supports texture mapping, even
environment mapping. As a result, generating a texture mapped scene does not need to take longer
than generating a scene without texture mapping.