Computer Graphics: Difference between revisions

Latest revision as of 06:40, 7 June 2025

Basics of Computer Graphics

Homogeneous Coordinates

http://www.opengl-tutorial.org/beginners-tutorials/tutorial-3-matrices/

Points and vectors are represented using homogeneous coordinates in computer graphics.
This allows affine transformations in 3D (i.e. rotation and translation) to be represented as a matrix multiplication.
While rotations can typically be represented in a 3x3 matrix multiplication, a translation requires a shear in 4D.

Points are \(\displaystyle (x,y,z,1)\) and vectors are \(\displaystyle (x,y,z,0)\).
The last coordinate in points allow for translations to be represented as matrix multiplications.

Notes

The point \(\displaystyle (kx, ky, kz, k)\) is equivalent to \(\displaystyle (x, y, z, 1)\).

Affine transformations consist of translations, rotations, and scaling

Translation Matrix

\(\displaystyle T = \begin{bmatrix} 1 & 0 & 0 & X\\ 0 & 1 & 0 & Y\\ 0 & 0 & 1 & Z\\ 0 & 0 & 0 & 1 \end{bmatrix} \)

Rotation Matrix

Rotations can be about the X, Y, and Z axis.
Below is a rotation about the Z axis by angle \(\displaystyle \theta\).
\(\displaystyle R = \begin{bmatrix} \cos(\theta) & -\sin(\theta) & 0 & 0\\ \sin(\theta) & \cos(\theta) & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 \end{bmatrix} \)

To formulate a rotation about a specific axis, we use Wikipedia:Rodrigues' rotation formula.
Suppose we want to rotate by angle \(\displaystyle \theta\) around axis \(\displaystyle \mathbf{k}=(k_x, k_y, k_z)\).
Let \(\displaystyle \mathbf{K} = [\mathbf{k}]_{\times} = \begin{bmatrix} 0 & -k_z & k_y\\ k_z & 0 & -k_x\\ -k_y & k_x & 0 \end{bmatrix}\)
Then the rotation matrix is \(\displaystyle \mathbf{R} = \mathbf{I}_{3} + (\sin \theta)\mathbf{K} + (1 - \cos \theta)\mathbf{K}^2\)
Here the 4x4 form is: \(\displaystyle R = \begin{bmatrix} \mathbf{R} & \mathbf{0}\\ \mathbf{0}^T & 1 \end{bmatrix} \)

Scaling Matrix

\(\displaystyle S = \begin{bmatrix} X & 0 & 0 & 0\\ 0 & Y & 0 & 0\\ 0 & 0 & Z & 0\\ 0 & 0 & 0 & 1 \end{bmatrix} \)

Transformation matrix

\(\displaystyle L = T * R * S \)

Depending on implementation, it may be more memory-efficient or compute-efficient to represent affine transformations as their own class rather than 4x4 matrices. For example, a rotation can be represented with 3 floats in angle-axis or 4 floats in quaternion coordinates rather than a 3x3 rotation matrix.

For example, see

Eigen::Transform

Barycentric Coordinates

MVP Matrices

To convert from model coordinates \(\displaystyle v\) to screen coordinates \(\displaystyle w\), you do multiply by the MVP matrices \(\displaystyle w=P*V*M*v\)

The model matrix \(\displaystyle M\) applies the transform of your object. This includes the position and rotation. \(\displaystyle M*v\) is in world coordinates.
The view matrix \(\displaystyle V\) applies the inverse transform of your camera. \(\displaystyle V*M*v\) is in camera or view coordinates (i.e. coordinates relative to the camera).
The projection matrix \(\displaystyle P\) applies the projection of your camera, typically an orthographic or a perspective camera. The perspective camera shrinks objects in the distance.

Model Matrix

Order of matrices
The model matrix is the product of the element's scale, rotation, and translation matrices.
\(\displaystyle M = T * R * S\)

View Matrix

Reference
Lookat function
The view matrix is a 4x4 matrix which encodes the position and rotation of the camera.
Given a camera at position \(\displaystyle \mathbf p\) looking at target \(\displaystyle \mathbf t\) and up vector \(\displaystyle \mathbf u\).
We can calculate the forward vector (from target to position) as \(\displaystyle \mathbf{f}=\mathbf{p} - \mathbf{t}\).
We can calculate the right vector as \(\displaystyle \mathbf u \times \mathbf f\).
Then the view matrix is written as:

r_x r_y r_z 0
u_x u_y u_z 0
f_x f_y f_z 0
p_x p_y p_z 1

Matrix lookAt(camera_pos, target, up) {
  forward = normalize(camera - target)
  up_normalized = normalize(up)
  right = normalize(cross(up, forward)
  // Make sure up is perpendicular to forward
  up = normalize(cross(forward, right)
  m = stack([right, up, forward, camera], 0)
  return m
}

Perspective Projection Matrix

https://www.scratchapixel.com/lessons/3d-basic-rendering/perspective-and-orthographic-projection-matrix/building-basic-perspective-projection-matrix

https://www.songho.ca/opengl/gl_projectionmatrix.html

The projection matrix applies a perspective projection based on the field of view of the camera. This is done dividing the x,y view coordinates by the z-coordinate so that further object appear closer to the center. Note that the output is typically in normalized device coordinates (NDC) \(\displaystyle [-1, 1]\times[-1, 1]\) rather than image coordinates \(\displaystyle {0, ..., W-1} \times {0, ..., H-1}\). Additionally, in NDC, the y-coordinate typically points upwards unlike image coordinates.

The Z-coordinate in the projection matrix represents a remapped version of the z-depth, i.e. depth along the camera forward axis. In OpenGL, this maps z=-f to 1 and z=-n to -1 where -z is forward.

Notes: In computer vision, this is analogous to the calibration matrix \(\displaystyle K\). It contains the intrinsic parameters of your pinhole camera such as field of view and focal length. The focal length determines the resolution of your output.

Inverting the projection

If you have the depth (either z-depth or euclidean depth), you can invert the projection operation.
The idea is to construct a ray from the camera to the pixel on a plane of the viewing frustrum and scale the distance accordingly.

See stackexchange.

Shading

Interpolation

Flat shading - color is computed for each face/triangle.
Gourard shading - color is computed for each vertex and interpolated.
Phong shading - color is computed for each pixel with the normal vector interpolated from each vertex.

Lambert reflectance

This is a way to model diffuse (matte) materials.

\(\displaystyle I_D = (\mathbf{L} \cdot \mathbf{N}) * C * I_{L}\)

\(\displaystyle \mathbf{N}\) is the normal vector.
\(\displaystyle \mathbf{L}\) is the vector to the light.
\(\displaystyle C\) is the color.
\(\displaystyle I_{L}\) is the intensity of light.

Phong reflection model

See scratchapixel phong shader BRDF.
This is a way to model specular (shiny) materials.

Here, the image is a linear combination of ambient, diffuse, and specular colors.

If \(\displaystyle \mathbf{N}\) is the normal vector, \(\displaystyle \mathbf{V}\) is a vector from the vertex to the viewer, \(\displaystyle \mathbf{L}\) from the light to the vertex, and \(\displaystyle \mathbf{R}\) the incident vector (i.e. \(\displaystyle \mathbf{L}\) rotated 180 around \(\displaystyle \mathbf{N}\)) then

Ambient is a constant color for every pixel.
The diffuse coefficient is \(\displaystyle \mathbf{N} \cdot \mathbf{L}\).
The specular coefficient is \(\displaystyle (\mathbf{R} \cdot \mathbf{V})^n\) where \(\displaystyle n\) is the shininess.

The final color is \(\displaystyle k_{ambient} * ambientColor + k_{diffuse} * (\mathbf{N} \cdot \mathbf{L}) * diffuseColor + k_{specular} * (\mathbf{R} \cdot \mathbf{V})^n * specularColor\).

Notes

The diffuse and specular components need to be computed for every visible light source.

Physically Based

See pbs disney brdf notes and the pbr-book
In frameworks and libraries, these are often refered to as standard materials or in Blender, Principled BSDF.

Blending and Pixel Formats

Pixel Formats

Blending

To output transparent images, i.e. images with alpha, you'll generally want to blend using Premultiplied Alpha. Rendering in premultiplied alpha prevents your RGB color values from getting mixed with the background color empty pixels.

Rendering

For rasterization, the render loop typically consists of:

Render the shadow map.
Render all opaque objects front-to-back.
1. Opaque objects write to the depth buffer.
Render all transparent objects back-to-front
1. Transparent objects do not write to the depth buffer.

Rendering opaque objects front to back minimizes overdraw, where a pixel gets drawn to multiple times in a single frame. Rendering transparent objects back to front is needed for proper blending of transparent materials.

Anti-aliasing

For a high-quality anti-aliasing, you'll generally want to multiple multi-sampling (MSAA). This causes the GPU to render the depth buffer at a higher resolution to determine the contribution of your fragment shader's color to the final image.

See https://learnopengl.com/Advanced-OpenGL/Anti-Aliasing#:~:text=How%20MSAA%20really%20works%20is,buffer%20to%20determine%20subsample%20coverage for more details.

More Terms

Diffuse reflection - reflection scattered in many directions (i.e. matte)
Specular reflection - mirror reflection
Refraction - change in direction of light as it passes through a material

Resources

Udacity Interactive 3D Graphics

@@ Line 6: / Line 6: @@
 Points and vectors are represented using homogeneous coordinates in computer graphics.
 This allows affine transformations in 3D (i.e. rotation and translation) to be represented as a matrix multiplication.
-While rotations can typically be represented in a 3x3 matrix multiplication, a translation requires a [[wikipedia:Shear mapping ''shear'']] in 4D.
+While rotations can typically be represented in a 3x3 matrix multiplication, a translation requires a [[wikipedia:Shear mapping | ''shear'']] in 4D.
 Points are <math>(x,y,z,1)</math> and vectors are <math>(x,y,z,0)</math>.
@@ Line 66: / Line 66: @@
 \end{bmatrix}
 </math>
+===Transformation matrix===
+<math>
+L = T * R * S
+</math>
+Depending on implementation, it may be more memory-efficient or compute-efficient to represent affine transformations as their own class rather than 4x4 matrices. For example, a rotation can be represented with 3 floats in angle-axis or 4 floats in quaternion coordinates rather than a 3x3 rotation matrix.
+For example, see
+* [https://eigen.tuxfamily.org/dox/classEigen_1_1Transform.html Eigen::Transform]
+===Barycentric Coordinates===
 ==MVP Matrices==
@@ Line 71: / Line 83: @@
 To convert from model coordinates <math>v</math> to screen coordinates <math>w</math>, you do multiply by the MVP matrices <math>w=P*V*M*v</math>
 * The model matrix <math>M</math> applies the transform of your object. This includes the position and rotation. <math>M*v</math> is in world coordinates.
-* The view matrix <math>V</math> applies the transform of your camera.
+* The view matrix <math>V</math> applies the inverse transform of your camera. <math>V*M*v</math> is in camera or view coordinates (i.e. coordinates relative to the camera).
 * The projection matrix <math>P</math> applies the projection of your camera, typically an orthographic or a perspective camera. The perspective camera shrinks objects in the distance.
@@ Line 108: / Line 120: @@
 [https://www.scratchapixel.com/lessons/3d-basic-rendering/perspective-and-orthographic-projection-matrix/building-basic-perspective-projection-matrix https://www.scratchapixel.com/lessons/3d-basic-rendering/perspective-and-orthographic-projection-matrix/building-basic-perspective-projection-matrix]
-Notes: In computer vision, this is called the calibration matrix <math>K</math>.
+https://www.songho.ca/opengl/gl_projectionmatrix.html
-It contains the intrinsic parameters of your pinhole camera such as field of view and focal length (which determines the resolution of your output).
+The projection matrix applies a perspective projection based on the field of view of the camera. This is done dividing the x,y view coordinates by the z-coordinate so that further object appear closer to the center. Note that the output is typically in normalized device coordinates (NDC) <math>[-1, 1]\times[-1, 1]</math> rather than image coordinates <math>{0, ..., W-1} \times {0, ..., H-1}</math>. Additionally, in NDC, the y-coordinate typically points upwards unlike image coordinates.
+The Z-coordinate in the projection matrix represents a remapped version of the z-depth, i.e. depth along the camera forward axis. In OpenGL, this maps z=-f to 1 and z=-n to -1 where -z is forward.
+Notes: In computer vision, this is analogous to the calibration matrix <math>K</math>.
+It contains the intrinsic parameters of your pinhole camera such as field of view and focal length. The focal length determines the resolution of your output.
 ===Inverting the projection===
@@ Line 119: / Line 137: @@
 ==Shading==
 {{main | Wikipedia:Shading}}
-===Flat Shading===
+===Interpolation===
-===Gourard Shading===
-===Phong Shading===
+* Flat shading - color is computed for each face/triangle.
+* Gourard shading - color is computed for each vertex and interpolated.
+* Phong shading - color is computed for each pixel with the normal vector interpolated from each vertex.
+===Lambert reflectance===
+{{main | Wikipedia: Lambertian reflectance}}
+This is a way to model diffuse (matte) materials.
+<math>I_D = (\mathbf{L} \cdot \mathbf{N}) * C * I_{L}</math>
+* <math>\mathbf{N}</math> is the normal vector.
+* <math>\mathbf{L}</math> is the vector to the light.
+* <math>C</math> is the color.
+* <math>I_{L}</math> is the intensity of light.
+===Phong reflection model===
+{{main | Wikipedia: Phong reflection model}}
+See [https://www.scratchapixel.com/lessons/3d-basic-rendering/phong-shader-BRDF scratchapixel phong shader BRDF].
+This is a way to model specular (shiny) materials.
+Here, the image is a linear combination of ambient, diffuse, and specular colors.
+If <math>\mathbf{N}</math> is the normal vector, <math>\mathbf{V}</math> is a vector from the vertex to the viewer, <math>\mathbf{L}</math> from the light to the vertex, and <math>\mathbf{R}</math> the incident vector (i.e. <math>\mathbf{L}</math> rotated 180 around <math>\mathbf{N}</math>) then
+* Ambient is a constant color for every pixel.
+* The diffuse coefficient is <math>\mathbf{N} \cdot \mathbf{L}</math>.
+* The specular coefficient is <math>(\mathbf{R} \cdot \mathbf{V})^n</math> where <math>n</math> is the ''shininess''.
+The final color is <math>k_{ambient} * ambientColor + k_{diffuse} * (\mathbf{N} \cdot \mathbf{L}) * diffuseColor + k_{specular} * (\mathbf{R} \cdot \mathbf{V})^n * specularColor</math>.
+;Notes
+* The diffuse and specular components need to be computed for every visible light source.
+===Physically Based===
+See [https://static1.squarespace.com/static/58586fa5ebbd1a60e7d76d3e/t/593a3afa46c3c4a376d779f6/1496988449807/s2012_pbs_disney_brdf_notes_v2.pdf pbs disney brdf notes] and the [http://www.pbr-book.org/ pbr-book]
+In frameworks and libraries, these are often refered to as ''standard materials'' or in Blender, ''Principled BSDF''.
+==Blending and Pixel Formats==
+===Pixel Formats===
+===Blending===
+To output transparent images, i.e. images with alpha, you'll generally want to blend using [[Premultiplied Alpha]]. Rendering in premultiplied alpha prevents your RGB color values from getting mixed with the background color empty pixels.
+===Rendering===
+For rasterization, the render loop typically consists of:
+# Render the shadow map.
+# Render all opaque objects front-to-back.
+## Opaque objects write to the depth buffer.
+# Render all transparent objects back-to-front
+## Transparent objects do not write to the depth buffer.
+Rendering opaque objects front to back minimizes overdraw, where a pixel gets drawn to multiple times in a single frame.
+Rendering transparent objects back to front is needed for proper blending of transparent materials.
+==Anti-aliasing==
+For a high-quality anti-aliasing, you'll generally want to multiple multi-sampling (MSAA).
+This causes the GPU to render the depth buffer at a higher resolution to determine the contribution of your fragment shader's color to the final image.
+See https://learnopengl.com/Advanced-OpenGL/Anti-Aliasing#:~:text=How%20MSAA%20really%20works%20is,buffer%20to%20determine%20subsample%20coverage for more details.
 ==More Terms==