Trying out Voxel Cone Tracing...

It has been a long time since my last post, and this is just a quick post about what's going on in the past two months. I was trying to implement sparse voxel octree global illumination using cone tracing. Below are some screen shots showing my current progress:

Scene with direct lighting only
Scene with indirect lighting

Voxelized scene
building sparse octree

inject direct lighting to the voxelized scene
Another view of the voxelized scene

creating mip-map of the direct lighting
(mip-map resolution256x256x256)
mip-map resolution 128x128x128
mip-map resolution 64x64x64

AO computed from the voxelized scene
Another angle to view the scene

Scene lit by both direct and indirect light, notice
the shadowed floor reflect the red and blue curtain 
The back of the knight is illuminated by
 indirect lighting only
Another camera angle
Light buffer showing only indirect lighting
Light buffer with both direct and indirect lighting

The above screen shots are captured with 512x512x512 voxel resolution, illuminated by one directional light. The indirect illumination show some flicking when the camera move if the shadowing map resolution is not high. Also the performance is not good on my GTX460, performing anisotropic mip-mapping takes up most of the time... I will write more posts on it after tidying up all the stuffs, hope everything can be finished by the end of this year...

Angle based SSAO

SSAO (Screen space ambient occlusion) is a common post processing effect that approximate how much light is occluded in a given surface by the surrounding objects. In this year SIGGRAPH, there are a few slides in "The Technology behind the Unreal Engine 4 Elemental Demo" about how they implement SSAO. Their technique can either use only the depth buffer or with the addition of per-pixel normal. And I tried to implement both version with a slight modification:

Using only the depth buffer
The definition of ambient occlusion is to calculate the visibility integral over the hemisphere of a given surface:

To approximate this in screen space, we design our sampling pattern as paired samples:
paired sample pattern
So for each pair of samples, we can approximate how much the shading point is occluded in 2D instead of integrating over the hemisphere:

The AO term for each given pair of samples will be min( (θleft + θright)/π, 1). Then by averaging the AO terms of all the sample pairs (in my case, there are 6 pairs), we achieve the following result:

Dealing with large depth differences
As seen from the above screen shot, there is dark halos around the knight. But the knight should not contribute AO to the castle as he is too far away. So to deal with the large depth differences. I adopt the approach used in Toy Story 3. If one of the paired sample is too far away from the shading point, say the red point in the following figure, it will be replace by the pink point, which is on the same plane as the other valid paired sample:

So we can interpolate between the red point and the pink point for dealing with the large depth difference. Now the dark halo has gone:

The above treatment only handle if one of the paired sample is far away from shading point. What if both of the samples have large depth differences?

dark halo artifact is shown around the sword
AO strength of this pic is increased to high light the artifact 

In this case, it will result in the dark halo around the sword in the above screen shot. Remember we are averaging the all the paired samples to compute the final AO value. So to deal with this artifact, we just assign a weight to each paired samples and then re-normalize the final result. Say, for each paired sample, if both of the samples are within a small depth differences, that sample pair will have a weight of 1. If only 1 sample is far away, that pair will have a weight of 0.5. And finally if both of the samples is far away, the weight will be 0. This can eliminate most(but not all) of the artifacts:

Approximating arc-cos function
In this approach, the AO is calculated by using the angle between the paired samples, which need to evaluate the arc-cos function which is a bit expensive. We can approximate acos(x) with a linear function:  π(1-x)/2.

And the resulting AO looks much darker with this approximation:

computed with the arc-cos function
computed with the linear approximation

Note that the maximum error between the two function is around 18.946 degree.

This may affect the AO for the area of a curved surface with low tessellation. You may either need to increase the bias angle threshold or switch to a more accurate function. So my second attempt is to approximate it with a quadratic function:  π(1- sign(x) * x * x)/2.

And this approximation shows a much similar result to the one using the arc-cos function.
computed with the arc-cos function
computed with the quadratic approximation

And the maximum error of this function is around 9.473 degree.

Using per-pixel normal
We can enhance the details of AO by making use of the per-pixel normal. The per-pixel normal is used for further restricting the angle to compute the AO where the angle θleft, θright are clamped to the tangent plane :

And here is the final result:

The result of this AO is pleasant by taking total 12 samples per pixel and with 16 rotation in 4x4 pixel block at half resolution. I did not apply bilateral blur to the AO result, but applying the blur may gives a softer AO look. Also approximating the arc-cos function with a linear function although is not accurate, but it gives a good enough result for me. Finally more time are need to spend on generating the sampling pattern in the future where the pattern I currently used is nearly uniform distributed (with some jittering).

[1] The Technology behind the Unreal Engine 4 Elemental Demo
[2] Rendering techniques in Toy Story 3,Hall%20and%20Hall%20-%20Toystory3%20(SIGGRAPH%202010%20Advanced%20RealTime%20Rendering%20Course).pdf
[3] Image-Space Horizon-Based Ambient Occlusion
[5] The models are export from UDK and extracted from Infinity Blade using umodel.exe

Implementing Client/Server Tools Architecture

In the past few weeks, I was rewriting my toy engine. During the rewrite of the tools, I want to try out the Client/Server Tools Architecture that used in Insomniac Games because this architecture has a benefit that it is crash proofing and have "free" undo/redo. My tools is built using C# instead of web-apps like Insomniac Games because there are not much resources about it on the internet as they said. I don't want to spend lots of time to solve some tricky problems in my hobby project and instead focus on graphics programming after the tools are done. Currently only material editor, model editor and asset browser are implemented.

Screen shot showing all the editors

The client/server architecture for tools is that, all the editors are clients and need to connect to the server to perform editing. All the editing states about a file are maintain in the server. The client/server communicate by sending/receiving JSON messages (which is easy and fast to parse) such as:

                 "message type" : "load file",
                 " file path" : "package/default.texture"

The client will send request to the server such as loading a file (all the assets are in JSON format such as mesh exported from Maya) and the server will reply with the appropriate response such as the loaded file, or whether the request operation is successful. If there is no client request, the server will do nothing, the clients will keep polling about the change of the editing state. My tools are designed both the client and server are run on the same machine and can run without internet connection.

The Server
The server is responsible for handling a set of client request such as create a new JSON file, load file, save file, change file...  The server maintains the opened JSON files that loaded/created by the clients. An undo/redo queue is associate with each opened JSON file in the server. When client send a request to modify a JSON file, the inverse change operation is computed and stored in the undo/redo queue. So if the client program crash, all the editing states including the undo/redo queue will not be lost as they are stored in the server.

The editor launcher which runs
the server in the background
The above picture showing my server program which is a simple C# program. The server is presented as an editor launcher, user can open other tools (e.g. asset browser, material editors) through this launcher. The launcher only has a few functions as shown in the UI which is to fire up the .exe of the appropriate editor, while in the background, it is a JSON file server for listening the client requests.

The Clients
All the editors are implemented as a client which need to connect to the server to perform editing. Each editor is a small application that perform only a specific task and they will keep polling changes from the server. Below shows the asset browser of the editor which is also a client program.  The asset browser is used to import assets such as mesh, texture and surface shader. Also, it can change the loaded assets package directory and the current working directory by modifying the corresponding JSON file in the server so that other editor will know about the change. Besides, assets can be drag and drop to other editors which specific the relative path of the assets inside a package and other editors can get the assets from the server.

The asset browser, importing a texture
For other tools that have a 3D viewport, such as material editor. The viewport is implemented as a unmanaged C++ .dll (which use the same code as the engine), which can be called by the C# program. During initialization of the tools, the C# program will pass the window handle of the Control (e.g. picture box) to the C++ .dll to create the D3D swap chain. Also the C++ .dll will provide several more hook up functions such as mouse up/down, viewport resize, timer update callback to the C# program. After the start up, the C++ .dll can poll changes from the server inside the update function and logically acts like a separate program which is independent of the C# program.

The 3D viewport is logically separate from the user interface
As mentioned before, the client/server architecture is crash proofing, if the client program crash, no data is lost because all the file changes are maintained in the server. But during the development of the tools, I encountered some situation that when the client program crash, the client program cannot restart. This is because the server is maintaining the file that cause the crash, so every time the client program restart, it crashes... So to avoid this problem, I added application exception handler in C# form so that every time it crashes, a dialog box would appear to ask whether to undo the last operation before the crash. This feature helps debugging a bit because every time the client crash, I can undo the changes and then the same crash can be reproduced easily by repeating the last editing step.

User is offered a last chance to undo the operation before the crash
However, not all the crash caused in the unmanaged C++ .dll can be caught by the application exception handler. Only some of the crash can be handled in C# by using [HandleProcessCorruptedStateExceptions].

Delta JSON
For the client to communicate with the server about the file changes(poll changes/make changes), delta JSON is used. The delta JSON I used is similar to the one used in Client/Server Tools Architecture. For example, we have a material file "package/default.material" like:
     "diffuse color" : "red",
      "specular color" : "white",
      "glossiness" : 0.5
To change the diffuse color to blue in the above file, a delta JSON message will be sent to the server:

     "message type" : "delta changes",
     "file" : "package/default.material",
     "delta changes" :
                                             "key" : "diffuse color",
                                             "value" : "blue"
When the server receive the above message, it will change the "package/default.material" file, and at the same time, the server will compute the inverse of the delta change, i.e.
     "key" : "diffuse color",
     "value" : "red"
so that undo can be performed. Note that the delta changes is contained in the array of a JSON message since more than 1 change operation can be performed and all of the changes in a single message will be considered as an "atomic operation". The server will roll back to the previous change if the current change is invalid(such as changing a value of an array with an index greater than the length of the array).

After the above update, there may have other clients polling for the file change, the client need to tell the server what version of the file they held so that the server can send them the correct delta changes. But, instead of sending the whole JSON file to server to compute the delta changes, an 'editing step' counter is stored in the server for each opened file. This counter will be increased for each delta change (and decrease after undo). So the client only need to send their 'editing step' counter to the server and the server will know how many delta changes are needed to send back to the client. Also, this 'editing step' counter can be used to ensure that if 2 clients are making delta change requests to the server at the same time, only 1 client will success while the other would fail. However, there is one bug: there will have a chance of having the same 'editing step' but different file state if the file has perform an undo operation and the document is changed afterward. So to uniquely identify the 'editing step', a GUID is generated with each editing step. If the server check that the GUID does not match, the server cannot compute the delta changes for the client and the client may simply need to reload the whole document.

The client/server tool architecture approach is really a smart idea that can provide a crash proofing editor by taking the advantage of the server software would mature much earlier than the client editors. I really like this approach and glad that I give it a try to implement it. This approach gives my tools 'free' undo/redo function, multiple viewport, easier for debugging and splitting the editors into smaller client applications makes the code easier to maintain. I guess that this approach can do something interesting such as doing the rigs/animation in Maya which is live sync to the editors just like this CryEngine trailer.

[1] A Client/Server Tools Architecture
[2] Developing Imperfect Software: How to prepare for development failure :
[3] Developing Imperfect Software: The Movie
[4] New generation of @insomniacgames tools as webapp:
[5] bitsquid Our Tool Architecture:
[6] Assets are exported from UDK

Shader Generator

In the last few weeks, I was busy with rewriting my iPhone engine so that it can also run on the Windows platform (so that I can use Visual Studio in stead of Xcode~) and most importantly, I can play around with D3D11. During the rewrite, I want to improve the process of writing shaders so that I don't need to write similar shaders multiple times for each shader permutation (say, for each surface, I have to write a shader for static mesh, skinned mesh, instanced static mesh... multiplies with the number of render pass), and instead I can focus on coding how the surface would looks like. So I decided to write a shader generator that will generate those shaders which is similar to the surface shader in Unity. I choose the surface shader approach instead of a graph based approach like Unreal Engine, because being a programer, I feel more comfortable (and faster) to write code than dragging tree nodes using the GUI. In the current implementation of the shader generator, it can only generate vertex and pixel shaders for the light pre pass renderer which is the lighting model used before.

Defining the surface
To generate the target vertex and pixel shaders by the shader generator, we need to define how the surface looks like by writing surface shader. In my version of surface shader, I need to define 3 functions: vertex function, surface function and lighting function. The vertex function defines the vertex properties like position and texture coordinates.
  2. {
  3. VTX_FUNC_OUTPUT output;
  4. output.position = mul( float4(input.position, 1), worldViewProj  );
  5. output.normal = mul( worldInv, float4(input.normal, 0) ).xyz;
  6. output.uv0 = input.uv0;
  7. return output;
  8. }
The surface function which describe how the surface looks like by defining the diffuse color of the surface, glossiness and the surface normal.
  2. {
  3. SUF_FUNC_OUTPUT output;
  4. output.normal = input.normal;
  5. output.diffuse = diffuseTex.Sample( samplerLinear, input.uv0 ).rgb;
  6. output.glossiness = glossiness;
  7. return output;
  8. }
Finally the lighting function will decide which lighting model is used to calculate the reflected color of the surface.
  2. {
  3. LIGHT_FUNC_OUTPUT output;
  4. float4 lightColor = lightBuffer.Sample(samplerLinear, input.pxPos.xy * renderTargetSizeInv.xy );
  5. output.color = float4(input.diffuse * lightColor.rgb, 1);
  6. return output;
  7. }
By defining the above functions, writer of the surface shader only need to fill in the output structure of the function by using the input structure with some auxiliary functions and shader constants provided by the engine.

Generating the shaders
As you can see in the above code snippet, my surface shader is just defining normal HLSL function with a fixed input and output structure for the functions. So to generate the vertex and pixel shaders, we just need to  copy these functions to the target shader code which will invoke those functions defined in the surface shader. Take the above vertex function as an example, the generated vertex shader would look like:
  1. #include "include.h"
  2. struct VS_INPUT
  3. {
  4. float3 position : POSITION0;
  5. float3 normal : NORMAL0;
  6. float2 uv0 : UV0;
  7. };
  8. struct VS_OUTPUT
  9. {
  10. float4 position : SV_POSITION0;
  11. float3 normal : NORMAL0;
  12. float2 uv0 : UV0;
  13. };
  14. typedef VS_INPUT VTX_FUNC_INPUT;
  16. /********************* User Defined Content ********************/
  18. {
  19. VTX_FUNC_OUTPUT output;
  20. output.position = mul( float4(input.position, 1), worldViewProj  );
  21. output.normal = mul( worldInv, float4(input.normal, 0) ).xyz;
  22. output.uv0 = input.uv0;
  23. return output;
  24. }
  25. /******************** End User Defined Content *****************/
  26. VS_OUTPUT main(VS_INPUT input)
  27. {
  28. return vtxFunc(input);
  29. }
During code generation, the shader generator need to figure out what input and output structure are needed to feed into the user defined functions. This task is simple and can be accomplished by using some string functions.

Simplifying the shader
As I mentioned before, my shader generator is used for generating shaders used in the light pre pass renderer. There are 2 passes in light pre pass renderer which need different shader input and output. For example in the G-buffer pass, the shaders are only interested in the surface normal data but not the diffuse color while the data need by second geometry pass are the opposite. However all the surface information (surface normal and diffuse color) are defined in the surface function inside the surface shader. If we simply generating shaders like last section, we will generate some redundant code that cannot be optimized by the shader compiler. For example, the pixel shader in G buffer pass may need to sample the diffuse texture which require the texture coordinates input from vertex shader but the diffuse color is actually don't needed in this pass, the compiler may not be able to figure out we don't need the texture coordinates output in vertex shader. Of course we can force the writer to define some #if preprocessor inside the surface function for the particular render pass to eliminate the useless output, but this will complicated the surface shader authoring process as writing surface shader is to describe how the surface looks like, ideally, don't need to worry about the output of a render pass.

So the problem is to figure out what the output data are actually need in a given pass and eliminate those outputs that are not needed. For example, given we are generating shaders for the G buffer pass and a surface function:

  2. {
  3. SUF_FUNC_OUTPUT output;
  4. output.normal = input.normal;
  5. output.diffuse = diffuseTex.Sample( samplerLinear, input.uv0 ).rgb;
  6. output.glossiness = glossiness;
  7. return output;
  8. }
We only want to keep the variables output.normal and output.glossiness. And the variable output.diffuse, and other variables that is referenced by output.diffuse (diffuseTex, samplerLinear, input.uv0) are going to be eliminated. To find out such variable dependency, we need to teach the shader generator to understand HLSL grammar and find out all the assignment statements and branching conditions to derive the variable dependency.

To do this, we need to generate an abstract syntax tree from the shader source code. Of course we can write our own LALR parser to achieve this goal, but I chose to use lex&yacc (or flex&bison) to generate the parse tree. Luckily we are working on a subset of the HLSL syntax(only need to define functions and don't need to use pointers) and HLSL syntax is similar to C language, so modifying the ANSI-C grammar rule for lex&yacc would do the job. Here is my modified grammar rule used to generate the parse tree. By traversing the parse tree, the variable dependency can be obtained, hence we know which variables need to be eliminated and eliminate them by taking out the assignment statements, then the compiler will do the rest. Below is the simplified pixel shader generated in the previous example:
  1. #include "include.h"
  2. cbuffer _materialParam : register( MATERIAL_CONSTANT_BUFFER_SLOT_0 )
  3. {
  4. float glossiness;
  5. };
  6. Texture2D diffuseTex : register( MATERIAL_SHADER_RESOURCE_SLOT_0 );
  7. struct PS_INPUT
  8. {
  9. float4 position : SV_POSITION0;
  10. float3 normal : NORMAL0;
  11. };
  12. struct PS_OUTPUT
  13. {
  14. float4 gBuffer : SV_Target0;
  15. };
  16. struct SUF_FUNC_OUTPUT
  17. {
  18. float3 normal;
  19. float glossiness;
  20. };
  21. typedef PS_INPUT SUF_FUNC_INPUT;
  22. /********************* User Defined Content ********************/
  24. {
  25. SUF_FUNC_OUTPUT output;
  26. output.normal = input.normal;
  27.                                                                  ;
  28. output.glossiness = glossiness;
  29. return output;
  30. }
  31. /******************** End User Defined Content *****************/
  32. PS_OUTPUT main(PS_INPUT input)
  33. {
  34. SUF_FUNC_OUTPUT sufOut= sufFunc(input);
  35. PS_OUTPUT output;
  36. output.gBuffer= normalToGBuffer(sufOut.normal, sufOut.glossiness);
  37. return output;
  38. }
Extending the surface shader syntax
As I use lex&yacc to parse the surface shader, I can extend the surface shader syntax by adding more grammar rule, so that writer of the surface shader can define what shader constants and textures are needed in their surface function to generate the constant buffer and shader resources in the source code. Also my surface shader syntax permit user to define their struct and function other than their 3 main functions (vertex, surface and lighting function), where they will also be copied into the generated source code. Here is a sample of how my surface shader would looks like:

  1. RenderType{
  2. opaque;
  3. };
  4. ShaderConstant
  5. {
  6. float glossiness : ui_slider_0_255_Glossiness;
  7. };
  8. TextureResource
  9. {
  10. Texture2D diffuseTex;
  11. };
  13. {
  14. VTX_FUNC_OUTPUT output;
  15. output.position = mul( float4(input.position, 1), worldViewProj  );
  16. output.normal = mul( worldInv, float4(input.normal, 0) ).xyz;
  17. output.uv0 = input.uv0;
  18. return output;
  19. }
  21. {
  22. SUF_FUNC_OUTPUT output;
  23. output.normal = input.normal;
  24. output.diffuse = diffuseTex.Sample( samplerLinear, input.uv0 ).rgb;
  25. output.glossiness = glossiness;
  26. return output;
  27. }
  29. {
  30. LIGHT_FUNC_OUTPUT output;
  31. float4 lightColor = lightBuffer.Sample(samplerLinear, input.pxPos.xy * renderTargetSizeInv.xy );
  32. output.color = float4(input.diffuse * lightColor.rgb, 1);
  33. return output;
  34. }
This post described how I generate vertex and pixel shader source codes for different render passes by defining a surface shader which avoid me to write similar shaders multiple times and without worrying the particular shader input and output for each render pass. Currently, the shader generator can only generate vertex and pixel shader in HLSL for static mesh in the light pre pass renderer. The shader generator is still under progress where generating shader source code for the forward pass is still have not done yet. Besides domain, hull and geometry shaders are not implemented. Also GLSL support is missing, but this can be generated (in theory...) by building a more sophisticated abstract syntax tree during parsing the surface shader grammar or defining some new grammar rule in the surface shader (using lex&yacc) for easier generating both HLSL and GLSL source code. But these will be left for the future as I still need to rewrite my engine and get it running again...

[1] Unity - Surface Shader Examples
[2] Lex & Yacc Tutorial
[3] ANSI C grammar, Lex specification
[4] ANSI C Yacc grammar

Photon Mapping Part 2

Continue with previous post, this post will describe how light map is calculated from the photon map. My light map stores incoming radiance of indirect lighting on a surface which are projected into Spherical Harmonics(SH) basis. 4 SH coefficients is used  for each color channels. So 3 textures are used for RGB channels (total 12 coefficients).

Baking the light map
To bake the light map, the scene must have a set of unique, non-overlapping texture coordinates(UV) that correspond to a unique world space position so that the incoming radiance at a world position can be represented. This set of UV can be generated inside modeling package or using UVAtlas. In my simple case, this UV is mapped manually.
To generate the light map, given a mesh with unique UV and the light map resolution, we need to rasterize the mesh (using scan-line or half-space rasterization) into the texture space with interpolated world space position across the triangles. So we can associate a world space position to a light map texel. Then for each texel, we can sample the photon map at the corresponding world space position by performing a final gather step just like previous post for offline rendering. So the incoming radiance at that world space position, hence the texel in the light map, can be calculated. Then the data is projected into SH coefficients, stored in 3 16-bits floating point textures. Below is a light map that extracting the dominant light color from SH coefficients:

The baked light map showing the dominant
light color from SH coefficients

Using the light map
After baking the light map, during run-time, the direct lighting is rendering with usual way, a point light is used to approximated the area light in the ray traced version, the difference is more noticeable at the shadow edges.

direct lighting only, real time version
direct lighting only, ray traced version

Then we sample the SH coefficients from the light map to calculate the indirect lighting
indirect lighting only, real time version
indirect lighting only, ray traced version

Combining the direct and indirect lighting, the final result becomes:
direct + indirect lighting, real time version
direct + indirect lighting, ray traced version

As we store the light map in SH, we can apply normal map to the mesh to change the reflected radiance.
Rendered with normal map
Indirect lighting with normal map
We can also applying some tessellation, adding some ambient occlusion(AO) to make the result more interesting:
Rendered with light map, normal map, tessellation and AO
Rendered with light map, normal map, tessellation and AO
This post gives an overview on how to bake light map of indirect lighting data by sampling from the photon map. I use SH to store the incoming radiance, but other data can be stored such as storing the reflected diffuse radiance of the surface, which can reduce texture storage and doesn't require floating point texture. Besides, the SH coefficients can be store per vertex in the static mesh instead of light map. Lastly, by sampling the photon map with final gather rays, light probe for dynamic objects can also be baked using similar methods.

March of the Froblins:
Lighting and Material of HALO 3: