Pablo Zurita’s Blog

October 4, 2006

Tools

Filed under: General — Pablo Zurita @ 2:48 am

Since this generation of real-time engine we can see that it isn’t possible to differentiate ourselves just with raw graphics. The arrival of programmable GPUs puts us all in the same place. Writing shaders and manipulating their parameters we can achieve the same visuals as any game written by famous programmers such as John Carmack or Tim Sweeney. There is no longer the need to create complex data structures, complex hacks, and so on just to achieve a visual look. This shows us a bunch of things. First, it was never so easy to work on computer graphics. And second, in order to differentiate hour products we have to rely on our artists, and as a consequence, we rely on the tools that we provide to our artists. That’s why it’s extremely important to provide flexible, easy-to-use tools that need to be delivered in a reasonable time.

In the past we were used to writing very limited tools because the resources available were limited. For example in the Quake 2 era you had simple editors such as Qoole for geometry, Wally to create 256 paletted color textures, and look game depended on simple diffuse textures with lightmaps generated by q2rad. Now with programmable GPUs and their speed there is no need to limit the artists. Slowly with the progress of GPUs we replaced Wally with Photoshop, the geometry editors became more complex, and the material and illumination systems became bigger and included more variables such as specular factor and so on. Now we find ourselves with the possibility of showing tons of geometry, high resolution textures (or even synthesize textures if we desire) and the lighting and material systems can be as complex as the shaders we can write. That’s why we need to create tools that are flexible so that the artists can utilize all the resources available. These tools need to as flexible as the shaders we can execute, and the textures and geometry we can render. Also the tools need to be really well integrated with the engine so that the artists can see exactly what the end user will see. For example, shader editors such as FX Composer and RenderMonkey are discarded. Looking from the point of view of the materials we can’t just give the artists a text file to edit shaders and material properties since that takes time from the artists. We need to create an environment for the artists where they can leverage the power of shaders but at the same time it has to be a visual tool that they can understand. A possible solution is what Epic Games did with their material editor. The materials and shaders in the Unreal Engine 3 are created with modules in a similar way that Native Instrument’s Reaktor uses modules to create synthesizers. Each module is created by a programmer and the materials are created by the artist by connecting those modules. This is really valuable because artists can do what they want, and the programmers have more time to work on other things.

Reaktor
Sound synthesis with modules in Reaktor.

Unreal material editor.
Material editor in Unreal Engine 3. Different modules are connected in order to obtain the final material.

It is also extremely important to develop tools that are easy to use. Just like in any application, usability is extremely important. If you create a complex tool (such as Bungie’s Sapien) we are going to lose productivity, we are going to frustrate the artists, and it will be really complicated to integrate artists to our project. It is very important to maintain the quality of the usability throughout the whole project because the chances of a great paradigm shift in the creation of content is fairly low for the near future. I can’t see any change myself at least until we move from polygons to voxels. The reason for this is that usually we need to add new artists in the middle of our project, and usually there is no time to teach them a complex tool. Being aware of this means that there is a general problem in most of the current engines, and that is that the studios aren’t taking advantage of the applications that most artists already know. For example, instead of taking advantage of geometry editing applications such as 3ds Max or Maya there are studios that still write editors such as D3Radiant or UnrealEd. A generation ago those editors were necessary because the engine and technology where they ran was also limited, but now the engines are far more flexible and in the end we end up duplicating the functionality available in those other tools such as 3ds Max, SoftImage or Maya. This means not only that the programmers waste time duplicating functionality available in other tools, but the artists also waste time getting used to this new tools. Instead we should create plug-ins to these standard tools to fit our needs. With respect to usability I recommend the book “About Face 2.0: The Essentials of Interaction Design” by Alan Cooper and Robert Reimann (ISBN 0764526413)

Bungie's Sapien
Sapien by Bungie. A simple usability test would show that the user is flooded with information among other problems.

Also, the need to deliver tools on time is also a good reason to expand existing editors. Creating even the most basic features a fair amount of time, and then more complex editing features can take more time. Instead, creating a plug-in all the features are available which means we only need to focus on the features that are unique to our technology. Some could say that this isn’t good because you don’t have control but you also have to realize what your priorities are. Another complaint are the SDKs of the different products (for example, 3ds Max SDK is pretty bad) but you need to remember the benefits I mentioned. Even though it will take us some time to adapt to an SDK and an environment we are not used to, but that cost will always be lower than creating a new tool and teaching the artists how to use it. The only real problem is the cost of those applications.

June 12, 2006

Tone mapping and blooms

Filed under: General — Pablo Zurita @ 11:56 pm

Lately two of the latest additions done in real time simulation are the tone-mapping and blooming of scenes. But for various reasons it has been done incorrectly from a psychophysical point of view. The idea of this article is to see how this affects occur in real life and then how they can be simulated in real time.

The human eye has two photosensitive cells in the retina, the cones and rods, even though the two are similar in structure, both of fairly different and equality important for the human vision. The rods are highly sensitive to light which allows them to respond in low lighting situations. The cones need more light in order to generate a response but they also allow to see a higher range of lighting (among other things, it allows us to see colors). Analyzing both photosensitive cells we can see that their combined response allows us to see a high range of light intensities. These ranges can be divided in three regions. The photopic range which goes from 10^1 cd/m^2 to 18^8 cd/m^2 which stimulates the cones, the scotopic ranges that goes from 10^-1 cd/m^2 to 10^-6 cd/m^2 which stimulates the rods, and the mesopic range which stimulates the cones and rods. Also we cannot forget about the pupil which adjusts its size constantly in order to maintain a more or less constant flow of light into the eye, when the intensity of incoming light is too high the pupil reduces its size and on low lighting situation the pupil increases its size. But sometimes no matter what the pupil does, there will be no way to generate the expected response, which means that or the intensity is so low there is no response, or the intensity is too high and the sensibility decreases. This is known as response compression which is a vital part to determine an algorithm to do the tone mapping correctly. But there is also another factor, even though the sensitivity of the eye changes according to the lighting conditions, this process isn’t instantaneous. This mechanism is call adaptation. The period of adaptation can be as high as 15 minutes for the cones and 40 minutes for the rods in order to reach complete adaption in a severe change of illumination. Now that this is out of the way, we can start talking about the tone mapping itself.

The goal of the tone mapping operators is to convert a high dynamic range source into a low dynamic range output. These operators can be put in different categories depending on how they go from HDR to LDR. The operators that apply a single function over the whole image are known as global operators. These operators can provide good performance because of their simplicity, but they usually lose a lot of information in the process. A few examples are Ward’s operator, Schlick’s operator, Tumblin’s operator and Drago’s operator. The operators that depend on the neighbor pixels are known as local operators. These operators are the best since they provide a great HDR compression without losing much detail. Some examples of local operators are Pattanaik’s operator, Jonson’s operator, etc. The best solution should be an operator that deals with the psychophysics aspects of the human vision such as Pattanaik’s operator.

Whichever operator you decide to use, there is certain amount of information that you have to gather such as global intensity. The problem is that most of the operators are meant to be run on a CPU, but when you have to do that in a GPU the whole thing is different because different restrictions apply. For example to obtain the global intensity in the CPU it’s just as looping though all the pixels and gathering the intensity and then averaging it with a constant. But on the GPU the same can’t be done due to the fact that you can’t access the information as easily. Due to these restrictions to obtain the maximum luminance and global luminance you will have to use many framebuffer objects of different resolutions and execute shaders in order to obtain the information. You can start with a 512×512 FBO and then keep going down on resolution in order to obtain the information. Here is a sample shader:

void main (void)
{
vec4 finalCalc;
vec2 uvCoords;
float maxLum;
float avLum;
float currentLum;
uvCoords.x = vTexCoord.x - 0.5 / alb_screenfbo_width;
uvCoords.y = vTexCoord.y - 0.5 / alb_screenfbo_width;
currentLum = (0.30 * texture2D(alb_screenfbo, uvCoords).x) + (0.59 * texture2D(alb_screenfbo, uvCoords).y) + (0.11 * texture2D(alb_screenfbo, uvCoords).z);
avLum = log(currentLum + 1e-4);
maxLum = currentLum;
uvCoords.x = vTexCoord.x - 0.5 / alb_screenfbo_width;
uvCoords.y = vTexCoord.y + 0.5 / alb_screenfbo_width;
currentLum = (0.30 * texture2D(alb_screenfbo, uvCoords).x) + (0.59 * texture2D(alb_screenfbo, uvCoords).y) + (0.11 * texture2D(alb_screenfbo, uvCoords).z);
avLum = log(currentLum + 1e-4);
if(maxLum < currentLum)
maxLum = currentLum;
uvCoords.x = vTexCoord.x + 0.5 / alb_screenfbo_width;
uvCoords.y = vTexCoord.y - 0.5 / alb_screenfbo_width;
currentLum = (0.30 * texture2D(alb_screenfbo, uvCoords).x) + (0.59 * texture2D(alb_screenfbo, uvCoords).y) + (0.11 * texture2D(alb_screenfbo, uvCoords).z);
avLum = log(currentLum + 1e-4);
if(maxLum < currentLum)
maxLum = currentLum;
uvCoords.x = vTexCoord.x + 0.5 / alb_screenfbo_width;
uvCoords.y = vTexCoord.y + 0.5 / alb_screenfbo_width;
currentLum = (0.30 * texture2D(alb_screenfbo, uvCoords).x) + (0.59 * texture2D(alb_screenfbo, uvCoords).y) + (0.11 * texture2D(alb_screenfbo, uvCoords).z);
avLum = log(currentLum + 1e-4);
if(maxLum < currentLum)
maxLum = currentLum;
avLum = exp(avLum / (alb_screenfbo_width * alb_screenfbo_width));
finalCalc.x = avLum;
finalCalc.y = maxLum;
gl_FragColor = finalCalc;
}

As you can see in the shader what we do is obtain the luminance from the four neighbor fragments around the sample. We do this several times until we get to a 1×1 framebuffer object which contains the global luminance in the red channel and the maximum luminance in the green channel of the 1×1 framebuffer object. You can see part of the process in the following diagram.

Tonemapping diagram

Now that we have our tone mapped scene we can go to the blooming. Blooming is usually done incorrectly not because of the blooming algorithms themselves but because people decide to bloom before doing the tone mapping. The right thing to do is to have our tone mapped scene and if there are values that are out of the LDR then those pixels need to be bloomed. You should avoid doing the bright-pass that most applications do when they implement blooming because that blooms everything and it’s not how the eye reacts in a real situation.

If there are any questions please write a comment and I will answer.

April 8, 2006

Adding shaders to engine

Filed under: General — Pablo Zurita @ 10:35 pm

With the arrival of programmable GPUs, we got involved in a shaders and tools frenzy. Everybody downloads tools such as RenderMonkey, FX Composer, etc. to be able to create shaders, and some make small demos showing the latest features of their video cards. All that is fine but what most programmers don’t realize is that one of the biggest problem that comes from the flexibility of shaders is writing the shaders themselves, but to design and write an engine that support those shaders and their flexibility. So this is what I want to talk about in this article.

Introduction
After the lessons learned in the creation of the ^FishEngine (an engine visually similar to the Doom 3 engine), in late February 2005 I decided to write a new 3D engine. From the first day on I knew that one of the main differences between the Albedo Engine and the old ^FishEngine was the use of glSlang, which meant that I was going to leave behind extensions such as register combiners. This change comes directly from the need to provide more flexibility to programmers and artists. You just can’t keep working with proprietary extensions with limited features. You just can’t maintain hundred of shaders with different versions for every extension, now you just write the shaders in a high level shading language.

Beginnings
Creating an engine isn’t a task that you complete in a week and that you replace every other week. Engines are complex systems that take a lot of time and effort to create. Based on this fact, the Albedo Engine has to be an engine that I could use in any 3D project, flexibility and ease of use is extremely important. And seeing how graphics programming evolves it important to recognize that the shaders will get more complex over time. After a few months of work, in the middle of April 2005 I started the work of adding shaders support. It was a complicated situation at the time because I was a completely different approach to the work on ^FishEngine (there was support for glSlang but there was a single shader applied in all the geometry in the same way that Doom 3 did it). But after much investigation, implementation time and tests I arrived at a fairly good solution.

Shaders in the Albedo Engine
A shader in the Albedo Engine isn’t the glSlang shader itself; it refers to the shader itself and also everything necessary to render the shader correctly. This involves keeping track of blending states, dealing with the parameters sent to the vertex and fragment shader, dealing with the textures for the shaders and more. For every shader it’s possible to specify two types of parameters, local parameters that are parameters that don’t change per material but per shader, for example a constant. And then there are global parameters that can be changed with the material for example the specular value of a surface. At the same time, these parameters can make reference to a value provided by the engine itself (such as the position of a light) or as a value provided by the user. In the engine there are no multiple instances of a same shader even if the parameters are completely different, as long as the glSlang code is the same, there is a single shader object compiled and executed. There is also a shader manager that deals with basic stuff is as creating instances of the shaders, loading the text into the shader, dealing with the parameters, etc.

Shaders and geometry
Something valuable to recognize is the relation between geometry and shaders. Looking from the point of view of a shader, we can see that a vertex shader applied to geometry can cause some side-effects that we have to be aware of. For example if you cull a sphere against the frustum but then the vertex shader increases the size of the sphere 2 times, the sphere was culled even though the sphere after the vertex shader should be visible. That’s why it is important tag some of the geometry in such way that there aren’t culled just with frustum culling. From the performance point of view it is important to sort the rendering of geometry by material in order to decrease the number of state changes per frame. In the Albedo Engine every node has geometry organized per material in such way to reduce the number of state changes. And one last important thing to recognize is that shaders need to be applied in different ways in order to get to the final frame. For example if you want to do tone-mapping in a shader then the entire frame needs to be in a floating point framebuffer object. There are global shaders that apply to all the geometry for example a diffuse * lightmap shader. But there are also shaders that need to be applied based on a volume, if a given volume intersects some geometry, then that geometry needs to have a shader applied, any kind of light with a fall off is a good example. So an engine should support those ways to apply shaders in order to provide good flexibility and performance.

Shaders and its parameters
The most important thing for a shader are its parameters. This can be seen with the tools for creating shaders such as RenderMonkey and FX Composer. Even though they can be useful tools to preview shaders, working inside the engine itself its much more valuable since it allows the shaders to get parameters from artists and the engine itself that are hard to simulate in a shader previewer. The difference of creating an illumination shader in RenderMonkey compared to doing it on the engine with all the geometry and engine tools is huge. The shader previewers allow you to see a specific shader in a very simple and constrained environment, but the most important thing is to get the parameters from the engine in order to see how all the shaders interact in order to create the visual look that the artist desires. From this point of view it’s important to provide the artist with easy-to-use tools with a GUI to tweak parameters in real-time or at least a console command to change the value of a parameter. All this has to be done inside the engine while the artist is looking at his/her scene.

Conclusion
I hope this little article helps people implement shaders in their engines after seeing the requirements of a good shading system and the potential pitfalls. If there are any questions please post a comment and I will answer.

February 14, 2006

Scripting in Albedo Engine.

Filed under: General — Pablo Zurita @ 4:55 pm

Since last week I have been working on the basic integration o a scripting system and to tell you the truth it has been very interesting. In the past I have always chosen to use DLLs to favor performance, but now since we have more power to do more stuff in CPU, and realizing that the Albedo Engine won’t use stencil shadows, then there is CPU-time to add a scripting system and make the user’s life easier. When I was thinking about using a scripting framework the first thing that came to my mind was to use Lua. After all, it’s one of the oldest scripting frameworks, and due to that it’s one of the most solid ones. But after doing many tests I realized that it didn’t fit my needs. First of all I didn’t like the syntax at all, it doesn’t look like the syntax used in the engine (which is written in C++). Working with Lua and C++ it’s also very annoying even if you use tools such as toLua or other libraries such as Luabind (that isn’t good either since I don’t want fifteen layers from the script to the native code). And even if I wasn’t looking for the best possible performance (I would use a DLL for that) I certainly wanted to keep some good performance. So I started looking for other alternatives and I found AngelScript.

First of all it was really easy to register classes, no need to modify the classes to fit the scripting framework. The syntax is very similar to C++ which allows me to port stuff from the engine to a script very fast. AngelScript is the fastest framework I tried, very stable, no memory leaks (at least none so far). Also it was very easy to extend, for example AngelScript itself doesn’t support scripts by default so what I did was integrate STL’s strings. I’m very happy with the framework even if it has been only a week, I just don’t see any problem with it and it should be just fine for the future.

February 6, 2006

High Dynamic Range.

Filed under: General — Pablo Zurita @ 8:03 pm

With the arrival of video cards that support HDR (High Dynamic Range) formats, and with the release of HDR monitors such as the BrightSide DR37-P it’s important to analyze why doing everything in HDR is important, even when the final output is made to a LDR (Low Dynamic Range) device or format. But first of all let’s see the basics, what Low and High Dynamic Range means.
Dynamic range is a term used in different research fields to describe the ratio between the lowest and highest possible value in a changeable quantity. If the dynamic range is low then it means the ratio between the lowest and highest possible values is low, and if the dynamic range is high then the ratio between the lowest and highest possible value is high.

In the computer graphics field, and mostly on real-time simulations, the use of LDR formats was predominant; the values were usually stored in 8-bits per component floats clamped to values between 0.0 and 1.0. For example the pixel of a lightmap would be stored as for example [1.0 1.0 0.5], and that value was modulated with the diffuse texture pixel for example [0.5 1.0 1.0] so in the end final value is [0.5 1.1 0.5].
Now while that seems fine, what happens when we want to define in the lightmap that the lighting is twice as bright then the case I just showed? That’s when you realize that LDR values limit the possible lighting situations in a scene. Instead, if the lightmap was stored in an HDR format it would be possible to store and read the value [2.0 2.0 1.0] and then when you modulate that with the diffuse texture the final value is [1.0 2.0 2.0]. But even when the benefits of using HDR formats aren’t so clear, it’s important to see that even when the output it’s made to a LDR medium, all the intermediate calculations should be in HDR. Because in fact what we have been doing in the past with LDR is approximate what we really wanted to do. But when technology comes that let’s us do the right thing without making approximations, we should (if possible) use this technology the maximum extent because doing all the intermediate steps properly we benefit our product in general.

To see the impact of being able to use HDR formats we see that now the GPU is being used for general-purpose computation without using the CPU. While the most important part for GPGPU is the use of fragment programs, without the HDR support the chances of implementing even the most basic algorithms is slim since they were too limited to do any useful math. Now with HDR it’s possible to save values in 16-bit and 32-bit unclamped per component giving much more flexibility. For more information please go to the GPGPU site.

If you have any doubt on this topic leave a comment and I will answer as soon as I see it.

November 7, 2005

Deployment of C++ applications compiled with Visual Studio 2005

Filed under: General — Pablo Zurita @ 12:31 pm

These last three days I have been wasting a good amount of time with this problem so I decided to make a post about it. I hope this information reaches everybody through Google and the other search engines. Well, the problem is the following; Visual Studio 2005 has a new system for the deployment of applications to clients. These are defined either as “isolated applications” or “side-by-side assemblies”. Now the libraries such as ATL, MFC and CRT are side-by-side assemblies that are then used by the isolated applications compiled by Visual Studio 2005. The benefits of this system are that the deployment of updates is less painful, and it also makes it hard to have problems with multiple versions if a DLL. If you want to know more about this please read “About Isolated Applications and Side-by-side Assemblies”. Now the issue is the following, when you have Visual Studio 2005 installed everything works great, you can copy your application to any directory and everything works just right. The problem happens when you zip your application and try to run it on a system that doesn’t have Visual Studio 2005; this causes the following error “The application has failed to start because the application configuration is incorrect. Reinstalling the application may fix this problem.” That doesn’t say much about the actual problem. So you end up trying a million different things to fix this error. I searched in Google and some hits came about a similar problem in PuTTY but nothing really helped. So I had to do more and more research and the first problem I found with my application was that I needed a manifest file. In Visual Studio 2005 you can embed a manifest file in the executable or you can keep it in a separate file. It doesn’t matter what you choose but the information must be available when you try to run your application. But this doesn’t really fix the problem; the second part of the solution is that you need to include the redistributable Visual Studio 2005 runtime on the client that will run the application. You can download it from here. Once you have that your application compiled with Visual Studio 2005 should work just right on any client (I’m aware that some things are different in Windows 98 and Windows 2000 since none of those two OS support this deployment model). I hope this helps you, and I hope you didn’t waste as much time as I did.

October 22, 2005

Performance.

Filed under: General — Pablo Zurita @ 3:52 pm

If there is something that all developers want, and graphics programmers even more so, is good performance. Greater performance means more chances to add more features; it allows giving faster feedback to the user’s actions, and so on. But lately I have been noticing a mistake that many developers make when they think about performance and optimizing their code. Many developers ask themselves things such as “should I unroll my loops?”, “should I use a bi-dimensional array or should I use a one-dimensional array?” and things like that. And the answer is, maybe. But often, those changes are not really that important, and usually they are not the real cause of a performance bottleneck. Basically, the problem is that there are many developers trying to solve performance issues when they are blindfolded. If you don’t know where the problem really is, what’s the causer of the bottleneck then the chances of fixing that are very low. So what I do to optimize all my software is follow six steps that help me a lot to fix the bottlenecks.

1. Preliminary view of the problem.
The first thing to do is get a preview of the problem. What’s the main issue? How much faster should the program be if the bottleneck wasn’t there? These simple questions allows you to determine what’s that that you want to improve, it gives you the first hints to what the real problem is/are. One could for example begin to think that the problem is that your application is GPU bound, CPU bound, or anything else. But it’s important that you start to get your first ideas.

2. Identify the bottlenecks.
This is one of the most important steps. One has to determine what and where the bottlenecks are. What we want to do is look at different counters of performance to determine what and where the hotspots were most of the performance is lost. For example, I use Compuware’s DevPartner which comes with a performance analyzer, or you could use Intel’s VTune. Look at the CPU usage, memory usage, disk usage (and depending on the type of application you are working on you might want to look at the networking usage or GPU usage for example). Once you have that data you will begin to see where the problems are. Now, you have to be careful with the conclusions of step one and step two. When there are big differences between the two then it’s possible you are forgetting some important factor, so you might want to go back to step one.

3. Analysis of the bottleneck/s.
Here you will have problems if you didn’t follow steps one and two. If you didn’t take those steps then there are great chances that you will start to waste time analyzing something that’s not really the causer of the bottleneck. You could be focused looking at the CPU usage when in fact the problem was the disk usage. So, to avoid wasting time please take time to do step one and two. Now that you know what you should really pay attention to, its time to make more analysis of the data and collect more data. Using the tools in step two you investigate further, you can also use a debugger such as Compuware’s SoftIce to put system breakpoints and more. This will help you detect where the problems are, function by function and line by line.

4. Hypothesis and verification of hypothesis.
At this point you have a good idea of what could be the root of the whole problem. But to be sure its important that you make a hypothesis and that you verify that hypothesis. For example, if now I know that the problem it’s with my shader loader then I can create the hypothesis “If instead of having 10 shaders I have 100 shaders with 1000 instructions each then the performance should fall even more so”. Then when you verify the hypothesis you can see if you in fact determined what the problem was correctly, or if you have to go back to step three. If you don’t verify your hypothesis it’s possible that you waste a lot of time in the future trying to improve a bottleneck that doesn’t even exists.

5. Fix the bottleneck.
Here is the step that many developers take as the first and only step to improve the performance. You could waste a lot of time changing structures, designs, unrolling loops, and doing that kind of changes when in fact in step two you could have determined that the problem was the disk usage. But if you took all the steps correctly, now its time to make the changes to fix the bottleneck, with the confidence that you have great chances of actually fixing it because you have all the data and proof to do that.

6. Test.
Now its time to use the tools of step two again and see if you really fixed the bottleneck. You also need to test with different scenarios to see if you need to make more improvements. If the changes fixed it then congratulations, but if you didn’t don’t worry and go back as many steps as you need to fix the problem. You may even have to go back to step one.

Now, to show an example, I took the steps I just showed to get a 20% improvement in my mesh loader. Here is a screenshot of the before and after.

Before.

After.

I hope this helps you use your time more wisely, and helps you fix the performance bottlenecks in your software. If you have any comment, don’t forget to let me know through the Comment feature in the site.

October 9, 2005

Welcome.

Filed under: General — Pablo Zurita @ 3:53 pm

Welcome to the new version of my blog. The decision to update my blog and change its design was caused by the problems of PHP-Nuke. PHP-Nuke was simply way too big for a blog, it has many security problems, and it was a torture to deal with it even though I just updated my blog every two months. So here it’s the new blog, I hope you enjoy it.

Powered by WordPress