Don't you just hate this one? You have just implemented your mega-advanced 3D-vector rotation algorithm, and you even managed to add some perspective look alike to your coordinates. Ready to start drawing, and then... Aargh! You need to hide those faces as well. :( This article presents a neat algorithm deciding if your faces are going to be visible or not, using no tables or additional memory, performing quite well on average.
Hiding faces is not always an issue, and there are ways to *avoid* the problem. E.g. if you are not using any perspective in you 3D-calculations, you can often prove visibility by just comparing the z-value of a well chosen coordinate, representing the surface. This method doesn't take account of the 2D-graphical aspect at all, and if your calculations are not too accurate (which is often the case on 8-bit machines) you may end up with erroneous hiding of faces, and that looks just *horrible*. The algorithm I'm going to present is based on the actual 2D-coordinates that are used to draw the object, so bad accuracy of your 3D-rotation will not affect the hiding or showing of faces.
I don't think that I'm really the first one to invent this kind of algorithm on the C64, however as far as I know none has yet written about it. It's possible that Glasnost of Camelot had a similar algorithm in the TV-box part of "One Year Camelot 3". However, that assumption is based on the fact that I didn't understand what it was doing, when I looked at it some 10 years ago. :)
Before we dive into the algorithm, let's just state some pre-assumptions. Your 3D-coordinates are represented in signed bytes, meaning that values 1-7F are positive 1-7F, and FF-81 are negative 1-7F. 0 is 0 and 80 is not used. E.g. FA equals -6. In order to represent a 2D-surface, you need three coordinates from the surface. Normally you would use the corner vertexes, the ones you probably calculated. Please note that the X- and Y-coordinates are sufficient for the algorithm. From the three coordinates you form two vectors, originating from the same coordinate. E.g. we have the coordinates p0, p1 and p2. Build up the vectors v1 = p1 - p0, v2 = p2 - p0. Recalling vector theory brings us that the statement "(v1 x v2) > 0" distinguishes if the face is visible or not. If the statement is true the face is visible, if it's false the face is not visible. It may be totally the opposite, depending on the orientation of the face, but then just flip the > to a <.
So, the algorithm is basically about calculating the statement v1 x v2, which is nothing new if you've been into this stuff before. Developing the statement brings us: (v1x * v2y) > (v2x * v1y), which could of course be calculated right away, but that's not what we'd like to do today. Multiplying two 8-bit numbers for each face-check wastes quite some time, unless you fill up the memory with precalculated tables. It can be done more efficient, more precisely in many cases you don't have to use all eight bits to decide if the left side of the equation is larger or smaller than the right side. This is what the algorithm takes advandage of, and we'll go through what it is doing step by step.
Equation 1: (v1x * v2y) > (v2x * v1y)
Step one: sign-check. Depending on the signs (+/-) of the four factors, we can sometimes tell directly the answer of equation 1. E.g. if v1x is negative and all others are positive, we know the answer is "false". Four signs bring us 2**4 = 16 possibilities, where "step one" answers for 8 of them (editor's note: within this article "**" symbol is used in the context of raising the number to the specified "power"). The rest have to go further to "step two". Note that after step one, all factors need to be converted to positive values. Here it may be necessary to switch the sign > to a <, in order to keep the correctness of the equation.
Step two: division check. Since we now know that all factors are positive, equation 1 can be developed into equation 2.
Equation 2: (v1x / v1y) > (v2x / v2y)
Check if the left side is > 1, by comparing v1x with v1y, and do the same thing with the right side. E.g. if the left side is < 1 and the right side is > 1, we're done! Good, again we can sort out two of four cases with a fast check. Now if he have to go further, we can simply perform the division, but not the way you think. We're gonna continue doing it step by step. Note that after step two, we need to transform the divisions so that the numerator is always smaller than the denominator. Again this may cause a sign-flip from > to <, or the opposite.
Steps 3-8: Parallell division. The idea is to perform an incremental division, so that we don't calculate more than we need to get an answer. The equation now looks like this...
Equation 3: n1/d1 > n2/d2
...where we know that n1 < d1 and n2 < d2. Now we just perform the division like we learned in school, but in parallell (two at the same time). Since we'll get the result from each division bit by bit (luckily starting with the most significant bit), we can just compare the result after each step and see whether we're done or if we have to give it another try. Look at the code example if you wonder how to implement steps 3-8.
If you examine the code example, you will find some additional code that has not been mentioned yet. This is code to handle the extremes; some variable equals 0 or 80, the numerator and denominator are equal. E.g. when we do the sign-check in step one, the instructions used are: LDA, BPL. In this case 0 is treated as positive, which it's not, it's zero. Further we may later on be using that number as a denimonator, which is a bad idea. The value 80 should simply be avoided.
The time complexity is log(n) where n is the number of steps needed to distinguish between the two vectors (steps 1-8). This sort of reminds of a binary search, where in each step 50% of the search-group can be sorted out, and the other half has to be further examined. Best case lets you decide in just one step, worst case in eight steps. Weighted average becomes 1,96 steps. Not bad. :) The algorithm is scalable to 16 bits or more, no problem, but that exercise I leave to you.
It's hard to compare this algorithm theoretically with performing the two multiplications since the amount of cycles is data dependent. Sadly enough I didn't have the energy to do one more implementation to be able to compare either. You may think that performing a multiplication using the a*b = ((a+b)**2 - (a-b)**2)/4 substitution would do the job in notime, but if you look closer into it you still have to perform sign-checks on all scalars. Also you would need a 512-byte table of 16 bit values (= 1024 bytes) in order to get the same accuracy, and that would not be totally quick to handle either. It's still possible that you may end up with a faster algorithm that way, but then this one still has its beauty. :)
HCL/BOOZE DESIGN |