Link to asset creation documentation for GeomCaches can be found Here...
Reads alembic file format (currently HDF5 based up to version 1.1) and outputs compiled frame data into FIFOs for the encoder. TODO: Alembic 1.5 introduced a new data format which should greatly benefit read speed.
We support static and dynamic transforms with static or homogeneous poly meshes as leafs. Heterogeneous meshes (with changing topology) are not currently supported as they are not suitable for our compression scheme.
Alembic supports different frame spacing per object in the scene tree. E.g. a mesh can have frames at irregular intervals and a transform instead has a fixed 1/30s time step in the same file. CryEngine geometry caches do not support this. Each frame contains data for all objects in the scene to make realtime decoding simpler. To achieve this the alembic compiler first creates the superset of all frame times of the objects in the scene ( AlembicCompiler::CheckTimeSampling). For each unique found frame time a frame will be output.
Alembic has seperate arrays for each vertex data element. This for example allows two triangles to share the same position data for a vertex, but don't share the texture coordinates. GPUs only support a single index buffer for all elements therefore in this case we need to also duplicate the position data. This even is true if the UVs would only be different in one frame of the entire animation because we need the topology to stay the same for all frames.
To achieve this we first calculate a 64 bit hash for each vertex over all frames of the mesh's animation (AlembicCompiler::ComputeVertexHashes). If the hash of a vertex is different for two triangles it needs to be split. Note that there is a chance of a hash collision, but this is pretty unlikely with 64 bits. Checking for collisions would be pretty slow, so we don't do this at the moment.
If the mesh is does not contain normals we need to calculate smooth normals for the mesh in those cases (AlembicCompiler::CalculateSmoothNormals). We couldn't drop support meshes without normals because Maya doesn't store any normal information for a mesh if it is completely smooth. The used algorithm averages the normals of the adjacent triangles multiplied by angle of the respective corners.
We use Mikkelson's tangent space algorithm for calculating a tangent space for each frame of the animation from the normals and UVs of the mesh.
Meshes are optimized with Forsyth's face reorder algorithm to better utilize the GPU's transform cache. This is also necessary as a preparation for inter prediction which assumes the index order follows strips of triangles over the mesh.
The engine format supports multiple materials per mesh. For this the index buffer will be split into chunks for each material. Currently this is done by parsing the name of the face sets of the mesh in the alembic file where the first number in the name is interpreted as the material ID (AlembicCompiler::GetMeshFaceSets).
For compression purposes all data is quantized to integer values (AlembicCompiler::CompileVertices).
Positions are first normalized to [0,1] per axis where 0 and 1 represent the minimum and maximum extent for that axis of the AABB of the mesh's entire animation. After that the position is quantized to uint16. The value representing 1 is chosen so that enough bits are allocated to be better or good enough to match the specified position precision which can be set by RC command line. Smaller meshes therefore use a smaller range of values and therefore compress better.
Texcoords are quantized to uint16. The range is 0-63. Values wrap outside of that range.
Colors are stored as four individual streams for red, green and blue where each value is coded as one uint8 representing range 0-1.
Tangent frames consisting of normal, bitangent and tangent are converted to QTangents and then each component is quantized to int16 where 10 bits are used for range [-1, 1]. The reflection of the first frame is enforced for all successive frames to avoid interpolation issues. This only happens in edge cases where the tangent frame calculation produces different orientations for different frames. (AlembicCompiler::EncodeQTangent)
Each vertex channel of a mesh (positions, uv, color, QTangent) can either be static or dynamic. Static data is only stored once in the beginning of the file, so only animated channels are stored per frame.
As an optimization the alembic transform hierarchy is flattened as far as possible. All unnecessary transforms in the hierarchy are removed, e.g. a mesh with two dynamic transforms and no siblings will only have a single dynamic transform in the engine format (AlembicCompiler::CompileStaticDataRec). RC with verbose output will print the resulting tree for debugging purposes.
Transforms are converted to QuatTNS (Quaternion with translation and non-uniform scale) and stored per frame if they are dynamic. This is not optimized yet because the data rate was not an issue compared to vertex animation.
Dynamic meshes and transforms are updated for each frame. For transforms the influences for that transform are read from the Alembic scene each frame and the final result is computed. For meshes the vertex data is updated, smooth normals are recompiled if required and the tangent space is recomputed. Finally the vertices are converted to engine format again (see "Vertex quantization and conversion").
For output there is one FIFO buffer per mesh and transform which is written by the compiler and read from the encoder in frame order.
As this is an expensive operation the updates are run as jobs. One job per mesh and one job for all frame transform updates is spawned. To get enough parallelism data for multiple frames is processed in parallel. As the output must be in order a reservation buffer is used for each mesh and transform and data is only passed to the encoder if the next frame is completed. In case the reservation buffer runs full, the compiler stalls.
The encoder's job is to take the quantized data and try to predict it so the resulting residual is as small as possible. Smaller residual frames result in better compression ratio. The geometry cache format uses inter- (prediction of data with references only to the same frame) and intra prediction (prediction with references to other frames) to accomplish this goal. There currently is no prediction for animated transforms which also get stored as full QuatTNS in float format. There is a lot of room for optimization there if necessary.
Index frames can only use inter prediction as they must be able to be decoded without any other frame information. For vertex positions and texcoords we use parallelogram prediction.
The next vertex value is predicted from the previous triangle with P = C + B - A. The stored value is r = V - P, which is the residual from the prediction. For this prediction the encoder and the decoder need to know the adjacent triangle. We store the distance to the vertices that form the triangle we predict from as three uint16 values. The maximum look back distance is currently 4096. If no match can be found only one value is stored (0xFFFF) and we just use the previous vertex value as the predicted value.
To maximize the possibility for prediction we rearrange the vertices based on first use in the index array (GeomCacheEncoder::OptimizeMeshForCompression). This results in an vertex array where the distance to the last triangle's vertices is minimized.
For color and QTangents we don't use parallelogram prediction and just average B and C, which results in slightly better compression ratios.
For even better compression ratio by default only every 10th frame is an index frame and frames between are predicted from the two frames before and the previous and next index frames (InterpolateMotionDeltaPredictor).
The formulas for the predictor are:
The motion predictor (1) assumes that the vertex' motion continues in the same direction. Therefore it predicts it's value with the previous frame value plus the direction of motion derived from the last two frames. Factor a is the acceleration which can be in the range [0, 2] encoded as a uint8. For the first B frame there is no motion prediction as we can't access the B frame before the I frame. The interpolation predictor (2) just does a linear interpolation between the previous and next I frame with the factor b. The factor b is stored as a uint8. Finally both predictors are combined in another linear interpolation (3) with factor c stored as a uint8. This means for each frame and mesh we store three uint8 values for the predictor.
To minimize the search space for a, b and c the encoder first searches for the optimal entropy for (1) and (2) with a binary search and finally does another binary search with the found values for (3).
The writer has the function to take data from the compiler and encoder and store it in the final disk format with the chosen block compression.
First after compiling the static data the header is written which contains the 64 bit file signature, a version number, the block compression format, a flags field, the number of frames, the total uncompressed animation size and the AABB for the entire geom cache animation. Note that the signature is initially written with zeroes to prevent the cache from being loaded when the conversion fails. Only at the very if everything else was successful end we write the final signature ("CRYCACHE").
The writer then leaves out room for a frame info struct (SFrameInfo) for each frame which contains the frame type (I or B frame, 32 bit), frame size (32 bit) and frame offset (64 bit). The frame sizes and offsets can only be known after compilation because compression size varies. Those will be written in at the very end as well. The file header and frame infos are stored uncompressed.
Next in the files is the first compressed block which contains the static data. For all meshes this contains the index data (we don't support topology changes, remember), the static vertex channels and for meshes with at least one animated channel the uint16 adjacent triangle array for the parallelogram predictor. For transforms this contains the transform type and a QuatTNS for static transforms.
Each frame then gets stored as one compressed block each which contain the residual data after prediction for all animated channels and the QuatTNS animated transforms.
We support three different options for block compression, store, deflate and LZ4HC. Store just stores the raw data which is good for decoding speed, but mostly is implemented as a debugging feature. Deflate results in best compression ratio but is quite slow to decode. LZ4HC is very fast to decode (in the order of a magnitude faster than deflate) but results in about 20% bigger files.
In general LZ4HC seems to be the best option if the scene where the cache is used is heavy on CPU usage from other systems, otherwise if CPU time isn't of concern use Deflate.
Each pushed block from the encoder gets pushed in a queue which jobs for compression pick up. After compression they get handed to a write thread which asynchronously writes to disk.
CVar | Description |
---|---|
e_GeomCaches | Disables drawing of geometry caches. Also disables launching of new async reads and launching of decompress & decode jobs in the manager. |
e_GeomCacheBufferSize | Sets the size in MB for the streaming buffer heap. Default is 128MB, which is enough ~10MB/s. |
e_GeomCacheMaxPlaybackFromMemorySize | Limits "playback from memory" to caches with less than the specified size in MB (default 16). |
e_GeomCachePreferredDiskRequestSize | The preferred size for one read in KB. Currently set to 1MB. The manager will try to read as many frames as possible until it hits this limit or the max buffer ahead time. |
e_GeomCacheMinBufferAheadTime | The minimum amount of time the manager will always try to keep in memory for each stream. |
e_GeomCacheMaxBufferAheadTime | The max amount of time it will buffer ahead. This allows the manager to reach the preferred disk request size more often. |
e_GeomCacheDecodeAheadTime | The amount of time when the manager will start creating decode buffers and launch decompress and decode jobs for the frames in them. |
e_GeomCacheDebug | Shows a debug overlay which represents the buffers in the heap and shows all active streams |
e_GeomCacheDebugFilter | Filters the active streams by name when e_GeomCacheDebug is active |
e_GeomCacheDebugDrawMode | 1: Only animated meshes Note that 1 and 2 will only have effect when the cache is playing at the moment. |
e_GeomCacheLerpBetweenFrames | If set to 0 disables interpolation between frames. Used for debugging, it has no performance advantage. |