nvtt
|
NVTT 3 is a library that can be used to compress image data and files into compressed texture formats, and to handle compressed and uncompressed images.
In NVTT 3, most compression algorithms and image processing algorithms can be accelerated by the GPU. These have CPU fallbacks for GPUs without support for CUDA.
The NVTT 3 C++ APIs consist of 2 headers. The high-level APIs are defined in nvtt/nvtt.h. These APIs include image I/O, image processing, and general interfaces for image compression - the main functionality.
The low-level APIs are exposed through nvtt/nvtt_lowlevel.h. This includes individual functions for compressing to each of the supported texture formats. In addition, these allow inputs and outputs to each be located on the CPU or GPU, and give more freedom with regard to image layouts. The high-level APIs are based on the low-level APIs.
In addition, a C wrapper for other compilers and programming languages is available through nvtt/nvtt_wrapper.h.
Here we give some examples with reference code of using the high-level and low-level APIs. More samples covering a range of features will be available at https://github.com/nvpro-samples/nvtt_samples.
First, create an nvtt::Context. Contexts are used both for global settings and for controlling the compression process:
In NVTT, we use nvtt::Surface to store a single uncompressed image. nvtt::Surface has a method nvtt::Surface::load(), which can be used to load an image file. A typical image loading process looks like this:
Then, we set up compression options using nvtt::CompressionOptions:
See nvtt::Format for all compression formats.
Next, we say how to write the compressed data using nvtt::OutputOptions. The simplest case is to assign a filename directly:
For more dedicated control of the output stream, you may want to derive a subclass of nvtt::OutputHandler, then use nvtt::OutputOptions::setOutputHandler to redirect the output:
When the above setup is complete, we compress the image using nvtt::Context.
DDS files can contain multiple cube map faces and mipmap levels. To load these surfaces, you may want to use nvtt::SurfaceSet as the loader:
Then you can use nvtt::SurfaceSet::GetSurface() to extract each individual nvtt::Surface:
or:
For batch processing of multiple files, we use nvtt::BatchList.
Instead of compressing each image one-by-one, now we first append all images to the batchList
container. Each image has its own output handler (e.g. to write to multiple files):
Then we issue a single nvtt::Context::compress
command to compress all the inputs:
When we do it this way, NVTT 3 will restructure the input images and exploit parallelism to the maximum extent when GPU compression is used, without needing to synchronize with the CPU between images. When there are a large number of small textures to compress with the same nvtt::CompressionOptions, batch processing can dramatically increase the performance.
The decompression routine is relatively straightforward. The function nvtt::SurfaceSet::loadDDS() can be used to decode a DDS file. After the file is loaded, you can use nvtt::SurfaceSet::saveImage() to output the result:
If you only have compressed data, but know the size and format, you can use nvtt::Surface::setImage2D() or nvtt::Surface::setImage3D():
Note that decompression is currently not GPU accelerated.
Some useful image processing routines are provided in nvtt::Surface. For example, a downsampling process looks like this:
The low-level APIs are for compression only.
To use the low-level APIs, there is no need to create an nvtt::Context first, and images do not need to come from nvtt::Surface.
At the center of the low-level APIs, there are 2 buffer structs for storing the input image data. nvtt::CPUInputBuffer stores the data in host memory, and nvtt::GPUInputBuffer stores the data in device memory.
For each of the supported texture formats, there are 2 functions that compress textures into that format, each handling one buffer type. For functions using nvtt::CPUInputBuffer, there is a useGpu
parameter to choose whether to use the GPU routine to compress it. For the function using nvtt::GPUInputBuffer, the compression is always done by the GPU. In both cases, the user can choose whether the output goes to host memory or device memory using the to_device_mem
parameter.
A buffer can be created from one or more images in host memory or device memory. The user must reference each of the input images using the nvtt::RefImage structure.
Here's an example.
First, we use an external tool like stb_image to load a image file:
Second, we use nvtt::RefImage to reference this image:
Third, we create a nvtt::CPUInputBuffer using the single nvtt::RefImage:
We then prepare a buffer for receiving compressed BC1 data:
Finally, call the BC1 compressing function:
The raw compressed blocks will then be stored in outbuf
. The low-level APIs do not generate file headers (such as DDS file headers).
To add the dynamic build of NVTT 3 to a C++ application, link with nvtt.lib
in the lib/
folder, include nvtt/nvtt.h
, and copy nvtt.dll
to the application output directory.
Example CMake applications using this process can be found at the online samples repository; most of the work in the CMake file there is in locating the system's NVTT distribution.
The C++ API should be compatible with any MSVC 14x toolset. For other toolsets and other programming languages, please use the C wrapper in nvtt/nvtt_wrapper.h.
The dynamic build can also be delay-loaded.
The following notes are important for apps using CUDA elsewhere.
NVTT 3 uses the CUDA Runtime API, and certain functions such as nvtt::Context::enableCudaAcceleration() and nvtt::isCudaSupported() can choose a device and call cudaSetDevice()
unless nvtt::useCurrentDevice() has been called first. When using NVTT 3 with other CUDA functionality, we recommend doing two things:
cudaSetDevice()
before any other NVTT 3 functions. Calling useCurrentDevice() will prevent NVTT from choosing and changing the device. It only needs to be called once during the lifetime of the application.