nvtt
Loading...
Searching...
No Matches
NVTT 3 - API Introduction

NVTT 3 is a library that can be used to compress image data and files into compressed texture formats, and to handle compressed and uncompressed images.

In NVTT 3, most compression algorithms and image processing algorithms can be accelerated by the GPU. These have CPU fallbacks for GPUs without support for CUDA.

The NVTT 3 C++ APIs consist of 2 headers. The high-level APIs are defined in nvtt/nvtt.h. These APIs include image I/O, image processing, and general interfaces for image compression - the main functionality.

The low-level APIs are exposed through nvtt/nvtt_lowlevel.h. This includes individual functions for compressing to each of the supported texture formats. In addition, these allow inputs and outputs to each be located on the CPU or GPU, and give more freedom with regard to image layouts. The high-level APIs are based on the low-level APIs.

In addition, a C wrapper for other compilers and programming languages is available through nvtt/nvtt_wrapper.h.

Here we give some examples with reference code of using the high-level and low-level APIs. More samples covering a range of features will be available at https://github.com/nvpro-samples/nvtt_samples.

Using the high-level APIs

Compressing a single file

First, create an nvtt::Context. Contexts are used both for global settings and for controlling the compression process:

nvtt::Context context;
context.enableCudaAcceleration(true);
// Now all context compression will be CUDA-accelerated if any system GPU supports it.
Compression context.
Definition nvtt.h:479
NVTT_API void enableCudaAcceleration(bool enable)
Enable CUDA acceleration; initializes CUDA if not already initialized.

In NVTT, we use nvtt::Surface to store a single uncompressed image. nvtt::Surface has a method nvtt::Surface::load(), which can be used to load an image file. A typical image loading process looks like this:

image.load(inputFileName);
A surface is one level of a 2D or 3D texture.
Definition nvtt.h:698
NVTT_API bool load(const char *fileName, bool *hasAlpha=0, bool expectSigned=false, TimingContext *tc=0)
Loads texture data from a file.

Then, we set up compression options using nvtt::CompressionOptions:

nvtt::CompressionOptions compressionOptions;
// Compress to 4-channel, 8-bit-per-pixel BC3:
compressionOptions.setFormat(nvtt::Format_BC3);
@ Format_BC3
DX10 - BC3 (DXT5) format.
Definition nvtt_lowlevel.h:234
Compression options. This class describes the desired compression format and other compression settin...
Definition nvtt.h:64
NVTT_API void setFormat(Format format)
Set desired compression format.

See nvtt::Format for all compression formats.

Next, we say how to write the compressed data using nvtt::OutputOptions. The simplest case is to assign a filename directly:

nvtt::OutputOptions outputOptions;
outputOptions.setFileName(outputFileName);
Output Options.
Definition nvtt.h:435
NVTT_API void setFileName(const char *fileName)

For more dedicated control of the output stream, you may want to derive a subclass of nvtt::OutputHandler, then use nvtt::OutputOptions::setOutputHandler to redirect the output:

MyOutputHandler outputHandler;
outputOptions.setOutputHandler(&outputHandler);
NVTT_API void setOutputHandler(OutputHandler *outputHandler)
Set output handler.

When the above setup is complete, we compress the image using nvtt::Context.

context.outputHeader(image, 1, compressionOptions, outputOptions); // output DDS header
context.compress(image, 0, 0, compressionOptions, outputOptions); // output compressed image
NVTT_API bool outputHeader(const Surface &img, int mipmapCount, const CompressionOptions &compressionOptions, const OutputOptions &outputOptions) const
Write the Container's header to the output.
NVTT_API bool compress(const Surface &img, int face, int mipmap, const CompressionOptions &compressionOptions, const OutputOptions &outputOptions) const
Compress the Surface and write the compressed data to the output.

Loading SurfaceSets from DDS files

DDS files can contain multiple cube map faces and mipmap levels. To load these surfaces, you may want to use nvtt::SurfaceSet as the loader:

images.loadDDS(ddsFileName);
Surface-set struct for convenience of handling multi-level texture files such as DDS,...
Definition nvtt.h:1408
NVTT_API bool loadDDS(const char *fileName, bool forcenormal=false)
Load a surface set from a DDS file.

Then you can use nvtt::SurfaceSet::GetSurface() to extract each individual nvtt::Surface:

nvtt::Surface image = images.GetSurface(face, mip);
NVTT_API Surface GetSurface(int faceId, int mipId, bool expectSigned=false)
Get a surface at specified face and mip level.

or:

images.GetSurface(face, mip, image);

Compressing multiple files faster

For batch processing of multiple files, we use nvtt::BatchList.

nvtt::BatchList batchList;
Structure defining a list of inputs to be compressed.
Definition nvtt.h:1663

Instead of compressing each image one-by-one, now we first append all images to the batchList container. Each image has its own output handler (e.g. to write to multiple files):

batchList.Append(image, 0 /*face*/, 0/*mip*/, outputOptions1);
batchList.Append(image2, 0 /*face*/, 0/*mip*/, outputOptions2);
...
NVTT_API void Append(const Surface *pImg, int face, int mipmap, const OutputOptions *outputOptions)
Adds a pointer to the surface, its face and mipmap index, and a pointer to the output method to the i...

Then we issue a single nvtt::Context::compress command to compress all the inputs:

context.compress(batchList, compressionOptions);

When we do it this way, NVTT 3 will restructure the input images and exploit parallelism to the maximum extent when GPU compression is used, without needing to synchronize with the CPU between images. When there are a large number of small textures to compress with the same nvtt::CompressionOptions, batch processing can dramatically increase the performance.

Decompression

The decompression routine is relatively straightforward. The function nvtt::SurfaceSet::loadDDS() can be used to decode a DDS file. After the file is loaded, you can use nvtt::SurfaceSet::saveImage() to output the result:

images.loadDDS(ddsFileName);
images.saveImage(outFileName,face,mip);
NVTT_API bool saveImage(const char *fileName, int faceId, int mipId)
Save an image at specified face and mip level (for decompression)

If you only have compressed data, but know the size and format, you can use nvtt::Surface::setImage2D() or nvtt::Surface::setImage3D():

image.setImage2D(nvtt::Format_BC7, width, height, data);
@ Format_BC7
DX10 - BC7 format (four channels, UNORM)
Definition nvtt_lowlevel.h:248
NVTT_API bool setImage2D(Format format, int w, int h, const void *data, TimingContext *tc=0)
Set 2D surface values from an encoded data source. Same as setImage3D() with d=1.

Note that decompression is currently not GPU accelerated.

Image processing with nvtt::Surface

Some useful image processing routines are provided in nvtt::Surface. For example, a downsampling process looks like this:

image.load(inputFileName);
image.ToGPU(); // this enables GPU acceleration for the succeeding operations
image.toLinearFromSrgb(); // resizing must be done in linear space
image.resize(newWidth, newHeight, newDepth, nvtt::ResizeFilter_Box);
image.toSrgb();
@ ResizeFilter_Box
Definition nvtt.h:232
NVTT_API void toSrgb(TimingContext *tc=0)
Applies the linear-to-sRGB transfer function to channels 0...2.
NVTT_API void toLinearFromSrgb(TimingContext *tc=0)
Applies the sRGB-to-linear transfer function to channels 0...2.
NVTT_API void resize(int w, int h, int d, ResizeFilter filter, TimingContext *tc=0)
Resizes this surface to have size w x h x d using a given filter.
NVTT_API void ToGPU(TimingContext *tc=0, bool performCopy=true)
Makes succeeding operations work on the GPU buffer.

Using the low-level APIs

The low-level APIs are for compression only.

To use the low-level APIs, there is no need to create an nvtt::Context first, and images do not need to come from nvtt::Surface.

At the center of the low-level APIs, there are 2 buffer structs for storing the input image data. nvtt::CPUInputBuffer stores the data in host memory, and nvtt::GPUInputBuffer stores the data in device memory.

For each of the supported texture formats, there are 2 functions that compress textures into that format, each handling one buffer type. For functions using nvtt::CPUInputBuffer, there is a useGpu parameter to choose whether to use the GPU routine to compress it. For the function using nvtt::GPUInputBuffer, the compression is always done by the GPU. In both cases, the user can choose whether the output goes to host memory or device memory using the to_device_mem parameter.

A buffer can be created from one or more images in host memory or device memory. The user must reference each of the input images using the nvtt::RefImage structure.

Here's an example.

First, we use an external tool like stb_image to load a image file:

int chn;
void *p_img = stbi_load("my_texture.png", &img_in.width, &img_in.height, &chn, 4);

Second, we use nvtt::RefImage to reference this image:

img_in.data = p_img;
img_in.num_channels = 4; // stb_image always produces 4 channels
if (chn == 3)
img_in.channel_swizzle[3] = nvtt::One; // alpha channel set to 1 for opaque images
Use this structure to reference each of the input images.
Definition nvtt_lowlevel.h:135
const void * data
For CPUInputBuffer, this should point to host memory; for GPUInputBuffer, this should point to device...
Definition nvtt_lowlevel.h:136
int num_channels
Number of channels the image has.
Definition nvtt_lowlevel.h:140
ChannelOrder channel_swizzle[4]
Channels order how the image is stored.
Definition nvtt_lowlevel.h:141

Third, we create a nvtt::CPUInputBuffer using the single nvtt::RefImage:

nvtt::CPUInputBuffer input_buf(&img_in, nvtt::UINT8);
free(p_img); // data is copied and reordered, the original data can be freed
@ UINT8
8-bit unsigned integer.
Definition nvtt_lowlevel.h:116
Structure containing all the input images from host memory. The image data is reordered by tiles.
Definition nvtt_lowlevel.h:148

We then prepare a buffer for receiving compressed BC1 data:

void* outbuf = malloc(input_buf.NumTiles() * 8); // BC1 uses 8 bytes/tile

Finally, call the BC1 compressing function:

.SetQuality(nvtt::Quality_Normal)
.SetUseGPU(true)
bool ret = nvtt_encode(input_buf, outbuf, settings);
@ Format_DXT1
DX9 - DXT1 format.
Definition nvtt_lowlevel.h:224
Definition nvtt_lowlevel.h:388
NVTT_API EncodeSettings & SetFormat(Format _format)
NVTT_API EncodeSettings & SetOutputToGPUMem(bool _to_device_mem)
NVTT_API EncodeSettings & SetUseGPU(bool _use_gpu)

The raw compressed blocks will then be stored in outbuf. The low-level APIs do not generate file headers (such as DDS file headers).

Building with NVTT 3

To add the dynamic build of NVTT 3 to a C++ application, link with nvtt.lib in the lib/ folder, include nvtt/nvtt.h, and copy nvtt.dll to the application output directory.

Example CMake applications using this process can be found at the online samples repository; most of the work in the CMake file there is in locating the system's NVTT distribution.

The C++ API should be compatible with any MSVC 14x toolset. For other toolsets and other programming languages, please use the C wrapper in nvtt/nvtt_wrapper.h.

The dynamic build can also be delay-loaded.

Considerations for CUDA compatibility

The following notes are important for apps using CUDA elsewhere.

NVTT 3 uses the CUDA Runtime API, and certain functions such as nvtt::Context::enableCudaAcceleration() and nvtt::isCudaSupported() can choose a device and call cudaSetDevice() unless nvtt::useCurrentDevice() has been called first. When using NVTT 3 with other CUDA functionality, we recommend doing two things:

  1. Call nvtt::useCurrentDevice() and cudaSetDevice() before any other NVTT 3 functions. Calling useCurrentDevice() will prevent NVTT from choosing and changing the device. It only needs to be called once during the lifetime of the application.
  2. When passing device pointers to NVTT, make sure the pointer refers to memory NVTT's device (and runtime API context if using the CUDA Driver API) can access. Similarly, when accessing data from device pointers returned from NVTT, make sure the current device (and context, if using the CUDA Driver API) can access allocations made by NVTT's device using the CUDA Runtime API. Device pointers are returned from nvtt::Surface::gpuData(), and used in the low-level GPU compression function API in nvtt_lowlevel.h.