On this page

torchjpeg.dct

DCT

The torchjpeg.dct package provides utilities for performing forward and inverse discrete cosine transforms on images. The dct routines are implemented in pytorch so they can be GPU accelerated and differentiated. While the routines here are restricted to two dimensional signals, the block size is configurable e.g. the DCT does not need to be performed on only the \(8 \times 8\) block size used by JPEG. This package includes additional utilities for splitting images into non-overlapping blocks, performing fast color transforms on Tensors, and normalizing DCT coefficients as preparation for input to a CNN.

torchjpeg.dct.blockify(im: torch.Tensor, size: int) → torch.Tensor[source]

Breaks an image into non-overlapping blocks of equal size.

Parameters:
  • im (Tensor) – The image to break into blocks, must be in \((N, C, H, W)\) format.

  • size (Tuple[int, int]) – The size of the blocks in \((H, W)\) format.

Returns:

  • A tensor containing the non-overlappng blocks in \((N, C, L, H, W)\) format where \(L\) is the

  • number of non-overlapping blocks in the image channel indexed by \((N, C)\) and \((H, W)\) matches

  • the block size.

Note

If the image does not split evenly into blocks of the given size, the result will have some overlap. It is the callers responsibility to pad the input to a multiple of the block size, no error will be thrown in this case.

torchjpeg.dct.deblockify()[source]

Reconstructs an image given non-overlapping blocks of equal size.

Parameters:
  • blocks (Tensor) – The non-overlapping blocks in \((N, C, L, H, W)\) format.

  • size – (Tuple[int, int]): The dimensions of the original image (e.g. the desired output) in \((H, W)\) format.

Returns:

The image in \((N, C, H, W)\) format.

Note

If the blocks have some overlap, or if the output size cannot be constructed from the given number of non-overlapping blocks, this function will raise an exception unlike blockify().

torchjpeg.dct.block_dct(blocks: torch.Tensor) → torch.Tensor[source]

Computes the DCT of image blocks

Parameters:

blocks (Tensor) – Non-overlapping blocks to perform the DCT on in \((N, C, L, H, W)\) format.

Returns:

The DCT coefficients of each block in the same shape as the input.

Return type:

Tensor

Note

The function computes the forward DCT on each block given by

\[D_{i,j}={\frac {1}{\sqrt{2N}}}\alpha (i)\alpha (j)\sum _{x=0}^{N}\sum _{y=0}^{N}I_{x,y}\cos \left[{\frac {(2x+1)i\pi }{2N}}\right]\cos \left[{\frac {(2y+1)j\pi }{2N}}\right]\]

Where \(i,j\) are the spatial frequency indices, \(N\) is the block size and \(I\) is the image with pixel positions \(x, y\).

\(\alpha\) is a scale factor which ensures the transform is orthonormal given by

\[\alpha(u) = \begin{cases}{ \frac{1}{\sqrt{2}}} &{\text{if }}u=0 \\ 1 &{\text{otherwise}} \end{cases}\]

There is technically no restriction on the range of pixel values but to match JPEG it is recommended to use the range [-128, 127].

torchjpeg.dct.block_idct(coeff: torch.Tensor) → torch.Tensor[source]

Computes the inverse DCT of non-overlapping blocks

Parameters:

coeff (Tensor) – The blockwise DCT coefficients in the format \((N, C, L, H, W)\)

Returns:

The pixels for each block in the same format as the input.

Return type:

Tensor

Note

This function computes the inverse DCT given by

\[I_{x,y}={\frac {1}{\sqrt{2N}}}\sum _{i=0}^{N}\sum _{j=0}^{N}\alpha (i)\alpha (j)D_{i,j}\cos \left[{\frac {(2x+1)i\pi }{2N}}\right]\cos \left[{\frac {(2y+1)j\pi }{2N}}\right] \]

See block_dct() for further details.

torchjpeg.dct.batch_dct(batch: torch.Tensor) → torch.Tensor[source]

Computes the DCT of a batch of images. See block_dct() for more details. This function takes care of splitting the images into blocks for the block_dct() and reconstructing the original shape of the input after the DCT.

Parameters:

batch (Tensor) – A batch of images of format \((N, C, H, W)\).

Returns:

A batch of DCT coefficients of the same format as the input.

Return type:

Tensor

Note

This fuction uses a block size of 8 to match the JPEG algorithm.

torchjpeg.dct.batch_idct(coeff: torch.Tensor) → torch.Tensor[source]

Computes the inverse DCT of a batch of coefficients. See block_dct() for more details. This function takes care of splitting the images into blocks for the block_idct() and reconstructing the original shape of the input after the inverse DCT.

Parameters:

batch (Tensor) – A batch of coefficients of format \((N, C, H, W)\).

Returns:

A batch of images of the same format as the input.

Return type:

Tensor

Note

This function uses a block size of 8 to match the JPEG algorithm.

torchjpeg.dct.fdct(im: torch.Tensor) → torch.Tensor[source]

Convenience function for taking the DCT of a single image

Parameters:

im (Tensor) – A single image of format \((C, H, W)\)

Returns:

The DCT coefficients of the input in the same format.

Return type:

Tensor

Note

This function simply expands the input in the batch dimension and then calls batch_dct() then removes the added batch dimension of the result.

torchjpeg.dct.idct(coeff: torch.Tensor) → torch.Tensor[source]

Convenience function for taking the inverse InversDCT of a single image

Parameters:

im (Tensor) – DCT coefficients of format \((C, H, W)\)

Returns:

The image pixels of the input in the same format.

Return type:

Tensor

Note

This function simply expands the input in the batch dimension and then calls batch_idct() then removes the added batch dimension of the result.

torchjpeg.dct.to_ycbcr(x: torch.Tensor, data_range: float = 255) → torch.Tensor[source]

Converts a Tensor from RGB color space to YCbCr color space

Parameters:
  • x (Tensor) – The input Tensor holding an RGB image in \((\ldots, C, H ,W)\) format (where \(\ldots\) indicates an arbitrary number of dimensions).

  • data_range (float) – The range of the input/output data. i.e., 255 indicates pixels in [0, 255], 1.0 indicates pixels in [0, 1]. Only 1.0 and 255 are supported.

Returns:

The YCbCr result of the same shape as the input and with the same data range.

Return type:

Tensor

Note

This function implements the “full range” conversion used by JPEG, e.g. it does not implement the ITU-R BT.601 standard which many libraries (excluding PIL) use as the default definition of YCbCr. This conversion (for [0, 255]) is given by:

\[\begin{aligned} Y&=&0&+(0.299&\cdot R)&+(0.587&\cdot G)&+(0.114&\cdot B) \\ C_{B}&=&128&-(0.168736&\cdot R)&-(0.331264&\cdot G)&+(0.5&\cdot B) \\ C_{R}&=&128&+(0.5&\cdot R)&-(0.418688&\cdot G)&-(0.081312&\cdot B) \end{aligned} \]
torchjpeg.dct.to_rgb(x: torch.Tensor, data_range: float = 255) → torch.Tensor[source]

Converts a Tensor from YCbCr color space to RGB color space

Parameters:
  • x (Tensor) – The input Tensor holding a YCbCr image in \((\ldots, C, H ,W)\) format (where \(\ldots\) indicates an arbitrary number of dimensions).

  • data_range (float) – The range of the input/output data. i.e., 255 indicates pixels in [0, 255], 1.0 indicates pixels in [0, 1]. Only 1.0 and 255 are supported.

Returns:

The RGB result of the same shape as the input and with the same data range.

Return type:

Tensor

Note

This function expects the input to be “full range” conversion used by JPEG, e.g. it does not implement the ITU-R BT.601 standard which many libraries (excluding PIL) use as the default definition of YCbCr. If the input came from this library or from PIL it should be fine. The conversion (for [0, 255]) is given by:

\[\begin{aligned} R&=&Y&&&+1.402&\cdot (C_{R}-128) \\ G&=&Y&-0.344136&\cdot (C_{B}-128)&-0.714136&\cdot (C_{R}-128 ) \\ B&=&Y&+1.772&\cdot (C_{B}-128)& \end{aligned} \]
torchjpeg.dct.normalize()[source]

Normalizes DCT coefficients using pre-computed stats

This function wraps the DCTStats class to allow easy normalization of multichannel images

Parameters:
  • dct (Tensor) – The DCT coefficients to normalize in the format \((N, C, H, W)\)

  • stats (DCTStats) – Precomputed DCT statistics

  • channel (Optional[str]) – If the input coefficients are single channel, the Y channel is assumed, use this parameter to override that.

Returns:

Normalized DCT coefficients in the format \((N, C, H, W)\)

torchjpeg.dct.denormalize()[source]

Denormalizes DCT coefficients using pre-computed stats

This function wraps the DCTStats class to allow easy denormalization of multichannel images

Parameters:
  • dct (Tensor) – The DCT normalized coefficients in the format \((N, C, H, W)\)

  • stats (DCTStats) – Precomputed DCT statistics

  • channel (Optional[str]) – If the input coefficients are single channel, the Y channel is assumed, use this parameter to override that.

Returns:

Denormalized DCT coefficients in the format \((N, C, H, W)\)

torchjpeg.dct.batch_to_images()[source]

Converts a batch of DCT coefficients to a batch of images.

This high level convenience function wraps several operations. If stats are given, the coefficients are assumed to have been channel-wise and frequency-wise normalized and are denormalized. The coefficients are tranformed to pixels and uncentered (converted from [-128, 127] to [0, 255]). If the input is multichannel, it is converted from YCbCr to RGB. The image is then optionally cropped to remove padding that may have been added to make the coefficients. The output is then rescaled to [0, 1] to match pytorch conventions.

Parameters:
  • dct (Tensor) – A batch of DCT coefficients in \((N, C, H, W)\) format.

  • (Optional[ (stats) – py:class:DCTStats]): Optional DCT per-channel and per-frequency statistics to denormalize the coefficients.

  • crop (Optional[Tensor]) – Optional cropping dimensions. If this tensor has more than a single dimension, only the last dimension is used.

  • channel (Optional[str]) – One of ‘Y’, ‘Cb’, ‘Cr’. Denormalization of a single channel input assumes ‘Y’ channel by default, use this paramter to override that.

Returns:

The batch of images computed from the given coefficients.

Return type:

Tensor

torchjpeg.dct.images_to_batch()[source]

Converts a batch of images to a batch of DCT coefficients.

This high level convenience function wraps several operations. The input images are assumed to follow the pytorch convention of being in [0, 1] and are rescaled to [0, 255]. If the images are multichannel, they are converted to YCbCr and then centered (in [-128, 127]). The DCT is taken and if stats are given, the coefficients are normalized.

Parameters:
  • spatial (Tensor) – A batch of images in \((N, C, H, W)\) format.

  • (Optional[ (stats) – py:class:DCTStats]): Optional DCT per-channel and per-frequency statistics to normalize the coefficients.

  • channel (Optional[str]) – One of ‘Y’, ‘Cb’, ‘Cr’. Normalization of a single channel input assumes ‘Y’ channel by default, use this parameter to override that.

Returns:

A batch of DCT coefficients computed from the input images.

torchjpeg.dct.double_nn_dct(input_dct: Tensor, op: Tensor = block_doubler) → Tensor:[source]

DCT domain nearest neighbor doubling

The function computes a 2x nearest neighbor upsampling on DCT coefficients without converting them to pixels. It is equivalent to the following procedure: IDCT -> 2x upsampling -> DCT

Parameters:
  • input_dct (Tensor) – The input DCT coefficients in the format \((N, C, H, W)\)

  • op (Tensor) – The doubling operation tensor, mostly used to satisfy torchscript. Should be of shape \(8 \times 8 \times 16 \times 16\). Leave as default unless you know what you’re doing.

Returns:

The coefficients of the resized image, double the height and width of the input.

Return type:

Tensor

torchjpeg.dct.half_nn_dct(input_dct: Tensor, op: Tensor = block_halver) → Tensor:[source]

DCT domain nearest neighbor half-sizing

The function computes a 2x nearest neighbor downsampling on DCT coefficients without converting them to pixels. It is equivalent to the following procedure: IDCT -> 2x downsampling -> DCT

Parameters:
  • input_dct (Tensor) – The input DCT coefficients in the format \((N, C, H, W)\)

  • op (Tensor) – The halving operation tensor, mostly used to satisfy torchscript. Should be of shape \(16 \times 16 \times 8 \times 8\). Leave as default unless you know what you’re doing.

Returns:

The coefficients of the resized image, halg the height and width of the input.

Return type:

Tensor

class torchjpeg.dct.Stats(root: Union[str, pathlib.Path], normtype: str = 'ms')[source]

This class holds pre-computed per-channel and per-frequency DCT coefficient stats.

The stats are loaded from a file, this can be written using torch.save(). The file should contain a single dictionary with string keys containing channel names. The value of each entry should be a dictionary with the keys: “mean, variance, min, and max” with the corresponding statistics as Tensors.

Pre-computed stats are available for color or grayscale images (pass “color” and “grayscale” respectively for the root argument), these stats were computed from the Flickr 2k dataset, a large corpus of high quality images and are suitable for general use.

Parameters:
  • root (pathlib.Path, string, or literals “color”, “grayscale”) – The path to load the statistics from or “color” to use built in color stats or “grayscale” to use built in grayscale stats.

  • normtype (str) – Either “ms” for mean-variance normalization or “01” for zero-one normalization.

denormalize(blocks: torch.Tensor, normtype: str = 'y') → torch.Tensor[source]

Denormalizes blocks of coefficients.

Parameters:
  • blocks (Tensor) – a Tensor containing blocks of normalized DCT coefficients in the format \((N, C, L, H, W)\).

  • normtype (str) – Which channel to denormalize, “y” by default.

Returns:

The denormalized coefficients.

Return type:

Tensor

normalize(blocks: torch.Tensor, normtype: str = 'y') → torch.Tensor[source]

Normalizes blocks of coefficients.

Parameters:
  • blocks (Tensor) – a Tensor containing blocks of DCT coefficients in the format \((N, C, L, H, W)\).

  • normtype (str) – Which channel to normalize, “y” by default.

Returns:

The normalized coefficients.

Return type:

Tensor

torchjpeg.dct.pad_to_block_multiple(im: torch.Tensor, macroblock_size: int = 16) → torch.Tensor[source]

Pads an image to make it fit an even number of blocks

Parameters:
  • im (Tensor) – the input image to pad, can be of shape \((N, C, H, W)\) or \((C, H, W)\).

  • macroblock_size (int) – The size of a macroblock, sometimes referred to as a minimum coded unit (MCU), default 16.

Note

As in the JPEG standard, the padding is applied to the right and bottom edges and is replicate padding. Note that the default macroblock is of size 16 (meaning a \(16 \times 16\) block). This ensures that if chroma subsampling is used on the color channels, after half-sizing they will still fit an even number of \(8 \times 8\) blocks as required by the JPEG DCT.

torchjpeg.dct.zigzag(coefficients: torch.Tensor) → torch.Tensor[source]

Vectorizes a DCT coefficients in JPEG zigzag order

Parameters:

coefficients (Tensor) – DCT coefficients of shape \((N, C, H, W)\) or \((C, H, W)\).

Returns:

A batch of vectorized coefficients of shape \((N, C, L, 64)\) or \((C, L, 64)\)

Return type:

Tensor

Note

For a visual representation of JPEG zigzag order see https://en.wikipedia.org/wiki/JPEG#/media/File:JPEG_ZigZag.svg.