torchjpeg.data

Dataset and Data Processing

class torchjpeg.data.ImageList[source]

Bases: object

Structure that holds a list of images (of possibly varying sizes) as a single tensor. This works by padding the images to the same size, and storing in a field the original sizes of each image

image_sizes

each tuple is (h, w)

Type:

list[tuple[int, int]]

Note

This class was taken from detectron2 (https://github.com/facebookresearch/detectron2/blob/master/detectron2/structures/image_list.py) with a small modification to the padding function which preserves gradients and some small fixes for linting and type checking. Otherwise the class and its documentation of unchanged.

property device

Implements the device API for ImageLists by returning the underlying device

Returns:

The device that the underlying tensor storage resides on

Return type:

torch.device

static from_tensors()[source]
Parameters:
  • tensors – a tuple or list of torch.Tensors, each of shape (Hi, Wi) or (C_1, …, C_K, Hi, Wi) where K >= 1. The Tensors will be padded to the same shape with pad_value.

  • size_divisibility (int) – If size_divisibility > 0, add padding to ensure the common height and width is divisible by size_divisibility. This depends on the model and many models need a divisibility of 32.

  • pad_value (float) – value to pad

Returns:

an ImageList.

to(*args: Any, **kwargs: Any) → torchjpeg.data.image_list.ImageList[source]

Implements the device API for ImageLists by copying the underyling storage to the target device

:param All arguments are forwarded to the underlying tensor storage torch.Tensor.to():

Returns:

The imagelist object on the new device

Return type:

ImageList

torchjpeg.data.crop_batch()[source]

Crops a batch of images to their original size, removing any padding

Parameters:
  • batch (Tensor) – A batch of shape \((N, C, H, W)\) of images which may have been padded either by JPEG or to make them the same size

  • sizes (Tensor) – A tensor of shape \((N, M)\) where the height and width of image i respecively are stored at position [i, -1] and [i, -2].

Returns:

A list of the cropped images, potentially all with different sizes.

Return type:

Sequence of Tensors

class torchjpeg.data.JPEGQuantizedDataset[source]

Bases: torch.utils.data.dataset.Dataset

Wraps an arbitrary image dataset to return JPEG quantized versions of the image. The amount quantization is defined using IJG quality settings. If the underlying dataset returns a sequence, the first element of the sequence is taken the be the image which is quantized and the remaining elements are returned as the last element of the batch. If the underlying dataset returns a mapping, set image_key to the key of the image to be quantized. The original dictionary, including the image before quantization, will be returned as the last element of the batch.

Since the primary return values are all DCT coefficients, padding will be added to the images to make them an even multiple of the MCU. Following JPEG conventions this is replicate padding added to the bottom and right edges. The original size of the image is returned so that the images can be correctly cropped after processing.

The format returned by this dataset is:

Y Channel Coefficients, CbCr Coefficients, Y Quantization Matrix, CbCr Quantization Matrix, Pre-quantization YCbCr Coefficients, Original Image Size, Optional rest of the batch from the underlying dataset

If the image is grayscale, the CbCr coefficients and quantization matrix will be an empty tensor, if the underlying dataset returns an image with no additional data, the final return value will be an empty tensor. This is to avoid issues with the default collate function, an empty tensor is one initialized with 0 size using torch.empty(). It is detectable as tensor.numel() == 0.

Parameters:
  • data (torch.utils.data.Dataset) – The dataset to wrap

  • quality (int, tuple of two or three ints) – The quality range (min 0 max 100) to draw from, inclusive on both ends. If this is a single integer, only that quality is used, if it’s three integers, the last one defines a step size.

  • stats (:py:clas:`torchjpeg.dct.Stats`) – Statstics to use for per-frequency per-channel coefficient normalization

  • mcu (int) – The size of the minimum coded unit, use 16 for 4:2:0 chroma subsampling.

  • image_key (optional str) – The key to use to extract the image from a dataset which returns a mapping.

  • deterministic_quality (bool) – False by default, set to True to include the quality range in the dataset size. In other words, the length of this dataset will be len(quality_range) * len(dataset) and all the qualities in the range will be represented for every image by interating this dataset.

Warning

The images returned from this dataset may be of differing sizes, use the static torchjpeg.data.JPEGQuantizedDataset.collate() to collate them into a batch with padding. Use torchjpeg.data.crop_batch() to crop them back to the correct sizes (this will also remove JPEG padding).

static collate()[source]

Custom collate function which works for return values from this dataset. Adds padding to the images so that they can be stored in a single tensor

Parameters:

batch_list – Output from this dataset

Returns:

Batch with each input collated into single tensors

class torchjpeg.data.FolderOfJpegDataset[source]

Bases: torch.utils.data.dataset.Dataset

Loads coefficents from a folder of JPEG without any labels. For each image, it returns the format of torchjpeg.codec.read_coefficients(). The images must be actualy JPEG files (stored as JPEGs) for this to work. The relative path to the JPEG file will be returned along with the coefficients. The coefficients themselves are not guaranteed to be the same size, use the collate function to collate these into a batched Tensor by adding padding.

Parameters:
  • path (Path) – The path to load images from

  • stats (Stats) – DCT stats to use to normalize the coefficients

  • extensions (List[str]) – The JPEG file extensions to search for

static collate()[source]

Custom collate function which works for return values from this dataset. Adds padding to the images so that they can be stored in a single tensor

Parameters:

batch_list – Output from this dataset

Returns:

Batch with each input collated into single tensors

class torchjpeg.data.UnlabeledImageFolder[source]

Bases: torch.utils.data.dataset.Dataset

Dataset loading a folder of unlabeled images recursively. The images are loaded using PIL and otherwise unchanged, add a transform to turn them into Tensors

Parameters:
  • path (Path) – The path to load recursively from

  • extensions (List[str]) – The image extensions to look for

  • transform – Any transform to apply to the images after loading them

Transforms

class torchjpeg.data.transforms.RandomJPEG[source]

Bases: object

Applies JPEG compression on a PIL at a random quality.

Parameters:

quality_range (Tuple[int, int]) – The quality range to choose from, inclusive on both ends. An integer in this range will be chosen at random and will be used as the compression quality setting.

class torchjpeg.data.transforms.YCbCr[source]

Bases: object

Converts a PIL image to YCbCr color space

Note

PIL follows the JPEG YCbCr color conversion giving a result in [0, 255].

class torchjpeg.data.transforms.YChannel[source]

Bases: object

Converts a tensor with a color image in [0, 1] to the Y channel using ITU-R BT.601 conversion

Warning

This is not equivalent to the Y channel of a color image that would be used by JPEG, the result is in [16, 240] following the ITU-R BT.601 standard before normalization. This is useful for certian JPEG artifact correction algorithms due to some questionable evaluation choices by that community. The result is normalized to \(\left[\frac{16}{255},\frac{240}{255}\right]\) before being returned.