Implementing RLE in Python: A Step-by-Step Guide with Examples
Run-Length Encoding (RLE) is a simple lossless compression technique that replaces consecutive repeated values with a single value and a count. It’s especially effective on data with long runs of repeated elements (e.g., simple images, binary masks, or repetitive text). This guide shows a clear, step-by-step implementation in Python with examples for strings, lists, and binary image masks.
When to use RLE
- Effective: data with long repeated runs (e.g., black/white images, simple sensor streams).
- Not effective: high-entropy data (compressed files, encrypted streams, photos with lots of detail).
1. RLE for strings (basic)
Python implementation that compresses a string of characters into (char,count) pairs encoded as a compact string.
Code (compress and decompress):
python
def rle_compress_string(s: str) -> str: if not s: return ”” result = [] prev = s[0] count = 1 for ch in s[1:]: if ch == prev: count += 1 else: result.append(f”{prev}{count}“) prev = ch count = 1 result.append(f”{prev}{count}“) return ””.join(result) def rle_decompressstring(rle: str) -> str: if not rle: return ”” import re pairs = re.findall(r’(.)(\d+)’, rle) return ””.join(ch * int(cnt) for ch, cnt in pairs)
Example:
python
s = “AAAABBBCCDAA” c = rle_compress_string(s) # “A4B3C2D1A2” orig = rle_decompressstring(c) # “AAAABBBCCDAA”
Notes:
- This encoding assumes the character set excludes digits as separate tokens or uses a clear parsing scheme. For general use, separate value and count with a delimiter or use structured storage.
2. RLE for lists (generic values)
Handles arbitrary hashable values (ints, tuples). Output is list of (value, count) tuples.
Code:
python
from typing import List, Tuple, TypeVar T = TypeVar(’T’) def rle_compress_list(data: List[T]) -> List[Tuple[T, int]]: if not data: return [] out = [] prev = data[0] count = 1 for x in data[1:]: if x == prev: count += 1 else: out.append((prev, count)) prev = x count = 1 out.append((prev, count)) return out def rle_decompresslist(rle: List[Tuple[T, int]]) -> List[T]: out = [] for val, cnt in rle: out.extend([val] * cnt) return out
Example:
python
data = [1,1,1,2,2,3,3,3,3] c = rle_compress_list(data) # [(1,3),(2,2),(3,4)] orig = rle_decompresslist(c) # [1,1,1,2,2,3,3,3,3]
3. RLE for binary image masks (2D arrays)
Common for compressing binary masks (0/1). Flatten the array (row-major) and compress runs. Store width/height or use scanline boundaries if needed.
Code using numpy:
python
import numpy as np from typing import Tuple, List def rle_compress_mask(mask: np.ndarray) -> List[int]: # mask: 2D binary array (0/1) flat = mask.ravel(order=‘C’) # row-major if flat.size == 0: return [] runs = [] prev = int(flat[0]) count = 1 for b in flat[1:]: b = int(b) if b == prev: count += 1 else: runs.append(prev) runs.append(count) prev = b count = 1 runs.append(prev) runs.append(count) return runs def rle_decompressmask(runs: List[int], shape: Tuple[int, int]) -> np.ndarray: flat = [] it = iter(runs) for val, cnt in zip(it, it): flat.extend([val] * cnt) arr = np.array(flat, dtype=np.uint8) return arr.reshape(shape, order=‘C’)
Example:
python
mask = np.array([[1,1,0],[0,0,0],[1,1,1]]) runs = rle_compress_mask(mask) # 1,2,0,1,0,3,1,3 recon = rle_decompressmask(runs, mask.shape)
Optimization note:
- Often store only lengths of 1-runs or 0-runs depending on convention (e.g., COCO run-length uses start positions and lengths). Choose representation that suits your storage/interop needs.
4. Handling edge cases
- Large counts: use integers (no overflow in Python). If serializing to text, ensure counts are unambiguous (use delimiters or fixed-width).
- Mixed data types: serialize values before compression (e.g., JSON, bytes).
- Streaming data: emit runs as they complete; keep current run state between chunks.
5. Performance tips
- For large numeric arrays, use numpy vectorized methods to find run boundaries (diff != 0) for faster compression.
- For bytes, using Python’s bytes/bytearray and memoryviews is faster than lists of ints.
- When storing compressed data, consider binary formats (struct, protobuf) rather than string concatenation.
6. Summary example: compact byte-oriented RLE
Simple bytes compressor/decompressor:
python
def rle_compress_bytes(data: bytes) -> bytes: if not data: return b” out = bytearray() prev = data[0] count = 1 for b in data[1:]: if b == prev and count < 255: count += 1 else: out.extend([prev, count]) prev = b count = 1 out.extend([prev, count]) return bytes(out) def rle_decompress_bytes(r: bytes) -> bytes: it = iter(r) out = bytearray() for val, cnt in zip(it, it): out.extend([val] * cnt) return bytes(out)
Use-case: small embedded systems or simple image formats; note the 255-count cap — extend with multi-byte counts if needed.
Further reading: look up run-length encoding variations (packbits, TIFF RLE, COCO mask RLE) when targeting specific file formats.
Leave a Reply