Hi! I am Haoxuan Xu, an incoming Ph.D. student in the Department of Computer Science at University of Southern California, where I will be advised by Asst. Prof. Mengyuan Li. I am currently completing my Master’s in Computer Science at ETH Zurich, majoring in Secure and Reliable Systems. My research focuses on side-channel attacks and on securing AI through confidential computing and other hardware-based approaches, with the goal of making modern AI systems both verifiable and efficient.
Large Language Models (LLMs) inference increasingly require mechanisms that provide runtime visibility into what is actually executing, without exposing model weights or code. We present WAVE, a hardware-grounded monitoring framework that leverages GPU performance counters (PMCs) to observe LLM inference. WAVE is built on the insight that legitimate executions of a given model must satisfy hardware-constrained invariants, such as memory accesses, instruction mix, and tensor-core utilization, induced by the model’s linear-algebraic structure. WAVE collects lightweight PMC traces and applies a two-stage pipeline: (1) inferring architectural properties (e.g., parameter count, layer depth, hidden dimension, batch size) from the observed traces; and (2) using an SMT-based consistency checker to assess whether the execution aligns with the provisioned compute and the claimed model’s constraints. We evaluate WAVE on common open-source LLM architectures, such as LLaMA, GPT, and Qwen, across multiple GPU architectures, including NVIDIA Ada Lovelace, Hopper, and Blackwell. Results show that WAVE recovers key model parameters with an average error of 6.8% and identifies disguised executions under realistic perturbations. By grounding oversight in hardware invariants, WAVE provides a practical avenue for continuous, privacy-preserving runtime monitoring of LLM services.
EuroS&P 2025
Latte: Layered Attestation for Portable Enclaved Applications
Trusted Execution Environment (TEE) has become increasingly popular in privacy-protected cloud computing, and its rapid development has led to the availability of various heterogeneous TEE platforms on cloud servers. To facilitate portable TEE applications on heterogeneous TEE platforms, portable languages or intermediate representations (IRs) with platform-dependent TEE runtimes are adopted. However, existing remote attestation solutions for portable TEE applications follow a nested attestation pattern, i.e., attesting only the TEE runtime and relying on the TEE runtime to measure the loaded portable application, leading to potential security issues. On the other hand, directly packing the TEE runtime and the portable application into an enclave for secure attestation undermines the portability of the portable TEE applications. In this paper, we introduce the concept of portable identities to identify portable TEE applications, and propose a layered attestation framework, Latte, achieving both security and portability in attesting portable TEE applications. We provide a prototype implementation of Latte to validate its practicality, with WebAssembly as the portable IR, and Intel SGX and RISC-V Penglai as the exemplar heterogeneous TEEs. The evaluation demonstrates that Latte introduces minimal performance overhead compared with the nested attestation pattern.