Welcome to Ning Yang(杨宁)’s Homepage

I am a fifth-year Ph.D. student in Computer Science at Shanghai Jiao Tong University (SJTU), Intelligent Memory & Processor Architecture & Computing Lab (IMPACT Lab), where I am very fortunate to be advised by Prof. Li Jiang and Assist Prof. Fangxin Liu.

I received my Bachelor’s degree from the ACM Honors Class at Shanghai Jiao Tong University, where I worked under the supervision of Prof. Weinan Zhang and Prof. Yong Yu.

My research focuses on efficient systems for large language models (LLMs), with an emphasis on reducing data movement and improving real-world performance under hardware constraints. My work lies at the intersection of systems, machine learning, and architecture, including:

  • Model compression, quantization, and encoding
  • Memory-efficient execution and dataflow optimization
  • Mixture-of-Experts (MoE) inference and scheduling
  • Hardware–software co-design for LLM deployment

I am particularly interested in co-designing algorithms and systems to reduce memory movement and improve real-world performance.


🔥 Selected Work

My recent work explores this direction through several system–algorithm co-designed frameworks:

  • STEP: A system-aware expert prefetching architecture for MoE inference. ISCA 2026 (Accepted!)
  • EARTH: Entropy-aware MoE system with speculative prefetch and result reuse, reducing memory traffic and improving efficiency. ASPLOS 2026 [Paper]
  • NICE: Index-assisted LUT-based execution framework that transforms MAC operations into efficient lookup mechanisms TACO 2026 [Paper]
  • SPARK/INSPIRE/SPADE: A series of works exploring hybrid-precision encoding for LLMs hardware-efficient acceleration. HPCA 2024 / HOLES 2024 / TACO 2026 [Paper]

🎯 Research Vision

My long-term goal is to build efficient and deployable AI systems by bridging the gap between model design and system-level execution. I aim to develop techniques that jointly optimize representation, execution, and hardware interaction to enable scalable and efficient deployment of large models.

In addition to efficiency, I am also interested in the intersection of AI systems and security, particularly in understanding how compression technique design impact the robustness, reliability, and security of modern AI systems. I am interested in developing secure and trustworthy AI systems that remain efficient under real-world constraints.