Jiashuo Liu's Website

Research interests

Next-Gen LLM Evaluation: Design (1) algorithms to efficiently and reliably assess LLM's true capabilities, and (2) benchmarks with completely no data contamination for reliable LLM evaluation
Data-Centric AI: Design scalable algorithms to understand data properties for RL
Previous Interests: Out-of-Distribution Generalization, Distributionally Robust Optimization

Recent Highlights

Seed1.8 Model Card: Towards Generalized Real-World Agency
ByteDance Seed (core contributor to the Seed Evaluation System)
Technical Report, 2025.
paper
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction
Project Lead. ByteDance Seed, Fudan University, Stanford University, Princeton University.
Technical Report, 2025.
ICLR'26: International Conference on Learning Representations, 2026.
paper data website
LLM Swiss Round: Aggregating Multi-Benchmark Performance via Competitive Swiss-System Dynamics
Jiashuo Liu, Jiayun Wu, Chunjie Wu, Jingkai Liu, Zaiyuan Wang, Huan Zhou, Wenhao Huang, Hongseok Namkoong
paper
Mitigating LLM Hallucination via Behaviorally Calibrated Reinforcement Learning
Jiayun Wu, Jiashuo Liu†, Zhiyuan Zeng, Tianyang Zhan, Tianle Cai, Wenhao Huang
paper
FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning
Core Contributor. ByteDance Seed, Columbia Business School.
Technical Report, 2025.
ICLR'26: International Conference on Learning Representations, 2026.
Reported in Minimax-M2 and Kimi-K2-thinking
paper data website

Recent Preprints

CryptoBench: A Dynamic Benchmark for Expert-Level Evaluation of LLM Agents in Cryptocurrency
Core Contributor. Princeton University etc.
Technical Report, 2025.
paper
RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization
Zhiyuan Zeng, Jiashuo Liu, Zhangyue Yin, Ge Zhang, Wenhao Huang, Xipeng Qiu
paper code
AInsteinBench: Benchmarking Coding Agents on Scientific Repositories
Contributor. ByteDance Seed, Princeton University
paper website
LPFQA: A Long-Tail Professional Forum-based Benchmark for LLM Evaluation
Liya Zhu, Peizhuang Cong, Aowei Ji, Wenya Wu, Jiani Hou, Chunjie Wu, Xiang Gao, Jingkai Liu, Zhou Huan, Xuelei Sun, Yang Yang, Jianpeng Jiao, Liang Hu, Xinjie Chen, Jiashuo Liu, Jingzhe Ding, Tong Yang, Zaiyuan Wang, Ge Zhang, Wenhao Huang
paper

Publications

FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction
Project Lead. ByteDance Seed, Fudan University, Stanford University, Princeton University.
Technical Report, 2025.
ICLR'26: International Conference on Learning Representations, 2026.
paper data website
FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning
Core Contributor. ByteDance Seed, Columbia Business School.
Technical Report, 2025.
ICLR'26: International Conference on Learning Representations, 2026.
Reported in Minimax-M2 and Kimi-K2-thinking
paper data website
DiscoX: Benchmarking Discourse-Level Translation task in Expert Domains
Contributor. ByteDance Seed, Peking University
ICLR'26: International Conference on Learning Representations, 2026.
paper data
DRO: A Python Library for Distributionally Robust Optimization in Machine Learning
Jiashuo Liu*, Tianyu Wang*, Henry Lam, Hongseok Namkoong, Jose Blanchet.
OPT'25: Optimization for Machine Learning;
R&R at JMLR, 2025.
paper code
Towards Human-Guided, Data-Centric LLM Co-Pilots
Evgeny Saveliev*, Jiashuo Liu*, Nabeel Seedat*, Anders Boyd, Mihaela van der Schaar.
ICLR'25 Workshop on Navigating and Addressing Data Problems for Foundation Models.
DMLR'25: Data-centric Machine Learning Research, 2025.
paper
Data Heterogeneity Modeling for Trustworthy Machine Learning
Jiashuo Liu, Peng Cui.
KDD'25: SIGKDD Conference on Knowledge Discovery and Data Mining, 2025.
paper
DRO: A Python Library for Distributionally Robust Optimization in Machine Learning
Jiashuo Liu*, Tianyu Wang*, Henry Lam, Hongseok Namkoong, Jose Blanchet.
OPT'25: Optimization for Machine Learning; under review at JMLR, 2025.
paper code
Topology-Aware Dynamic Reweighting for Distribution Shifts on Graph
Weihuang Zheng*, Jiashuo Liu*, Jiaxing Li, Jiayun Wu, Peng Cui, Youyong Kong.
ICML'25: International Conference on Machine Learning, 2025.
paper
Exploring and Exploiting Data Heterogeneity in Recommendation
Zimu Wang, Jiashuo Liu, Hao Zou, Xingxuan Zhang, Yue He, Dongxu Liang, Peng Cui.
TKDD'25: ACM Transactions on Knowledge Discovery from Data, 2025.
paper
Going Beyond Static: Understanding Shifts with Time-Series Attribution
Jiashuo Liu, Nabeel Seedat, Peng Cui, Mihaela van der Schaar.
ICLR'25: International Conference on Learning Representations, 2025.
paper
Position: What's the next frontier for Data-centric AI? Data Savvy Agents!
Nabeel Seedat*, Jiashuo Liu*, Mihaela van der Schaar.
ICLR'25 Workshop on Navigating and Addressing Data Problems for Foundation Models, 2025.
paper
Towards Out-of-Distribution Generalization: A Survey
Jiashuo Liu*, Zheyan Shen*, Yue He, Xingxuan Zhang, Renzhe Xu, Han Yu, Peng Cui.
Survey Paper, 2021.
paper
AdaptSel: Adaptive Selection of Biased and Debiased Recommendation Models for Varying Test Environments
Zimu Wang, Hao Zou, Jiashuo Liu, Jiayun Wu, Pengfei Tian, Yue He, Peng Cui.
TKDD'24: ACM Transactions on Knowledge Discovery from Data, 2024.
paper
Bridging Multicalibration and Out-of-distribution Generalization Beyond Covariate Shift
Jiayun Wu, Jiashuo Liu, Peng Cui, Zhiwei Steven Wu.
NeurIPS'24: Neural Information Processing Systems, 2024.
paper code
LLM Embeddings Improve Test-time Adaptation to Tabular Y|X-Shifts
Yibo Zeng*, Jiashuo Liu*, Henry Lam, Hongseok Namkoong.
NeurIPS'24 Workshop on Table Representation Learning, 2024.
paper website code
Stability Evaluation of Large Language Models via Distributional Perturbation Analysis
Jiashuo Liu, Jiajin Li, Peng Cui, Jose Blanchet.
NeurIPS'24 Workshop on Red Teaming GenAI, 2024.
paper
Stability Evaluation via Distributional Perturbation Analysis
(α-β order) Jose Blanchet*, Peng Cui*, Jiajin Li*, Jiashuo Liu*.
ICML'24: International Conference on Machine Learning; invited talk at INFORMS'24 DRO Workshop, 2024.
paper
Geometry-Calibrated DRO: Combating Over-Pessimism with Free Energy Implications
Jiashuo Liu, Jiayun Wu, Tianyu Wang, Hao Zou, Bo Li, Peng Cui.
ICML'24: International Conference on Machine Learning; short version at NeurIPS'23 (DS), 2024.
paper
Enhancing Distributional Stability among Sub-Populations
Jiashuo Liu, Jiayun Wu, Jie Peng, Xiaoyu Wu, Yang Zheng, Bo Li, Peng Cui.
AISTATS'24: International Conference on Artificial Intelligence and Statistics, 2024.
paper
Domain-wise Data Acquisition to Improve Performance under Distribution Shift
Yue He, Dongbai Li, Pengfei Tian, Han Yu, Jiashuo Liu, Hao Zou, Peng Cui.
ICML'24: International Conference on Machine Learning, 2024.
paper
Distributionally Generative Augmentation for Fair Facial Attribute Classification
Fengda Zhang, Qianpei He, Kun Kuang, Jiashuo Liu, Long Chen, Chao Wu, Jun Xiao, Hanwang Zhang.
CVPR'24: Conference on Computer Vision and Pattern Recognition, 2024.
paper
Rethinking the Evaluation Protocol of Domain Generalization Han Yu, Xingxuan Zhang, Renzhe Xu, Jiashuo Liu, Yue He, Peng Cui.
CVPR'24: Conference on Computer Vision and Pattern Recognition, 2024.
paper
Towards Robust Out-of-Distribution Generalization Bounds via Sharpness
Yingtian Zou, Kenji Kawaguchi, Yingnan Liu, Jiashuo Liu, Mong-Li Lee, Wynne Hsu.
ICLR'24 (Spotlight): International Conference on Learning Representations, 2024.
paper
On the Need of a Modeling Language for Distribution Shifts: Illustrations on Tabular Datasets
Jiashuo Liu*, Tianyu Wang*, Peng Cui, Hongseok Namkoong.
INFORMS'24 Workshop on Data Science (full paper, 2024); NeurIPS'23 Datasets & Benchmarks (2023); Major Revision at Management Science.
Selected as the Favorite Paper by Two Sigma (9/3500)
paper code website
Offline Policy Evaluation in Large Action Spaces via Outcome-Oriented Action Grouping
Jie Peng, Hao Zou, Jiashuo Liu, Shaoming Li, Yibao Jiang, Jian Pei, Peng Cui.
WWW'23: The ACM Web Conference, 2023.
paper
Measure the Predictive Heterogeneity
Jiashuo Liu*, Jiayun Wu*, Renjie Pi, Renzhe Xu, Xingxuan Zhang, Bo Li, Peng Cui.
ICLR'23: International Conference on Learning Representations, 2023.
paper
Distributionally Robust Learning with Stable Adversarial Training
Jiashuo Liu, Zheyan Shen, Peng Cui, Linjun Zhou, Kun Kuang, Bo Li.
TKDE'22: IEEE Transactions on Knowledge and Data Engineering, 2022.
paper code
Distributionally Robust Optimization with Data Geometry
Jiashuo Liu*, Jiayun Wu*, Bo Li, Peng Cui.
NeurIPS'22 (Spotlight): Neural Information Processing Systems, 2022.
paper
Towards the ultimate PMT waveform analysis for neutrino and dark matter experiments
Dacheng Xu, Benda Xu, Erjin Bao, Yiyang Wu, Aiqiang Zhang, Yuyi Wang, Geliang Zhang, Yu Xu, Ziyi Guo, Jihui Pei, Hanyang Mao, Jiashuo Liu, Zhe Wang, Shaomin Chen.
JINST'22: Journal of Instrumentation, 2022.
Invariant Preference Learning for General Debiasing in Recommendation
Zimu Wang, Yue He, Jiashuo Liu, Wenchao Zou, Philip Yu, Peng Cui.
KDD'22: SIGKDD Conference on Knowledge Discovery and Data Mining, 2022.
paper
Kernelized Heterogeneous Risk Minimization
Jiashuo Liu*, Zheyuan Hu*, Peng Cui, Bo Li, Zheyan Shen.
NeurIPS'21: Neural Information Processing Systems, 2021.
paper code
Heterogeneous Risk Minimization
Jiashuo Liu, Zheyuan Hu, Peng Cui, Bo Li, Zheyan Shen.
ICML'21: International Conference on Machine Learning, 2021.
paper code
Stable Adversarial Learning under Distributional Shifts
Jiashuo Liu, Zheyan Shen, Peng Cui, Linjun Zhou, Kun Kuang, Bo Li, Yishi Lin.
AAAI'21: AAAI Conference on Artificial Intelligence, 2021.
paper code
Triple Generative Adversarial Networks
Chongxuan Li, Kun Xu, Jun Zhu, Jiashuo Liu, Bo Zhang.
TPAMI'21: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.\
Signed Graph Neural Network with Latent Groups
Haoxin Liu, Ziwei Zhang, Peng Cui, Yafeng Zhang, Qiang Cui, Jiashuo Liu, Wenwu Zhu.
KDD'21: SIGKDD Conference on Knowledge Discovery and Data Mining, 2021.
paper
Stable Learning via Differentiated Variable Decorrelation
Zheyan Shen, Peng Cui, Jiashuo Liu, Tong Zhang, Bo Li, Zhitang Chen.
KDD'20: SIGKDD Conference on Knowledge Discovery and Data Mining, 2020.
paper