Posts by Collection

portfolio

Portfolio item number 1

Short description of portfolio item number 1

Portfolio item number 2

Short description of portfolio item number 2

publications

A Unified Framework for Multi-Domain CTR Prediction via Large Language Models

Published in TOIS, ACM Transactions on Information Systems, 2024

Uni-CTR leverages Large Language Models and pluggable domain networks to address the seesaw phenomenon and scalability challenges in multi-domain CTR prediction, achieving SOTA performance across various scenarios.

Citation: Zichuan Fu, Xiangyang Li, Chuhan Wu, Yichao Wang, Kuicai Dong, Xiangyu Zhao, Mengchen Zhao, Huifeng Guo, and Ruiming Tang. 2024. A Unified Framework for Multi-Domain CTR Prediction via Large Language Models. ACM Trans. Inf. Syst. Just Accepted (October 2024). https://doi.org/10.1145/3698878
Download Paper

LLM4MSR: An LLM-Enhanced Paradigm for Multi-Scenario Recommendation

Published in CIKM’24 (Full Research Paper track), Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, 2024

LLM4MSR enhances multi-scenario recommendation by leveraging LLM for knowledge extraction and hierarchical meta networks, achieving improved performance and interpretability without LLM fine-tuning while maintaining deployment efficiency.

Citation: Yuhao Wang, Yichao Wang, Zichuan Fu, Xiangyang Li, Wanyu Wang, Yuyang Ye, Xiangyu Zhao, Huifeng Guo, and Ruiming Tang. 2024. LLM4MSR: An LLM-Enhanced Paradigm for Multi-Scenario Recommendation. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (CIKM '24). Association for Computing Machinery, New York, NY, USA, 2472–2481. https://doi.org/10.1145/3627673.3679743
Download Paper

Sliding Window Attention Training for Efficient Large Language Models

Published in arXiv preprint arXiv:2502.18845, 2025

SWAT enables efficient long-context handling via Sliding Window Attention Training, replacing softmax with sigmoid and combining balanced ALiBi with Rotary Position Embedding to retain information.

Citation: Zichuan Fu, Wentao Song, Yejing Wang, Xian Wu, Yefeng Zheng, Yingying Zhang, Derong Xu, Xuetao Wei, Tong Xu, and Xiangyu Zhao. 2025. Sliding Window Attention Training for Efficient Large Language Models. arXiv preprint arXiv:2502.18845. arXiv:2502.18845
Download Paper

Model Merging for Knowledge Editing

Published in ACL’25(Industry Track), Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, 2025

A two-stage framework combining robust supervised fine-tuning with model merging for efficient knowledge editing in LLMs that preserves general capabilities while outperforming existing methods.

Citation:
Download Paper

Training-free LLM Merging for Multi-task Learning

Published in ACL’25, Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, 2025

Hi-Merging: a training-free method that merges specialized LLMs into a unified multi-task model using hierarchical pruning and scaling, preserving individual strengths while minimizing parameter conflicts across languages and tasks.

Citation:
Download Paper

AnchorCoT: Anchors Pave the Way for Multi-hop Reasoning

Published in ACL’25 Findings, Findings of the Association for Computational Linguistics, 2025

AnchorCoT predicts key entities as “anchors” to guide multi-hop reasoning and uses a ranking algorithm to ensure logical answer sequences, improving LLM performance on multi-hop QA.

Citation: Tianshi Ming, Xian Wu, Yingying Zhang, Zichuan Fu, and Dawei Cheng. 2025. AnchorCoT: Anchors Pave the Way for Multi-hop Reasoning. In Findings of the Association for Computational Linguistics: ACL 2025, pages 15522-15536. 10.18653/v1/2025.findings-acl.801
Download Paper

A Multi-Expert Structural-Semantic Hybrid Framework for Unveiling Historical Patterns in Temporal Knowledge Graphs

Published in ACL’25 Findings, Findings of the Association for Computational Linguistics, 2025

MESH employs three kinds of expert modules to integrate structural and semantic information for temporal knowledge graph reasoning, capturing differences between historical and non-historical events.

Citation: Yimin Deng, Yuxia Wu, Yejing Wang, Guoshuai Zhao, Li Zhu, Qidong Liu, Derong Xu, Zichuan Fu, Xian Wu, Yefeng Zheng, Xiangyu Zhao, and Xueming Qian. 2025. A Multi-Expert Structural-Semantic Hybrid Framework for Unveiling Historical Patterns in Temporal Knowledge Graphs. In Findings of the Association for Computational Linguistics: ACL 2025. 10.18653/v1/2025.findings-acl.1056
Download Paper

Attention Needs to Focus: A Unified Perspective on Attention Allocation

Published in arXiv preprint arXiv:2601.00919, 2026

A unified perspective tracing representational collapse and attention sink to improper attention allocation, introducing Lazy Attention with positional discrimination and Elastic-Softmax for focused attention.

Citation: Zichuan Fu, Wentao Song, Guojing Li, Yejing Wang, Xian Wu, Yimin Deng, Hanyu Yan, Yefeng Zheng, and Xiangyu Zhao. 2026. Attention Needs to Focus: A Unified Perspective on Attention Allocation. arXiv preprint arXiv:2601.00919. arXiv:2601.00919
Download Paper

AdapTime: Enabling Adaptive Temporal Reasoning in Large Language Models

Published in ACL’26 Findings, Findings of the Association for Computational Linguistics, 2026

AdapTime is an adaptive temporal reasoning method that dynamically executes reformulate, rewrite, and review actions guided by an LLM planner, enhancing temporal reasoning without external tools.

Citation: Yimin Deng, Yejing Wang, Zhenxi Lin, Zichuan Fu, Guoshuai Zhao, Derong Xu, Yefeng Zheng, Xiangyu Zhao, Xian Wu, Li Zhu, and Xueming Qian. 2026. AdapTime: Enabling Adaptive Temporal Reasoning in Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2026. arXiv:2604.24175
Download Paper

MultiDx: A Multi-Source Knowledge Integration Framework towards Diagnostic Reasoning

Published in ACL’26 Findings, Findings of the Association for Computational Linguistics, 2026

MultiDx is a two-stage diagnostic reasoning framework that performs differential diagnosis by integrating multi-perspective evidence from web search, SOAP-formatted cases, and a clinical case database.

Citation: Yimin Deng, Zhenxi Lin, Yejing Wang, Guoshuai Zhao, Pengyue Jia, Zichuan Fu, Derong Xu, Yefeng Zheng, Xiangyu Zhao, Li Zhu, Xian Wu, and Xueming Qian. 2026. MultiDx: A Multi-Source Knowledge Integration Framework towards Diagnostic Reasoning. In Findings of the Association for Computational Linguistics: ACL 2026. arXiv:2604.24186
Download Paper

Tandem: Riding Together with Large and Small Language Models for Efficient Reasoning

Published in ACL’26 Findings, Findings of the Association for Computational Linguistics, 2026

Tandem is a collaborative framework where an LLM provides strategic reasoning insights to guide an efficient SLM, reducing computational costs by ~40% while maintaining or improving reasoning performance.

Citation: Zichuan Fu, Xian Wu, Guojing Li, Yejing Wang, Yijun Chen, Zihao Zhao, Yixuan Luo, Hanyu Yan, Yefeng Zheng, and Xiangyu Zhao. 2026. Tandem: Riding Together with Large and Small Language Models for Efficient Reasoning. In Findings of the Association for Computational Linguistics: ACL 2026. arXiv:2604.23623
Download Paper

talks

Ensemble-Hub — Tandem: Collaborative LLM–SLM Reasoning Permalink

Published: April 14, 2026

Reference implementation of Tandem (ACL 2026 Findings): a mentor–intern framework where an LLM emits compact GPRA reasoning insights to guide an efficient SLM, with cost-aware termination and progressive effort levels. ~40% cost reduction; +2.56% accuracy over a standalone 32B LLM at 59% of its compute on MATH.

GUI Agent Harness Permalink

Published: June 02, 2026

Observe → plan → click → verify by vision; drives desktop apps and OSWorld VMs. Runs on macOS / Windows / Linux (perception macOS-tuned). Track record: 79.8% on OSWorld Multi-Apps (72.6 / 91).

Research Agent Harness Permalink

Published: June 02, 2026

Literature survey → idea → experiments → paper draft → cross-model review. Track record: turns a topic into a submission-ready draft.

Wiki Agent Harness Permalink

Published: June 02, 2026

Ingests notes / docs / chats into an Obsidian-compatible vault with [[wikilinks]]. Track record: Obsidian vault output.

OpenProgram — An Agentic Programming Framework Permalink

Published: June 03, 2026

Agentic Programming framework: Python drives the deterministic control flow, the LLM reasons only when asked. Automatic context threading over a DAG, terminal + web UIs with live execution visualization, self-evolving workflows, and any LLM provider (Claude / GPT / Gemini). Runs natively on macOS, Linux, and Windows.

Fu Zichuan

Posts by Collection

portfolio

Portfolio item number 1

Portfolio item number 2

publications

A Unified Framework for Multi-Domain CTR Prediction via Large Language Models

LLM4MSR: An LLM-Enhanced Paradigm for Multi-Scenario Recommendation

Sliding Window Attention Training for Efficient Large Language Models

Model Merging for Knowledge Editing

Training-free LLM Merging for Multi-task Learning

AnchorCoT: Anchors Pave the Way for Multi-hop Reasoning

A Multi-Expert Structural-Semantic Hybrid Framework for Unveiling Historical Patterns in Temporal Knowledge Graphs

Attention Needs to Focus: A Unified Perspective on Attention Allocation

AdapTime: Enabling Adaptive Temporal Reasoning in Large Language Models

MultiDx: A Multi-Source Knowledge Integration Framework towards Diagnostic Reasoning

Tandem: Riding Together with Large and Small Language Models for Efficient Reasoning

talks

Ensemble-Hub — Tandem: Collaborative LLM–SLM Reasoning Permalink

GUI Agent Harness Permalink

Research Agent Harness Permalink

Wiki Agent Harness Permalink

OpenProgram — An Agentic Programming Framework Permalink

teaching

Thesis Supervision — MSc and Undergraduate Final-Year Projects

Research Project Mentor — SDSC6002 (MSDS Capstone), LLM Job Recommendation Team

Research Project Mentor — SDSC6002 (MSDS Capstone), TravelAgent Team