I am now working on the research of deep learning and large language model adaption to tabular data fields. If you are seeking any form of academic discussion or cooperation, please feel free to email me at jyansir@zju.edu.cn.

I graduated from Chu Kochen Honors College, Zhejiang University (浙江大学竺可桢学院) with a bachelor’s degree and am a full-time PhD student in the College of Computer Science and Technology, Zhejiang University (浙江大学计算机科学与技术学院), advised by Jian Wu (吴健). I also collaborate with the researcher Jintai Chen (陈晋泰) from the University of Illinois Urbana-Champaign closely.

My research interest includes tabular language model pre-training, relational deep learning and neural architecture engineering.

I am also an amateur photographer and ACG enthusiast, and very willing to join relevant offline activities in my spare time.

📝 Publications

ICLR 2024
sym

Making Pre-trained Language Models Great on Tabular Prediction

Jiahuan Yan, Bo Zheng, Hongxia Xu, Yiheng Zhu, Danny Z. Chen, Jimeng Sun, Jian Wu, Jintai Chen, (SpotLight, Notable Top 5%)

repo

  • TL;DR: This work proposed relative magnitude tokenization, a distributed numerical feature embedding technique, and intra-feature attention, a reasonably contextualized mechanism, both for tabular feature adaption to the modern Transformer-based LM architecture.
  • Academic Impact: The resulting pre-trained LM TP-BERTa surpasses non-LM baselines on 145 downstream tabular prediction datasets, pivot analysis exhibits further significant improvement when the discrete feature dominates. This work is included by many tabular LLM paper repos, such as Awesome-LLM-Tabular and promoted by known media, such as 新智元.
KDD 2024
sym

Team up GBDTs and DNNs: Advancing Efficient and Effective Tabular Prediction with Tree-hybrid MLPs

Jiahuan Yan, Jintai Chen, Qianxing Wang, Danny Z Chen, Jian Wu, (Oral)

repo

  • TL;DR: This work explored a GBDT-DNN hybrid framework to address the model selection dilemma in tabular prediction and introduced T-MLP, a tree-hybrid MLP architecture combining the strengths of both GBDTs and DNNs. T-MLP benefits from using DNN capacity to emulate GBDT development process, i.e., entropy-driven feature gate, tree pruning and model ensemble. Experiments on 88 datasets from 4 benchmarks (covering DNN- and GBDT-favored ones) show that T-MLP is competitive with extensively tuned SOTA DNNs and GBDTs, all achieved with pre-fixed hyperparameters, compact model size and reduced training duration.
AAAI 2023
sym

T2G-Former: Organizing tabular features into relation graphs promotes heterogeneous feature interaction

Jiahuan Yan, Jintai Chen, Yixuan Wu, Danny Z. Chen, Jian Wu, (Oral, Top 20%)

repo

  • TL;DR: This work introduced T2G-Former architecture, a feature relation (FR) graph guided Transformer for selective feature interaction. For each basic T2G block, a Graph Estimator automatically organize FR graph in a data-driven manner to guide feature fusion and alleviate noisy signals, making feature interaction sparse and interpretable. The tuned model outperforms various DNNs and is comparable with XGBoost.
EMNLP 2023
sym

Text2Tree: Aligning Text Representation to the Label Tree Hierarchy for Imbalanced Medical Classification

Jiahuan Yan, Haojun Gao, Zhang Kai, Weize Liu, Danny Z. Chen, Jian Wu, Jintai Chen, EMNLP-Findings

repo

  • TL;DR: This work proposed a Label Tree guided imbalanced text classification algorithm Text2Tree, including a cascade attention module for structure-based label embedding, similarity based surrogate learning (a generalized form of supervised contrastive learning) & dissimilarity based MixUp. Text2Tree outperforms in ICD coding and serves as a supplementary techniques for imbalanced classification scenarios.

🛠️ Open Source Projects

  • Contributor to deep tabular model and data preprocessing modules for popular tabular deep learning library PyTorch Frame (created by the PyG team).

🔎 Field Activities

  • Reviewer for Conferences: NeurIPS 2024, ACL 2024, KDD 2024, IJCAI 2024, ACM MM 2024.

📖 Educations

  • 2020.09 - now, PhD, Zhejiang University, Hangzhou.
  • 2016.09 - 2020.06, Undergraduate, Chu Kochen Honors College, Zhejiang Univeristy, Hangzhou.