Me: A software engineer interested in Math and Computation.

Contact: [email protected]

English Articles

Do We Really Need the KVCache for All Large Language Models

Inconsistency in GEMM Performance

Performance Optimization of Embedding Computation on GPU Part 1: GPU Occupancy Optimization

Performance Optimization of torch.sort on GPU

How to Perform Multi-GPU Parallelization for DeepSeek MLA attention during Decoding

DeltaNet-from-the-Inference-Framework-Perspective

Beating Wave Quantization: Sequence Split Optimization for FlashAttention

中文文章(Chinese)

通过线性代数优化大语言模型的KVCache

GPU矩阵乘法的性能一致性与异常发现

Embedding计算在GPU上的性能优化1:GPU Occupancy优化

torch.sort在GPU上的性能优化

从推理框架的视角谈DeltaNet

FlashAttention性能优化:decode阶段序列切分策略优化

DeltaNet如何做序列并行