Me: A software engineer interested in Math and Computation.
Contact: [email protected]
Do We Really Need the KVCache for All Large Language Models
Inconsistency in GEMM Performance
Performance Optimization of Embedding Computation on GPU Part 1: GPU Occupancy Optimization
Performance Optimization of torch.sort on GPU
How to Perform Multi-GPU Parallelization for DeepSeek MLA attention during Decoding
DeltaNet-from-the-Inference-Framework-Perspective
Beating Wave Quantization: Sequence Split Optimization for FlashAttention
Embedding计算在GPU上的性能优化1:GPU Occupancy优化