LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs

Abstract

While large language models (LLMs) excel in generating coherent and contextually rich outputs, their capacity to efficiently handle long-form contexts is limited by fixed-length position embeddings. Additionally, the computational cost of processing long sequences increases quadratically, making it challenging to extend context length.

To address these challenges, we propose Long-form Context Injection with Recurrent Compression (LCIRC), a method that enables the efficient processing long-form sequences beyond the model's length limit through recurrent compression without retraining the entire model.

We further introduce query dependent context modeling, which selectively compresses query-relevant information, ensuring that the model retains the most pertinent content. Our empirical results demonstrate that Query Dependent LCIRC (QD-LCIRC) significantly improves LLM's ability to manage extended contexts, making it well-suited for tasks that require both comprehensive context understanding and query relevance.

Method

The overall process of the proposed Long-form Context Injection with Recurrent Compression (LCIRC). LCIRC comprises two components: Recurrent Context Compression (left) and Compressed Context Injection (right). In the $i$-th step of Recurrent Context Compression, the previously compressed features $\mathbf{h}^{(i-1)}$ and the segment embeddings $\mathbf{s}_{i}$ are fed into the Perceiver module as query and input features, respectively. The compressed features $\mathbf{h}^{(i)}$ are then generated and reinjected as query features for the subsequent recurrence step. The initial query features $\mathbf{h}^{(0)}$ are learnable parameters. In Compressed Context Injection, the concatenated compressed features $\mathbf{h}$ serve as input to the Gated Cross Attention layer. Layers indicated with a fire symbol represent trained layers, while layers marked with a snow symbol denote frozen layers.

Training

Comparisons of the proposed Selective State BPTT with vanilla and truncated BPTT. Green boxes represent timesteps where gradients are computed in BPTT whereas the light green ones indicate the timesteps without gradient computation. Finally, dotted red lines illustrate the gradient flows. (a) Vanilla BPTT computes the full gradients through the entire timesteps in recurrence but is computationally infeasible with a large $N$. The gradients for $\mathbf{h}^{(i)}$ receives upstream gradients both through the recurrent connection and through the direct connection from $\mathbf{h}$. (b) Truncated BPTT backprobagates gradients to the last $T$ timesteps only significantly reducing computational costs. However, it does not transfer gradient flows to timesteps further than $T$ (marked with light green color) and fails to learn long-term QD modeling. (c) Our proposed Selective State BPTT selects several random timesteps and transfer gradient flows directly through the direct connection from $\mathbf{h}$, which enables efficient learning of long-term QD modeling capabilities.

Results

Per-task performance on InfiniteBench and LongBench. The following abbreviations are used: NQA denotes NarrativeQA, MFQA represents MultiFieldQA-en, HQA refers to HotpotQA, 2WQA to 2WikiMQA, and MSQ to MuSiQue. Avg indicates the average score across all subtasks within respective benchmarks. FW-LQA indicates whether the model is fine-tuned on FineWeb-LQA. Our QD-LCIRC consistently outperforms competing methods, achieving the highest average score by incorporating query dependent modeling, as indicated in the QD column.

Per-task performance on L-Eval. The following abbreviations are used: CS denotes Coursera, QALIT refers to QuALITY, SF represents SFiction, LFQA refers to LongFQA, and NQA to NarrativeQA. Avg indicates the mean performance score across all subtasks within the respective benchmark. FW-LQA indicates whether the model has been fine-tuned on FineWeb-LQA, while QD denotes whether query dependent modeling.

BibTeX

@article{an2025lcirc,
  title={LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs},
  author={An, Sumin and Sung, Junyoung and Park, Wonpyo and Park, Chanjun and Seo, Paul Hongsuck},
  journal={arXiv preprint arXiv:2502.06139},
  year={2025}
}