SPARE: Training-Free Representation Engineering for Managing Knowledge Conflicts in Large Language Models


Large Language Models (LLMs) have demonstrated impressive capabilities in handling knowledge-intensive tasks through their parametric knowledge stored within model parameters. However, the stored knowledge can become inaccurate or outdated, leading to the adoption of retrieval and tool-augmented methods that provide external contextual knowledge. A critical challenge emerges when this contextual knowledge conflicts with the model’s parametric knowledge, causing undesired behaviors and incorrect outputs. LLMs prefer contextual knowledge over their parametric knowledge, but during conflicts, existing solutions that need additional model interactions result in high latency times, making them impractical for real-world applications.

Existing methods to understand and control LLM behavior have followed several key directions, including Representation engineering, Knowledge Conflicts, and Sparse Auto-Encoder (SAEs). Representation engineering emerged as a higher-level framework for understanding LLM behavior at scale. It includes Mechanistic interpretability that analyzes individual network components like circuits and neurons but struggles with complex phenomena. Further, there are three types of knowledge conflicts: inter-context, context-memory, and intra-memory conflicts. Moreover, SAEs have been developed as post-hoc analysis tools to identify disentangled features within LLM representations, showing promise in identifying sparse circuits and enabling controlled text generation through monosemantic features.

Researchers from the University of Edinburgh, The Chinese University of Hong Kong, Sapienza University of Rome, University College London, and Miniml.AI have proposed SPARE (Sparse Auto-Encoder-based Representation Engineering), a novel training-free representation engineering method. The method utilizes pre-trained sparse auto-encoders to control knowledge selection behavior in LLMs. It effectively resolves knowledge conflicts in open-domain question-answering tasks by identifying functional features that govern knowledge selection and editing internal activations during inference. SPARE outperforms existing representation engineering methods by 10% and contrastive decoding methods by 15%.

SPARE’s effectiveness is evaluated using multiple models, including Llama3-8B, Gemma2-9B with public pre-trained SAEs, and Llama2-7B with custom pre-trained SAEs. The method is tested on two prominent open-domain question-answering datasets featuring knowledge conflicts: NQSwap and Macnoise. The evaluation uses greedy decoding for open-ended generation settings. Performance comparisons are conducted against various inference-time representation engineering methods, including TaskVec, ActAdd, SEA (both linear and non-linear versions), and contrastive decoding methods like DoLa and CAD. Moreover, researchers also compared using in-context learning (ICL) to steer the knowledge selection.

SPARE outperforms existing representation engineering methods TaskVec, ActAdd, and SEA, showing superior performance in controlling both contextual and parametric knowledge usage compared to existing methods. Also, it outperforms Contrastive decoding strategies like DoLa and CAD that demonstrate effectiveness by enhancing contextual knowledge use but they face challenges with parametric knowledge control. SPARE’s ability to add and remove specific functional features results in more precise control over both knowledge types. Further, SPARE outperforms non-inference-time controlling approaches like ICL, highlighting its efficiency and effectiveness. These results underscore SPARE’s potential for practical applications requiring real-time control over LLM behavior.

In conclusion, researchers introduced SPARE which addresses the challenge of context-memory knowledge conflicts in LLMs by examining the model’s residual stream and implementing training-free representation engineering. The method’s effectiveness in controlling knowledge selection behavior without computational overhead represents a significant advancement in LLM knowledge management. However, some limitations exist, including the method’s dependency on pre-trained SAEs and the current focus on specific ODQA tasks. Despite these constraints, SPARE’s ability to enhance knowledge selection accuracy while maintaining efficiency makes it a promising solution for managing knowledge conflicts in practical LLM applications. 


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted)


Sajjad Ansari is a final year undergraduate from IIT Kharagpur. As a Tech enthusiast, he delves into the practical applications of AI with a focus on understanding the impact of AI technologies and their real-world implications. He aims to articulate complex AI concepts in a clear and accessible manner.

Listen to our latest AI podcasts and AI research videos here ➡️