Prompt engineering is the art and science of designing high-quality inputs for large language models (LLMs) in order to obtain accurate, relevant, and well-structured outputs. As stated in the whitepaper, prompt engineering involves considering several aspects of a prompt – including word choice, tone, style, structure, and context – as well as the model’s output configurations, such as temperature, top-K, top-P, and token limit. It is an iterative process where testing and refinement lead to improved model performance. In addition, simple and composable patterns are often recommended, especially in the early stages of system development, so as not to overcomplicate the implementation. This approach is similarly endorsed by Anthropic’s guidance on building effective agents, where starting with direct LLM API calls and keeping the design as simple as possible is emphasized.

A key best practice for prompt engineering is to design prompts in a clear, concise, and straightforward manner. The recommendation is to avoid overly complex language or unnecessary jargon because if the prompt is confusing for the engineer, it will likely be confusing for the model as well. Instead of using excessive constraints, engineers are advised to use explicit instructions: for instance, stating exactly what output is expected rather than listing what the model should avoid. Clear instructions help the language model focus on the critical elements of the task. Both sources note that simpler designs not only assist with debugging but also ensure that the underlying process remains transparent for later improvements.

Including one-shot or few-shot examples within the prompt is one of the most effective practices. Examples provide a reference point that helps guide the model’s output, enabling it to emulate the desired structure and tone. As highlighted in the whitepaper, by including carefully chosen examples, the model can produce responses that are closer to what is expected. Structured input, such as using a JSON schema, further reduces ambiguity by guiding the model to return output in a well-defined format. This reduces the risk of hallucinations and inconsistencies in the final output.

Alongside careful prompt design, configuring model parameters is essential. Key configuration settings include temperature, top-K, top-P, and the token limit. Lower temperatures are typically used when a more deterministic response is necessary, while higher temperatures encourage creative or varied output. The whitepaper recommends starting points – for example, a temperature of 0.2, top-P of 0.95, and top-K of 30 for coherent yet slightly creative output and adjusting these based on the specific task requirements. This configuration not only makes the responses more predictable but also affects processing time and cost, which are important factors in production environments.

A further best practice is to use positive instructions to guide the model towards what you want, rather than solely relying on negative constraints. This means instructing the model on the type of output you desire, such as stating: 'Generate a three-paragraph blog post about the top five video game consoles, written in a conversational tone' as opposed to listing items the model should avoid. Being specific about the output helps in obtaining responses that meet the required criteria. In addition, directing the model to output structured formats like JSON can be useful both for parsing the results reliably and for reducing hallucinations.

Making prompts more dynamic and reusable involves using variables as placeholders, which can be substituted with different inputs as needed. For example, by specifying a variable for a city name in a travel information prompt, the same basic structure can be adapted for multiple cities. Furthermore, experimenting with various input formats and writing styles is another best practice. This experimentation helps in understanding how small changes in the prompt’s wording or structure can yield different responses, allowing engineers to fine-tune their prompts for specific tasks.

Prompt engineering is not a one-time activity but an iterative process. It is crucial to test different versions and document each attempt’s settings, output, and any improvements achieved. The whitepaper suggests maintaining detailed records—using tools like spreadsheets—to document prompt versions, model parameters, outputs, feedback, and any observed variance across model updates. This practice not only facilitates future troubleshooting but also provides a historical record that can be referenced to ensure that prompts remain effective as the underlying model evolves.

When building more advanced systems such as autonomous agents or agentic workflows, it is important to apply the same principles of simplicity, clear instruction, and structured outputs. According to Anthropic, starting with the simplest possible solution is advised, only adding layers of complexity, such as dynamic tool usage or multi-step workflows, when absolutely necessary. Agentic systems benefit from prompt chaining and routing where tasks are divided into subtasks or different models handle specific components of a larger process. Keeping the agent’s interface and tool documentation clear is critical to ensure reliable and predictable performance.

The best practices for prompt engineering center on clarity, simplicity, and explicit instruction. By providing clear examples, structuring inputs appropriately, and carefully configuring model parameters, developers can significantly improve the accuracy and reliability of LLM outputs. Alongside these established prompt design principles, rigorous documentation and iterative refinement pave the way for building effective and trustworthy agentic systems. Simple, well-documented approaches not only enhance the immediate performance of language models but also provide the necessary foundation for scaling more complex, autonomous systems down the line.

Quick facts about quantization techniques

Related Content From The Pandipedia