Prompt Engineering Guide

Introduction
Prompt Engineering Overview
LLM Output Configuration
- Temperature
- Top-K and Top-P Sampling
Zero-shot Prompting
One-shot & Few-shot Prompting
Step-back Prompting
Chain-of-Thought (CoT) Prompting
Self-Consistency
Tree-of-Thoughts (ToT)
ReAct (Reason + Act)
Automatic Prompt Engineering (APE)
Code Prompting
Multimodal Prompting
Best Practices
Summary
Endnotes

Introduction

The guide begins by noting that anyone can write a prompt, but crafting an effective prompt can be complex. Many factors influence a prompt’s success – the choice of model, the model’s configuration, the training data it has seen, and the wording, style, structure, and context of the prompt all play a role. Prompt engineering is presented as an iterative process of refining prompts to avoid ambiguous or inaccurate outputs. The introduction explains that the guide will focus on using Google’s Gemini language model (via Vertex AI or API) directly, allowing fine control of settings like temperature. It outlines that the document will explore various prompting techniques, share best practices, and discuss common challenges in prompt design.

Prompt Engineering Overview

This section explains what prompt engineering is and how large language models (LLMs) operate. An LLM is essentially a next-word prediction engine: it takes an input sequence (the prompt) and predicts subsequent tokens based on patterns learned from training data. Prompt engineering is defined as the craft of designing high-quality prompts that guide the model toward producing accurate and relevant outputs. It involves experimenting with the wording, length, format, and context of the prompt to best suit the task. Well-engineered prompts enable LLMs to perform a wide range of tasks – from summarization and question-answering to code generation and translation – without needing additional training. In short, this overview emphasizes that a prompt is the user’s tool to steer the model’s generative process effectively.

LLM Output Configuration

Large language models offer various settings that control the behavior of their outputs. After choosing a model, a prompt engineer should also tune these parameters to fit the task at hand. Key configuration options include the maximum output length (how many tokens the model should generate) and the sampling settings that affect randomness and creativity in the output. Properly adjusting these settings is an important part of prompt engineering, as they can influence the detail, style, and reliability of the model’s responses.

Temperature

Temperaturecontrols the degree of randomness in the model’s token selection. A low temperature (near 0) makes the output more deterministic and focused – the model will consistently choose the highest-probability next token, yielding a precise but potentially plain response. In contrast, a higher temperature allows more randomness, leading to more varied or creative outputs (the model is more likely to pick less probable words occasionally). Extremely high temperatures can result in incoherent or unpredictable text, while a temperature of 0 (greedy decoding) means the model always picks the top prediction (though if two tokens are equally likely, results might still vary on those ties). In practice, for tasks that require factual accuracy or consistency, a lower temperature is used, whereas for open-ended creative tasks, a moderate increase in temperature can produce more interesting results.

Top-K and Top-P Sampling

Top-Kand Top-P(nucleus) sampling are techniques to control output randomness by limiting the pool of candidate tokens. With Top-K sampling, at each step the model considers only the K most likely token options (with K=1 being always take the single most likely token). A smaller K makes the output more focused and repeatable, while a larger K allows more diversity. Top-P sampling, on the other hand, includes all tokens whose cumulative probability mass is at least P (for example, P=0.9 might include a varying number of top tokens until their probabilities sum to 90%). This means the number of candidates can vary, but the idea is similar: a lower P (closer to 0) acts like a stricter filter for likely tokens (more conservative output), whereas P closer to 1 gives the model free rein (more creative output). These settings can be used alone or together with temperature. Prompt engineers often experiment with combinations – for instance, setting a moderately high Top-K or Top-P to allow some variety, then using a temperature to shuffle among those choices. The goal is to find a balance where the model’s output is neither too random nor too repetitive, aligning with the desired style of response.

Zero-shot Prompting

Zero-shot promptingis the simplest prompting approach: the prompt contains only a description of the task or a question, with no examples provided. The model must respond based solely on its understanding of the instruction and its learned knowledge. The term “zero-shot” implies that the model is given zero examples of what the output should look like. This method relies on the model’s ability to generalize from the task description alone. Zero-shot prompts are useful for straightforward queries or when example data isn’t available. However, if the model’s output is unsatisfactory or ambiguous, the guide suggests moving to one-shot or few-shot prompting (adding examples) to better illustrate the desired response or format.

One-shot & Few-shot Prompting

In one-shot promptingand few-shot prompting, the prompt includes one or more examples of the task to guide the model. Instead of just giving an instruction, you show the model how it should respond by providing sample input-output pairs. In a one-shot prompt, a single example is given, while a few-shot prompt provides multiple examples (few could be 3, 5, or any small number). These examples are typically presented in the prompt before the actual question or input that the user wants answered. By doing this, the model can infer the pattern or style of the desired output from the examples. For instance, if the task is sentiment analysis, a few-shot prompt might show a couple of movie reviews along with the correct sentiment for each, and then ask the model to classify a new review. Providing these demonstrations often makes the model’s responses more accurate and aligned with the user’s expectations. The guide notes that the optimal number of examples can depend on the complexity of the task and the model’s capacity, but including even a handful of diverse examples can significantly improve performance over zero-shot prompting.

Step-back Prompting

Step-back promptingis a technique where the prompt is structured in two stages: first asking a broad or related question, then using that answer to inform the final request. In other words, you prompt the model to think about a general principle or context before tackling the specific problem. The approach leverages the model’s ability to retrieve and articulate relevant background knowledge. For example, before asking the model to solve a particular problem, you might ask it a more general question like “What are some general strategies for solving this type of problem?” The model’s answer (the general strategies) can then be included or used implicitly when asking the specific question. By “stepping back” in this way, the model often produces more insightful and accurate results because it has had a chance to consider the big picture or underlying concepts first. The guide points out that step-back prompting encourages critical thinking and can draw out knowledge that wouldn’t surface if the model jumped straight into the narrow task. It also can help reduce bias, since the model starts from a neutral, general perspective before dealing with potentially bias-laden specifics.

Chain-of-Thought (CoT) Prompting

Chain-of-Thought promptingis a method that gets the model to produce intermediate reasoning steps or explanations as part of its answer. Instead of answering directly, the model is guided (through the prompt) to “think step by step” and lay out its reasoning process. This can be done implicitly by adding something like “Let’s think step by step” to the prompt or explicitly by providing an example where a question is answered with a reasoning trail. The advantage of CoT prompting is that it often leads to more accurate answers on complex problems, because the model essentially breaks the task down and works through it systematically. It also provides transparency: you can see the model’s line of thought, which can be useful for diagnosing errors or verifying the logic. The guide emphasizes that chain-of-thought doesn’t require special model training – it’s a prompt technique that works with off-the-shelf LLMs – and it tends to significantly improve performance on tasks that require reasoning, arithmetic, or multi-step inference. One important aspect is to ensure the model does eventually output a final answer after the reasoning steps. (For example, the prompt or few-shot examples might be designed so that the last line of the model’s answer is “Therefore, the answer is X.”) That way, you can easily extract the final result from the detailed reasoning the model provides.

Self-Consistency

Self-consistencyis an extension of chain-of-thought prompting aimed at increasing the reliability of the final answer. The idea is to have the model generate multiple reasoning paths and answers (by running the CoT prompt several times, possibly with slight randomness each time), and then see which answer appears most frequently. In practice, this might involve sampling the model’s chain-of-thought answers at a higher temperature or otherwise inducing varied reasoning. You could ask the model the same CoT-styled question, say, 5 or 10 times. Each run, the model might produce a different chain of reasoning and possibly a different answer. Once you have these multiple answers, you select the answer that is most common among them (or you could have a rule to choose the best reasoned one). The underlying assumption is that the correct answer, supported by sound reasoning, will show up more often than any particular incorrect answer if the model is nudged to explore different reasoning paths. By using a majority vote or consistency check, you filter out outlier responses and reduce the chance of a wrong answer due to a fluke reasoning error. The guide notes that self-consistency makes chain-of-thought more robust: it leverages the ensemble of the model’s “thoughts” to converge on a trustworthy answer.

Tree-of-Thoughts (ToT)

Tree-of-Thoughtsis a prompting approach that generalizes the chain-of-thought idea by allowing the model to branch out into multiple possibilities at each step of reasoning. Instead of producing one linear sequence of thoughts, the model explores a tree of possible thoughts. For example, after an initial thought, it might consider two or three different continuations or approaches to the problem, then expand each of those in further steps, and so on. This method is akin to a search or brainstorming process, where the model isn’t committing to a single line of reasoning but is investigating many. The guide explains that Tree-of-Thoughts is particularly useful for very complex tasks, where a single chain-of-thought might get stuck or miss the solution. By keeping track of a “tree” of states (partial reasoning sequences) and exploring multiple branches, the model has a better chance to find a correct or creative solution. Intermediate branches can be evaluated and pruned, focusing the model’s effort on the most promising paths. Essentially, ToT prompting turns the problem-solving process into a search through the space of possible reasonings. It’s more complex to implement (since you need to manage the branching and selection of thoughts), but it can outperform standard CoT on problems that benefit from exploration and deliberation. The guide references research (the “LLM Guided Tree-of-Thought” paper) that this technique is based on.

ReAct (Reason + Act)

ReActprompting, short for “Reason and Act,” is a paradigm where the model not only reasons in natural language but also produces action commands that can interact with external tools or environments. In a ReAct prompt, the model’s response alternates between thoughts (the reasoning steps, typically in natural language) and actions (special outputs that might trigger an external API call, a web search, a calculator, etc.). This technique enables the LLM to solve more complex queries by retrieving information or performing operations during its reasoning process.

The guide gives an example using an agent with the LangChain framework: the model is asked a question about members of a band, and it responds by issuing search actions (like queries to a search engine) and then reasoning based on the results. Essentially, ReAct combines the analytical approach of chain-of-thought with the capability to gather new evidence. The model might say (as a thought) it needs a certain piece of information, then (as an action) request that information via a tool, then incorporate what it finds into the next thought, and so on. This loop continues until the model has enough information to answer the question. The key benefit is that the model is not limited to its static training data; it can actively fetch data or use tools to perform calculations, which is a step toward more dynamic “agent-like” behavior. The guide points out that ReAct was able to carry out a chain of multiple web searches to arrive at the final answer in the example, demonstrating a successful interplay of reasoning and acting.

Automatic Prompt Engineering (APE)

Automatic Prompt Engineeringrefers to automating the creation and refinement of prompts using the model itself. Writing an optimal prompt by hand can be challenging, so APE proposes to have the model do it: essentially, “ask the model to create a prompt for your task.” In the guide, this is illustrated by generating multiple candidate prompts and then selecting the best one. For example, if the task is to get a chatbot to handle band t-shirt orders, you might prompt the model with something like: “We have a band merchandise t-shirt webshop; generate 10 different ways a customer might phrase an order for a Metallica t-shirt in size small.” The model then outputs a list of variant prompts (different phrasings a customer might use). These can subsequently be evaluated for quality or tested to see which prompt yields the best results. The highest-performing prompt can be chosen as the one to actually use in the system, and possibly tweaked further. In short, APE turns prompt generation into an iterative loop: use the model to brainstorm prompt ideas, evaluate them (using metrics or human judgment), and refine as needed. This reduces the human guesswork in prompt design and can reveal inventive prompt formulations that a person might not have considered. The guide notes that after selecting a good prompt, you can still tweak it and re-evaluate, and that this method can improve performance while saving time once set up.

Code Prompting

The guide devotes a section to prompting for code-related tasks, acknowledging that code generation and understanding is a major use case for LLMs like Gemini. Even though the model’s interface is pure text, it can produce and interpret code when asked properly. Code prompting scenarios include asking the model to write code, explain code, translate code between languages, or debug code. In all cases, the prompts are written in natural language (possibly with code snippets included when explaining or debugging) – no special programming API is needed, but careful phrasing and formatting can help. One general tip noted is to ensure the model’s output preserves proper formatting (especially for languages like Python where indentation is crucial). The following sub-sections summarize specific examples from the guide on how to prompt for various coding tasks.

Prompts for Writing Code

To get an LLM to generate code, you provide a prompt that describes the programming task or problem. The guide’s example shows a prompt where the user asks for a Bash script that renames files in a folder by adding a prefix to each filename. The model (Gemini) is able to produce the requested script in Bash code. Key practices for code generation prompts include specifying the language and the requirements clearly. For instance, the prompt explicitly said “Write a code snippet in Bash that…” and described the desired behavior. The model output a Bash script, complete with comments and proper syntax. This demonstrates how an LLM can serve as a coding assistant – you describe what you want in plain English, and the model translates that into code. The guide also emphasizes that while the code may look correct (and even include comments explaining itself), you should test it. LLMs can sometimes produce subtle bugs or assume things that aren’t true, so running the generated code in a safe environment is an important step. In the example, they did execute the script and found it worked correctly, but caution is advised in general.

Prompts for Explaining Code

This use case involves giving the model some code and asking it to explain the code’s function or logic. The guide continues the Bash script example: after obtaining the script, they removed the comments and fed the code back into the model with a prompt like “Explain to me the below Bash code: …” The model’s response was a structured explanation, breaking down the script into steps (for example, explaining how it reads the folder name, checks if the folder exists, loops through files, renames them, etc.). This showcases the model’s ability to act as a code interpreter or tutor. For someone inheriting code or trying to understand code written by someone else (or by the model itself in a prior step), this approach can save time. The explanations the model provides might sometimes be verbose, but they are useful for comprehending what each part of the code does. The guide’s example output explained the Bash script in sections (input, folder existence check, listing files, renaming loop, success message), which is exactly the kind of breakdown a human might do when commenting code. This confirms that prompting an LLM to explain code can yield clear, human-readable insights into the code’s behavior.

Prompts for Translating Code

LLMs can also translate code from one programming language to another by prompt. In the guide, after working with the Bash script example, they decided to translate that script into Python. The prompt given was essentially: “Translate the below Bash code to a Python snippet,” followed by the Bash code. The model produced a Python version of the file-renaming script. Code translation prompts should specify the source and target languages. The result is that the model acts like a code converter, rewriting logic in a different syntax. This can be very useful when you have legacy code or examples in one language and you want to see how they would look in another language. The guide notes a practical detail: when using the Vertex AI interface, one should toggle the output to “markdown” mode so that the model’s response retains proper code formatting (indentation in Python, for example, which might otherwise be lost if the output is treated as plain text). Once the Python code was generated, they ran it to verify it works. This demonstrates how an LLM can accelerate porting solutions between languages or provide a quick draft in a more convenient language.

Prompts for Debugging & Reviewing Code

The guide shows that you can ask the model to debug code by providing the code and an error message or scenario. In the example, they intentionally introduced a bug into the Python script (for instance, using a function toUpperCase()that doesn’t exist in Python, instead of the correct upper()method). The prompt then included the traceback error and the faulty code, with an instruction to “Debug what’s wrong and explain how to improve the code.” Gemini responded by pinpointing the bug (the undefined function) and suggesting a fix (using prefix.upper()instead).

Importantly, the model didn’t stop at the first error; it went on to suggest further improvements to the code: preserving file extensions, handling spaces in file names, using try-except for file operations, etc. This turns the debugging prompt into a mini code review. The model is not only fixing the immediate problem but also offering best practices to enhance the code. This highlights the potential of LLMs to assist in code quality assurance. A developer can use such prompts to get quick feedback on what might be wrong with their code and what could be improved. However, as always, any changes should be reviewed and tested by a human, since the model might suggest fixes that are not entirely context-appropriate. Still, as the guide’s outcome shows, the model identified the error correctly and its suggestions aligned well with typical improvements a human reviewer might propose.

Multimodal Prompting

Multimodal promptingmeans giving the model inputs beyond just text (for instance, images, audio, or structured data along with text). The guide briefly addresses this concept to clarify that it’s a separate domain from normal text prompting. In the context of Google’s models like Gemini, the current focus is on text-based prompts (even code is treated as text). If a model is multimodal, you might be able to provide an image and a question about that image together, or a piece of audio with a text instruction, etc. This approach can lead to richer interactions – for example, asking “Here is a photo of a plant [image]. Will it thrive in low light conditions?” combines visual and text information. The guide notes that such capabilities depend on the model: a multimodal model can accept multiple input formats, whereas Gemini primarily expects text. Because the whitepaper is focused on text prompting for Gemini, it doesn’t delve deeply into multimodal examples, but it acknowledges that prompting strategies could similarly be applied when multiple input types are involved. In summary, multimodal prompting expands the input channels to an LLM, but the underlying idea remains the same – you provide context (be it text or otherwise) and instructions, and the model uses all provided information to generate a response.

Best Practices

Drawing from the techniques and examples covered, the guide offers a collection of best practice guidelines for prompt engineering. These are general tips to keep in mind when writing prompts, aimed at improving clarity, relevance, and effectiveness. Prompt engineering often requires iterative experimentation, and these best practices help ensure each iteration is informed by sound principles. Below is a summary of the key best practices highlighted:

Provide Examples

Provide examples in your prompts whenever possible.Demonstrations (one-shot or few-shot examples) within the prompt can significantly improve the model’s performance. By showing the model an example of the task with the correct output, you give it a pattern to follow. For instance, if you want a certain format for answers, include a sample question and answer in that format in your prompt. Examples act as a guide or template, reducing ambiguity about what the model should do. This is often the single most effective way to get better results from an LLM on a new task.

Design with Simplicity

Keep prompts clear and simple.Avoid unnecessary complexity in your instructions. A convoluted or wordy prompt can confuse the model, just as it might confuse a human reader. The guide suggests using straightforward language and even enumerating steps or using bullet points if a task has multiple parts (though here we remain in plain text). It also recommends using strong directive verbs to start your prompt (e.g., “List,” “Describe,” “Translate,” “Summarize”) so the model immediately knows the action expected. If you find your prompt is lengthy or contains extraneous details, try simplifying it—often a shorter, more direct prompt yields a better response.

Be Specific about the Output

Specify the desired output format, content, or style.The more explicit you are about what you want in the answer, the less guesswork the model has to do. If you need a list of items, say so. If you want the answer in JSON format or as an outline, mention that. If the response should be concise or, conversely, very detailed, include those details in the prompt. For example, telling the model “Give me three bullet points about X” or “Respond in a formal tone” guides it to meet those requirements. The guide’s examples illustrate that a prompt with clear instructions (“Generate a three-paragraph article with an introduction, body, and conclusion about …”) tends to produce a well-structured result, whereas a vague prompt would make the model unsure how much to write or which aspects to focus on.

Use Instructions over Constraints

Favor instructive prompts over purely restrictive prompts.This means it’s better to tell the model what you want it to do than to only tell it what not to do. For instance, saying “Explain this concept in simple terms” is usually more effective than saying “Don’t use jargon or complex language.” Humans respond well to positive instructions, and LLMs often do too. Overloading a prompt with a long list of “don’ts” can inadvertently confuse the model or even cause it to fixate on the forbidden topics. That’s not to say constraints aren’t useful – sometimes you must specify things the model should avoid (for safety or clarity). But those should accompany clear directions about what the model should accomplish. The guide notes that a big list of prohibitions might lead the model to focus on avoiding those points at the expense of actually answering the question. In summary, clearly instructing what to do is usually more effective, using constraints only as necessary to set boundaries.

Control the Max Token Length

Limit the length of the output when appropriate.Many models let you set a maximum number of tokens for the response. Using this wisely can prevent the model from rambling or producing more text than you need. You can also encourage brevity through the prompt itself (e.g., “in one sentence” or “no more than 100 words”). This is especially important for tasks where a short answer is expected or when you want to conserve tokens for cost and speed reasons. However, keep in mind that if a response is cut off due to a token limit, you might lose important content, so choose a limit that’s sufficient for a complete answer. The guide specifically mentions that in certain prompting strategies like ReAct, without a token limit the model might continue producing unnecessary text after completing the task, hence a limit ensures it stops at the right point. Always align the token limit with the task’s needs: for instance, generating a brief summary might warrant a strict limit, whereas writing a detailed article should allow a larger limit to avoid truncation.

Use Variables in Prompts

Incorporate variables or placeholders into prompts for reusability.If you find yourself writing similar prompts repeatedly (e.g., asking for information about different items), it’s more efficient to draft a single prompt template and substitute the changing pieces. For example, “Provide a brief overview of {city}” can serve as a template where {city}is replaced with whatever subject or name you need. This not only saves time but also ensures consistency in the way you ask questions. In programming or API use, you can maintain these prompts as strings with slots and fill in the slots programmatically. The guide highlights that using variables makes maintenance easier – if you need to tweak the wording of the prompt, you do it in one place instead of in every instance. Essentially, treat prompts like code: avoid hard-coding specific values if those values will change, and keep a single source of truth for the prompt format.

Experiment with Input Formats and Writing Styles

Try different prompt phrasings and styles to see what works best.The same request can often be worded in various ways – as a direct command, an open-ended question, a role-play scenario, etc. – and these might yield different qualities of response. The guide encourages prompt engineers to explore these alternatives. For example, if you want a historical explanation, you could prompt with “Explain the event as if I’m new to the subject” versus “Give a detailed historical account of the event.” You might even frame one prompt in a casual tone and another in a formal style to see which aligns better with your needs. Similarly, giving context like “You are an expert in X” at the start of a prompt can sometimes influence the model’s tone and detail level. In the guide’s case, they tried framing a question about a game console in multiple ways (question form, statement form, and instruction form), each producing a slightly different output. By comparing outputs, you can choose the format that best meets your criteria (accuracy, clarity, creativity, etc.). This trial-and-error process is a key part of prompt engineering – small changes in wording can have noticeable effects on the results.

Mix Up Few-shot Example Order

Vary the order of examples in few-shot prompts, especially for classification tasks.When providing several examples, be mindful of unintentional patterns. For instance, if all your provided examples of emails labeled “spam” come first followed by all “not spam” examples, the model might pick up a position bias (thinking the first position is always spam). A better approach is to intermix examples of different classes or outcomes. By shuffling or rotating the order of example demonstrations (and even using different examples on different attempts), you ensure the model is learning the concept rather than the sequence. The guide suggests that overfitting to example order is something to watch out for; mixing classes helps the model generalize. They also provide a heuristic: you might start with around 6 examples in a few-shot prompt as a baseline, and then test to see if adding more or fewer changes the outcome. In any case, diversifying how examples are presented can make the prompt more robust and prevent the model from relying on superficial patterns.

Adapt to Model Updates

Be prepared to update your prompts when the model changes.AI models are frequently updated, and their behavior can shift with new versions. A prompt that worked perfectly with one version might produce slightly different results with the next, due to changes in the model’s knowledge or tuning. The guide advises prompt engineers to keep an eye on release notes of models and retest important prompts when an update occurs. It’s a good practice to periodically re-evaluate your prompts even if you haven’t changed them – just to ensure they still perform as expected. Sometimes model updates can actually allow you to simplify a prompt (because the model has become better at understanding instructions), or might necessitate adding an extra clarification if the model’s focus has changed. Using a prompt development environment like Vertex AI Studio can help track and compare prompt performance across model versions. In short, don’t assume prompts are one-and-done; maintain them as living artifacts that might need adjustment over time.

Experiment with Output Formats

Consider requesting structured output to make results easier to use.Depending on your use case, you might ask the model to return answers in a structured format such as JSON or XML or as a well-formatted list. Structured outputs are useful because they can be programmatically parsed. For example, if you ask for a JSON object with specific fields, you can directly feed the model’s response into a downstream system. The guide explains additional benefits: a structured format often forces the model to stay on topic (since it has to fit its answer into a given schema), which can reduce irrelevant rambling and even curb tendencies to hallucinate. If the model knows it must produce, say, a list of 3 items with certain subfields, it’s less likely to stray into unrelated content. However, the guide also cautions that while structured output is powerful, it can increase the token count significantly. A detailed JSON answer uses a lot of characters for brackets and field names that a plain English answer wouldn’t, which can be inefficient. You also run the risk of the model’s answer getting cut off if it’s very long, which could lead to invalid or incomplete JSON. Therefore, use structured output when it adds clear value, and be ready to handle any formatting issues that come with it.

JSON Repair

Handle JSON (or other structured output) errors when they occur.If you instruct the model to output in JSON, there’s a chance the model might produce JSON that isn’t perfectly valid – perhaps a missing quote or a truncated list if the output was too long. Instead of discarding such results, a practical solution highlighted in the guide is to use a JSON repair tool. These tools can automatically detect and fix common JSON issues (like adding a missing curly brace or comma). By running the model’s output through a repair step, you can often salvage a usable JSON from a nearly-correct response. The guide specifically mentions a Python library called json-repairthat can patch up incomplete JSON. The bigger point here is to anticipate that strict format constraints might not always be met 100% by the model due to its token limit or other quirks, and to have a fallback method to clean the output. This way, you don’t lose the information the model provided, and you ensure your application can handle the output gracefully.

Working with Schemas

Provide schemas to guide the model’s output structure.A schema is like a blueprint for data: it defines what fields or elements are expected and what form they should take. In prompt engineering, you can include a schema (for example, a JSON schema) in your prompt to tell the model exactly what the output should look like. The guide demonstrates this with an e-commerce example, where a JSON schema for a product description is given (fields like name, category, price, features, release_date, etc.). By showing the model this schema before asking for output, you set a clear expectation for the format and types of information required. This can significantly focus the model’s response. The model will try to fill in the schema with appropriate values or text, rather than generating a free-form paragraph. Using schemas in prompts is especially useful for complex or data-intensive tasks; it helps the model not only format the answer correctly but also consider each aspect of the schema (ensuring it doesn’t forget to mention something important like the price or release date in the example). It effectively reduces the model’s decision space to what fits the schema. The guide notes that schemas can also make a model “time-aware” or context-aware by explicitly including fields like dates or context parameters that you want the model to pay attention to. This technique requires a bit more prompt length (since you have to include the schema text in your prompt), but can pay off in the quality and consistency of the outputs.

Collaborate with Other Prompt Engineers

Don’t prompt-engineer in isolation – collaborate and compare.The guide suggests that if possible, having multiple people attempt to craft a prompt for the same task can yield better results. Each person might approach the wording and strategy differently (one might try a storytelling angle, another a bullet list of instructions, etc.). By comparing outcomes, you can identify which approach was most effective. In a team setting, you might conduct prompt reviews or even “prompt hackathons” where several variants are tested side by side. This not only speeds up finding a good solution, but also helps spread knowledge of what works and what doesn’t. Everyone can learn from the collective trial and error. Moreover, combining ideas from multiple prompts might produce an even stronger prompt. The guide underscores that prompt engineering, like other forms of engineering, can benefit from teamwork – leveraging diverse perspectives to cover blind spots and spark creativity. Additionally, documenting these team findings (e.g., in a shared document or spreadsheet as mentioned earlier) ensures the whole team benefits from individual discoveries.

CoT Best Practices

Best practices specific to Chain-of-Thought prompting include providing the answer last and using a deterministic setup.The guide emphasizes that when you’re using CoT, you should structure the model’s output to end with the final answer clearly separate from the reasoning. For example, you might prompt the model with an instruction or example format like: “Think step by step, then conclude with the final answer on a new line.” This way, no matter how lengthy or complex the reasoning is, you can easily identify the answer at the end (and perhaps automatically extract it). Additionally, it’s advised to use a low temperature (ideally 0) for chain-of-thought prompts. The rationale is that logical reasoning should be consistent; randomness can introduce incorrect or divergent steps. Since typically there is one correct answer to a well-defined problem, you want the model’s reasoning to converge on that answer reliably. By setting temperature to 0, you eliminate the stochastic aspect of generation, making the chain-of-thought repeatable if run again. These tips ensure CoT outputs are easier to parse and trust: you get a clear final answer and a stable reasoning process leading up to it.

Document Prompt Attempts

Keep a detailed record of your prompt experiments and their results.The guide’s final piece of advice is to treat prompt engineering with the rigor of an experiment-driven process. Whenever you try a new prompt or tweak an existing one, note down what you changed and what the outcome was. Useful information to record includes the prompt text (with version numbers if you iterate), the model used (and its version), the parameter settings (like temperature), and whether the result was satisfactory (possibly rating the output or noting errors). By keeping such records, you won’t lose track of what you’ve attempted, you can revert to earlier prompts if needed, and you build a knowledge base of what works and what doesn’t for future reference.

The guide suggests using a structured template (for example, a spreadsheet) to log each prompt trial. Fields in this template might include: prompt name, goal, prompt text, model/version, settings, outcome, and any feedback or notes. If you are using a platform like Vertex AI Studio, it also helps to save your prompts within the tool (with clear names and versions) and link those in your documentation. That way, you can easily re-run a specific prompt from your records with a click.

Once you have refined a prompt to near perfection, integrate it into your application codebase carefully. Store the prompt text separately from code (so it’s easy to update without altering program logic), and continue to evaluate it over time. Ideally, prompts that are in production should have automated tests or evaluation scripts to regularly check their performance, ensuring they still produce the desired output as the model or usage conditions change. In summary, prompt engineering is an iterative process – always be prepared to adjust and improve your prompts based on new results and developments.

Summary

The Google Prompt Engineering Guide concludes that effective prompting is achievable by anyone willing to apply these techniques and iterate on their approach. It reiterates the core idea that an LLM is a powerful predictive engine which can perform astonishingly well on many tasks if guided properly. The various prompting strategies (from basic zero-shot to advanced methods like CoT, ToT, ReAct, etc.) and best practices discussed in the guide form a toolkit for users to elicit the best possible responses from the model. The summary encourages prompt engineers to be creative yet methodical: try different approaches, observe the model’s behavior, and refine the prompt. By combining thoughtful prompt design with careful tuning of model settings (and leveraging tools like prompt libraries and collaboration), users can substantially improve the quality of LLM outputs. Ultimately, prompt engineering is presented as an evolving discipline – as models improve and new techniques emerge, the guide suggests staying curious and continuing to experiment, since there will always be more to learn and optimize.

Endnotes

The guide includes endnotes that reference additional resources and research behind the techniques covered. For example, it cites earlier Google prompting guides and specific academic papers for methods like Chain-of-Thought, Tree-of-Thoughts, and ReAct. These endnotes provide readers with sources for further reading and to credit the original authors of some techniques. They serve as a helpful pointer to dig deeper into prompt engineering topics beyond the scope of the guide itself.