Logo
Datadrifters Blog Header Image

Optimizing Prompts for Language Model Pipelines: DSPy MIPROv2

2024-10-29


I’ve been diving deep into prompt optimization for large language models (LLMs) lately, especially as we build more complex NLP pipelines — or Language Model Programs.


These are workflows that chain together multiple LLM calls to tackle sophisticated tasks.


While powerful, designing these pipelines isn’t straightforward because each module may require prompts that work well together, and crafting them by hand is both time-consuming and inefficient.


Recently, I came across a workflow by Karthik Kalyanaraman that offers a practical approach to prompt optimization for multi-stage LLM programs.


But before diving into that, I want to quickly summarize 2 main challenges:



MIPROv2 tackles both the proposal and credit assignment challenges by:



Here’s a breakdown of how it works.



Step 1: Generating Effective Demos


The first step is to create a solid set of demonstration examples — or “demos” — that showcase what ideal input-output pairs look like.


Using a labeled training dataset, we generate multiple sets of demos.


Each set includes:



For instance, if we want to generate two sets of ten demos each, we might select five labeled examples and create five bootstrapped ones for each set.


This mix helps the model see both real and generated examples that meet our standards.

import dspy  
import random  
from loguru import logger  
from .signatures import Step1BootstrapFewShot, GenerateExampleResponse  

class Step1BootstrapFewShotModule(dspy.Module):  
    # ... check implementation in source code  

NUM_INSTRUCTIONS = 2  
NUM_SETS = 2  

# Generating few-shot examples  
demo_generator = Step1BootstrapFewShotModule(  
    trainset=trainset[:20],  
    num_sets=NUM_SETS,  
    num_labeled_shots=5,  
    num_shuffled_shots=3,  
    metric="accuracy"  
)  
bootstrap_few_shot_examples = demo_generator()


Step 2: Crafting the Instructions


Next, we aim to generate instructions that will guide the model to produce the desired outputs. We use two main inputs:



Using these inputs, we generate a set of instructions that reflect both the nature of the problems illustrated by the demos and the overall goal of our program.

import dspy  
from loguru import logger  
# pylint: disable=relative-beyond-top-level  
from .signatures import (  
    Step2GenerateDatasetIntent,  
    Step2GenerateProgramSummary,  
    Step2GenerateInstruction  
)  


class Step2GenerateInstructionModule(dspy.Module):  
    # ... check implementation in source code  

# Generating instructions  
instruction_generator = Step2GenerateInstructionModule(  
    few_shot_prompts=bootstrap_few_shot_examples,  
    program_code=str(program_code),  
    num_instructions=NUM_INSTRUCTIONS  
)  
instructions = instruction_generator()


Next cohort will start soon! Reserve your spot for building full-stack GenAI SaaS applications


Step 3: Optimizing the Prompt


Finally, we use a Bayesian Optimization approach to find the best combination of demos and instructions. This involves running several evaluation trials where we:


import dspy  
from loguru import logger  
from dspy.datasets.gsm8k import GSM8K  
from dotenv import find_dotenv, load_dotenv  
from src.simple_miprov2.programs.step1_bootstrap_few_shot.program import (  
    Step1BootstrapFewShotModule  
)  
from src.simple_miprov2.programs.step2_bootstrap_instruction.program import (  
    Step2GenerateInstructionModule  
)  
from src.simple_miprov2.programs.step3_generate_final_prompt.program import (  
    Step3GenerateFinalPromptModule  
)  

# ... check implementation in source code  

lm = dspy.LM(model='gpt-4', max_tokens=250, cache=False)  
dspy.settings.configure(lm=lm)  

if __name__ == "__main__":  

    # ... check implementation in source code  

    # Run the generate final prompt program to generate a final prompt  
    logger.info("Step 3: Running generate final prompt program")  
    final_prompts = []  
    for instruction, few_shot_examples in zip(  
        instructions, bootstrap_few_shot_examples  
    ):  
        # convert few_shot_examples to a string  
        few_shot_examples_str = ""  
        for example in few_shot_examples:  
            try:  
                input_str = example["question"]  
                output_str = example["answer"]  
                few_shot_examples_str += (  
                    f"Question: {input_str}\nExpected Answer: {output_str}\n\n"  
                )  
            # pylint: disable=broad-exception-caught  
            except Exception as e:  
                logger.error(f"Error: {e}")  
        generate_final_prompt_program = Step3GenerateFinalPromptModule(  
            instruction=instruction,  
            few_shot_examples=few_shot_examples_str  
        )  
        final_prompt = generate_final_prompt_program()  
        final_prompts.append(final_prompt["final_prompt"])  

    logger.info("Final prompts:")  
    for i, prompt in enumerate(final_prompts, 1):  
        logger.info(f"  Prompt {i}: {prompt}")


By the end of this process, we have prompts that are optimized to guide the model effectively, based purely on our initial labeled dataset and without requiring module-level labels or gradients.


For more information, please check source code, blog post, mipro docs.


If you have any question, please leave a comment!


Bonus Content : Building with AI


And don’t forget to have a look at some practitioner resources that we published recently: And don’t forget to have a look at some practitioner resources that we published recently:


Llama 3.2-Vision for High-Precision OCR with Ollama

LitServe: FastAPI on Steroids for Serving AI Models — Tutorial with Llama 3.2 Vision

Run FLUX Models Locally on Your Mac!

GOT-OCR2.0 in Action: Optical Character Recognition Applications and Code Examples


Thank you for stopping by, and being an integral part of our community.


Happy building!