Debug School

rakesh kumar
rakesh kumar

Posted on

How to Optimize Your LangChain Workflow with Parallel Execution

o reduce response time when using LangChain for multi-step workflows, you can try several optimization strategies that minimize redundant processing, reduce unnecessary API calls, and streamline the logic. Here are a few strategies with example code:

  1. Batch Processing Multiple Prompts
    Rather than running multiple sequential prompts (which can be slow), you can batch them together or optimize the process by handling multiple inputs at once in one prompt, which reduces the number of calls to the language model.

  2. Use Custom Caching
    LangChain supports caching intermediate results. This way, if a step is repeated with the same input, the result is retrieved from cache instead of running the model again, thus saving time.

  3. Minimize API Calls
    Instead of creating separate chains for each part of the workflow, you can pass information between steps more efficiently within a single chain, reducing the number of interactions with the language model.

  4. Parallelize Independent Tasks
    If you have multiple independent tasks (e.g., technical vs managerial decisions), you can run them in parallel rather than sequentially, thus speeding up the process.

  5. Optimize Prompt Templates
    Instead of creating separate prompts for each conditional route, combine similar steps in a more concise way, reducing complexity and execution time.

Batch Processing Multiple Prompts
In this approach, we’ll combine multiple prompts into a single request, reducing redundant API calls. This works well when you have related tasks (e.g., generating both technical and managerial solutions for the same context).

from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.llms import OpenAI

# Setup LLM (e.g., GPT model)
llm = OpenAI(model="text-davinci-003", temperature=0.7)

# Define the template for both technical and managerial solutions
batch_prompt = PromptTemplate(
    input_variables=["context"],
    template="""
    Given the context: {context},
    1. Provide a **technical solution** to address the problem.
    2. Provide a **managerial solution** to handle the situation.
    """
)

# Create the chain with a batch prompt
batch_chain = LLMChain(llm=llm, prompt=batch_prompt)

# Function to process multiple tasks in batch
def process_batch(context):
    # Call the chain with the context
    result = batch_chain.run({"context": context})
    return result

# Example input context
context = "The team is facing issues with server downtime and performance degradation."

# Run the batch processing
result = process_batch(context)
print(result)
Enter fullscreen mode Exit fullscreen mode

How It Works:
Prompt Template: We create a single prompt template that asks the model to provide both a technical solution and a managerial solution based on the same context.

The prompt template combines both tasks into one request to avoid multiple calls to the language model.
Single Chain: The batch_chain processes both tasks in a single call, improving efficiency and reducing response time.

Batch Processing: The process_batch function sends the context to the model, which processes both the technical and managerial solutions in one go.

Result: The output is returned as a single response with both solutions, saving time compared to handling them separately.

Custom Caching with a Dictionary
In this example, we will use a Python dictionary to implement a simple in-memory cache for caching the results of prompts. When a result is generated for a specific context, it is stored in the cache. On subsequent runs with the same context, the result is fetched from the cache, thus reducing response time.

from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.llms import OpenAI
import hashlib

# Setup the LLM (e.g., GPT model)
llm = OpenAI(model="text-davinci-003", temperature=0.7)

# Define the prompt template
prompt_template = PromptTemplate(
    input_variables=["context"],
    template="Given the context: {context}, provide a detailed solution.",
)

# Create the chain
chain = LLMChain(llm=llm, prompt=prompt_template)

# Simple in-memory cache (dictionary) to store results
cache = {}

# Function to generate a unique key for caching based on the context
def generate_cache_key(context):
    return hashlib.md5(context.encode()).hexdigest()

# Function to process and use cache
def process_with_cache(context):
    cache_key = generate_cache_key(context)

    # Check if result is already in cache
    if cache_key in cache:
        print("Fetching result from cache...")
        return cache[cache_key]  # Return cached result
    else:
        # If not cached, call the LLM and cache the result
        print("Generating new result...")
        result = chain.run({"context": context})
        cache[cache_key] = result  # Store the result in cache
        return result

# Example context input
context = "The project deadline is approaching, and the team is falling behind schedule."

# Run the process with cache
result = process_with_cache(context)
print(result)

# Run again to see if the result is fetched from cache
result2 = process_with_cache(context)
print(result2)
Enter fullscreen mode Exit fullscreen mode

How It Works:
Cache Key Generation: We generate a unique cache key based on the input context using hashlib.md5(). This ensures that even small variations in input will lead to different cache keys.
Cache Check: Before executing the chain, the process_with_cache() function checks if the result for the given context is already stored in the cache.
Store and Retrieve: If the result is cached, it is retrieved and returned directly. Otherwise, the LLM chain is executed, and the result is stored in the cache for future use.
Benefits:
Faster Responses: On repeated calls with the same context, results are fetched from the cache instead of making an API call.
Efficient API Calls: Reduces the number of calls to the language model, which can save both time and cost (if using a paid API).
Customizable Cache: The caching logic can be extended to use databases (e.g., Redis, SQLite) for more advanced scenarios, or you can implement cache expiration for stale data.

Minimize API Calls by Combining Related Tasks
Here’s how to combine multiple steps in a single prompt, thus minimizing the number of API calls:

from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.llms import OpenAI

# Setup LLM (e.g., GPT model)
llm = OpenAI(model="text-davinci-003", temperature=0.7)

# Define a single prompt template that combines multiple tasks (technical and managerial solutions)
combined_prompt = PromptTemplate(
    input_variables=["context"],
    template="""
    Given the following context: {context},
    1. Provide a **technical solution** to address the issue.
    2. Provide a **managerial solution** to handle the situation.
    3. Suggest potential **next steps** after implementing the solutions.
    """
)

# Create the chain using the combined prompt
combined_chain = LLMChain(llm=llm, prompt=combined_prompt)

# Function to process the context and generate results for all tasks in one call
def process_with_combined_prompt(context):
    # Run the combined chain, which will process all tasks in one API call
    result = combined_chain.run({"context": context})
    return result

# Example input context
context = "The team is facing issues with server downtime and performance degradation. The issue needs immediate resolution."

# Run the combined prompt processing
result = process_with_combined_prompt(context)
print(result)
Enter fullscreen mode Exit fullscreen mode

How It Works:
Single Prompt Template: Instead of calling the LLM with separate prompts for technical solutions, managerial solutions, and next steps, all the tasks are combined into one template.
One API Call: The LLMChain only makes one API call to process the entire set of tasks, drastically reducing the number of API calls.
Efficiency: This approach minimizes the overhead of making multiple requests and leverages a single, efficient API call to accomplish multiple tasks.

Parallel Execution Strategy
Instead of processing each task sequentially (which can take time), we will:

Identify independent tasks (e.g., technical vs. managerial tasks).
Use asyncio to run these tasks in parallel.
Combine the results once all tasks are completed.
Here's an example code that parallelizes technical and managerial solution generation tasks using LangChain with asyncio:

Code Example: Parallelizing Independent Tasks with asyncio

import asyncio
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.llms import OpenAI

# Setup the LLM (e.g., GPT model)
llm = OpenAI(model="text-davinci-003", temperature=0.7)

# Define Prompt Templates
technical_prompt = PromptTemplate(
    input_variables=["context"],
    template="Provide a technical solution for the context: {context}",
)

managerial_prompt = PromptTemplate(
    input_variables=["context"],
    template="Provide a managerial solution for the context: {context}",
)

# Create individual chains
technical_chain = LLMChain(llm=llm, prompt=technical_prompt)
managerial_chain = LLMChain(llm=llm, prompt=managerial_prompt)

# Function to handle technical solution
async def get_technical_solution(context):
    return await technical_chain.acall({"context": context})

# Function to handle managerial solution
async def get_managerial_solution(context):
    return await managerial_chain.acall({"context": context})

# Main function to execute parallel tasks
async def parallel_execution(context):
    # Run both tasks in parallel using asyncio.gather
    technical_task = get_technical_solution(context)
    managerial_task = get_managerial_solution(context)

    # Wait for both tasks to finish and collect the results
    technical_result, managerial_result = await asyncio.gather(technical_task, managerial_task)

    # Return the results
    return {
        "technical_solution": technical_result,
        "managerial_solution": managerial_result
    }

# Example Context
context = "The company is facing a server crash and downtime."

# Run the parallel tasks and get the results
async def main():
    results = await parallel_execution(context)
    print("Technical Solution:", results["technical_solution"])
    print("Managerial Solution:", results["managerial_solution"])

# Run the main function
asyncio.run(main())
Enter fullscreen mode Exit fullscreen mode

Image description

Top comments (0)