Skip to main content

Krrish Dholakia
Ishaan Jaffer

alerting, prometheus, secret management, management endpoints, ui, prompt management, finetuning, batch

note

v1.57.8-stable, is currently being tested. It will be released on 2025-01-12.

New / Updated Models​

  1. Mistral large pricing - https://github.com/BerriAI/litellm/pull/7452
  2. Cohere command-r7b-12-2024 pricing - https://github.com/BerriAI/litellm/pull/7553/files
  3. Voyage - new models, prices and context window information - https://github.com/BerriAI/litellm/pull/7472
  4. Anthropic - bump Bedrock claude-3-5-haiku max_output_tokens to 8192

General Proxy Improvements​

  1. Health check support for realtime models
  2. Support calling Azure realtime routes via virtual keys
  3. Support custom tokenizer on /utils/token_counter - useful when checking token count for self-hosted models
  4. Request Prioritization - support on /v1/completion endpoint as well

LLM Translation Improvements​

  1. Deepgram STT support. Start Here
  2. OpenAI Moderations - omni-moderation-latest support. Start Here
  3. Azure O1 - fake streaming support. This ensures if a stream=true is passed, the response is streamed. Start Here
  4. Anthropic - non-whitespace char stop sequence handling - PR
  5. Azure OpenAI - support entrata id username + password based auth. Start Here
  6. LM Studio - embedding route support. Start Here
  7. WatsonX - ZenAPIKeyAuth support. Start Here

Prompt Management Improvements​

  1. Langfuse integration
  2. HumanLoop integration
  3. Support for using load balanced models
  4. Support for loading optional params from prompt manager

Start Here

Finetuning + Batch APIs Improvements​

  1. Improved unified endpoint support for Vertex AI finetuning - PR
  2. Add support for retrieving vertex api batch jobs - PR

NEW Alerting Integration​

PagerDuty Alerting Integration.

Handles two types of alerts:

  • High LLM API Failure Rate. Configure X fails in Y seconds to trigger an alert.
  • High Number of Hanging LLM Requests. Configure X hangs in Y seconds to trigger an alert.

Start Here

Prometheus Improvements​

Added support for tracking latency/spend/tokens based on custom metrics. Start Here

NEW Hashicorp Secret Manager Support​

Support for reading credentials + writing LLM API keys. Start Here

Management Endpoints / UI Improvements​

  1. Create and view organizations + assign org admins on the Proxy UI
  2. Support deleting keys by key_alias
  3. Allow assigning teams to org on UI
  4. Disable using ui session token for 'test key' pane
  5. Show model used in 'test key' pane
  6. Support markdown output in 'test key' pane

Helm Improvements​

  1. Prevent istio injection for db migrations cron job
  2. allow using migrationJob.enabled variable within job

Logging Improvements​

  1. braintrust logging: respect project_id, add more metrics - https://github.com/BerriAI/litellm/pull/7613
  2. Athina - support base url - ATHINA_BASE_URL
  3. Lunary - Allow passing custom parent run id to LLM Calls

Git Diff​

This is the diff between v1.56.3-stable and v1.57.8-stable.

Use this to see the changes in the codebase.

Git Diff

Krrish Dholakia
Ishaan Jaffer

langfuse, management endpoints, ui, prometheus, secret management

Langfuse Prompt Management​

Langfuse Prompt Management is being labelled as BETA. This allows us to iterate quickly on the feedback we're receiving, and making the status clearer to users. We expect to make this feature to be stable by next month (February 2025).

Changes:

  • Include the client message in the LLM API Request. (Previously only the prompt template was sent, and the client message was ignored).
  • Log the prompt template in the logged request (e.g. to s3/langfuse).
  • Log the 'prompt_id' and 'prompt_variables' in the logged request (e.g. to s3/langfuse).

Start Here

Team/Organization Management + UI Improvements​

Managing teams and organizations on the UI is now easier.

Changes:

  • Support for editing user role within team on UI.
  • Support updating team member role to admin via api - /team/member_update
  • Show team admins all keys for their team.
  • Add organizations with budgets
  • Assign teams to orgs on the UI
  • Auto-assign SSO users to teams

Start Here

Hashicorp Vault Support​

We now support writing LiteLLM Virtual API keys to Hashicorp Vault.

Start Here

Custom Prometheus Metrics​

Define custom prometheus metrics, and track usage/latency/no. of requests against them

This allows for more fine-grained tracking - e.g. on prompt template passed in request metadata

Start Here

Krrish Dholakia
Ishaan Jaffer

docker image, security, vulnerability

0 Critical/High Vulnerabilities

What changed?​

  • LiteLLMBase image now uses cgr.dev/chainguard/python:latest-dev

Why the change?​

To ensure there are 0 critical/high vulnerabilities on LiteLLM Docker Image

Migration Guide​

  • If you use a custom dockerfile with litellm as a base image + apt-get

Instead of apt-get use apk, the base litellm image will no longer have apt-get installed.

You are only impacted if you use apt-get in your Dockerfile

# Use the provided base image
FROM ghcr.io/berriai/litellm:main-latest

# Set the working directory
WORKDIR /app

# Install dependencies - CHANGE THIS to `apk`
RUN apt-get update && apt-get install -y dumb-init

Before Change

RUN apt-get update && apt-get install -y dumb-init

After Change

RUN apk update && apk add --no-cache dumb-init

Krrish Dholakia
Ishaan Jaffer

deepgram, fireworks ai, vision, admin ui, dependency upgrades

New Models​

Deepgram Speech to Text​

New Speech to Text support for Deepgram models. Start Here

from litellm import transcription
import os

# set api keys
os.environ["DEEPGRAM_API_KEY"] = ""
audio_file = open("/path/to/audio.mp3", "rb")

response = transcription(model="deepgram/nova-2", file=audio_file)

print(f"response: {response}")

Fireworks AI - Vision support for all models​

LiteLLM supports document inlining for Fireworks AI models. This is useful for models that are not vision models, but still need to parse documents/images/etc. LiteLLM will add #transform=inline to the url of the image_url, if the model is not a vision model See Code

Proxy Admin UI​

  • Test Key Tab displays model used in response
  • Test Key Tab renders content in .md, .py (any code/markdown format)

Dependency Upgrades​

Bug Fixes​

Krrish Dholakia
Ishaan Jaffer

guardrails, logging, virtual key management, new models

info

Get a 7 day free trial for LiteLLM Enterprise here.

no call needed

New Features​

✨ Log Guardrail Traces​

Track guardrail failure rate and if a guardrail is going rogue and failing requests. Start here

Traced Guardrail Success​

Traced Guardrail Failure​

/guardrails/list​

/guardrails/list allows clients to view available guardrails + supported guardrail params

curl -X GET 'http://0.0.0.0:4000/guardrails/list'

Expected response

{
"guardrails": [
{
"guardrail_name": "aporia-post-guard",
"guardrail_info": {
"params": [
{
"name": "toxicity_score",
"type": "float",
"description": "Score between 0-1 indicating content toxicity level"
},
{
"name": "pii_detection",
"type": "boolean"
}
]
}
}
]
}

✨ Guardrails with Mock LLM​

Send mock_response to test guardrails without making an LLM call. More info on mock_response here

curl -i http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-npnwjPQciVRok5yNZgKmFQ" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{"role": "user", "content": "hi my email is ishaan@berri.ai"}
],
"mock_response": "This is a mock response",
"guardrails": ["aporia-pre-guard", "aporia-post-guard"]
}'

Assign Keys to Users​

You can now assign keys to users via Proxy UI

New Models​

  • openrouter/openai/o1
  • vertex_ai/mistral-large@2411

Fixes​

Krrish Dholakia
Ishaan Jaffer

key management, budgets/rate limits, logging, guardrails

info

Get a 7 day free trial for LiteLLM Enterprise here.

no call needed

✨ Budget / Rate Limit Tiers​

Define tiers with rate limits. Assign them to keys.

Use this to control access and budgets across a lot of keys.

Start here

curl -L -X POST 'http://0.0.0.0:4000/budget/new' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"budget_id": "high-usage-tier",
"model_max_budget": {
"gpt-4o": {"rpm_limit": 1000000}
}
}'

OTEL Bug Fix​

LiteLLM was double logging litellm_request span. This is now fixed.

Relevant PR

Logging for Finetuning Endpoints​

Logs for finetuning requests are now available on all logging providers (e.g. Datadog).

What's logged per request:

  • file_id
  • finetuning_job_id
  • any key/team metadata

Start Here:

Dynamic Params for Guardrails​

You can now set custom parameters (like success threshold) for your guardrails in each request.

See guardrails spec for more details

Krrish Dholakia
Ishaan Jaffer

batches, guardrails, team management, custom auth


info

Get a free 7-day LiteLLM Enterprise trial here. Start here

No call needed

✨ Cost Tracking, Logging for Batches API (/batches)​

Track cost, usage for Batch Creation Jobs. Start here

✨ /guardrails/list endpoint​

Show available guardrails to users. Start here

✨ Allow teams to add models​

This enables team admins to call their own finetuned models via litellm proxy. Start here

✨ Common checks for custom auth​

Calling the internal common_checks function in custom auth is now enforced as an enterprise feature. This allows admins to use litellm's default budget/auth checks within their custom auth implementation. Start here

✨ Assigning team admins​

Team admins is graduating from beta and moving to our enterprise tier. This allows proxy admins to allow others to manage keys/models for their own teams (useful for projects in production). Start here

Krrish Dholakia
Ishaan Jaffer

A new LiteLLM Stable release just went out. Here are 5 updates since v1.52.2-stable.

langfuse, fallbacks, new models, azure_storage

Langfuse Prompt Management​

This makes it easy to run experiments or change the specific models gpt-4o to gpt-4o-mini on Langfuse, instead of making changes in your applications. Start here

Control fallback prompts client-side​

Claude prompts are different than OpenAI

Pass in prompts specific to model when doing fallbacks. Start here

New Providers / Models​

✨ Azure Data Lake Storage Support​

Send LLM usage (spend, tokens) data to Azure Data Lake. This makes it easy to consume usage data on other services (eg. Databricks) Start here

Docker Run LiteLLM​

docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:litellm_stable_release_branch-v1.55.8-stable

Get Daily Updates​

LiteLLM ships new releases every day. Follow us on LinkedIn to get daily updates.