Supported models
Below is a selection of advanced large language models (LLMs) from industry-leading AI providers. These models offer a range of capabilities to enhance your applications, from text analysis to natural language understanding. Explore the options to find the ideal model that fits your project's needs.
Language Models
claude-3-haiku
Claude 3 Haiku, Anthropic's quickest and smallest AI model, is ideal for high-volume deployments requiring fast responses like knowledge retrieval, sales automation, and real-time customer service.
codellama-70b
The CodeLlama-70B-Instruct can understand natural language instructions to generate code, capable of tasks such as data manipulation, searching, sorting, filtering, and algorithm implementation.
dbrx
DBRX outperforms models like GPT-3.5 and rivals Gemini 1.0 Pro, excelling at coding tasks and surpassing models like Code LLamA-70B, thanks to its mixture-of-experts (MoE) architecture.

firefunction-v1
FireFunction is 4x quicker than GPT-4, supports an 'any' function option for routing, and is API compatible with OpenAI, excelling in request routing and structured information extraction tasks, offering near GPT-4 quality.

firellava-13b
LLaVA, a unique large multimodal model that blends a vision encoder with a language model (Vicuna), provides comprehensive visual and language understanding and emulates the multimodal skills of GPT-4.

gemma-7b
Google DeepMind's Gemma 7B, a lighter model of the Gemini series, excels in academic benchmarks such as math, science, code, reasoning, dialogue, and instruction following.
gpt-3.5-turbo
GPT-3.5 Turbo, optimized for chat, utilizes the Chat Completions API. It's ideal for driving chatbots, grammar checkers, spam filters, and code generators.
hermes-2-pro
Hermes 2 Pro, an upgrade from OpenHermes 2.5, excels in tasks, conversation, and outputs. It surpasses the original in benchmarks like AGIEval and TruthfulQA. Its efficiency and cost-effectiveness make it ideal for text generation, automation, and coding.

llama-3-70b
Llama-3-70b excels at generating high-quality dialogue output and demonstrates top-tier performance across various industry benchmarks. It also offers new capabilities, such as enhanced reasoning.
llama-3-8b
The llama-3-8b model has distinct sections for system instructions, user input, and assistant output, separated by special tokens. It's versatile and can be used for tasks like image captioning.
mistral-7b
With fewer parameters, Mistral-7b outperforms larger models in numerous benchmarks. It excels in English and coding tasks, using Grouped-Query Attention and Sliding Window Attention for speed and efficiency.

mixtral-8x7b
Mixtral 8x7B, an 8-expert Mixture of Experts (MoE) model, uses a router to select two experts per token, matching the speed of a 13 billion parameter model. Unlike dense models, not all parameters are active for every token.

neuralhermes
NeuralHermes, an enhanced version of the teknium/OpenHermes-2.5-Mistral-7B model, uses Direct Preference Optimization (DPO) with the mlabonne/chatml_dpo_pairs dataset for fine-tuning and outperforms the original in most benchmarks.

Multi-Modal Models
gemini-1.5-pro
Gemini 1.5 Pro provides long-context understanding across modalities. It can handle up to 1 million tokens, suitable for processing large text and codebases, all while maintaining efficiency due to its Mixture-of-Experts (MoE) architecture.
Text Embedding Models
text-embedding-ada-002
Embedding model developed by OpenAI, replaces five separate models for text search, text similarity, and code search.