Supported models

Below is a selection of advanced large language models (LLMs) from industry-leading AI providers. These models offer a range of capabilities to enhance your applications, from text analysis to natural language understanding. Explore the options to find the ideal model that fits your project's needs.

Language Models

claude-3-haiku

Claude 3 Haiku, Anthropic's quickest and smallest AI model, is ideal for high-volume deployments requiring fast responses like knowledge retrieval, sales automation, and real-time customer service.

codellama-70b

The CodeLlama-70B-Instruct can understand natural language instructions to generate code, capable of tasks such as data manipulation, searching, sorting, filtering, and algorithm implementation.

dbrx

DBRX outperforms models like GPT-3.5 and rivals Gemini 1.0 Pro, excelling at coding tasks and surpassing models like Code LLamA-70B, thanks to its mixture-of-experts (MoE) architecture.

firefunction-v1

FireFunction is 4x quicker than GPT-4, supports an 'any' function option for routing, and is API compatible with OpenAI, excelling in request routing and structured information extraction tasks, offering near GPT-4 quality.

firellava-13b

LLaVA, a unique large multimodal model that blends a vision encoder with a language model (Vicuna), provides comprehensive visual and language understanding and emulates the multimodal skills of GPT-4.

gemma-7b

Google DeepMind's Gemma 7B, a lighter model of the Gemini series, excels in academic benchmarks such as math, science, code, reasoning, dialogue, and instruction following.

gpt-3.5-turbo

GPT-3.5 Turbo, optimized for chat, utilizes the Chat Completions API. It's ideal for driving chatbots, grammar checkers, spam filters, and code generators.

hermes-2-pro

Hermes 2 Pro, an upgrade from OpenHermes 2.5, excels in tasks, conversation, and outputs. It surpasses the original in benchmarks like AGIEval and TruthfulQA. Its efficiency and cost-effectiveness make it ideal for text generation, automation, and coding.

llama-3-70b

Llama-3-70b excels at generating high-quality dialogue output and demonstrates top-tier performance across various industry benchmarks. It also offers new capabilities, such as enhanced reasoning.

llama-3-8b

The llama-3-8b model has distinct sections for system instructions, user input, and assistant output, separated by special tokens. It's versatile and can be used for tasks like image captioning.

mistral-7b

With fewer parameters, Mistral-7b outperforms larger models in numerous benchmarks. It excels in English and coding tasks, using Grouped-Query Attention and Sliding Window Attention for speed and efficiency.

mixtral-8x7b

Mixtral 8x7B, an 8-expert Mixture of Experts (MoE) model, uses a router to select two experts per token, matching the speed of a 13 billion parameter model. Unlike dense models, not all parameters are active for every token.

neuralhermes

NeuralHermes, an enhanced version of the teknium/OpenHermes-2.5-Mistral-7B model, uses Direct Preference Optimization (DPO) with the mlabonne/chatml_dpo_pairs dataset for fine-tuning and outperforms the original in most benchmarks.

Multi-Modal Models

gemini-1.5-pro

Gemini 1.5 Pro provides long-context understanding across modalities. It can handle up to 1 million tokens, suitable for processing large text and codebases, all while maintaining efficiency due to its Mixture-of-Experts (MoE) architecture.

Text Embedding Models

text-embedding-ada-002

Embedding model developed by OpenAI, replaces five separate models for text search, text similarity, and code search.