AI21 Labs Unveils Jamba 1.5 LLMs with Hybrid Architecture for Enhanced Reasoning

Jessie A Ellis
Aug 23, 2024 01:33

AI21 Labs introduces Jamba 1.5, a new family of large language models leveraging hybrid architecture for superior reasoning and long context handling.

AI21 Labs has introduced the Jamba 1.5 model family, a state-of-the-art collection of large language models (LLMs) engineered to excel in a variety of generative AI tasks, according to the NVIDIA Technical Blog.

Hybrid Architecture Delivers Superior Performance

The Jamba 1.5 family employs a hybrid approach combining Mamba and transformer architectures, complemented by a mixture of experts (MoE) module. This architecture excels in managing long contexts with minimal computational overhead while ensuring high accuracy in reasoning tasks. The MoE module increases the model’s capacity without escalating computational requirements by utilizing only a subset of available parameters during token generation.

Each Jamba block, configured with eight layers and an attention-to-Mamba ratio of 1:7, fits into a single NVIDIA H100 80 GB GPU. The model’s architecture balances memory usage and computational efficiency, making it suitable for various enterprise applications.

The Jamba 1.5 models also boast an extensive 256K token context window, enabling the processing of approximately 800 pages of text. This capability improves the accuracy of responses by retaining more relevant information over longer contexts.

Enhancing AI Interactivity with Function Calling and JSON Support

One of the standout features of the Jamba 1.5 models is their robust function calling capability with JSON data interchange support. This functionality allows the models to execute complex actions and handle sophisticated queries, enhancing the interactivity and relevance of AI applications.

For instance, businesses can deploy these models for real-time, high-precision tasks such as generating loan term sheets for financial services or acting as shopping assistants in retail environments.

Maximizing Accuracy with Retrieval-Augmented Generation

The Jamba 1.5 models are optimized for retrieval-augmented generation (RAG), which improves their ability to deliver contextually relevant responses. The 256K token context window allows for managing large volumes of information without continuous chunking, ideal for scenarios requiring comprehensive data analysis.

RAG is particularly beneficial in environments with extensive and scattered knowledge bases, enabling the models to retrieve and provide more relevant information efficiently.

Get Started

The Jamba 1.5 models are now available on the NVIDIA API catalog, joining over 100 popular AI models supported by NVIDIA NIM microservices. These microservices simplify the deployment of performance-optimized models for various enterprise applications.

NVIDIA collaborates with leading model builders to support a wide range of models, including Llama 3.1 405B, Mistral 8x22B, Phi-3, and Nemotron 340B Reward. For more information and to explore these models, visit ai.nvidia.com.

Image source: Shutterstock