Databricks, the leading Data and AI company, made significant announcements at the Data + AI Summit. The newly introduced Lakehouse AI enables customers to develop generative AI applications, including large language models (LLMs), directly within the Databricks Lakehouse Platform. With the new Lakehouse AI, Databricks aims to unify the data and AI platform, enabling organizations to accelerate their generative AI journey.
Here are the key announcements made by Databricks:
Data-Driven Approach to AI
Maintaining clean and high-quality data becomes challenging when the data and AI platforms are not unified. Databricks addresses this issue by integrating data and AI on the Lakehouse Platform. By bringing together data, AI models, and monitoring and governance capabilities, Databricks enables customers to develop generative AI solutions more efficiently and successfully.
Databricks unifies the data and AI platforms with Lakehouse AI, allowing customers to develop generative AI solutions rapidly – from using foundational SaaS models to securely training their own custom models with enterprise data. Organizations can accelerate their generative AI journey by combining data, AI models, LLM operations (LLMOps), monitoring, and governance on the Databricks Lakehouse Platform.
Key Capabilities of Lakehouse AI
Vector Search: Vector databases are one of the key pillars of LLM-based applications. They store word embeddings and help perform semantic searches to retrieve sentences and phrases with the same meaning. Databricks Vector Search enhances the accuracy of LLM responses by enabling developers to perform semantic searches. It automatically creates and manages vector embeddings from files in Unity Catalog, Databricks’ flagship solution for unified search and governance. With seamless integrations with Databricks Model Serving, developers can improve the response from models by adding query filters to the search.
Fine-tuning in AutoML: Databricks AutoML introduces a low-code approach to fine-tuning LLMs. Customers can securely fine-tune LLMs using their enterprise data, and they retain ownership of the resulting model. Integration with MLflow, Unity Catalog, and Model Serving enables easy sharing, governance, serving, and monitoring of fine-tuned models within the organization.
Curated Open Source Models: The Databricks Marketplace offers a curated list of open source models. This collection includes models for various generative AI use cases, including instruction-following, summarization, and image generation. Databricks Model Serving optimizes these models’ performance, ensuring peak efficiency and cost optimization.
MLflow 2.5 Supports LLMs
Databricks introduced MLflow 2.5, an update to their popular open-source project for managing the machine learning lifecycle. MLflow AI Gateway allows centralized management of credentials for SaaS models or model APIs, along with access-controlled routes for querying. It provides flexibility to swap backend models for improved cost and quality and enables switching across LLM providers. MLflow Prompt Tools, a no-code visual tool, facilitates model output comparison based on a set of prompts. Integration with Databricks Model Serving streamlines deployment to production.
Intelligent Monitoring with Databricks Lakehouse Monitoring
Databricks expanded its monitoring capabilities with Databricks Lakehouse Monitoring. This feature offers end-to-end visibility into data pipelines, empowering users to continuously monitor, tune, and improve performance without additional tools or complexity. Leveraging the Unity Catalog, Lakehouse Monitoring provides deep insights into the lineage of data and AI assets, ensuring high quality, accuracy, and reliability. Proactive error detection and reporting simplify root cause analysis and provide recommended solutions across the data lifecycle.
Databricks has augmented its core offerings, including the Lakehouse, MLflow, Unity Catalog, and model serving platform, to support the lifecycle of Large Language Models (LLMs).
Databricks is strengthening its position in the generative AI market through investments in open source foundation models such as Dolly, the most recent acquisition of MosiacML, and enhancements to its key products announced at the Data + AI Summit.
Read the full article here