Data is multifarious. Information is used in so many places for so many technology applications and use cases that it takes on many forms, it runs at many speeds, it resides in many types of storage structures and it represents different levels of mission-critical value. Because there is now such an (almost) infinite variety of data spread across the enterprise, organizations need a way to not perform many different types of queries across data-in-motion and data-at-rest.
Even when we think we have a handle on the ‘shape’ of the information wave that travels across our new digital workflows, firms can experience unexpected spikes, outages and disruptions that throw the best-laid plans out of kilter. These home truths have an impact on the way the technology industry has been working to adapt databases work; when we also factor in the new reality of generative AI, there’s a lot to ingest here – and we haven’t even started on data ingestion challenges here.
It’s all very well if the database vendors tell us they can now offer more speed, performance and agility (and they do – yawn), what we now need is deeper specific insight into the way the engine room functions, what kind of gearing ratios we can make use of and whether or not we’re kitted out with all-terrain road tires. Developer data platform company MongoDB, Inc. says it has watched all the above factors play out in order to extend its core technology proposition this year.
The company talks about its focus on operational data across an entire organization, for any workload or use case. That means working data, live data, real-time data and every information stream that exists all the way through to the fragmented unstructured data that resides in the murky waters of the data lake, or perhaps (certainly in MongoDB’s case) the more ordered confines of the data lakehouse.
A developer data platform
As noted above, MongoDB generally doesn’t refer to its technology as a database, the company calls MongoDB Atlas a developer data platform. This means it is a technology built from database foundations, but now significantly more directly engineered for so-called modern applications i.e. ones that feature potentially massive data flows, ones that make heavy use of automation and autonomous management where possible, ones that are eminently scalable, flexible, resilient and agile when the need for impactful change (remember Covid-19?), ones that embrace microservices and containerization for granular control, ones that have the ‘luxury’ of serverless provisioning and – basically – ones that don’t look like the monolithic applications we built before 1999.
“When we talk about MongoDB Atlas as a developer data platform, we mean that we offer developers everything they need to work with data – a task which typically takes up some three-quarters of their time when working to provision and manage the infrastructure for the enterprise applications that we all depend upon,” said Dev Ittycheria, president and CEO at MongoDB.
So that’s a developer data platform then and not a data developer platform, the latter (if it were to exist) presumably being something that Database Administrators (DBAs) might use to perform the Ops-operations roles that they fulfil in DevOps teams. If anything, says Ittycheria, MongoDB has ‘obviated’ (in a positive way) the need for DBAs to focus on many of the traditional grunt work jobs associated with operations so they can instead turn their attention to higher value tasks such as identifying bottlenecks or looking for operational efficiencies and so on.
“Cloud has enabled such a wide variety of use cases across increasingly diverse application structures, but there have inevitably been trade-offs with an explosion of ‘point tools’ designed to handle specific functions – such as text search, graph queries, time-series functions, vector-based search and more – a reality that always means there will be diminishing returns in terms of what a developer can do with a tool in any one deployment scenario. By offering essential data developer functions in MongoDB Atlas built natively [such as vector-search, explained later in this story], we can offer all the benefits of cloud in a radically simplified and unified model,” clarified Ittycheria, with a degree of quiet confidence when speaking to press this summer 2023.
New this summer from MongoDB are new products and features including generative AI capabilities with MongoDB Atlas Vector Search for ‘highly’ relevant information retrieval and personalization, MongoDB Atlas Search Nodes for dedicated resources with search workloads at enterprise scale, MongoDB Atlas Stream Processing for high-velocity streams of complex data, significant scaling and efficiency improvements for MongoDB Time Series collections and new capabilities using MongoDB Atlas Data Federation for querying data and isolating workloads on Microsoft Azure.
If that penny didn’t drop there… this is all about standardizing many types of workloads on a single developer data platform across the enterprise. As we said at the start, data is multifarious.
“The new MongoDB Atlas capabilities are an answer to the feedback we get from customers every day; they love that their teams are able to quickly build and innovate with MongoDB Atlas and want to be able to do even more with it across the enterprise,” said CEO Ittycheria. “With the [platform’s] new features, we’re further supporting customers running the largest, most demanding mission-critical workloads that require continually increasing scalability and flexibility, so they can unleash the power of software and data with next-generation applications.”
What’s all the fuss with data?
To the casual non-techie not particularly interested in data science that just wants their business applications to work, it might well sound like the technology industry is being a bit histrionic over the ‘new’ data streams that we need to deal with. After all, this isn’t the 1950s and we’ve long moved on from the era of punch cards, mainframes (apart from their use in financials) and the early clunky use of data systems with basic storage and retrieval functionalities and only limited scope for analytics.
Much of the answer comes down to the widely touted inflexion point that we now find ourselves in. There’s the rise (some would say explosion) of new technology like generative AI and Large Language Models (LLMs) and the widespread (some would say exponential) growth of different types of data being generated in real-time. Combine that data back-end reality with the front-end demands of users who have gotten well-used to instant application functions and updates, which are largely driven by ubiquitous connectivity, the always-on model of the cloud and the ability for software developers to deliver continuous integration and continuous deployment – and you can see why we need more than just a database these days.
To be fair to MongoDB, every other data vendor out there refers to their technology as a platform, but it’s worth knowing why this progression to a more unified and fully managed developer data platform has taken place.
The integrated functionality factor
According to the company’s engineering team, MongoDB Atlas today represents a multi-cloud developer data platform that provides an integrated set of data and application services in a unified environment to enable developer teams to quickly build with the capabilities, performance and scale modern applications require. CEO Ittycheria notes that this essentially ‘integrated functionality factor’ is what more companies today have been asking for
Core product updates for MongoDB this year include a number of key cornerstone technologies.
To integrate AI-powered search and personalization into applications on MongoDB Atlas, the company has developed MongoDB Atlas Vector Search. Surfacing as a comparatively new term in enterprise technology circles (although stemming from computer science principles that have been around for decades) the world of vector search makes use of Machine Learning (ML) techniques in order to understand the contextual meaning behind more unstructured types of data such as video or images, but often also text. By transforming meaning values into geometric representations in the shape of vectors, machines can plot out our wider but specific human intent when we engage in search and then start analytics procedures to discover results for us.
The previously noted use of Large Language Models (LLMs) requires data in the form of vectors. According to MongoDB, these types of AI models measure the similarity between vectors to probabilistically construct sentences from prompts, generate images from captions, or return search results that are more accurate and contain greater context than traditional search engines.
“To store vectors so LLMs can use them, some organizations have begun using specialized databases. However, single-purpose databases for use cases like vector stores or time-series applications are often bolted on to existing technology stacks, resulting in more administrative complexity, an educational burden on developers, and longer time to value. With MongoDB Atlas Vector Search, customers can power a range of new workloads from semantic search with text to image search and comparison to highly personalized product recommendations using a single, familiar, unified platform across an entire organization – all with minimal developer friction,” notes the company, in a technical statement.
MongoDB Atlas Vector Search also allows customers to securely augment the capabilities of pre-trained generative AI models with their own data to provide memory that creates more accurate and relevant results for specific domains or use cases.
Other base-level developments from MongoDB for this year’s platform update also include the ability to scale search workloads independent of their database – remember how ‘scale’ was one of the defining factors of a modern application? The company has also engineered for the ability to process high-velocity streams of complex data with MongoDB Atlas, so with data stream processing also being fundamental to the way data platforms must now develop, this is arguably expected but also welcome.
Data tiering, without tears
It’s also worth noting that MongoDB has enabled ‘tier and query data’ on Microsoft Azure with MongoDB Atlas Online Archive and Atlas Data Federation. New multi-cloud options bring Microsoft Azure support to MongoDB Atlas Online Archive and Atlas Data Federation in addition to Amazon Web Services (AWS).
As explained in full here “A tiered [data] storage architecture categorizes data hierarchically based on its business value, with data ranked by how often it’s accessed by users and applications.”
Often classified by whether data is ‘hot’ (frequently used, mission-critical and essential for modern application processing) or ‘cold’ (less used, so suitable for cheaper forms of slower data storage services with more latency overhead and so, therefore, slower recovery), data tiering enables us to work with more types of information more cost-effectively. Given the different types of data that we have been talking about here and the fact that many different types of cloud services are available in the multi-cloud world, data tiering has evolved from being a useful spanner to a keenly sharpened tool in the cloud data engineer’s toolbox these days.
As per the organization’s announcement statement for this news, “Customers today use MongoDB Atlas Online Archive to automatically tier Atlas databases to the most cost-effective cloud object storage option while retaining the ability to query with high performance. By adding support for Microsoft Azure, customers can now more easily keep their entire workloads in the same cloud. Atlas Data Federation provides a way to read and write data from Atlas databases and cloud object stores to simplify how customers can generate datasets from Atlas to feed downstream applications and systems that use cloud storage.”
Data trends to take away
As we said at the start, data is multifarious. There’s a lot more data in a lot more places in a lot more ‘form factors’ today, all tasked with doing a lot more jobs across many more clouds, applications and ancillary services. To reflect and ‘serve’ our user needs across this seemingly vast information topography, MongoDB has perhaps described a more diverse set of platform tools, functions and abilities.
What matters now is whether these tools are useable by the data developers who will need to get their hands dirty with it – but there’s integrated platform-wide automation now, so it’s a much less dirty job. The world of data is certainly humongous, how on earth could MongoDB have come up with that name right?
Read the full article here