Develop solutions that use Cosmos DB storage

Develop solutions that use Cosmos DB storage is part of Develop for Azure storage topics. the total weight of this in the exam will be 10-15%. This training post is designed to help and provide readers with a better understanding of the topic mentioned.

Disclaimer: This is not a training article to help complete the Microsoft Azure AZ-204, but it provides a good insight into the areas within these topics. Labs and hands-on work are essential to passing most Microsoft Azure exams.

Develop solutions that use Cosmos DB storage:
Azure CosmoDB

Develop solutions that use Cosmos DB storage:

Azure CosmoDB

Today’s applications must be highly responsive and always online.

One of the most obvious challenges when maintaining a relational database system is that most relational engines apply locks and latches to enforce strict atomicity, consistency, isolation, durability (ACID) semantics. This approach provides benefits in terms of ensuring a consistent data state within the database. However, there are heavy tradeoffs with respect to concurrency, latency, and availability.

To achieve low latency and high availability, you must deploy instances of a database engine in datacenters that are close to their users. Databases then need to respond in real time to significant amounts of changes in usage at peak hours, store ever-increasing volumes of data, and make this data available to users in milliseconds.

Microsoft Azure Cosmos DB is a database service native to Azure that focuses on providing a high-performance database regardless of your selected API or data model.

You can deploy Azure Cosmos DB worldwide across all Azure regions, which helps you overcome the previously mentioned challenges of maintaining a relational database system. Partition ranges can be dynamically subdivided to seamlessly grow a database in line with an application while simultaneously maintaining high availability.

Core functionality

Azure Cosmos DB has four core features that are the same regardless of which API you use.

  • Global replication
  • Varied consistency levels
  • Low latency
  • Elastic scale-out

Global Replication

Azure Cosmos DB has a feature referred to as turnkey global distribution that automatically replicates data to other Azure datacenters across the globe without the need to manually write code or build a replication infrastructure.

Develop solutions that use Cosmos DB storage:
implement scaling (partitions, containers)

Develop solutions that use Cosmos DB storage:

Containers

In the Azure Cosmos DB SQL API, databases are essentially containers for collections. Collections are where you place individual documents. A collection is intrinsically elastic—it automatically grows and shrinks as you add or remove documents.

Each collection is assigned a throughput value, and that value dictates the maximum throughput for that collection and its corresponding documents. Alternatively, you can assign the throughput at the database level and share the throughput values among the collections in the database. If you have a set of documents that needs throughput beyond the limits of an individual collection, you can distribute the documents among multiple collections. Each collection has its own distinct throughput level.

If a particular collection is seeing spikes in throughput, you can manage its throughput level in isolation by increasing or decreasing the value. This change to the throughput level of a particular collection will not cause side effects for the other collections. This allows you to adjust to meet the performance needs of any workload in isolation.

You can also scale workloads across collections, if you have a workload that needs to be partitioned, you can scale that workload by distributing its associated documents across multiple collections. The SQL API for Azure Cosmos DB includes a client-side partition resolver that allows you to manage transactions and point them in code to the correct partition based on a partition key field.

Partitioning

Azure Cosmos DB provides containers for storing data called collections (for documents), graphs, or tables. Containers are logical resources and can span one or more physical partitions or servers. The number of partitions is determined by Azure Cosmos DB based on the storage size and throughput provisioned for a container or set of containers.

If you are already familiar with the sharding pattern, the idea of dynamic partitioning is not very different.

physical partition is a fixed amount of reserved solid-state drive (SSD) back-end storage combined with a variable amount of compute resources (CPU and memory). Each physical partition is replicated for high availability. A physical partition is an internal concept of Azure Cosmos DB, and physical partitions are transient. Azure Cosmos DB will automatically scale the number of physical partitions based on your workload.

Develop solutions that use Cosmos DB storage:
Select the appropriate API for your solution

Develop solutions that use Cosmos DB storage:

APIs

MongoDB API

Acts as a massively scalable MongoDB service powered by the Azure Cosmos DB platform žCompatible with existing MongoDB libraries, drivers, tools, and applications

Table API

A key-value database service built to provide premium capabilities to existing Azure Table storage applications without making any app changes

Gremlin API

The Gremlin API in Azure Cosmos DB is a fully managed, horizontally scalable graph database service that makes it easy to build and run applications that work with highly connected datasets supporting Open Graph APIs (based on the Apache TinkerPop specification, Apache Gremlin).

Apache Cassandra API

The Cassandra API in Azure Cosmos DB is a globally distributed Apache Cassandra service powered by the Azure Cosmos DB platform. Compatible with existing Apache Cassandra libraries, drivers, tools, and applications.

SQL API

The SQL API in Azure Cosmos DB is a JavaScript and JavaScript Object Notation (JSON) native API based on the Azure Cosmos DB database engine. The SQL API also provides query capabilities rooted in the familiar SQL query language. By using SQL, you can query for documents based on their identifiers or make deeper queries based on properties of the document, complex objects, or even the existence of specific properties. The SQL API supports the execution of JavaScript logic within the database in the form of stored procedures, triggers, and user-defined functions.

Migrating from NoSQL

Many NoSQL database engines are simple to get started with, but they might cause problems as you scale, including:

  • Tedious setup and maintenance requirements for a multiple-server database cluster
  • Expensive and complex high-availability solutions
  • Challenges in achieving end-to-end security, including encryption at rest and in flight
  • Required resource overprovisioning and unpredictable costs to achieve scale

Azure Cosmos DB provides NoSQL-as-a-service for:

  • MongoDB
  • Cassandra
  • Gremlin

To achieve a successful migration, it is important to keep a few tips in mind:

  • Instead of writing custom code, you should use native tools, such as the Cassandra shell, mongodump, and mongoexport.
  • Azure Cosmos DB containers should be allocated prior to the migration with the appropriate throughput levels set. Many of the tools will create containers for you with default settings that are not ideal.
  • Prior to migrating, you should increase the container’s throughput to at least 1,000 Request Units (RUs) per second so that the import tools are not throttled. The throughput can be reverted back to the typical values after the import is complete.
Azure Cosmos DB has a MongoDB API and a Cassandra API to provide a NoSQL service offering for two of the most popular NoSQL database platforms. Both APIs are protocol compatible with the Cassandra API supporting CQLv4 and the MongoDB API supporting MongoDB v5. Many applications can be “lifted and shifted” to Azure Cosmos DB without the need to rewrite code.

Develop solutions that use Cosmos DB storage:
Implement partitioning schemes

Develop solutions that use Cosmos DB storage:

Partitioning

Azure Cosmos DB provides containers for storing data called collections (for documents), graphs, or tables. Containers are logical resources and can span one or more physical partitions or servers. The number of partitions is determined by Azure Cosmos DB based on the storage size and throughput provisioned for a container or set of containers.

If you are already familiar with the sharding pattern, the idea of dynamic partitioning is not very different.

A physical partition is a fixed amount of reserved solid-state drive (SSD) back-end storage combined with a variable amount of compute resources (CPU and memory). Each physical partition is replicated for high availability. A physical partition is an internal concept of Azure Cosmos DB, and physical partitions are transient. Azure Cosmos DB will automatically scale the number of physical partitions based on your workload.

Partitioning implementation

Frugal number of partitions based on actual storage and throughput needs

A logical partition is a partition within a physical partition that stores all the data associated with a single partition key value. Partition ranges can be dynamically subdivided to seamlessly grow the database as the application grows while simultaneously maintaining high availability. When a container meets the partitioning prerequisites, partitioning is completely transparent to your application. Azure Cosmos DB handles distributing data across physical and logical partitions and routing query requests to the right partition.

Develop solutions that use Cosmos DB storage:
Set the appropriate consistency level for operations

Develop solutions that use Cosmos DB storage:

Consistency levels

Azure Cosmos DB provides five consistency levels:

StrongWhen a write operation is performed on your primary database, the write operation is replicated to the replica instances. The write operation is committed (and visible) on the primary only after it has been committed and confirmed by all replicas.
Bounded StatelessThis level is similar to the Strong level with the major difference that you can configure how stale documents can be within replicas. Staleness refers to the quantity of time (or the version count) a replica document can be behind the primary document.
SessionThis level guarantees that all read and write operations are consistent within a user session. Within the user session, all reads and writes are monotonic and guaranteed to be consistent across primary and replica instances.
Consistent PrefixThis level has loose consistency but guarantees that when updates show up in replicas, they will show up in the correct order (that is, as prefixes of other updates) without any gaps.
EventualThis level has the loosest consistency and essentially commits any write operation against the primary immediately. Replica transactions are asynchronously handled and will eventually (over time) be consistent with the primary. This tier has the best performance, because the primary database does not need to wait for replicas to commit to finalize it’s transactions.

More topics on Develop for Azure storage:

Develop solutions that use blob storage

Microsoft Azure AZ-204 exam topics:

If you have covered the current topics in Develop for Azure storage then you can have a look at the other topic areas:

View full documentation Microsoft Azure: AZ-204 exam content from Microsoft

Leave a Reply