Workload Management Summary

Workload management is the process of controlling and allocating resources to different workloads in a data solution to meet Service Level Agreements (SLAs)—ensuring tasks like data loading, transformation, and querying finish within set timeframes.

In distributed systems, workloads run in parallel and share resources, so workload management must isolate and control allocations to maintain performance.

Workload in Data Warehouses

Database: Stores one type of data.
Data Warehouse: Combines data from multiple sources for complex reporting and analysis.

Typical workload tasks:

Loading data (directly or via intermediary processes)
Analysis & reporting (queries, reports)
Data management (replication, movement)
Data export (archiving or transfer)

Azure Synapse Analytics Example

Old method – Resource Classes

Assigns memory based on user role.
Allows query tuning (small/medium/large/XL).
Limitation: No direct control; heavy queries can overuse resources.

New method – Dedicated SQL Pools

Workload Classification – Assign workloads to groups and set priority.
Workload Isolation – Reserve resources for critical workloads to meet strict SLAs.
Fine-grained Resource Control – Reserve or cap usage, set timeouts, and prevent resource starvation.

Key Benefit: Dedicated SQL Pools offer guaranteed resource allocation, predictable performance, and protection for critical workloads.

Identifying the Need for Data Solutions

Overview

Data is central to modern cloud solutions, enabling organizations to uncover valuable business insights. Cloud providers offer a wide range of services to:

Collect data from diverse sources (e.g., IoT devices)
Analyze data to extract insights
Store massive amounts of data (big data storage) for analysis

A data solution is needed when an organization wants to effectively manage, store, and analyze various types of data.

Types of Data

Structured Data
- Definition: Organized in predefined schemas (rows & columns).
- Example: Relational databases like SQL Server.
- Characteristics:
  - Schema is known ahead of time.
  - Data must be transformed to fit schema.
- Azure Solution: Azure SQL Database.
Semi-structured Data
- Definition: Has meaningful fields but no strict relational schema.
- Examples:
  - Key-value stores (e.g., Azure Cosmos DB)
    - Records as JSON objects, each with a unique key.
    - Fields can vary between records.
  - Graph databases
    - Store nodes and relationships (dynamic associations).
    - Ideal for modeling complex, interconnected data (e.g., industrial plant models).
- Challenges: Schema varies between records, making analysis more complex.
- Azure Solutions: Azure Table Storage, Azure Cosmos DB.
Unstructured Data
- Definition: No predefined structure.
- Examples: PDFs, images, sound files.
- Azure Solution: Azure Blob Storage.

Access Control in Azure SQL Database

Read-only: View data without modification (e.g., report accounts).
Read/Write: View and modify data (e.g., data collection services).
Owner: Full administrative control (schema changes, user management).

Access can be managed via:

Azure Active Directory
Database roles (read, read/write, owner)

Key Indicators for Needing a Data Solution

Handling large volumes of structured, semi-structured, or unstructured data.
Need for centralized storage and management in the cloud.
Requirement for secure and role-based access to data.
Desire to analyze and visualize data to gain insights.
Managing data from multiple, diverse sources.

Identifying Types of Data Storage

Overview

Cloud services like Azure offer flexible storage options based on how data is stored and how it’s used. Data storage falls into two main categories:

1. Relational Data Storage

Structure: Organized into schemas, databases, tables, and fields with predefined relationships.
Advantages:
- Predefined schema ensures query reliability.
- Normalization reduces duplication (e.g., using lookup tables for repeated values).
Disadvantages:
- Schema rigidity — cannot store data outside defined structure.
Example: SQL Server.

2. Non-relational Data Storage

Structure: Does not follow relational table structure.
Example: NoSQL databases (JSON objects, dynamic fields).
Advantages:
- Flexible; can store irregular or unstructured data without prior transformation.
- Often faster for certain workloads.
Disadvantages:
- Querying is harder due to unpredictable fields.
- Uses data duplication to speed queries.
Example: Azure Table Storage, Cosmos DB.

Core Azure Storage Services (All Non-relational)

Azure Blob Storage
- Big data storage for any file type (text, binary, images, video).
- Stores data “as is” for later analysis.
- Supports tagging for searchability.
Azure Files
- Cloud-based file sharing for VMs or on-prem systems.
Azure Queues
- Message queue service for decoupling services.
- Messages stored in JSON format.
Azure Tables
- NoSQL key-value store for JSON objects.
Azure Disks
- Block-level storage for Azure virtual machines.

Benefits of Azure Storage Services

High availability with redundancy and regional replication.
Scalability (e.g., 500 TB per Azure Table, with option to add more tables).
Fully managed infrastructure.
Global access via web, SDKs, CLI, or PowerShell.

Security in Azure Storage

Authentication & Authorization:
- Azure Active Directory (AAD) with Role-Based Access Control (RBAC).
- Shared Key (account-level key, less secure).
- Shared Access Signature (SAS) for time-limited or restricted access.
- Public access (Blob containers) for non-sensitive content.
Encryption:
- At rest: Data stored encrypted in Azure.
- Client-side: Data encrypted before sending to Azure for added security.

Recognizing Characteristics of Transactional Workloads

Overview

A transactional workload involves one or more operations that must be treated as a single, atomic unit. Either all operations succeed or none do, ensuring the system is never left in an incomplete state.

Example: A bank transfer (withdraw from one account, deposit to another) must complete fully or fail entirely—never leaving money “in limbo.”

Core Properties (ACID)

Transactional databases, especially relational databases like SQL Server, follow ACID properties:

Atomicity
- All steps in a transaction succeed or fail as one unit.
- Prevents partial changes to the system.
Consistency
- Ensures that transactions move the system from one valid state to another.
- Prevents data anomalies (e.g., missing funds).
Isolation
- Transactions running concurrently cannot interfere with each other.
- Results are the same as if transactions were run sequentially.
Durability
- Once committed, a transaction’s results are permanent—even if a system failure occurs.
- Achieved through robust storage and backups.

Implementation Considerations

Locks: Used to prevent conflicting access during transactions.
- Too many locks can hurt performance.
Error Handling: Systems must be able to rollback changes if failures occur.
Concurrency Management: Supports multiple users modifying the same data without breaking ACID rules.

Challenges in Distributed Databases

Geographic distribution adds network latency and failure risks.
Eventual Consistency:
- Updates in one region are propagated to others over time.
- Acceptable for unordered processes (e.g., sending emails).
- Not ideal for ordered or time-sensitive operations.
Rollback Complexity: Failures during rollback can lead to additional handling logic.
Locks in Distributed Systems: May ensure synchronous processing but reduce parallel performance.

Key Takeaways

Transactional workloads require strict guarantees to maintain data integrity, especially in scenarios involving financial transactions, inventory management, or other mission-critical processes. In distributed cloud systems, eventual consistency is often used as a trade-off between speed and absolute synchronization.

Analytical Workloads

Definition

Analytical workloads perform data analysis on read-only, large sets of data. These datasets are typically aggregated from raw data to highlight important business factors, often displayed as visualizations such as tables and charts.

Example: A chart showing quarterly earnings trends in a business report.

Characteristics

Used primarily for business decision-making.
Show summaries or snapshots, not raw data.
Can vary in importance depending on deadlines or impact.

Importance Levels in Azure Synapse

Importance Level	Example	Notes
Low	Company website showing current weather	Least important, no serious consequences if delayed
Below Normal	Website count of oil facilities with issues	Can be up to a day behind, relatively important
Normal	Ad hoc queries by data scientists	Standard importance
Above Normal	Summaries needing accuracy within 10 minutes	Not critical but higher priority than normal
Highest	Yearly sales report for stakeholders	Must meet specific deadline (SLA)

How Importance Affects Execution

Higher importance workloads run before lower importance workloads.
Within the same importance level, workloads compete for resources.
Non-locking workloads are prioritized over locking workloads.

Locking Workloads

Some operations cause read/write locks, delaying other workloads:

Partition switching (e.g., moving data to archive store)
RENAME OBJECT (renaming a table)

Example: If a rename operation is queued but a non-locking query on the same table is submitted, the query runs first unless the rename is set to a higher importance level.

Efficiency Considerations

Importance is not the only factor—workload efficiency matters.

Strategies for Efficiency:

Pre-aggregate roll-up data for set date ranges (weekly, monthly, quarterly, yearly) instead of allowing ad hoc ranges.
Limit displayed data points to what’s readable and useful.
- Example: Avoid displaying tens of thousands of data points on a small trend chart; show only a few hundred key points.

Key Takeaways

Analytical workloads are about summarizing and visualizing data for decision-making.
Importance levels in Azure Synapse determine execution order and resource allocation.
Avoid unnecessary locks; design workloads for maximum efficiency.
Preprocessing and limiting data improve both speed and usability.

Real-Time Processing in Azure

Purpose

Real-time processing analyzes streaming data as it enters a system, before it is stored. It delivers insights and actions within seconds or milliseconds, making it ideal for scenarios where low latency is critical.

Example: Monitoring temperature readings from IoT thermostats and triggering alerts immediately when a threshold is exceeded.

Key Characteristics

Low Latency: Processes data almost instantly upon arrival.
Streaming Data: Works on sequentially incoming data, not static datasets.
Parallelism: Splits incoming data into independent sub-streams for simultaneous processing.
Immediate Output: Results can be streamed directly to dashboards or stored for later use.

Common Use Cases

Industrial monitoring (e.g., gas plant alerts)
IoT device telemetry analysis
Fraud detection
Real-time analytics dashboards
AI-driven live decision systems

Real-Time Processing Architecture

1. Ingestion

Technologies for high-throughput, ordered event ingestion:

Azure Event Hubs: Ingests millions of events per second with guaranteed ordering.
Azure IoT Hub: Bi-directional communication with millions of IoT devices.
Apache Kafka: Open-source, high-performance stream processing.

2. Processing

Options for analyzing data streams in motion:

Azure Stream Analytics: SQL-like query language, extendable with JavaScript.
Apache Storm: Uses "spouts" (ingress) and "bolts" (processing/output) with C# or Java.
Apache Spark Streaming: Supports Java, Python, Scala.

3. Storage

High-frequency write-optimized data stores:

Azure Synapse Analytics
Azure Data Explorer (fast ad hoc queries)
Azure Cosmos DB
Temporary real-time stores like Power BI live datasets

4. Analytics & Visualization

Power BI (direct real-time streaming or historical)
Azure Analysis Services (historical with near real-time simulation)
Excel (connected to real-time sources)

Performance Considerations

Avoiding Ingestion Backlogs: Ingestion pipelines must keep pace with data arrival.
Horizontal Scaling: Spawning multiple processing instances for independent sub-streams (e.g., one per device).
Optimized Storage Writes: Data stores must handle constant, rapid updates without bottlenecks.

Summary

Real-time processing in Azure enables instant insights from live data streams, making it essential for time-critical applications. By combining fast ingestion, massively parallel processing, and optimized storage/analytics tools, it supports scenarios from IoT monitoring to financial fraud detection with minimal delay.

Batch vs. Streaming Data Processing

Batch Data Processing

Data Scope: Large historical datasets (hours, days, or years).
Data Size: Very large — millions or billions of records.
Performance Expectations: Not time-sensitive; can take minutes, hours, or days.
Complexity: Supports deep analytics, complex queries, and machine learning.
Example: Analyzing a year of HVAC data to determine optimal system settings.

Streaming Data Processing

Data Scope: Small, recent "windows" of data (seconds or milliseconds).
Data Size: Small — just a few seconds’ worth of incoming records at a time.
Performance Expectations: Real-time or near real-time (milliseconds or seconds).
Complexity: Typically simpler analytics (e.g., rolling averages, anomaly detection).
Example: Triggering an alert within seconds when a gas plant’s pressure exceeds a safe threshold.

Key Differences

Feature	Batch Processing	Streaming Processing
Data Scope	Historical data	Recent/live data
Data Size	Very large datasets	Small time windows
Speed	Minutes to days	Milliseconds to seconds
Complexity	Complex analytics and models	Simple, fast calculations
Use Case	Trend analysis, reporting, forecasting	Alerts, monitoring, real-time decision-making
Example	Yearly energy usage analysis	IoT temperature alert

Relational Databases

Definition

A relational database stores and provides access to data points that have relationships to one another, defined by the relational model.

Data is organized into tables (rows = records, columns = attributes).
Relationships within and between tables are represented by keys (unique IDs).
The schema (rigid structure) must be defined before data storage.

Structure

Tables: Store data points as rows; attributes are columns.
Keys:
- Primary key: Unique identifier for each row (e.g., username).
- Can have multiple keys for indexing and quick lookups.
Views: Predefined queries that present data in different formats.
Indexes: Improve data retrieval speed.
Schema vs. Physical Storage: Logical structure is separate from physical storage (allows re-indexing without changing schema).

Benefits

Intuitive organization — easy to read like a 2D table.
Efficient read/write operations via keys and indexes.
Flexibility — model can be tailored to the data.
Data consistency — always returns the most up-to-date data.
Concurrency control — database locking at table or row level.
Programmable — supports stored procedures for automation.

Use Cases

Inventory tracking
Transaction processing
Customer information management

Relational vs. Non-Relational

Feature	Relational (RDBMS)	Non-Relational (NoSQL)
Structure	Rigid schema (tables, rows, columns)	Flexible, schema-less
Consistency	Strong consistency	Eventual consistency
Scalability	Vertical scaling	Horizontal scaling (elastic)
Performance	Optimized for transactions	Optimized for high-speed, large-scale data
Best For	Banking, inventory, customer data	Social media analytics, big unstructured data

Key Considerations When Choosing a Database

Data accuracy needs — Strong vs. eventual consistency.
Scalability — Can it handle future data growth?
Concurrency — Will it maintain accuracy with many users?
Performance & Reliability — Required uptime and speed.

Azure Data Explorer

Overview

Azure Data Explorer (ADX) is a big data service for collecting, storing, and analyzing data from a wide range of sources. Common sources include:

Websites (via APIs)
Applications (aggregated or processed data)
IoT devices (streaming telemetry)
Virtually any other programmatic data source

Key Features

Highly scalable — handles terabytes of data quickly.
Fast analysis with Kusto Query Language (KQL).
NoSQL architecture — supports heterogeneous data.
Extensible — integrates with Microsoft services (e.g., Power BI) and custom architectures.
Real-time indexing for both structured and unstructured data.
Visualization — integrates with tools like Power BI for reporting.

Workflow

Data Collection
- Sources: IoT Hub, Event Hub, custom services, etc.
- Temporary storage: Blob Storage, Table Storage, Cosmos DB.
Data Ingestion
- Built-in connectors for common sources.
- Custom programmatic ingestion possible.
Data Storage
- Stored internally with indexes for fast querying.
Data Querying
- Read-only queries (data isn’t modified in ADX).
- Queries run via Kusto Query Language.
- Data deletion/archiving done via services like Azure Data Factory.

Kusto Query Language (KQL) Capabilities

Column-based querying (column store).
Fast text indexing for pattern search.
Built-in time series analysis (ideal for IoT telemetry).
Real-time analytics.

Common Use Cases

IoT telemetry analysis
Website/app event analytics
Operational monitoring
Log and telemetry querying

Microsoft Services Using ADX Internally

Azure Application Insights — Performance monitoring and debugging.
Azure Monitor Logs — Usage and performance tracking.
Time Series Insights — Specialized time series analysis.
Windows Defender Advanced Threat Protection — Real-time security analytics.

Important Notes

Clusters host databases that store tables.
Ingestion can be automated with mappings for incoming data.
Data is not updated or deleted inside ADX (only archived externally).

Using Azure Data Explorer for Data Visualization

Overview

Azure Data Explorer (ADX) provides built-in capabilities to query and visualize large datasets directly within its web interface at dataexplorer.azure.com.

Steps to Visualize Data in ADX

1. Connect to a Cluster

Go to dataexplorer.azure.com.
Click Add Cluster.
Enter the Connection URI (for the demo: https://help.kusto.windows.net).
Enter a display name (e.g., Test Data).
Click Add — your cluster appears in the left pane.

2. Select a Database and Table

Expand the cluster and choose a database (Samples in the demo).
Select a table (StormEvents in the demo).

3. Run a Basic Query

Kusto queries start with the table name (no SELECT keyword needed).

StormEvents
StormEvents

Run the query to retrieve data. For large datasets, limit results:

StormEvents
| take 10000

Filter Data

Use where to filter specific records. Example: Only show heavy rain events.

StormEvents
| where EventType == "Heavy Rain"

Summarize Data

Aggregate results using summarize.

StormEvents
| where EventType == "Heavy Rain"
| summarize event_count = count() by State

This returns each state with the count of heavy rain events. 6. Render a Visualization

Use render to create a chart.

StormEvents
| where EventType == "Heavy Rain"
| summarize event_count = count() by State
| render columnchart

X-axis → First column (State)
Y-axis → Second column (event_count)
Hover to see exact values. Key Points
take limits result size.
where filters rows.
summarize aggregates data.
render creates visualizations (column charts, pie charts, time charts, etc.).
All done directly inside the ADX web UI — no external tools required (but can integrate with Power BI for more advanced visuals).

Choosing a Data Analytics Technology for Reporting

Purpose

Selecting the right data reporting technology ensures that reports meet the needs of their primary users—often data analysts—who require specific capabilities for data analysis, visualization, and insight generation.

Microsoft Azure Reporting Technologies

Power BI

Focus: Business reporting and visualization.
Data Sources: Connects to hundreds of sources (Excel, Azure Blob/Table Storage, SQL Database, etc.).
Strengths: Built-in visualizations, ad hoc queries, Azure AD authentication, embeddable in other apps.
Use Case Example: Sales trends dashboard with drill-down by region and product.

Jupyter Notebooks

Focus: Programmatic data analysis using Python, Scala, or R.
Data Sources: Commonly used with HDInsight clusters.
Strengths: Custom visualizations, advanced statistical analysis, machine learning integration.
Use Case Example: Predictive maintenance model for IoT sensor data.

Zeppelin Notebooks

Focus: Hive queries and big data exploration.
Data Sources: Works with HDInsight Hive; supports multiple user logins tied to cluster domains.
Strengths: Browser-based, collaborative big data querying.
Use Case Example: Querying terabytes of transaction logs for fraud pattern detection.

Azure Notebooks

Focus: Cloud-based Jupyter environment (no local install).
Data Sources: Flexible—connects to Blob/Table Storage, SQL Database, and more.
Strengths: Free, easily accessible, shareable via Azure AD, no local setup.
Use Case Example: Collaborative data science project analyzing climate datasets.

Data Analyst Goals

Identify Trends: Combine multiple data sources to detect seasonal, behavioral, or operational patterns.
Spot Issues: Discover anomalies or inefficiencies, sometimes unexpectedly.
Integrate Data: Aggregate and compare diverse datasets for richer insights.
Visualize Findings: Present results in meaningful, business-friendly formats.

Key Criteria for Choosing a Technology

Multiple Data Source Support: Can the tool integrate data from diverse locations?
Embedding Capabilities: Do you need visualizations embedded in your applications?
Online vs. Offline Reporting: Will reports be consumed in real-time, offline, or both?
- Consider securing offline reports against unauthorized access.
Data Volume Handling: Do you require big data clusters or large-scale processing?
Evolving Needs: Periodically re-evaluate tools as business requirements change.

Comparison Table

Feature	Power BI	Jupyter Notebooks	Zeppelin Notebooks	Azure Notebooks
Focus	Business reporting & visuals	Programmatic analysis	Hive query-based analysis	Cloud-hosted programmatic analysis
Data Sources	Hundreds, including Azure & on-prem	Typically HDInsight	HDInsight Hive	Multiple Azure-connected sources
Best For	Dashboards, KPIs, quick insights	Advanced analytics, machine learning	Big data SQL/Hive exploration	Collaborative cloud data science
Complexity	Low–Medium	Medium–High	Medium	Medium–High
Example	Sales trends dashboard	Predictive maintenance	Fraud detection	Climate data modeling

SQL Server Reporting Services (SSRS)

SQL Server Reporting Services (SSRS) is a collection of on-premises tools and services for creating, deploying, and managing reports based on SQL Server data. It supports multiple platforms, including web and mobile, with modern pagination and responsive layouts.

Key Features

Report Creation
- Build rich, customized reports from SQL Server databases.
- Utilize modern pagination tools and templates for multi-page reports.
- Choose from updated built-in styles for quick customization.
Report Deployment
- Distribute reports across web browsers, mobile devices, and hybrid platforms (SSRS + Power BI).
- Responsive layouts for optimal mobile viewing.
- Mobile Report Publisher for creating mobile-optimized reports.
Report Management
- Manage reports, schedules, permissions, and history via the SSRS web portal.
- Store datasets and data sources (e.g., Excel, external databases).
- Remote administration capabilities for authorized users.

Modern Enhancements

Updated Formats
- Export to PDF, PowerPoint (with interactivity), and other formats.
- PowerPoint exports allow filter modifications and dynamic views.
Mobile Reporting
- HTML5 and responsive designs remove the need for separate desktop/mobile versions.
- SQL Server migration assistant converts Datazen reports for mobile.
- Reports consumable through Power BI mobile apps or web browsers.

Extensibility

APIs and Customization
- Programmatically control data sources, aggregation, and workflows.
- Extend SSRS to integrate with custom systems and data sources.
- Enable advanced, highly tailored reporting experiences.

In summary, SSRS is more than just a reporting tool—it’s a flexible, extensible reporting platform designed to deliver interactive, paginated, and mobile-friendly reports while giving administrators and developers powerful customization options.

SSRS Chart Types Summary

Area Chart

Shows values over time with filled areas under lines.
Best for aggregated data and stacked category comparisons.

Bar Chart

Horizontal bars for comparing categories.
Best for short, finite time periods or discrete comparisons.

Line Chart

Displays trends over time with lines for each category.
Best for summarizing large, time-based datasets.

Sparklines

Mini charts without axes for quick trend visuals.
Best for adding trend context to individual values.

Pie Chart

Displays category percentages of a whole.
Best for non-time-based aggregate breakdowns.

Scatter Chart

Plots points to show relationships between two variables.
Best for analyzing variable interactions.

Gauge

Shows a value relative to an expected range.
Best for at-a-glance status or performance indicators.

Stock Chart

Displays data range, start, and end for time intervals.
Best for tracking performance changes over time.

Tree Map

Nested rectangles showing category proportions by size and color.
Best for hierarchical data visualization.

Map

Visualizes geospatial data with overlays.
Best for location-based analysis.

Chart Type	Best For	Data Focus	Visual Style	Example Use Case
Area Chart	Aggregates over time	Time series, categories	Filled areas under lines	Showing sales volume stacked by product categories over months
Bar Chart	Comparing categories	Discrete categories	Horizontal bars	Comparing revenue by region
Line Chart	Trends over time	Time series	Lines per category	Tracking stock prices quarterly
Sparklines	Mini trend indicators	Short time ranges	Tiny line/area without axes	Showing weekly sales trends per salesperson
Pie Chart	Percentage breakdown of whole	Categorical totals	Circular slices	Lunch menu sales distribution
Scatter Chart	Relationship between two variables	Two continuous variables	Points on XY axes	Correlation between temperature and ice cream sales
Gauge	Current value vs. target	Single metric	Dial or bar style	Fuel level indicator
Stock Chart	Range + start/end over time	Time intervals	Candlestick-like with ticks	Hourly stock price fluctuations
Tree Map	Hierarchical data proportions	Categories & subcategories	Nested colored rectangles	Bike sales by territory and type
Map	Geospatial data visualization	Location-based data	Geographic overlays + bubbles	Household counts by county

Analytic Techniques — Summary

Analytic techniques help uncover insights from data rather than focusing solely on the data itself. The main types include:

1. Descriptive Analytics

Focuses on what happened in the past.
Summarizes and presents historical data without drawing conclusions.
Uses reports, dashboards, graphs, and charts (e.g., line chart of last year’s sales).
Helps assess business performance.

2. Diagnostic Analytics

Explains why past patterns occurred (root cause analysis).
Involves:
- Data discovery — gathering data from various sources.
- Data mining — identifying patterns and insights.
- Drill-down — moving from general to specific views in reports.
- Drill-through — switching between visualizations of the same dataset.

3. Predictive Analytics

Uses historical data patterns to forecast future outcomes.
Based on the idea that past trends can indicate future trends.

4. Prescriptive Analytics

Builds on predictive analytics to suggest actions to influence future outcomes.
Evaluates potential implications of each action.
Supports decision-making for improving future performance.

5. Cognitive Analytics

Uses AI systems to analyze data in a human-like way.
Learns from past data to make more effective future decisions.
Recognizes patterns, understands context, and uncovers previously unseen relationships.
Enables personalized, data-driven decisions rather than relying on intuition.

Stream Analytics — Summary

Stream analytics enables real-time data processing by analyzing data as it flows into a system, rather than only after storage. A common example is Microsoft Azure Stream Analytics, which processes inputs from sources like IoT sensors, social media feeds, and applications.

Uses

Geospatial analytics — e.g., tracking delivery trucks in real time for optimal routing.
Asset monitoring — detecting anomalies in facilities like gas plants.
Real-time business insights — analyzing sales data for personalized advertising.
Telemetry monitoring — checking IoT sensor readings for out-of-bounds values.
Keyword tracking — extracting insights from live weblog streams.

How It Works

Stream analytics is composed of three stages:

Input
- Data can come from Azure Event Hubs, Azure IoT Hub, or Azure Blob Storage.
Query
- Uses an SQL-like query language for analysis while data flows in.
- Supports user-defined functions (JavaScript, C#).
- Handles event order and time windows for correct sequencing.
- Can reorder events, aggregate over time, detect anomalies, match patterns, and process geospatial data.
Output
- May produce no output, pass data through unchanged, or generate a transformed stream.
- Can send results to visualization tools (e.g., Power BI) or storage systems (Azure SQL Database, Blob Storage, Data Lake, CosmosDB).

Key Features

Fully managed PaaS — no hardware provisioning required.
Scalable — processes millions of events per second.
Edge computing support — ultra-low latency processing closer to data sources.
Reliability — guarantees event processing without loss.
Security — all incoming/outgoing data and checkpoints are encrypted.
Performance — built on the Trill engine for high-speed stream processing.
Cost-optimized — pay only for the processing used.
In-memory processing — data is never written to a store during processing.

Essence: Stream analytics transforms live data into actionable insights instantly, with flexibility to process at scale in the cloud or at the edge.

Text Analytics — Summary

Text analytics uses natural language processing (NLP), statistics, and machine learning to extract meaning from unstructured text data.

Key Capabilities

Key phrase extraction — identifies the most important words/phrases in text.
Entity recognition — detects and categorizes names of people, places, organizations, dates, etc.
Sentiment analysis — determines whether a text expresses positive, negative, neutral, or mixed sentiment.
Language detection — automatically identifies the language of the text.
Named entity linking — maps recognized entities to specific entries in knowledge bases.
PII detection — finds and classifies personally identifiable information in text.
Healthcare text analysis — extracts medical information from unstructured clinical documents.

Applications

Customer feedback analysis — extracting sentiment from reviews or surveys.
Social media monitoring — tracking public opinion on brands, products, or events.
Content classification — organizing large volumes of documents or messages.
Compliance monitoring — detecting sensitive or regulated information in communications.

Essence: Text analytics turns raw text into structured, actionable insights that can drive decisions in customer engagement, compliance, and operational efficiency.

Batch Analytics — Summary

Batch analytics processes large volumes of stored data in scheduled runs, rather than in real time.

How It Works

Operates on historical data stored in data lakes, databases, or files.
Processes datasets in chunks or batches on a set schedule or on-demand.
Can use tools like Azure Synapse Analytics, Hadoop, or Spark.

Applications

Financial reporting — generating monthly or quarterly summaries.
Data transformation — cleaning and structuring raw datasets for analysis.
Trend analysis — identifying long-term patterns in sales, traffic, or production.
Machine learning training — preparing and processing large datasets for model building.

Key Points

Not real-time — results are available only after the batch completes.
Cost-effective — efficient for periodic large-scale processing.
Scalable — can handle terabytes or petabytes of data.
Reliable — well-suited for processes that can tolerate latency.

Essence: Batch analytics transforms large sets of historical data into insights on a scheduled or on-demand basis, trading speed for scale and cost efficiency.

Azure Core Storage Services – Detailed Overview

This session explains Azure’s five core storage services and demonstrates creating and managing them in the Azure Portal.

1. Azure Blobs

Object storage for unstructured data (PDFs, images, logs, big data).
Accessible via cloud URIs; access can be public or private.
Example: Created myblobs container, uploaded a JPEG, accessed via public URI.

2. Azure Files

Managed SMB-based file shares for both Azure VMs and on-premises systems.
Supports performance tiers (Transaction optimized, Cool) and quotas.
Example: Created myfileshare with 5GB quota, upload and VM connection options.

3. Azure Queues

Message queue service for asynchronous communication between services.
FIFO (First-In-First-Out) message processing.
Example: Created myqueue, added two messages, demonstrated dequeuing.

4. Azure Tables

NoSQL storage for semi-structured data, supports dynamic fields per entity.
Entities must have PartitionKey and RowKey, other properties are flexible.
Example: Created mytable, added two entities with different properties to demonstrate schema-less design.

5. Azure Disks

Persistent, attachable storage for virtual machines in Azure.
Not demoed in this session.

Portal Walkthrough:

Created a storage account (test226) with unique naming and linked to an existing resource group.
Used Storage Explorer to manage Blobs, Files, Queues, and Tables from a single interface.
Demonstrated file uploads, public access configuration, queue message handling, and flexible NoSQL table creation.

Key Takeaways:

Choose storage type based on workload needs:
- Blobs for large-scale file/object storage.
- Files for shared drives across systems.
- Queues for decoupling service communication.
- Tables for flexible, schema-less data.
- Disks for VM-attached persistent volumes.
Azure Portal provides a straightforward interface for setup and testing.

ELT Pipeline in Azure – Blob Storage to Azure Data Explorer

The session demonstrates building an ELT pipeline in Azure:

Extract: Source data stored as a JSON file in Azure Blob Storage.
Load: Data is ingested directly into Azure Data Explorer without modification.
Transform: Changes are applied at query time instead of during ingestion.

Process Overview:

Create Blob Storage container and upload a JSON file containing event data.
In Azure Data Explorer, create a table schema that matches the JSON structure.
Set up a data connection between Blob Storage and Data Explorer to automatically pull in the file.
Verify data ingestion by running a basic SELECT * query.
Apply transformations directly in queries:
- Renaming columns.
- Dropping unused fields.
- Filtering and aggregating data.

Advantages of ELT:

Original source data remains unchanged.
Enables flexible, dynamic transformations tailored to different use cases.
Reduces the need for upfront data wrangling.

Drawbacks:

Transformations happen at query time, potentially increasing query latency and resource usage.

ETL Pipeline in Azure – Blob Storage to Azure Data Explorer

This session covers creating an Extract–Transform–Load (ETL) process where transformations occur before data is stored in Azure Data Explorer (ADX).

Scenario: JSON data with multiple fields (Timestamp, Value1, Value2) is uploaded to Azure Blob Storage. The pipeline should store only Timestamp and Value1 in ADX, renaming Value1 to Value and ignoring Value2 entirely.

Process Steps:

Create Table in ADX
- Table: TestTable
- Columns: Timestamp: datetime, Value: string.
Define JSON Ingestion Mapping (TestMapping)
- Map $.Timestamp → Timestamp.
- Map $.Value1 → Value.
- Value2 omitted from mapping, so it is excluded.
Configure Data Connection
- Event trigger: When a .json file is created in container myblobs.
- Ingestion mapping applied automatically during load.
Run the Pipeline
- Upload a JSON file to Blob Storage.
- Blob trigger sends file to ADX via defined mapping.
- Data arrives in final transformed form—no extra query transformations needed.

Key Takeaways:

ETL vs ELT: In ETL, transformations happen before storage, meaning faster queries but loss of raw data.
Useful when you know exactly which fields are needed long-term and want minimal query overhead.
Reduces storage costs by excluding unnecessary fields up front.

Characteristics of Relational Data

1. Definition and Purpose

A relational database (RDB) is a system that stores data in structured tables and allows you to retrieve, manipulate, and manage it efficiently using SQL (Structured Query Language). Relational databases provide both storage and querying capabilities, supporting operations like filtering, updating, and aggregating data. They are widely used in applications ranging from financial transactions to customer management because they provide consistency, reliability, and a structured approach to data.

2. Structure of Relational Databases

Tables: Fundamental unit of storage; each table represents an entity (e.g., Employees, Departments) and contains rows (records) and columns (attributes).
Rows: Represent individual instances of an entity.
Columns: Define the type of data stored for each row; consistent across all rows.
Schema: Defines the table structure, including column names, data types, and constraints.

Example:

Departments Table: DepartmentID, DepartmentName
Employees Table: EmployeeID, Name, DepartmentID
Managers Table: EmployeeID, DepartmentID

3. Keys and Relationships

Primary Key (PK): Uniquely identifies each row in a table. Only one PK per table.
Foreign Key (FK): References the PK of another table to establish a relationship.
Joins: Use FKs to combine data from multiple tables, enabling complex queries and relational views.
Relational databases are based on relational algebra, a mathematical foundation ensuring consistency.

4. Normalization & Data Integrity

Normalization: Organizes tables to minimize redundancy and maintain integrity.
Denormalization: Sometimes used in analytical or reporting scenarios to improve query performance.
Constraints: Ensure valid data (e.g., unique PKs, valid FK references, non-nullable columns).

5. SQL Operations

Data Definition Language (DDL): CREATE TABLE, ALTER TABLE, DROP TABLE
Data Manipulation Language (DML): INSERT, UPDATE, DELETE
Querying: SELECT, WHERE, JOIN, GROUP BY, ORDER BY
Indexes: Enhance query performance by providing efficient access paths to data.

6. Use Cases / Workloads

Online Transaction Processing (OLTP)
- Supports real-time transactions (e.g., retail, banking, order management).
- Emphasizes ACID properties: Atomicity, Consistency, Isolation, Durability.
Internet of Things (IoT)
- Embedded relational databases in gateway devices can store sensor data locally.
- Allows quick and reliable aggregation and querying of edge-collected data.
Online Analytical Processing (OLAP) / Data Warehousing
- Optimized for querying historical data from multiple sources.
- Supports dimensional modeling, aggregations, and business intelligence analysis.
- Enables stakeholders to derive insights and summaries efficiently.

7. Advantages

Strong data consistency and integrity.
Supports complex queries across related tables.
Mature ecosystem (SQL support, drivers, tools).
Scalable from small embedded systems to enterprise-grade warehouses.

8. Limitations

Rigid schemas make evolving data models harder.
Performance may degrade with extremely large datasets or high write loads without tuning.
Less suited for unstructured or semi-structured data compared to NoSQL.

9. Best Practices

Carefully design tables, PKs, FKs, and relationships.
Use indexes for frequently queried columns.
Balance normalization and denormalization according to workload requirements.
Monitor performance and optimize queries, especially joins and aggregations.

Data Store Models (DSM)

1. Introduction

A Data Store Model (DSM) defines how data is organized, stored, and accessed in a system. Modern business systems ingest heterogeneous data—data with high variability in types and formats. Different types of data require different storage approaches, which is where polyglot persistence comes in: using multiple data store techniques within the same system to optimize for various workloads.

When selecting a DSM, consider:

Schema requirements: SQL vs NoSQL
Query patterns: simple lookup vs complex relationships
Performance: speed, scale, latency
Management and cost: ease of administration and operational costs

2. Relational Database Management Systems (RDBMS)

Stores data in tables (rows and columns).
Provides ACID transactions:
- Atomicity: operations are all-or-nothing
- Consistency: data remains valid across transactions
- Isolation: concurrent operations do not interfere
- Durability: data persists after transactions
Supports SQL for creating tables, inserting, updating, deleting, and querying data.
Can run across multiple regions for high availability and durability.

Best for: Structured data, complex queries, OLTP (Online Transaction Processing), business applications.

3. Key-Value Stores

Each data item is stored as a value associated with a unique key.
Supports simple operations: insert, delete, and retrieve by key.
Highly optimized for fast lookups.
Not optimized for queries based on values.

Best for: Caching, session storage, real-time applications requiring high-speed access.

4. Document Databases

Stores data as JSON-like documents, which can have nested or varying structures.
No fixed schema; documents can evolve over time.
Documents often represent entities (e.g., a customer or order) and may include what would normally require multiple tables in a relational database.
Example: Azure Cosmos DB SQL API.

Best for: Semi-structured data, flexible schema requirements, rapid application development.

5. Graph Databases

Focus on relationships between data as much as the data itself.
Composed of nodes (entities) and edges (relationships).
Properties can exist on both nodes and edges.
Supports complex traversals and queries over large, interconnected datasets.
Example: Azure Cosmos DB Gremlin API.

Best for: Social networks, recommendation engines, fraud detection.

6. Data Analytics Stores

Designed for large-scale data analysis across distributed systems.
Can handle historical or streaming data.
File formats: CSV, ORC, Parquet.
Services: Azure Data Lake Storage Gen2, Synapse Analytics, Data Explorer, Databricks.

Best for: OLAP, business intelligence, data warehousing, big data analytics.

7. Column-Family Databases

Conceptually similar to relational databases: tables with rows and columns.
Columns are grouped into column families that store logically related data.
Supports sparse data and dynamic columns.
Data stored in key order; indexing on specific columns possible.

Best for: Analytical workloads with denormalized, sparse, or wide datasets.

8. Search Engine Databases

Indexes large datasets for near real-time search and retrieval.
Optimized for querying external data stores quickly.
Example use cases: e-commerce search, document search engines.

Best for: Full-text search, indexing large document repositories.

9. Time Series Databases

Optimized for time-indexed data.
Stores sequences of measurements or events over time (IoT sensors, financial tick data).
Records can grow rapidly due to frequent updates.
Example: Azure Time Series Insights.

Best for: IoT telemetry, monitoring, performance metrics.

10. Object Storage

Stores unstructured or semi-structured data as objects.
Optimized for large files such as images, video, audio, or backups.
Examples: Azure Blob Storage, Azure Data Lake Storage Gen2.
Can store VM disk images for rapid deployment.

Best for: Large files, backups, cloud-native applications.

11. Shared File Storage

Cloud-based file shares accessible across networks.
Supports standard file system operations with access controls.
Highly scalable for growing storage needs.
Example: Azure Files.

Best for: Enterprise file sharing, collaborative workspaces, applications requiring SMB/NFS access.

12. Summary Table

DSM Type	Structure / Key Feature	Best Use Case / Workload
RDBMS	Tables, ACID, SQL	OLTP, structured business applications
Key-Value Store	Unique key → value	Caching, real-time lookups
Document Database	JSON-like documents, schema-less	Semi-structured data, flexible schema apps
Graph Database	Nodes & edges, relationship-centric	Social networks, recommendation engines
Data Analytics Store	Distributed, large-scale	OLAP, data warehouses, big data analytics
Column-Family Database	Rows & column families, sparse data	Analytical workloads, denormalized datasets
Search Engine Database	Indexing for fast retrieval	Full-text search, e-commerce
Time Series Database	Time-indexed measurements	IoT telemetry, monitoring
Object Storage	Unstructured objects	Large files, backups, media storage
Shared File Storage	Network-accessible file shares	Enterprise file sharing, collaborative apps

Using Indexes in an Azure SQL Relational Database

1. Introduction

Indexes are on-disk structures associated with tables or views that speed up data retrieval. Without proper indexing, queries can be slow, especially on large tables. In Azure SQL, indexes must be created before building queries that require optimized access patterns.

Key points:

Indexes contain keys from one or more columns.
Keys are stored in a B-Tree structure, which allows SQL Server to locate rows quickly.
Indexes improve query performance but may slightly impact write performance (INSERT, UPDATE, DELETE).

2. Types of Indexes

2.1 Clustered Index

Sorts and stores the actual data rows in the table or view based on the key values.
Only one clustered index per table because data rows can be sorted in only one order.
Ideal for columns that are frequently used in range queries or as primary keys.

2.2 Non-Clustered Index

Separate structure from the data rows.
Contains key values with a pointer to the data row.
Can improve performance of queries that search for specific columns.
Multiple non-clustered indexes can exist per table.

2.3 Unique Index

Can be clustered or non-clustered.
Ensures no two rows share the same value for the indexed column(s).
Useful for enforcing uniqueness constraints on business-critical columns.

3. Creating Indexes in Azure SQL

3.1 Non-Clustered Index Example

CREATE INDEX index1
ON dbo.Persons(PersonID);

Creates a non-clustered index with a unique constraint.
Uses multiple columns with specific sort orders.

4. Index Properties in SQL Server

After creating an index, you can inspect its properties:

Index key columns: Columns used for the index.
Included columns: Additional columns stored in the index for query coverage.
Options: Locking behavior (row/page locks), fill factor, etc.
Storage: Physical location and filegroup.

You can access index properties in SQL Server Management Studio (SSMS):

Navigate to the table (dbo.Persons → Indexes).
Right-click the index → Properties.
Review and modify options if needed.

5. Benefits of Indexing

Speeds up SELECT queries by allowing SQL Server to quickly locate rows.
Optimizes JOIN operations by providing fast access to key columns.
Can enforce uniqueness for critical columns.
Helps maintain efficient data access patterns for OLTP and OLAP workloads.

6. Best Practices

Index columns that are frequently used in WHERE, JOIN, ORDER BY, and GROUP BY clauses.
Avoid excessive indexing on frequently updated tables, as it may slow down write operations.
Monitor and rebuild or reorganize indexes periodically to maintain performance.
Consider covering indexes (including additional columns) to avoid extra lookups.

7. Summary

Index Type	Key Feature	Best Use Case
Clustered Index	Sorts & stores data rows	Primary key, range queries
Non-Clustered Index	Separate structure, points to data rows	Queries on specific columns
Unique Index	Ensures no duplicate values, can be clustered/non-clustered	Enforcing uniqueness constraints

Using Views in an Azure SQL Relational Database

1. Introduction

A view is a virtual table in a SQL database that is defined by a query. Views do not store data themselves, but they allow you to query data as if it were a table. They are useful for:

Abstracting complex queries.
Hiding certain columns or sensitive data.
Combining data from multiple tables using joins.
Simplifying access to frequently queried datasets.

Views are searchable objects and can be used in SELECT statements like normal tables.

2. Characteristics of Views

Virtual table: Stores the query definition, not the data.
Readability: Simplifies complex SQL queries for developers or analysts.
Reusability: You can query a view repeatedly without rewriting the underlying SQL.
Abstraction: Can hide table structure complexity and sensitive information.
Dynamic data: Always returns current data from the underlying tables.

3. System vs. User Views

System Views: Pre-created views in a database for metadata, configuration, or server properties. Usually start with sys.
User Views: Views created by developers to represent specific subsets of data or simplified queries.

4. Creating Views

4.1 Basic Syntax

CREATE VIEW view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE condition;

4.2 Example: Canadian Cities

CREATE VIEW canadiancities AS
SELECT FirstName, LastName
FROM dbo.Persons
WHERE City IN ('Montreal', 'Calgary', 'Toronto');

4.3 Example: American Cities

CREATE VIEW americancities AS
SELECT FirstName, LastName
FROM dbo.Persons
WHERE City IN ('New York', 'Chicago', 'Boston');

After executing the query, these views can be queried like tables:

SELECT * FROM canadiancities;
SELECT * FROM americancities;

5. Benefits of Using Views

Simplifies complex queries by encapsulating joins and filters.
Improves maintainability of SQL code.
Provides security by exposing only the necessary columns.
Enhances consistency by using the same view in multiple queries.

6. Best Practices

Use views for repeatedly used queries to reduce duplication.
Avoid overusing views for write-intensive operations; they may add overhead.
Use indexed views (materialized views) if you need faster performance on large datasets.
Keep view definitions clear and simple to avoid complex maintenance.

7. Summary

Feature	Description
Virtual Table	Defined by a query, stores no data
Queryable Like a Table	Can be used in SELECT statements
Combines Multiple Tables	Supports joins and aggregations
Abstraction & Security	Can hide complex queries or sensitive columns
Dynamic Data	Always reflects current data in underlying tables

Azure Relational Delivery Models

1. Introduction

Microsoft Azure offers multiple fully managed relational database options to support modern cloud applications. These services are:

Intelligent: AI-driven performance optimization and automation.
Scalable: Can handle workloads from small apps to enterprise-level solutions.
Flexible: Supports multiple database engines, deployment models, and hybrid connectivity.
Managed: Reduces operational overhead like patching, backups, and scaling.

Understanding the differences between these delivery models helps you choose the right database for your workload.

2. Azure SQL Database

Type: Fully managed relational database.
Key Features:
- Serverless compute: No need to manage servers, automatic scaling on demand.
- Hyperscale storage: Scales storage dynamically.
- AI-powered: Performance monitoring and automated resource scaling.
- High availability and automated backups.
Use Case: Cloud-native applications that require intelligent, scalable SQL storage.

3. Azure SQL Managed Instance

Type: Fully managed SQL instance with full SQL Server compatibility.
Key Features:
- Migration of existing SQL workloads to Azure.
- Maintains full SQL Server compatibility.
- Azure Arc support for hybrid deployments.
- Elastic scale and unified management.
Use Case: Modernizing on-premises SQL Server applications to the cloud.

4. SQL Server on Virtual Machines (VMs)

Type: Infrastructure-as-a-Service (IaaS) relational database.
Key Features:
- Complete control over SQL Server instance and VM configuration.
- Full compatibility with SQL Server features.
- Hybrid connectivity options.
- Access to latest SQL Server releases (e.g., SQL Server 2019).
Use Case: Lift-and-shift migrations of existing SQL Server workloads to Azure with high control and performance.

5. Azure Database for PostgreSQL

Type: Managed relational PostgreSQL service.
Key Features:
- Single-zone or zone-redundant high availability.
- AI-powered performance optimization.
- Advanced security features.
- Scalable and fully managed.
Use Case: Open-source workloads, enterprise applications, and migration from on-prem PostgreSQL.

6. Azure Database for MySQL

Type: Managed relational MySQL service.
Key Features:
- High availability with up to 99.99% uptime.
- Elastic scaling for compute and storage.
- Open-source compatibility.
- Easy setup and operation.
Use Case: Web and mobile applications, open-source app development.

7. Azure Database for MariaDB

Type: Managed relational MariaDB service.
Key Features:
- Fully managed database engine.
- Predictable performance and dynamic scalability.
- High availability for mission-critical workloads.
- Support for open-source development tools.
Use Case: Mission-critical workloads requiring open-source relational database solutions.

8. Summary Table

Database Type	Delivery Model	Key Benefits	Best Use Case
Azure SQL Database	PaaS (Serverless)	Intelligent, scalable, automated	Cloud-native SQL apps
Azure SQL Managed Instance	PaaS (Instance)	Full SQL Server compatibility, hybrid support	Modernizing on-prem SQL workloads
SQL Server on VMs	IaaS	Full control, hybrid connectivity	Lift-and-shift migrations, high performance needs
Azure Database for PostgreSQL	PaaS	Open-source, AI optimization, secure	PostgreSQL workloads, cloud migration
Azure Database for MySQL	PaaS	Open-source, easy scale, high availability	Web/mobile apps, open-source development
Azure Database for MariaDB	PaaS	Fully managed, dynamic scalability, HA	Mission-critical workloads, open-source apps

9. Key Considerations for Choosing a Delivery Model

Performance & Scale: How large is your workload, and how dynamic are your resource needs?
Compatibility: Do you need full SQL Server features or open-source engine support?
Management Overhead: Do you want fully managed PaaS or more control via IaaS?
High Availability: How critical is uptime and redundancy across regions?
Cost: Balance between upfront cost, scalability, and operational cost.

Microsoft Azure Platform as a Service (PaaS)

1. Introduction

Platform as a Service (PaaS) provides a fully managed cloud environment for building, testing, deploying, and managing applications without the need to manage the underlying infrastructure. PaaS allows developers to focus on application development rather than hardware, networking, or operating systems.

Key benefits include:

Reduced operational overhead
Faster application deployment
Integrated development tools
Scalability and high availability built-in
Pay-as-you-go cost model

2. Core Features of Azure PaaS

2.1 Development Tools

Integrated software development kits (SDKs).
Monitoring and analytics tools.
Built-in support for application lifecycle management:
- Build
- Test
- Deploy
- Manage
- Update

2.2 Operating Systems

Choice of Windows or Linux.
Specialized OS options for machine learning or other workloads.
No need to install or maintain the OS manually.

2.3 Servers and Storage

Servers and storage are provisioned and managed automatically.
Developers avoid:
- Configuring hardware
- Applying patches or updates
- Managing backups
Scalable storage and compute resources on-demand.

2.4 Networking

Networking infrastructure (firewalls, routing, load balancing) is fully managed by Azure.
Enables global application deployment without managing VPNs or physical network devices.

2.5 Data Center Management

Azure handles all physical infrastructure, including:
- Power and cooling
- Security
- Maintenance
Developers do not manage the data center.

3. Use Cases for Azure PaaS

Development Framework: Quickly build and deploy web or mobile applications.
Cloud Migration: Move in-house apps to the cloud, reducing operational overhead.
Edge Deployment: Deploy applications closer to users worldwide for improved performance.
Integrated Analytics & Business Intelligence: Connect applications directly to cloud analytics services.
Rapid Prototyping: Launch applications quickly without upfront infrastructure investment.

4. Advantages of PaaS

Reduced coding time: Pre-coded templates and starter applications.
Directory services integration: Built-in integration with authentication and identity services.
Enhanced development capabilities: Scale development without adding staff.
Sophisticated tools: Access to advanced BI and development tools.
Global collaboration: Teams across the world can work in a unified environment.
Application lifecycle support: Full support from development to deployment.

5. Cost and Ownership

Aspect	Description
Upfront Costs	Minimal; pay only for resources used.
User Ownership	Focus solely on developing and managing applications.
Cloud Provider Ownership	Manages servers, OS, networking, and overall infrastructure.
Pay-as-you-go	Resources can be scaled up or down; idle resources do not incur costs.

6. Key Takeaways

PaaS eliminates the need to manage infrastructure, servers, and operating systems.
It provides a ready-to-use development environment in the cloud.
Ideal for scenarios where developers want to focus on applications and code.
Offers cost efficiency, scalability, and global accessibility.

Microsoft Azure Infrastructure as a Service (IaaS)

1. Introduction

Infrastructure as a Service (IaaS) provides fully provisioned computing infrastructure in the cloud, giving users virtualized servers, storage, and networking. Unlike PaaS, IaaS gives you full control over the operating system, applications, and runtime, while the cloud provider manages the physical data center.

Key benefits:

Quick provisioning of infrastructure
No need to manage physical data centers
Flexible, scalable, and cost-effective
Pay-as-you-go model

2. Core Features of Azure IaaS

2.1 Servers and Storage

Virtual machines (VMs) with customizable CPU, RAM, and storage.
Storage options include:
- Object storage
- File storage
- Attached block storage
Scalable on-demand without physical procurement.

2.2 Networking

Virtual networks with firewalls, NAT, Internet gateways, and ACLs.
Enables full replication of on-premise network infrastructure.
Supports hybrid environments connecting on-premise and cloud resources.

2.3 User Management

Users manage:
- Applications
- Data within applications
- Runtime libraries
- Middleware
- Operating systems
Full control over configuration and deployment.

2.4 Cloud Provider Responsibilities

Manages physical servers, storage, and networking.
Ensures data center security, power, cooling, and availability.
Provides Service Level Agreements (SLAs) for uptime.

3. Common Use Cases for IaaS

Workload Migration: Move existing on-premise workloads to the cloud.
Deployment & Testing: Spin up and tear down test environments quickly using Infrastructure as Code.
High Performance Computing (HPC): Run computationally intensive tasks like simulations, weather prediction, or financial modeling.
Web Hosting: Host websites or applications without maintaining physical infrastructure.
Big Data & Analytics: Provision compute and storage for analytics workloads.
Remote Work: Deploy virtual desktops for distributed teams.
Cloud Backup & Disaster Recovery: Backup data and replicate infrastructure across regions.
Hybrid Environments: Combine on-premise infrastructure with cloud resources.

4. Advantages of IaaS

Availability: Providers maintain infrastructure with SLAs.
Disaster Recovery: Easier and faster replication and recovery.
Reduced Costs: Eliminates upfront capital expenses and lowers operational costs.
Scalability: Quickly scale up or down based on demand.
Stability: Reliable infrastructure maintained by cloud providers.
Security: Built-in security features and customizable controls.

5. Cost and Ownership

Aspect	Description
Upfront Costs	Minimal; pay only for resources used.
User Ownership	Responsible for software, OS, middleware, applications, and data management.
Provider Ownership	Responsible for hardware, networking, storage, and overall infrastructure.

6. Key Takeaways

IaaS allows full control over infrastructure without the burden of maintaining physical hardware.
Best suited for scenarios where you need customizable environments, high performance, or hybrid deployments.
Offers flexible, scalable, and cost-effective solutions for enterprise workloads, testing, and web hosting.
The division of responsibilities ensures users manage the software stack while providers manage the underlying infrastructure.

Microsoft Azure Software as a Service (SaaS)

1. Introduction

Software as a Service (SaaS) delivers cloud-hosted applications to end-users over the internet. Users access the software via a web browser without installing or maintaining the application locally. The cloud provider manages all underlying infrastructure, servers, operating systems, and application updates.

Key benefits:

Minimal management for users
Subscription-based pricing (monthly or annually)
Instant access from anywhere
No need for local installation or infrastructure setup

2. Core Features of Azure SaaS

2.1 Centralized Management

Applications are hosted and managed by the cloud provider.
Providers handle:
- Servers and storage
- Networking and security
- Software updates and patches
Users only interact with the application.

2.2 Accessibility

Access applications via web browsers from any location.
Supports remote work and globally distributed teams.
No installation required on individual devices.

2.3 Subscription Pricing

Pay for what you use, e.g., number of licenses or users.
Eliminates upfront costs for software purchase.
Scales easily as your team grows.

3. Common Use Cases

Email: Gmail, Outlook, Yahoo Mail.
Customer Relationship Management (CRM): Salesforce, Microsoft Dynamics 365.
Collaboration and Calendaring: Microsoft Teams, Office 365.
Enterprise Resource Planning (ERP): SAP, PeopleSoft.
Accounting and Business Software: Simply Accounting, Sage.
Remote Workforce Mobilization: Allows employees to use the same software regardless of location.

4. Advantages of SaaS

Sophisticated Applications: Advanced tools like CRM and ERP without local setup.
Reduced Management Overhead: No installation, maintenance, or infrastructure concerns.
Workforce Mobility: Easy access for distributed teams.
Cloud Integration: Automatic backups, monitoring, and updates.
Flexible Pricing: Subscription-based, pay-as-you-go model.

5. Cost and Ownership

Aspect	Description
Upfront Costs	None; pay a subscription fee based on usage.
User Ownership	Owns the data in the application; responsible for using the software effectively.
Provider Ownership	Responsible for the application, servers, networking, security, and infrastructure.

6. Key Takeaways

SaaS provides the least management overhead compared to PaaS and IaaS.
Best suited for businesses that want ready-to-use applications without maintaining hardware or software.
Users focus solely on application data and usage, while providers handle everything else.
Ideal for email, CRM, collaboration tools, ERP, and remote workforce applications.

Online Transaction Processing (OLTP) in Microsoft Azure

1. Introduction

Online Transaction Processing (OLTP) refers to managing transactional data in relational databases. It is optimized for real-time transactional operations and supports high-volume, short, and atomic transactions.

Transactional data characteristics:

Atomic: smallest possible unit of data.
Consistent: follows data integrity rules.
Timestamped: usually includes time/date of the transaction.
Referential: often links to other data (e.g., products, customers, inventory).

Typical scenarios:

Business transactions
Payments and invoices
Orders and services
Inventory tracking

2. Core Features of OLTP

Efficient transaction processing: Insert, update, delete, or query small units of data quickly.
Data normalization: Minimizes redundancy by splitting data into multiple tables.
High throughput: Supports many concurrent transactions.
Transactional integrity: Ensures atomicity, consistency, isolation, and durability (ACID properties).

3. Common Challenges

Aggregate calculations: OLTP is not optimized for heavy analytics or summarization.
Complex queries: Denormalizing normalized data for analytics can require multiple joins.
Succinct naming conventions: Designed for speed and efficiency, not necessarily readability.
Indefinite transaction history: Large tables or long-term storage may impact performance.

4. OLTP in Azure

Azure provides fully-managed, scalable OLTP solutions suitable for modern applications:

Service	Description
Azure SQL Database	Managed relational database service for transactional workloads.
SQL Server on Virtual Machines	Full SQL Server control in a cloud VM; good for lift-and-shift.
MySQL / PostgreSQL	Fully-managed open-source database services with OLTP support.

Applications that interact with OLTP systems:

Websites (via App Service)
REST APIs
Mobile or desktop applications

5. Key Considerations When Choosing an OLTP System

Managed vs. unmanaged: Do you want the provider to handle infrastructure, scaling, and high availability?
Database engine compatibility: SQL Server, MySQL, PostgreSQL.
Throughput: High-volume write operations may require in-memory tables.
Low latency: Consider elastic pools for multi-tenant environments.
High availability: Multi-region support and uptime guarantees.
Security: Protect sensitive transactional data (e.g., payments, customer info).

6. Advantages of OLTP

Real-time transaction tracking
Scalable infrastructure for peak loads
Secure and reliable storage of transactions
Supports hybrid transactional and analytical processing (HTAP) for reporting
Integration with Azure services for analytics and monitoring

7. Use Cases

Financial transactions (credit card processing, payment gateways)
E-commerce orders and inventory tracking
Customer interactions and CRM updates
Web and mobile applications requiring fast, reliable data updates

Online Analytical Processing (OLAP) in Microsoft Azure

1. Introduction

Online Analytical Processing (OLAP) is a technology for organizing large business databases and performing complex analyses. It is the core of business intelligence (BI) in modern organizations.

Key characteristics:

Optimized for heavy reads and low writes
Integrates data from multiple sources
Supports complex calculations, aggregations, and analytics
Enables high-performance data extraction for real-time insights

2. OLAP vs. OLTP

Feature	OLTP	OLAP
Purpose	Transaction processing	Analytical processing
Data volume	Many small transactions	Large datasets from multiple sources
Operations	Insert, update, delete	Read-heavy queries, aggregations
Data model	Normalized tables	Multidimensional, star/snowflake schema
Usage	Operational systems (e.g., e-commerce orders)	Business intelligence, reporting, decision-making

3. OLAP Semantic Models

OLAP provides abstraction over underlying database structures to simplify analytics:

Tabular Models
- Uses relational concepts (tables, columns)
- Combines benefits of relational and multidimensional modeling
- Easier integration with relational data sources
Multidimensional Models
- Traditional OLAP structure using cubes, dimensions, and measures
- Optimized for pre-aggregated, multidimensional analysis
- Ideal for legacy or complex BI scenarios

Semantic modeling features:

Aggregation behaviors for reporting
Business logic and calculations
Time-oriented calculations
Integrated data from multiple sources

4. When to Use OLAP

Complex analytics across large datasets
Report generation for executives and decision-makers
Consistent results across multiple users
Scenarios where read-heavy access is required and write frequency is low

5. Challenges of OLAP

Data store refreshes are slower than OLTP due to low-write optimization
Mapping relational/ER models to multidimensional models can be complex
Requires star or snowflake schema instead of fully normalized tables
Not ideal for real-time transactional operations

6. OLAP in Microsoft Azure

Azure provides managed services and tools for OLAP workloads:

Service	Purpose
Azure Analysis Services	Fully-managed OLAP engine for tabular and multidimensional models
SQL Server Analysis Services (SSAS)	Supports cubes, dimensions, measures; can be hosted on VMs
SQL Server Integration Services (SSIS)	Data integration, ETL from multiple sources
Azure SQL Database / SQL Server	Source OLTP systems feeding OLAP workloads

Key capabilities:

Managed service, reducing infrastructure overhead
Supports multidimensional cubes and tabular semantic models
Easy integration with multiple data sources
Scalable for high availability and query performance
Dynamic query scale-out and real-time analytics (when supported)

7. Considerations for Choosing OLAP

Managed service vs. self-hosted: Do you want to manage infrastructure?
Authentication & security: Integration with Azure Active Directory
Pre-aggregated vs. real-time data: Match OLAP refresh cycles to business needs
Integration: Connect multiple Azure services and external data sources
Scalability: Support for high availability and elastic scaling

8. Advantages of OLAP

Efficient, high-performance analysis of large, complex datasets
Supports strategic decision-making using BI tools
Handles aggregated and multidimensional data
Reduces complexity for end users via semantic models
Seamless integration with Azure analytics and visualization tools

Deploying Relational Data

Deploying Relational Data in Azure

This session demonstrates three distinct methods for deploying a SQL Server database in Azure: using the Azure Portal, Azure CLI, and PowerShell. All steps are performed via Microsoft Cloud Shell when applicable.

1. Deploying via Azure Portal

Create a Resource Group
- Navigate to Resource Groups in Azure Portal.
- Click Add → fill in subscription and a unique resource group name (e.g., Azure26) → Review + Create → Create.
Create SQL Database and Server
- Within the resource group, click Add → search for SQL Database → Create.
- Fill in:
  - Database name (e.g., mydb)
  - Server details via Create new:
    - Server name (e.g., azureserver26)
    - Admin login (e.g., bbrooks26)
    - Password
    - Location
  - SQL elastic pool: No
- Click Review + Create → Create.
Configure Firewall
- Attempting to access Query Editor may prompt an IP restriction error.
- Click Set server firewall → add a rule with:
  - Rule name (e.g., ClientIP)
  - Start IP / End IP (can be the same for a single client)
- Click Save.
Querying Database
- Log in via Query Editor using admin credentials.
- Database is ready for queries.

2. Deploying via Azure CLI

Open Azure Cloud Shell
- Can be accessed via Try It link in Azure Docs Quickstart.

Create Resource Group

az group create --name AzureCLI --location eastus

3. Create sql server
```bash
az sql server create --name cliserver2626 --resource-group AzureCLI --location eastus --admin-user bbrooks26 --admin-password <password>

Create Firewall Rule

az sql server firewall-rule create --resource-group AzureCLI --server cliserver2626 --name IpRange --start-ip-address 0.0.0.0 --end-ip-address 255.255.255.255

Create SQL Database

az sql db create --resource-group AzureCLI --server cliserver2626 --name mydb --edition GeneralPurpose --compute-model Serverless --family Gen5 --capacity 2

Verify in Azure Portal

Resource group AzureCLI now contains the SQL server and database.
Access Query Editor to run queries.

Deploying via PowerShell

Open Azure Cloud Shell with PowerShell
Create Resource Group

New-AzResourceGroup -Name PowerShell -Location eastus

Create SQL Server

New-AzSqlServer -ResourceGroupName PowerShell -ServerName powershellserver26 -Location eastus -SqlAdministratorCredentials (New-Object -TypeName System.Management.Automation.PSCredential -ArgumentList "bbrooks26",(ConvertTo-SecureString "<password>" -AsPlainText -Force))

Create Firewall Rule

New-AzSqlServerFirewallRule -ResourceGroupName PowerShell -ServerName powershellserver26 -FirewallRuleName IpRange -StartIpAddress 0.0.0.0 -EndIpAddress 255.255.255.255

Create SQL Database

New-AzSqlDatabase -ResourceGroupName PowerShell -ServerName powershellserver26 -DatabaseName mydb -Edition GeneralPurpose -VCore 2 -ComputeGeneration Gen5 -MinimumCapacity 2

Verify in Azure Portal

Resource group PowerShell contains SQL server and database.
Access Query Editor to run queries.

Key Takeaways

Azure Portal: Provides a GUI wizard for creating SQL servers and databases in one flow.
Azure CLI & PowerShell: Scriptable, repeatable, and automatable methods for provisioning resources.
Firewall rules must be set to allow access from client IPs.
Using scripts is advantageous for automation, reproducibility, and integration in CI/CD pipelines.

Azure SQL Database – Uses, Features, and Deployment Models

Azure SQL Database is Microsoft’s fully managed, cloud-based relational database service built on SQL Server technology. It is delivered as Platform as a Service (PaaS), meaning Microsoft handles all infrastructure, maintenance, and updates, freeing organizations from managing servers, backups, and patches.

Key Features and Advantages

Fully Managed – Microsoft automates backups, patching, monitoring, and upgrades.
Always Updated – Always runs the latest SQL Server engine version; cloud gets patches before on-prem SQL Server.
High Availability – 99.99% uptime SLA.
Performance Enhancements –
- In-memory technologies for faster queries.
- Intelligent Query Processing for optimized workloads.
Built-in Monitoring – Performance insights and tuning tools available out-of-the-box.

Deployment Models

Single Database
- Dedicated resources (best for predictable, low-latency needs).
- Easier performance guarantees.
Elastic Pool
- Multiple databases share a pool of compute/storage.
- Ideal for workloads with varying usage patterns.
- Can set caps, minimums, and maximums per database to control costs and performance.
- Databases can be moved between single and pool configurations.

Purchasing Models

DTU-based (Database Transaction Units)
- Bundled measure of CPU, memory, and I/O.
- Charged per service tier (light, medium, heavy).
vCore-based (Virtual Cores)
- More granular control over compute, memory, and storage.
- Supports Azure Hybrid Benefit to reuse existing on-prem SQL Server licenses.
Serverless
- Auto-scales resources up and down.
- Pay per second of actual usage.
- Great for unpredictable workloads.

Service Tiers

General Purpose – For most standard workloads, budget-friendly.
Business Critical – For OLTP (online transaction processing) systems needing low latency and failover replicas.
Hyperscale – For large, dynamic workloads requiring rapid scaling of storage and compute.

High Availability & Disaster Recovery

Automatic Backups – Disaster recovery ready.
Geo-Replication – Copies databases across regions.
Auto-Failover Groups – Seamless failover between replicas.
Point-in-Time Restore – Roll back to a previous state.
Zone Redundancy – Enhanced availability in business-critical and elastic pool setups.

Monitoring & Tuning

Performance Insights – Detects performance issues and suggests improvements.
Intelligent Insights – Learns workload patterns, identifies root causes.
Automatic Tuning –
- Automatic plan correction.
- Index optimization.
- Adaptive query processing.
- Rolls back changes if performance declines.

Key Takeaways

Best for organizations wanting SQL Server functionality without the overhead of infrastructure management.
Scales up or down depending on performance needs and budget.
Highly integrated with Azure ecosystem (Event Hubs, Monitor Logs, Storage).
Mix & match single databases and elastic pools to optimize cost and performance.

Azure SQL Database Security Features

Overview

Azure SQL Database includes multiple layers of security beyond the built-in protections of the Azure platform. These features address both external threats (e.g., cyberattacks, unauthorized access) and internal risks (e.g., misconfigurations, improper change management).

Network & Protocol Security

Protocol Restriction: Only supports the Tabular Data Stream (TDS) protocol, restricting communication to TCP port 1433.
Firewall Protection: By default, blocks all IP addresses; access must be explicitly granted to trusted IPs.
SQL Database Gateway: Enforces encrypted connections and uses Denial of Service (DoS) Guard to block malicious IPs after detecting attack patterns.
Isolation Layers: - Hypervisor isolates root OS from guest VMs and separates customer environments from each other. - Virtual Machines perform packet filtering to prevent unauthorized traffic. - VLAN isolation restricts lateral movement within the network.

Configuration Management & Internal Security

Strict change control processes: all production changes are: - Tracked in a central system. - Tested in test and staging environments before deployment. - Reviewed for security and privacy compliance. - Approved by authorized personnel before going live.
Changes are monitored and rolled back quickly if failures occur.
Tools Used: - Azure Security Monitoring (checks for vulnerabilities). - Git (version control). - Master Data Services (MDS) for data management. - Automated test runners for QA. - Fabric Controller (FC) for secure Azure service management. - Source Depot (versioning). - WinFabric (service management). - Team Foundation Server (TFS) for project management.

Security Compliance & Advanced Protection

Azure Defender for SQL: - Vulnerability Assessment: Identifies weaknesses and recommends actions. - Threat Detection: Monitors for suspicious activity (e.g., SQL injection, unauthorized access) and provides mitigation steps.
Auditing Features: - Integration with Azure Active Directory for authentication and authorization. - Data Encryption: At rest, in motion, and in use. - Data Discovery & Classification: Labels sensitive data and audits access, even outside the database. - Multi-Factor Authentication (MFA): Adds extra identity verification steps beyond username/password (e.g., passcodes via email/SMS).

Key Takeaways

Azure SQL Database combines network security, strict operational controls, and continuous monitoring.
Strong protections against both external attacks and internal mismanagement.
Compliance-ready through auditing, encryption, and identity management.

Azure SQL Database – Using Azure AD Authentication (with full code + context)

Goal: Set up and use three authentication/admin paths for Azure SQL Database, with working code and practical context so you can reproduce and adapt safely.

Overview: Auth vs. AuthZ (what & why)

Authentication (AuthN) = who you are. In Azure SQL Database, this can be:
1. SQL authentication (username/password in SQL)
2. Azure AD authentication (identity from Azure Active Directory)
Authorization (AuthZ) = what you can do. Granted via database roles (e.g., db_owner, db_datareader, db_datawriter) and granular GRANT permissions.

Best practice: Prefer Azure AD for centralized identity, MFA, conditional access; then apply least privilege roles/permissions.

Method 1 — Built-in SQL admin created at server provisioning

When you create an Azure SQL server, you are prompted for a Server admin login and password. You can use that admin to sign in (e.g., Portal Query editor).

Verification (Portal ⇒ Query editor)

-- Demonstration of admin rights
CREATE TABLE dbo.admintest (id INT);
-- Result in the video: "Query succeeded", dbo.admintest appears under Tables

Notes

This admin credential is a SQL auth principal (not Azure AD).
Keep and rotate it securely; prefer day-to-day work via Azure AD identities.

Method 2 — Use Azure AD as the server Active Directory admin

Steps (Portal)

Create an Azure AD user (e.g., aaduser) in Azure Active Directory → Users → New user.
Assign as AD admin at the SQL server: SQL server → Active Directory admin → Set admin → select aaduser → Save.
First login password change: Sign in to the Azure portal as aaduser to change the initial password (required).
Connect in SSMS (SQL Server Management Studio):
- Server name: your_server.database.windows.net
- Authentication: Azure Active Directory – Password
- User name: aaduser@yourtenant.onmicrosoft.com (or your domain UPN)
- Password: (the updated one)

Result (from the video demo)

Able to connect as the Azure AD admin and create a table (via designer).
T-SQL equivalent of that simple table could be:

CREATE TABLE dbo.Table_1
(
    id NCHAR(10) NULL
);

Why this is good:

Centralized identity & lifecycle (disable/lock users in one place).

Stronger security with MFA and Conditional Access.

Easier compliance & auditing.

Pro tip: Instead of making a user server AD admin, create database users from Azure AD and grant roles minimally (see “Least privilege” below).

Method 3 — Create an internal SQL login + user + role assignment

The video shows creating a SQL login and mapping it to a user, then assigning a role so it can create tables.

T-SQL shown in the video

-- Create a SQL login (server-level) with a password
CREATE LOGIN [sqladmin] WITH PASSWORD = '!qazXsw2#edc';
GO

-- In the target database (e.g., mydb), create a user for that login
CREATE USER [sqladmin] FOR LOGIN [sqladmin];

-- Attempt to create a table as [sqladmin] (will initially fail: insufficient permission)
CREATE TABLE [sqltable] (id INT);
-- Error observed in video: "permission denied in database 'mydb'"

-- Grant broad admin rights (demonstration only)
ALTER ROLE [db_owner] ADD MEMBER [sqladmin];

-- Retry creating the table (succeeds now)
CREATE TABLE [sqltable] (id INT);

Important: db_owner is very powerful. Prefer least privilege:

-- Safer alternatives (grant only what’s needed)
ALTER ROLE [db_datareader] ADD MEMBER [sqladmin];
ALTER ROLE [db_datawriter] ADD MEMBER [sqladmin];
GRANT CREATE TABLE TO [sqladmin];             -- if they truly need to create tables
GRANT ALTER, REFERENCES, SELECT ON SCHEMA::dbo TO [sqladmin];  -- fine-grained options

Azure SQL nuances you should know (added context)

1) Contained database users (Single Database)

For Azure SQL Database (single DB), the recommended model is contained users (no server-level login needed):

-- Create a contained SQL user with password (in the user database)
CREATE USER [sqluser] WITH PASSWORD = 'A-Strong_Pwd_That_Meets_Policy!';
-- Grant minimal rights
ALTER ROLE [db_datareader] ADD MEMBER [sqluser];
ALTER ROLE [db_datawriter] ADD MEMBER [sqluser];

2) Azure AD users in the database (least privilege, recommended)

-- Create a database user from Azure AD (in the user database)
CREATE USER [alice@contoso.com] FROM EXTERNAL PROVIDER;

-- Grant minimal rights
ALTER ROLE [db_datareader] ADD MEMBER [alice@contoso.com];
-- Optional, if needed
ALTER ROLE [db_datawriter] ADD MEMBER [alice@contoso.com];

3) Setting Azure AD admin via CLI (optional but handy)

# Get Azure AD user's objectId (replace UPN)
az ad user show --id alice@contoso.com --query id --output tsv

# Set as AD admin on the SQL server
az sql server ad-admin create \
  --resource-group <rg-name> \
  --server <server-name> \
  --display-name alice@contoso.com \
  --object-id <objectId>

Troubleshooting & gotchas

First login password change (Azure AD): New cloud users must update their password on first sign-in (Portal/Microsoft 365).
Firewall / IP rules: Make sure client IP is allowed (server → Set server firewall). Azure SQL listens on TCP 1433.
Auth mismatch in SSMS: Choose Azure Active Directory – Password when using Azure AD credentials; SQL Server Authentication for SQL logins.
Permission denied: If CREATE TABLE fails, verify role membership and specific GRANTs; avoid knee-jerk db_owner.
MFA with tools: If you require MFA, consider Azure AD Integrated or token-based flows where supported.

Security best practices (quick checklist)

Prefer Azure AD identities for humans; use managed identities for apps where possible.
Enforce MFA and Conditional Access for all admins.
Use least privilege: roles + granular GRANT.
Rotate and vault any SQL passwords; monitor sign-ins and anomalies.
Audit activity (Azure SQL Auditing, Defender for SQL) and alert on threats.

Using Azure Data Studio as a Query Tool

Azure Data Studio (ADS) is a cross-platform database management and development tool from Microsoft, capable of connecting to both local and cloud-based databases (including Azure SQL Database). It provides features such as IntelliSense, smart code snippets, connection management, Jupyter notebooks, and extension support.

Installation

Navigate to the Microsoft Docs page "What is Azure Data Studio".
Use the Download and Install link to get the installer.
For Windows, download the MSI package and proceed with the default installation settings.

Connecting to an Azure SQL Database

Method 1: SQL Login Connection

In ADS, under the Connections pane, click New Connection.
Set:
- Connection type: Microsoft SQL Server
- Authentication type: SQL Login
- Server: (Copied from Azure Portal → SQL Database → Overview → Server name)
- Username: (Admin username set during database creation)
- Password: (Admin password)
(Optional) Check Remember password.
Click Connect.

Once connected, expand the server → Databases folder to view available databases (e.g., master, mydb).

Running SQL Queries

Example Workflow:

-- 1. Create a new table
CREATE TABLE DemoUserTable(
    first_name NVARCHAR(255),
    last_name NVARCHAR(255),
    age INT
);

-- 2. Insert sample records
INSERT INTO DemoUserTable(first_name, last_name, age) VALUES
('Joe', 'Smith', 56),
('Barbara', 'Young', 12),
('Edna', 'White', 88),
('Larry', 'Bligh', 38);

-- 3. Select all rows from the table
SELECT * FROM DemoUserTable;

-- 4. Drop the table
DROP TABLE DemoUserTable;

Notes:

You can highlight specific lines to run only part of a query.
The Messages pane confirms execution success and row counts.
Refreshing the Tables folder lets you verify creation or deletion.

Method 2: Azure Subscription Connection

Under Azure in ADS, click + to add an account.
Sign in using the Azure account associated with your databases.
Once added, expand your subscription to view all associated databases.
Connect to a database (e.g., mydb) and run the same workflow as above. This method is especially useful for managing multiple Azure SQL databases within the same subscription.

Additional Azure Data Studio Features

1. Jupyter Notebooks

Accessible via the Notebooks tab in the left navigation.
Supports interactive data processing workflows, often used by data scientists.

2. Source Control

Accessible via the Source Control tab.
Can connect to Git repositories for version control of SQL scripts and notebooks.

3. Extensions

Extend ADS functionality by installing from the Extensions marketplace.
Examples:
- PostgreSQL support
- Azure CLI integration
- Machine Learning extensions
- PowerShell scripting

Summary

Azure Data Studio provides:

Multiple connection methods to Azure SQL Databases (SQL Login or Azure Subscription).
A powerful SQL editor with partial query execution.
Support for Jupyter notebooks, Git integration, and third-party extensions.
Flexibility to manage both local and cloud databases in a single tool.

Using SQL Server Management Studio (SSMS) to Query Azure SQL Database

Overview

SQL Server Management Studio (SSMS) is a long-standing database management tool originally built for on-premises SQL Server. Because Azure SQL Database is built on the same underlying SQL Server engine, SSMS can also connect to cloud-based Azure instances. This makes it useful for administrators and developers already familiar with the tool.

Installing SSMS

Visit the official Microsoft Docs page: Download SQL Server Management Studio (SSMS).
Download the Windows installer (SSMS-Setup-ENU.exe).
Run the installer and accept default settings.
Once installed, launch SSMS.

Connecting to an Azure SQL Database

Steps:

In SSMS, go to File → Connect Object Explorer.
In the Connect to Server dialog:
- Server type: Database Engine
- Server name: Copy from Azure Portal (Azure Portal → SQL Database → Overview → "Server name")
- Authentication: SQL Server Authentication
- Login: Username from Azure SQL creation process
- Password: Corresponding password
Click Connect.

After connection, the server appears in Object Explorer, showing folders for Databases, Security, and more.

Running Queries in SSMS

Right-click the target database (e.g., mydb) → New Query. This opens a SQL editor.

Example Workflow:

-- 1. Create a new table
CREATE TABLE DemoUserTable (
    first_name NVARCHAR(255),
    last_name NVARCHAR(255),
    age INT
);

-- 2. Insert sample rows
INSERT INTO DemoUserTable (first_name, last_name, age) VALUES
('Joe', 'Smith', 56),
('Barbara', 'Young', 12),
('Edna', 'White', 88),
('Larry', 'Bligh', 38);

-- 3. Query all rows
SELECT * FROM DemoUserTable;

-- 4. Drop the table
DROP TABLE DemoUserTable;

How SSMS Handles Execution:

You can highlight only the SQL statement you want to execute and run it with Execute.
Results appear in the Results pane, while execution messages (e.g., "4 rows affected") appear in the Messages pane.
Refreshing the Tables folder in Object Explorer updates the view to reflect created or dropped tables.

Verifying in Azure Portal

After creating the table in SSMS, you can check it directly in Azure:

In Azure Portal, go to your SQL Database.
Click Query editor (preview).
Log in with SQL Server authentication credentials.
Expand the Tables folder to confirm your new table exists and view its columns (first_name, last_name, age).

Why Use SSMS for Azure SQL?

Mature and Stable: SSMS has been refined over many years, making it reliable and predictable.
Feature-Rich: Full control over security, queries, and database objects.
Familiar UI: Useful for those transitioning from on-premises SQL Server to Azure.
Direct Integration: Changes made in SSMS are immediately reflected in Azure.

Summary

SSMS allows seamless management of Azure SQL Database:

Connect using SQL Server Authentication.
Create, insert, query, and drop tables from a familiar interface.
Verify results in Azure’s Query Editor for consistency.

This workflow is ideal for administrators who need a robust, full-featured SQL client while working with Azure-hosted databases.

Using the `sqlcmd` Utility to Query Azure SQL Database

Overview

sqlcmd is a command-line utility for running SQL queries against SQL Server, including Azure SQL Database. It is useful for automating tasks, running queries from files, using variables, and outputting results to files.

Installing `sqlcmd`

Visit the Microsoft Docs page: sqlcmd Utility.
Download the installer for your OS (64-bit or 32-bit Windows).
Run the installer and follow the defaults.
Once installed, open a Command Prompt to start using sqlcmd.

Connecting to Azure SQL Database

Command Syntax:

sqlcmd -S <server_name> -d <database_name> -U <username> -P <password>

Example

sqlcmd -S azureserver26.database.windows.net -d mydb -U bbrooks26 -P !qazXsw2#edc

-S: Server name
-d: Database name
-U: Username
-P: Password

Once connected, the prompt changes to 1> indicating it is ready for input. Lines are counted sequentially, and commands execute only when the keyword GO is entered.

Running Queries

Example Workflow:

-- 1. Create a new table
CREATE TABLE DemoUserTable (
    first_name CHAR(20),
    last_name CHAR(20),
    age INT
);
GO

-- 2. Insert multiple rows
INSERT INTO DemoUserTable (first_name, last_name, age) VALUES
('Joe', 'Smith', 56),
('Barbara', 'Young', 12),
('Edna', 'White', 88),
('Larry', 'Bligh', 38);
GO

-- 3. Query all rows
SELECT * FROM DemoUserTable;
GO

-- 4. Drop the table
DROP TABLE DemoUserTable;
GO

NOTES:

Multi-line queries are supported. Execution occurs only after typing GO Success dot not return a message; errors are displayed immediately

Running a Query from a File

Create a file, e.g., C:\sqlcmd\select.sql with the following content:

SELECT * FROM $(table);
GO

Run the file with a variable substitution:

sqlcmd -S azureserver26.database.windows.net -d mydb -U bbrooks26 -P !qazXsw2#edc -i "C:\sqlcmd\select.sql" -v table=DemoUserTable

-i: Specifies the input file containing SQL commands.
-v: Passes variables to replace placeholders in the file.

Output displays results directly in the command prompt.

Outputting Query Results to a File

To save the output to a file, use the -o option:

sqlcmd -S azureserver26.database.windows.net -d mydb -U bbrooks26 -P !qazXsw2#edc -i "C:\sqlcmd\select.sql" -v table=DemoUserTable -o "C:\sqlcmd\output.txt"

-o: Path to output file
File now contains a formatted table with columns and data.

Verifying in Azure Portal

After creating and populating a table with sqlcmd:

Open Azure SQL Database → Query editor (preview).
Log in using SQL Server authentication.
Check Tables folder to verify table creation and data.

Summary

With sqlcmd, you can:

Connect to Azure SQL Database from the command line.
Create, populate, query, and drop tables.
Run queries from files with variable substitution.
Output results to files for automation purposes.

This makes sqlcmd ideal for scripting, automated database tasks, and quick access without a GUI.

SQL Server on Azure Virtual Machines

Overview

Azure provides three main options for hosting SQL Server in the cloud:

Azure SQL Database – A fully managed PaaS database service.
Azure SQL Managed Instance – A managed instance of SQL Server with more compatibility with on-premises SQL Server.
SQL Server on Azure Virtual Machines (SQL VMs) – A full SQL Server installation running on an Azure-hosted VM.

This section focuses on SQL Server on Azure Virtual Machines — its setup, benefits, configurations, and migration methods.

Benefits of SQL Server on Azure Virtual Machines

Simplified Licensing Costs:
- Azure uses a pay-as-you-go licensing model (no large upfront payments like on-prem).
- You avoid hardware costs and Microsoft maintains the underlying infrastructure.
Global Deployment:
- Deploy VMs in multiple geographical regions to reduce latency for customers worldwide.
Flexible Machine Sizes:
- Select a VM size based on workload needs.
- Choose an image with the desired SQL Server version, edition, and operating system.

Key Configurations for SQL VMs

Backups:
- Use Azure Backup for SQL VMs — supports automated/manual backups.
- Enterprise-class backup with point-in-time restore.
- Define retention policies (potentially years of data storage).
High Availability:
- Configure SQL Server Availability Groups.
- Azure treats multiple SQL VMs as a virtual network with automatic traffic management.
Automated Patching:
- Schedule regular SQL Server updates.
Performance Tuning:
- Choose the right VM tier for your workload.
Connectivity Options:
- Public: SQL Server is accessible over the internet.
- Private: SQL Server is only accessible within the virtual network.

SQL Connectivity Setup

When provisioning, Azure automatically:

Configures firewall rules to allow SQL traffic on TCP port 1433.
Enables SQL Server authentication for logins.
Enables TCP/IP protocol in SQL Server.
If public connectivity is chosen:
- Sets up a Network Security Group rule to allow SQL port traffic.
If private connectivity is chosen:
- Network Security Group step is skipped.

Migration Methods to SQL Server on Azure VMs

Several ways to migrate from on-premises SQL Server to an Azure SQL VM:

Detach/Attach via Blob Storage:
- Detach .mdf (data) and .ldf (log) files.
- Upload to Azure Blob Storage.
- Attach them to the Azure SQL VM using blob URLs.
Always On Availability Groups:
- Use Add Azure Replica Wizard to replicate to Azure.
- Perform a failover to the Azure instance.
Backup and Restore:
- Create a compressed backup on-prem.
- Upload to Azure VM, decompress, and restore.
Hyper-V VHD Upload:
- Convert on-prem VM to a Hyper-V VHD.
- Upload to Azure Blob Storage.
- Create a VM from this VHD in Azure.
Transaction Replication:
- Set the Azure SQL VM as the subscriber.
- Replicate from on-prem.
- Disable replication when ready for cutover.

Centralized Management

Azure offers a unified "Azure SQL" view in the portal.
Shows all SQL resources in your subscription:
- Elastic pools
- Managed instances
- Azure SQL databases
- SQL Virtual Machines
- Logical SQL servers
From here, you can add, delete, or configure resources.

Deployment Considerations

VM Image Selection:
- Must match your required OS, SQL version, and edition.
- Deprecated images are removed from the portal but remain deployable via PowerShell for 3 months after removal.
Customer Experience Improvement Program:
- Enabled by default — sends periodic usage reports to Microsoft.
- To disable:
  1. Connect to the SQL VM via Remote Desktop.
  2. Run the SQL Server Error and Usage Reporting Utility.
  3. Opt out of report sending.

Default SQL Server Port

TCP 1433 — The standard port for SQL Server communication.

Summary

SQL Server on Azure Virtual Machines offers:

Quick deployment with no upfront infrastructure costs.
Multiple configuration options for performance, availability, and backups.
Flexible migration paths from on-prem.
Centralized management through the Azure portal.
The ability to choose between public or private connectivity depending on security needs.

Azure SQL Managed Instance — Comprehensive Overview

Introduction

Azure SQL Managed Instance (MI) is a Platform as a Service (PaaS) offering that delivers almost full compatibility with the latest Enterprise Edition SQL Server Database Engine. It is optimized for lift-and-shift scenarios, allowing organizations to migrate existing on-premises SQL Server instances to Azure with minimal changes.

Key Characteristics

Managed by Microsoft: No hardware purchase, patching, or infrastructure maintenance required.
PaaS Benefits: Automatic backups, monitoring, auditing, scalability, and integration with other Azure services.
VNet Integration: Can be deployed into an Azure Virtual Network (VNet), enabling isolation and hybrid cloud setups that extend existing on-prem networks.
High Availability: 99.99% SLA with both automated and configurable backups and point-in-time restore.

Feature Comparisons

Versus Azure SQL Database:
- Azure SQL Database = General-purpose, fewer features (e.g., cannot perform cross-database queries).
- Managed Instance = Supports cross-database queries and advanced SQL Server features.
Versus Azure SQL Virtual Machine:
- SQL VM (IaaS) = Full infrastructure control, manual patching, and management.
- Managed Instance (PaaS) = Limited infrastructure control, but no management overhead.

Security and Compliance

Authentication:
- Azure Active Directory (AAD) authentication.
- Supports both traditional SQL logins and AAD logins.
- AAD Server Principal for full usage of Azure AD users and groups in authorization.
- Supports Multi-Factor Authentication (MFA).
Data Protection:
- Encryption at rest and in transit.
- Comprehensive SQL auditing and threat detection.
Isolation:
- Runs on single-tenant infrastructure.
- Private IP address by default—accessible only from private or hybrid networks.
- VNet isolation with optional VPN Gateway for secure hybrid connectivity.

Tiers

General Purpose:
- Suitable for typical workloads.
- Uses Azure Blob Storage with built-in redundancy.
- Ideal for non-business-critical apps.
Business Critical:
- SSD storage for low latency.
- In-Memory OLTP for faster transaction processing.
- Read-only replicas for better performance and data integrity.

Management Operations via API

Azure provides API-based operations for:

Instance Deployment: Creating a new Managed Instance in a VNet.
Instance Update: Changing configuration (e.g., storage, CPU).
Instance Deletion: Removing an instance.

Note: These operations may modify the underlying virtual cluster (isolated group of VMs), which can take time to complete.

Backup and Migration

Backup:
- Supports backing up to Azure Blob Storage.
- Uses standard database backup format.
Migration:
- From on-premises SQL Server or Azure SQL Virtual Machine.
- Supports large data migrations.

Summary

Azure SQL Managed Instance is best for enterprises needing:

High compatibility with existing SQL Server features.
Minimal administration with strong built-in security.
Seamless integration into a hybrid cloud environment.
Business continuity with high availability and disaster recovery options.

Azure Synapse Analytics — Components, Features, and Best Practices

Azure Synapse Analytics is a cloud-based data warehousing and analytics solution designed to integrate and analyze data from multiple sources in a unified environment.

Key Components

Apache Spark Integration Fully integrated to allow distributed big data processing, machine learning, and advanced analytics.
Synapse Studio A graphical interface to:
- Prepare and manage datasets
- Query data
- Run AI/ML processes
- Build ETL/ELT pipelines
Synapse SQL T-SQL-based query language for data processing and analytics.
Synapse Pipelines Native data integration capabilities for ingestion from various sources.
SQL Pool Types
- Dedicated SQL Pool: Pre-allocated compute resources, measured in DWUs (Data Warehousing Units).
- Serverless/On-Demand SQL Pool: Pay-as-you-go query execution.

DWUs (Data Warehousing Units)

A standardized measure of compute and performance capacity.
Adjustable based on workload requirements.
Directly impact query speed and parallelism.

Data Flow in Synapse Analytics

Data Ingestion Using PolyBase T-SQL queries to load large datasets from:
- Relational databases
- NoSQL stores
- File-based sources (e.g., CSV, Parquet)
Data Storage Data resides in scalable stores like:
- Azure Data Lake
- Azure Blob Storage
Data Preparation & Training
- Preprocessing via Hadoop, Spark, or Azure Machine Learning.
- Cleaning, transforming, and enriching data.
Modeling PolyBase imports data into internal tables defined by the analyst.
Analysis & Querying
- Query via Synapse SQL.
- Serve outputs through reporting tools or ad-hoc analysis.

Best Practices for Performance & Cost Optimization

Data Migration

Load all necessary data early in the process to avoid runtime delays.
Use distributed tables to parallelize storage and querying.
Use replicated tables to avoid costly data shuffling during joins/aggregations.

Table Design

Indexing: Apply meaningful indexes to optimize reads.
Partitioning: Improve filter performance and manage large datasets.
Incremental Loads: Only load changes (delta) instead of full datasets.

Query Optimization

Statistics Maintenance: Keep statistics updated to enable the query optimizer to make better execution plans.
- Auto-statistics mode for ongoing updates.
- Manual updates after significant data changes.
Resource Classes: Control memory allocation per query to balance performance.

Resource Management

Scale DWUs based on workload.
Pause compute during idle periods to reduce cost.
Automate scaling using Azure APIs and scripts.

Cost Control Strategies

Regularly review query patterns and storage usage.
Match compute allocation to actual usage.
Minimize unnecessary resource consumption while meeting performance needs.

Azure SQL Connectivity Errors — Detailed Guide

Overview

This guide explains common Azure SQL connectivity errors, their causes, and solutions. It also covers transient fault errors, how to design applications to handle them, and a list of common Azure SQL error codes with resolutions.

1. Common Causes of Azure SQL Connectivity Issues

When connecting to Azure SQL Database or SQL Managed Instance, you may encounter different error types. The main causes include:

1.1 Connection Timeout

Definition: The client sends a request to SQL Server but doesn't receive a response within the configured timeout.
Causes:
- Slow network traffic.
- Server performance issues.
Solution:
- Check network performance.
- Retry the connection.
- Investigate possible server-side slowdowns.

1.2 Firewall Misconfiguration

Definition: Azure SQL uses firewalls to block unwanted traffic.
Common Problem: Incorrect firewall rules block valid traffic.
Solution:
- Ensure client IP address is added to the Azure SQL firewall rules.
- Verify port 1433 is open for SQL Server.

1.3 Incorrect Login

Cause: Invalid username/password.
Error Message: Authentication failed.
Solution: Double-check login credentials and authentication method.

1.4 Bad Design

Definition: Poor architecture or not following best practices can lead to errors.
Examples:
- Unoptimized queries causing server overload.
- Applications that can’t handle connection drops.
Solution: Review database design and optimize application queries.

1.5 Reconfiguration & Transient Fault Errors

Definition: Short-lived errors due to Azure performing maintenance or configuration changes.
Causes:
- Planned updates (e.g., index rebuilds).
- Unplanned issues (e.g., network hiccups).
Solution: Retry after a short wait (e.g., 60 seconds).

2. Handling Transient Fault Errors

Transient errors are unavoidable in cloud systems. To mitigate:

Implement Retry Logic:
- First retry after ~5 seconds.
- Increase delay between retries (exponential backoff).
- Limit retries to prevent infinite loops.
Benefit: User may never notice the issue because the retry succeeds.

3. Administrator Actions for Troubleshooting

If a transient connectivity issue is reported:

Check Azure Service Health Dashboard for known outages.
Verify if Azure is performing periodic reconfiguration.
Check resource limits (e.g., storage, DTU).
If persistent (>60 seconds), contact Azure Support.

4. Common Azure SQL Error Codes

Connection or Server Not Found

Error 26: Server not found.
- Check server name, connection string, and remote connection settings.
Error 40: Cannot open a connection.
- Same checks as Error 26.

Network Transport Issues

Error 10053: Transport-level error during data transfer.
- Indicates possible network interruptions.

Firewall-Related Errors

Error 40615: Cannot connect — firewall rejected connection.
- Verify client IP is allowed.
Error 5: Cannot connect — check firewall rules and ensure port 1433 is open.

Timeout & Server Response Issues

Connection Timeout Expired (during login):
- Pre-login handshake timed out.
Underlying Provider Failed on Open:
- SQL Server is not responding to open requests.
Timeout Expired (during request):
- Server latency too high.

Permission Errors

Cannot open database <name> requested by the login:
- User does not have permission.
- If connecting via SSMS, the default is master DB — make sure user has access or change the target database.

5. Best Practices for Application Design

Always handle transient errors gracefully.
Use retry logic with exponential backoff.
Monitor resource limits to prevent unexpected failures.
Keep firewall rules updated.
Design database schema and queries for performance.

Quick Reference Table

Error Code	Meaning	Common Fix
26	Server not found	Check server name, remote access
40	Cannot open connection	Same as 26
10053	Transport-level network error	Check network stability
40615	Firewall rejection	Add client IP, open port 1433
5	Firewall/Port issue	Same as 40615
Timeout Expired	Server not responding fast enough	Investigate performance
Cannot open database	No permissions	Grant access or change DB

Additional Resources

Azure SQL Connectivity Issues — Detailed Guide

This guide explains how to identify and handle Azure SQL connectivity issues, focusing on transient errors and retry logic. It also includes best practices, built-in .NET features, and troubleshooting steps.

1. Understanding Transient Errors

A transient connection error (transient fault) is:

Temporary and short-lived.
Often recoverable by retrying the operation.
Typically caused by brief network interruptions, resource contention, or failovers in the cloud.

Key point: Your application should be able to detect transient errors and retry them automatically—without user intervention.

Why it matters: Retrying non-transient errors is useless and wastes resources. You must distinguish between transient and permanent errors (often via error codes or context).

2. Common Scenarios Where Transient Errors Occur

Transient errors happen:

During connection to the database.
During execution of a SQL command/query.

If error occurs during connection:

Wait a few seconds.
Retry the connection.

If error occurs during a command:

Waiting and retrying the same connection often won’t work.
Instead:
1. Wait a few seconds.
2. Drop the old connection.
3. Open a new connection.
4. Retry the command.

3. Example: Transaction Recovery After a Transient Error

Assume a transaction fails midway due to a transient error:

Open a new connection.
Rerun the last successful command (e.g., SELECT).
Continue to the next commands (e.g., UPDATE).
If any command fails:
- Rollback the transaction to restore the database to its previous state.

⚠ Caution: In distributed systems, rollbacks may also fail if connectivity is lost for too long. Still, you should always attempt rollback and log high-priority errors if rollback fails.

4. Retry Strategy Guidelines

For batch processes (long-running, non-interactive jobs):

Space out retries generously.

For interactive applications (UI-driven):

Wait 5 seconds before first retry.
Use exponential backoff until 60 seconds total has passed.
If still failing after 1 minute → show an error.

Note: Many client libraries (especially .NET for Azure) have built-in retry logic. Check if it’s already implemented before writing your own.

5. Built-in .NET Retry Logic

The .NET System.Data.SqlClient.SqlConnection class provides retry support.

Key properties:

// How many retry attempts before failing
sqlConnection.ConnectRetryCount = 3; // Default: 1 (Range: 0–255)

// Time (seconds) between retries
sqlConnection.ConnectRetryInterval = 5; // Default: 10 (Range: 1–60)

// Timeout for the whole connection attempt
sqlConnection.ConnectionTimeout = 30; // Default: 15 (can be large)

Important: Set ConnectionTimeout >= ConnectRetryCount * ConnectRetryInterval to ensure all retries complete before timeout.

6. When Retries Happen in SqlConnection

Retries occur in two cases:

When opening a connection (SqlConnection.Open)
Before executing a query (SqlConnection.Execute):
- If a pre-execution connection test fails, it retries connection.
- If a query itself fails mid-execution, it does not retry the query.

7. Diagnosing Connectivity Errors

Steps:

Check ports — Ensure the correct ports are open (firewall rules, network configs).
Review server logs — Look for clues in OS, network, and SQL logs.
Reproduce with a utility — Tools like SQL Server Management Studio (SSMS) or sqlcmd help verify the issue.
Enable error logging — Essential for diagnosing intermittent issues.

8. Common Fixes for Connectivity Issues

Increase connection timeout (default: 15–30 seconds).
Validate configuration — Ensure correct server name, database name, and ports.
Test with SSMS — Quick way to verify if the problem is in your code or the network.
Use alternative connections — Try a different network or region to rule out ISP/routing issues.

9. Quick Checklist

Differentiate transient vs permanent errors.
Implement retry logic (or confirm library does it for you).
Use exponential backoff in retries.
Always log errors for debugging.
Ensure rollback logic is robust in transactions.
Keep ports open and firewalls properly configured.
Test connections outside your application (SSMS, ping, telnet).

Provisioning Relational Data Services in Azure

What is Provisioning?

Provisioning is the process of creating and configuring resources so they are ready for use. In Azure, much of this work is handled behind the scenes by the platform — you, as the end user, mainly need to specify:

Disk Space: The amount of storage allocated for your database or service.
Memory: RAM required to handle workloads.
CPU (vCPUs): Processing power assigned to the resource.
Network Bandwidth: Data transfer capacity for applications, scaled according to workload demands.

This means you don’t manually configure the underlying disks, CPUs, or networking — Azure automatically provisions and configures these resources based on the parameters you supply.

Azure Provisioning Methods

Azure offers multiple methods to provision relational data services:

Azure Portal
- A web-based GUI accessible through any modern browser.
- Allows management of subscriptions, services, and resources.
- Offers customizable dashboards for metrics visualization (e.g., monitoring database performance).
- Good for users who prefer visual controls over command-line tools.
Azure CLI (Command-Line Interface)
- Cross-platform (Windows, macOS, Linux) and supports Docker & Azure Cloud Shell.
- Provides command-line flexibility:
  - Copy/paste commands
  - Use history navigation (↑ / ↓ keys)
  - Avoid unnecessary UI clutter
- Useful for automation and faster configuration compared to the portal.
- Example:
```
az sql db create \
  --name MyDatabase \
  --resource-group MyResourceGroup \
  --server MyServer \
  --service-objective S0 \
  --storage 5GB
```
Azure PowerShell
- A cross-platform task automation and configuration management framework.
- Works on Windows, macOS, and Linux.
- Built on .NET CLR (Common Language Runtime).
- Great for configuration management via scripting.
- Example:
```
New-AzSqlDatabase `
  -ResourceGroupName "MyResourceGroup" `
  -ServerName "MyServer" `
  -DatabaseName "MyDatabase" `
  -RequestedServiceObjectiveName "S0" `
  -MaxSizeBytes 5GB
```

Azure Resource Manager (ARM) Templates

Infrastructure as Code (IaC) approach.
Templates are JSON files that declare the infrastructure and configuration to deploy.
Highly reusable — ideal for agile teams that frequently deploy similar environments.

Example structure:

{
  "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
  "contentVersion": "1.0.0.0",
  "resources": [
    {
      "type": "Microsoft.Sql/servers/databases",
      "apiVersion": "2021-02-01-preview",
      "name": "[concat(parameters('serverName'), '/', parameters('databaseName'))]",
      "properties": {
        "requestedServiceObjectiveName": "S0",
        "maxSizeBytes": "5368709120"
      }
    }
  ]
}

Why Use Templates and Automation?

Consistency: Deploy identical environments quickly.
Speed: Faster than manual configuration.
Scalability: Automate large-scale deployments.
Version Control: Templates can be stored in Git for history and rollbacks.

Key Takeaways:

Azure handles the heavy lifting of provisioning; you focus on specifying what you need.
Use the Azure Portal for visual control, CLI/PowerShell for speed and automation, and ARM templates for repeatable deployments.
Infrastructure as Code allows faster, more reliable, and scalable provisioning of relational data services.

Provisioning a Microsoft Azure SQL Database

In this session, we demonstrate the complete process of provisioning a Microsoft Azure SQL Database using the Azure Portal, covering each configuration step, available options, and best practices for setup.

Step 1 — Navigating to SQL Databases

From the Azure Portal Dashboard, locate Azure services.
Click SQL databases (one of the many managed databases Microsoft Azure offers).

Portal Elements:

Buttons: Add, Reservations, Edit columns, Refresh
Dropdowns: Resource group, Location, Tags
Table columns (if databases exist): Name, Status, Replication role
Action Button: Create SQL database (center of the screen)

If no databases exist, the table is empty — we proceed by clicking Create SQL database.

Step 2 — Basics Tab

Azure uses a five-tab wizard for SQL Database creation. The first tab, Basics, is split into:

Project details

Subscription: Choose your subscription (e.g., Free trial, corporate).
Resource group: Logical container for resources with shared permissions and lifecycle policies.
- If none exists, click Create new → name it (e.g., Databases) → click OK.

Database details

Database name: e.g., demodatabase
Server: If none exists, click Create new.
- New server panel fields:
  - Server name (must be globally unique, e.g., skillsoftdb)
  - Server admin login (e.g., your name)
  - Password + Confirm password (meeting Azure password requirements)
  - Location (choose region, e.g., (US) East US)
- Click OK after entering details.

SQL Elastic Pool

Option to pool compute (eDTUs) and storage among multiple databases on the server.
Leave set to No if you have only one database.

Compute + Storage Defaults

Standard: 10 DTUs, 250 GB storage.

Step 3 — Networking Tab

Network connectivity options:

No access — database not exposed externally.
Public endpoint — accessible over the public internet (requires firewall rules).
Private endpoint — accessible only within a private network.

When choosing Public endpoint, additional firewall rule options appear:

Allow Azure services/resources to access the server
Add current client IP address

For Private endpoint, you must already have a private endpoint configured.

In this demo, No access is chosen for initial setup.

Step 4 — Additional Settings Tab

Data source options:

None — creates an empty database.
Backup — restores from a provided backup.
Sample — provisions database with sample data (ideal for testing/demo).

Here, Sample is selected.

Database collation — configure collation rules if required.

Azure Defender for SQL — offers advanced threat protection (free 30-day trial, ~$20 CAD/month). Disabled for this demo.

Step 5 — Tags Tab

Tags are key-value pairs for resource organization and billing. Example:

Name: CreatedBy
Value: Richard

Step 6 — Review + Create Tab

Displays a summary of:

Subscription
Resource group
Database/server details
Networking configuration
Pricing estimate
Option to Download template for automation (ARM template for future quick deployments)

Click Create to begin provisioning.

Step 7 — Deployment Process

Azure displays Deployment in progress status.
Deployment view includes:
- Overview
- Inputs
- Outputs (e.g., connection strings, if applicable)
- Template (JSON definition of the deployment)

When finished:

Notification: Deployment succeeded
Button: Go to resource

Step 8 — Post-Provisioning

Clicking the database resource (e.g., skillsoftdb) shows:

Overview with:
- Resource group
- Server admin
- Status
- Location
- Backup management options
- DTU quotas

If you click the specific database (e.g., demodatabase under skillsoftdb), you can perform additional configurations and start working with your SQL instance.

Summary

This walkthrough covered:

Accessing SQL databases from Azure Portal
Creating a resource group and server
Choosing compute/storage tiers
Configuring networking and security
Using sample data for quick start
Tagging resources for management
Viewing and downloading deployment templates

With these steps, you can quickly provision and begin using a fully managed Microsoft Azure SQL Database.

Provisioning an Azure Database for PostgreSQL

Topic: Provisioning PostgreSQL Host: Richard Spencer

Overview

This session covers how to provision an Azure Database for PostgreSQL via the Azure Portal. PostgreSQL is an open-source relational database system, and Azure offers it as a fully managed database service with different deployment modes to suit performance, scalability, and management needs.

Step 1 — Navigating to Azure Database for PostgreSQL

From the Azure Portal Dashboard, locate Azure services.
Click Azure Database for PostgreSQL servers.
Options shown:
- Create
- View (for existing servers)
If no servers exist, the page will display an empty table with columns:
- Name
- Type
- Status
- Resource group
- (Others depending on configuration)
Click Create Azure Database for PostgreSQL server.

Step 2 — Choosing Deployment Option

Azure offers four deployment models for PostgreSQL:

Single server
- For a broad range of transactional workloads.
- Enterprise-ready, fully managed.
- Up to 64 vCores.
- Features like full-text search.
- Low admin overhead.
Flexible server (Preview)
- More customization and control (ideal for developers).
- Manage maintenance windows, zone redundancy, and cost optimization.
Hyperscale (server group)
- Ultra high-performance.
- Suitable for data needs beyond 100 GB.
- Supports multi-tenant apps and real-time analytics with sub-second latency.
Azure Arc-enabled PostgreSQL Hyperscale
- Same as Hyperscale but deployed on your own infrastructure (on-premises, edge, or multi-cloud).
- Supports hybrid cloud strategies.

For this demo, Single server is selected for simplicity.

Step 3 — Configuring the Single Server

The creation process uses four tabs:

Basics
Additional settings
Tags
Review + Create

Basics Tab

Project Details

Subscription: Select your Azure subscription (Free Trial in this example).
Resource group: Logical grouping for resources. Example: Create a new group named PostgreSQL for all related servers.

Server Details

Server name: Must be globally unique. Example: demopostgresql.
Data source:
- Blank (new empty DB)
- Backup (requires pre-existing backup)
Location: (US) East US
Version: PostgreSQL 11 (other options: 10, 9.6, 9.5).
Compute + Storage:
- General Purpose
- 4 vCores
- 100 GB storage

Administrator Account

Admin username: e.g., adminuser
Password: Must meet complexity requirements (entered twice to confirm).

Step 4 — Additional Settings Tab

Encryption

Azure encrypts data at rest by default with service-managed keys.
Optionally enable Infrastructure double encryption for extra protection:
- Adds another layer with a secondary service-managed key.
- Can slightly reduce performance.
For this demo, the default encryption is kept.

Step 5 — Tags Tab

Tags are metadata for resource organization and billing. Example:

Name: CreatedBy
Value: Richard

Step 6 — Review + Create Tab

Displays all configured details:

Subscription, resource group, server name
Version and compute/storage configuration
Backup retention: 7 days, locally redundant storage
Pricing estimate (shows Free Trial in this case)

Options:

Download template for automation (ARM template for repeat deployments).
Create to begin provisioning.

Step 7 — Deployment

Azure shows Deployment in progress with a real-time log.
Tabs available during deployment:
- Overview (current status)
- Inputs (settings entered)
- Outputs (if applicable)
- Template (JSON definition)

You can also:

Download deployment details (ZIP format) — useful for storing in a version control system or deployment repository.

When complete:

Notification: Deployment succeeded
Options:
- Go to resource
- Pin to dashboard

Step 8 — Post-Provisioning

Clicking Go to resource opens the PostgreSQL server’s management pane:

Overview (status, location, resource group)
Activity log
Settings
Security configuration
Connection endpoints
Backup and restore controls

Key Takeaways

Azure offers multiple deployment options for PostgreSQL:
- Single server: Easiest, most managed.
- Flexible server: More control for developers.
- Hyperscale: For very large, high-performance workloads.
- Arc-enabled Hyperscale: Run Azure-managed PostgreSQL on your own infrastructure.
Resource groups help organize and manage related resources.
ARM templates allow repeatable, automated deployments.
Default encryption is always on, with optional double encryption for compliance.
Deployment details can be saved for documentation or CI/CD pipelines.

Provisioning an Azure Database for MySQL

Topic: Provisioning MySQL Host: Richard Spencer

Overview

This session explains how to provision a MySQL database in Microsoft Azure using the Azure Portal. MySQL is a widely used open-source relational database, and Azure offers it as a fully managed database service with deployment options for different workloads and levels of administrative control.

Step 1 — Accessing Azure Database for MySQL

From the Azure Portal Dashboard, use the search bar to type mysql.
Two services appear:
- Azure Database for MySQL servers (fully managed single-server option)
- Azure Database for MySQL flexible server (more customization and developer control)
Select Azure Database for MySQL servers.

Step 2 — MySQL Console

The MySQL servers page displays:
- Buttons: Add, Manage view, Refresh, etc.
- Table columns: Name, Type, Status, Resource group.
- If no databases exist, the table is empty.
Click Create Azure Database for MySQL server to begin provisioning.

Step 3 — Choosing Deployment Option

Azure offers two deployment options for MySQL:

Single server
- Fully managed by Azure.
- Best for general-purpose transactional workloads.
- Lower administrative overhead.
Flexible server
- More developer control.
- Configure maintenance windows, zone redundancy, custom performance tuning.
- Less “hands-off” than single server.

For this demonstration: Single server is selected.

Step 4 — Configuring the MySQL Server

The Create MySQL server page has four tabs:

Basics
Additional settings
Tags
Review + Create

Basics Tab

Project Details

Subscription: Free Trial subscription is used in this demo.
Resource group: Logical container for resources. Example: Create new resource group MySQLDBs for all MySQL databases.

This allows:
- Centralized lifecycle management.
- Shared permissions and policies for all grouped resources.

Server Details

Server name: Must be globally unique. Example: demomysqldatabase.
Data source:
- None — start with an empty database.
- Backup — restore from an existing backup (requires one to be available).
Location: Choose Azure region (e.g., East US).
Version: Select supported MySQL version (not explicitly mentioned here).
Compute + Storage:
- General Purpose tier
- 4 vCores
- 100 GB storage

Administrator Account

Admin username: e.g., rspencer
Password: Must meet Azure’s password complexity rules.
Confirm password: Same as above.

Additional Settings Tab

Encryption

By default, Azure encrypts data at rest with a service-managed key.
Option: Enable Infrastructure double encryption:
- Adds an additional encryption layer with a second service-managed key.
- May slightly impact performance.
Demo choice: Leave double encryption disabled.

Tags Tab

Tags are key–value pairs for metadata, organization, and cost tracking.

Example:

Name: CreatedBy
Value: Richard

Review + Create Tab

Displays all configured settings.
Shows estimated monthly cost.
Option to Download template for automation:
- An ARM template that allows automated redeployment of the same configuration in future.

Step 5 — Deployment

When Create is clicked:

Azure begins deployment initialization.
The deployment pane shows:
- Overview (status)
- Inputs (settings)
- Outputs
- Template (JSON definition of deployment)
Download deployment details as a .zip for:
- Revision history
- CI/CD processes
- Documentation

Azure displays a Deployment succeeded notification upon completion.

Step 6 — Post-Provisioning

Go to resource: Opens the MySQL server’s management blade.
Management options include:
- Overview
- Activity log
- Settings
- Security configuration
- Connection strings
- Backup and restore

From here:

Fine-tune configuration.
Add data.
Begin using MySQL in your applications.

Key Takeaways

Azure offers Single server (fully managed) and Flexible server (more control) deployment models for MySQL.
Resource groups are critical for organization, lifecycle management, and permission control.
ARM templates enable automation and repeatable deployments.
Default encryption is always enabled, with optional double encryption for compliance.
Deployment logs and details can be exported for governance and DevOps workflows.

Provisioning Azure Database for MariaDB — Step-by-Step with Context

Overview

This section explains how to provision an Azure Database for MariaDB using the Azure Portal. The steps include:

Searching for MariaDB in the Azure portal
Creating a new MariaDB server
Configuring basic settings
Adding tags for organization and billing
Reviewing and creating the server
Monitoring deployment and accessing the new database

MariaDB is an open-source relational database management system (RDBMS) compatible with MySQL. Azure Database for MariaDB is a managed PaaS (Platform as a Service) offering, meaning:

Microsoft handles server maintenance, backups, patching, and availability.
You focus on schema, queries, and application logic.

1. Accessing the MariaDB Creation Menu

Open Azure Portal.
In the search bar, type Maria.
Click Azure Database for MariaDB servers.

📌 Tip: The search bar is often the fastest way to locate Azure resources, especially when you don’t remember the exact menu path.

2. Creating a New MariaDB Server

Once on the MariaDB servers page:

Click Create Azure Database for MariaDB server.

This starts the provisioning wizard with multiple tabs:

Basics
Tags
Review + create

3. Basics Tab — Project & Server Details

Project Details

Subscription: Choose your Azure subscription (e.g., Free Trial).
Resource Group: A logical container for Azure resources with shared lifecycle and permissions.

💡 Best practice: Create resource groups per environment or application to make management easier. Example here: Resource Group Name: MariaDB

Server Details

Server Name: Must be globally unique (e.g., demorichardmariadb).
Data Source: Choose None for a new empty database or restore from backup.
Location: Select the region closest to your app or users to minimize latency (e.g., (US) East US).
Version: Choose MariaDB version (e.g., 10.2 or 10.3).
- 📌 Note: Pick the version that matches your development/test environments for compatibility.
Compute + Storage:
- Pricing tier (e.g., General Purpose).
- Storage size (e.g., 100 GB).
- These can be scaled later depending on performance needs.

Administrator Account

Admin username: Avoid “root” or common names for security.
Password: Must meet Azure’s complexity requirements.
Confirm password.

Example: Admin username: rspencer Password: ********

4. Tags Tab — Resource Tracking

Tags are key-value pairs for metadata:

Can be used for billing, ownership, or environment categorization.

Example: Name: CreatedBy Value: Richard

5. Review + Create Tab

Here you:

Review all settings.
Optionally download a template for automation (ARM template) to repeat deployments programmatically.
Click Create.

6. Deployment Process

A notification appears: “Deployment in progress.”
Deployment can take 4–5 minutes.
You can:
- Download deployment details for documentation or auditing.
- Monitor deployment status under Notifications.

7. Accessing the New MariaDB Server

Once the deployment succeeds:

Click Go to resource.
Explore Overview, Activity log, Settings, Security, etc.

Best Practices & Additional Context

Security: After provisioning, configure firewall rules and SSL settings to control database access.
Scaling: Azure Database for MariaDB supports vertical scaling of compute and storage without downtime.
Backups: Automatic backups are enabled; verify retention period meets compliance needs.
Monitoring: Use Azure Monitor to track performance metrics like CPU, storage, and connections.

Quick Reference: Steps in Order

Search "Maria" in Azure Portal → Select Azure Database for MariaDB servers.
Click Create Azure Database for MariaDB server.
Configure Basics:
- Subscription, Resource Group, Server Name, Data Source, Location, Version, Compute & Storage.
Set Admin username/password.
Add Tags for tracking.
Review + Create → Confirm settings.
Wait for Deployment succeeded notification.
Click Go to resource and start configuring.

Configuring a Microsoft Azure SQL Database

Overview

This guide walks you through configuring a SQL Database in Microsoft Azure after it has been provisioned. You will learn how to:

Access your database from the Azure portal
Adjust compute resources (DTUs)
Configure firewall settings
Enable geo-replication
Retrieve connection strings
Apply security measures
Set up monitoring and alerts

1. Accessing Your SQL Database

Methods to access your database:

From the Dashboard – Click SQL Databases from the Azure Portal dashboard.
From the “Recent Resources” dropdown – Quick access to the most recent database you worked on.
From the Left Navigation Menu – Navigate to SQL databases.

SQL Databases Console

Displays all your databases in a table with columns:
- Name
- Status
- Replication role
- Server
- Pricing tier
- Location
- Subscription

2. Configuring Firewall Settings

Steps:

In the database view, click Set Server Firewall.
Configure:
- Minimum TLS version for secure connections.
- Allow Azure services to access server toggle.
- Firewall rules for specific IP ranges.

Note: Firewall settings control which IP addresses or Azure resources can access the database.

3. Understanding DTUs and Scaling

DTU (Database Transaction Unit) is a blended measure of:

CPU
Memory
Data I/O
Transactional I/O

It’s based on an OLTP (Online Transaction Processing) benchmark to simulate real-world workloads.

DTU Tiers

Tier	DTU Range	CPU	IOPS (per DTU)	Latency (ms)
Basic	5 DTUs	Low	Low	5–10
Standard	10–3,000 DTUs	Low–High	Medium	5–10
Premium	125+ DTUs	Medium–High	25 per DTU	~1–2

Free Trial Limitation: Only up to S1 (20 DTUs).

Resizing DTUs:

Click Configure in the SQL Database page.
Choose between Basic, Standard, and Premium tabs.
Adjust DTU level.
Click Apply – database will be resized automatically.

Use Microsoft’s DTU Calculator (via PowerShell) to estimate the right service tier.

4. Geo-Replication

Allows creation of a secondary database in another region for:

Disaster recovery
Load balancing
Global application reach

Steps:

Click Geo-Replication.
Select target region (e.g., West US).
Create a secondary server in that region.
Specify:
- Region
- Secondary type (Readable/Non-readable)
- Target server name (e.g., databasewestdemo)
- Admin login and password
Confirm and deploy – status will show as “Initializing.”

5. Connection Strings

Azure provides prebuilt connection strings for multiple frameworks:

ADO.NET
JDBC
ODBC
PHP
Go

Steps:

Click Connection strings in the navigation pane.
Copy the appropriate string for your application.
Replace placeholder values with your database credentials.

6. Database Synchronization

You can create a Sync Group for multiple databases.
The Sync Agent manages the synchronization process.

7. Security Configuration

In Security Center, you can:

Enable Azure Defender for advanced threat protection.
Perform Vulnerability Assessments.
Configure Data Encryption (enabled by default).
Set up Data Masking rules to hide sensitive information (e.g., credit card numbers).

8. Monitoring and Alerts

Use Metrics to:

View CPU %, DTU usage, and storage space.
Create alerts for performance thresholds.

Steps:

Click Metrics.
Select:
- Scope (your database)
- Metric namespace
- Metric type
- Aggregation method
Create rules to trigger notifications when thresholds are exceeded.

Summary

By following this configuration process, you can:

Optimize DTU usage to match performance needs.
Secure your database with firewall rules, encryption, and masking.
Ensure high availability with geo-replication.
Integrate applications easily using provided connection strings.
Monitor and proactively respond to database performance issues.

Next Steps:

Test connection from your app.
Run performance tests and adjust DTUs.
Implement replication and failover testing.

Configuring Azure Database for PostgreSQL

Overview

This guide explains how to configure a PostgreSQL database provisioned in Microsoft Azure. You will learn to:

Access the PostgreSQL database console
Configure connection security and firewall rules
Enable query performance insights
Adjust pricing tiers, compute, and storage
Configure replication and read replicas
Manage monitoring, metrics, and access control

1. Accessing the PostgreSQL Database

Steps:

Open Azure Portal.
Navigate via:
- Top search bar → type PostgreSQL
- Top-left services dropdown → select Azure Database for PostgreSQL
Click View to see existing servers.
Select your database (e.g., demopostgreSQL) from the table:
- Columns include Name, Type, Status, Subscription, Resource Group.

2. Overview and Settings

Navigation pane includes:
- Overview
- Activity log
- Access control
- Tags
- Diagnose and solve problems
- Settings
- Security
From Overview, you can see:
- Resource group
- Server name
- Admin username
- Version
- Performance configuration
- Resource utilization (CPU %, storage %)

3. Intelligent Performance

Click Intelligent Performance → Query Performance Insight
- Tabs: Long running queries, Wait statistics
- Helps identify queries needing indexing or optimization

4. Read Replicas

Click Replication → Add Replica:
- Enter server name (globally unique)
- Select Location (can be same or different region)
- Choose encryption type (single or double)
- Pricing tier matches master database
- Click OK to submit deployment
Replica helps offload read operations from master
Deployment may take several minutes

5. Connection Security

Configure firewall rules and access control lists (ACLs)
Deny public network access if needed
Add current IP or custom IP ranges
Optionally configure private endpoint connections

6. Connection Strings

Available for frameworks like:
- ADO.NET
- C++
- JDBC, ODBC, PHP, Go
Copy the string for integration with your application

7. Server Parameters

Click Server parameters:
- Modify database server parameters such as:
  - Checkpoint warnings
  - Commit delay
- Mostly for advanced database administration

8. Pricing Tier and Compute

Choose Basic, General Purpose, or Memory Optimized:
- Basic: Entry-level, low vCores
- General Purpose: Moderate compute and memory
- Memory Optimized: High performance, high memory
Adjust virtual cores (vCores) → increases CPU & memory
Adjust storage and enable Auto-growth if needed
Monitor estimated monthly cost before applying changes

9. Backup and Redundancy

Configure backup retention period (e.g., 7–35 days)
Choose backup redundancy:
- Locally Redundant → recover within the region
- Geo-Redundant → recover from regional outage

10. Security Features

Advanced Threat Protection → detect harmful attempts
Private endpoint connections
Encryption settings (Azure-managed or customer-managed)

11. Summary

With these configurations, you can:

Optimize database performance using pricing tiers and vCores
Offload reads using read replicas
Secure your database with firewall rules, TLS, and encryption
Monitor and analyze performance with metrics and Query Performance Insight
Scale storage and backups according to application needs

Tip: Focus on Pricing tier, replication, connection strings, and firewall settings as the main configuration areas for Azure PostgreSQL databases.

Configuring Azure Database for MySQL

Overview

This guide covers the configuration of a MySQL database provisioned in Microsoft Azure. You will learn to:

Access the database console
Configure connection security and firewall rules
Enable query performance insights
Adjust pricing tiers, compute, and storage
Configure replication and read replicas
Set up monitoring, metrics, and access control

1. Accessing the MySQL Database

Steps:

Open Azure Portal.
Navigate via:
- Top search bar → type MariaDB or MySQL
- Top-left services dropdown → select Azure Database for MySQL
Open Azure Database for MySQL servers.
Select your database (e.g., demomysqldatabase) from the table:
- Columns include Name, Type, Status, Subscription, Resource Group.

2. Overview and Settings

Navigation pane includes:
- Overview
- Activity log
- Access control
- Tags
- Diagnose and solve problems
- Settings
- Security
From Overview, you can view:
- Resource group
- Server name
- Status
- Resource utilization
- Intelligent Performance options
- Read replicas

3. Intelligent Performance

Click Intelligent Performance → Query Performance Insight
- Provides insight into long-running queries and wait statistics.
- Useful for troubleshooting performance issues.

4. Connection Security

Click Connection security:
- Deny public network access if needed
- Add firewall rules (current IP or custom)
- Configure virtual network rules
- Set minimum TLS version

5. Connection Strings

Click Connection strings:
- Prebuilt strings for:
  - ADO.NET
  - JDBC
  - ODBC
  - PHP
  - Go
- Copy and use in your application code

6. Server Parameters

Click Server parameters:
- View and modify parameters such as:
  - Max dirty pages %
  - Purge batch size
- Use caution – only modify if you understand implications

7. Replication and Read Replicas

Click Replication → Add Replica:
- Enter server name (must be globally unique)
- Select Location (can be same region as master)
- Set pricing tier to match master
- Optionally download template for automation
Deployment will take several minutes, after which the replica will be ready
Helps offload read traffic from master database

8. Pricing Tier and Compute

Click Pricing tier:
- Basic → up to 2 vCores, variable I/O
- General Purpose → up to 64 vCores, predictable I/O
- Memory Optimized → up to 32 memory-optimized vCores, predictable I/O
Scale vCores and storage according to workload
Configure storage auto-scale
Configure backup retention (7–35 days) and backup redundancy:
- Local → recover within the region
- Geo-redundant → recover from regional outage

9. Security Features

Advanced Threat Protection → on/off for detecting exploits
Private endpoint connections
Data encryption → Azure-managed or customer-managed keys

10. Monitoring and Metrics

Click Metrics:
- Monitor metrics like Connections, Memory %, Network In/Out, Backup storage
- Create alerts for proactive database management

11. Access Control

Click Access control:
- Check access level
- Add or deny role assignments
- Manage permissions for subscriptions and users

Summary

With these configurations, you can:

Optimize database performance using vCores and pricing tiers
Enhance security with firewall rules, TLS, and encryption
Offload read traffic with replication
Monitor database health with metrics and alerts
Control user access effectively

Tip: Always monitor replication progress and validate configuration settings to ensure your MySQL database meets performance and security requirements.

Configuring Azure Database for MariaDB

Overview

This guide explains how to configure a MariaDB database provisioned in Microsoft Azure. You will learn to:

Access the MariaDB database console
Configure connection security and firewall rules
Adjust server parameters
Scale compute and storage
Configure replication and read replicas
Monitor metrics and alerts
Manage access control and security

1. Accessing the MariaDB Database

Steps:

Open Azure Portal.
Navigate via:
- Azure Services → Azure Database for MariaDB servers
- Search bar → type MariaDB
Click View to see existing servers.
Select your database (e.g., demorichardmariadb) from the table:
- Columns include Name, Type, Status, Subscription, Resource Group.

2. Overview and Settings

Navigation pane includes:
- Overview
- Activity log
- Access control
- Tags
- Diagnose and solve problems
- Settings
- Security
From Overview, you can see:
- Resource group
- Server name
- Status
- Subscription
- Admin username
- Version
- Performance configuration

3. Access Control

Click Access control:
- Check current access (View my access)
- Add role assignments to grant access
- Deny role assignments if necessary
Follow the principle of least privilege.

4. Connection Security

Configure firewall rules and ACLs:
- Deny public network access
- Allow Azure services
- Add client IP addresses
- Configure virtual network rules
Enforce SSL and minimum TLS version for security

5. Connection Strings

Copy built-in connection strings for frameworks like:
- ADO.NET
- C++, JDBC, PHP, Go
Enables easy integration with applications

6. Server Parameters

Click Server parameters:
- Modify advanced settings such as:
  - Audited logs
  - Character set
  - Host cache size
  - Event scheduler
Caution: only adjust parameters if you understand the impact

7. Replication / Read Replicas

Click Replication → Add Replica:
- Provide a globally unique server name
- Choose location (same or different region)
- Select encryption type (single/double)
- Match pricing tier with master database
- Click OK to deploy replica
Replica helps offload read operations
Deployment may take 6–7 minutes; notifications indicate progress

8. Pricing Tier and Compute

Pricing tiers:
- Basic: Entry-level, low vCores
- General Purpose: Moderate compute and memory (2–64 vCores)
- Memory Optimized: High performance (2–32 memory-optimized vCores)
Adjust virtual cores → increases CPU & memory proportionally
Adjust storage and enable Auto-growth if needed
Monitor estimated monthly cost before applying changes

9. Backup and Redundancy

Configure backup retention period: 7–35 days
Choose redundancy:
- Locally Redundant → recover within the region
- Geo-Redundant → recover from regional outage

10. Security Features

Advanced Threat Protection → detect harmful attempts
Private endpoint connections → allow connection via virtual network
Encryption settings (Azure-managed or customer-managed)

11. Monitoring and Metrics

Click Metrics:
- Monitor Active Connections, CPU %, storage, etc.
- Create alerts to proactively respond to issues

12. Summary

With these configurations, you can:

Fine-tune database performance using pricing tiers and vCores
Offload reads using replicas
Secure your database with firewall rules, TLS, and encryption
Monitor and analyze performance with metrics and alerts
Scale storage and backups according to application needs

Tip: Focus on Pricing tier, replication, connection strings, and security settings as the main areas for MariaDB configuration.

Structured Query Language (SQL)

Overview

SQL (Structured Query Language) is a standardized language for communicating with relational databases. It allows developers to query, manipulate, and manage data across multiple database systems.

Common relational databases supported by SQL:

Microsoft SQL Server (enterprise-level)
MySQL (open source, easy to use)
PostgreSQL (open source, good for complex queries, extensible)
MariaDB (open source, often faster than MySQL)
Oracle (enterprise-level)

Features of SQL

Standardized Commands – Core SQL statements work across most databases:
- DML (Data Manipulation Language):
  - SELECT – Read data
  - INSERT – Add records
  - UPDATE – Modify records
  - DELETE – Remove records
- DDL (Data Definition Language):
  - CREATE – Create tables/databases
  - ALTER – Modify database objects
  - DROP – Delete objects
Extensible via Dialects – Databases often extend SQL with custom features:
- T-SQL – Microsoft SQL Server
- PL/pgSQL – PostgreSQL
- PL/SQL – Oracle
Cloud Support – SQL can be used in cloud databases with added benefits:
- Microsoft Azure SQL Database
- Azure SQL Managed Instance
- SQL Server on Azure VMs ("lift and shift" strategy)
Cloud Advantages:
- Scalable resources, pay-as-you-use
- High availability & disaster recovery
- Automatic backups
- Performance tuning
- Advanced security & compliance
- Reduced management overhead

Uses of SQL

Query and retrieve data efficiently
Insert, update, and delete records
Create and manage databases and tables
Support enterprise and cloud applications
Enable analytics and reporting

Limitations of SQL

Platform Variations – Slight differences across SQL dialects may require learning specific features.
Resource Constraints On-Premises – Managing large databases in-house can be expensive and inefficient.
Security Concerns – Perceived risk when migrating to cloud, though modern cloud offerings are highly secure.
Complexity with Large Data Sets – Handling very large or unstructured datasets may require additional tools (e.g., NoSQL, big data platforms).

Summary: SQL is a powerful, standardized language for relational databases, widely used both on-premises and in the cloud. It provides core functionality for data manipulation, definition, and management, but users should be aware of dialect differences, resource management, and security considerations.

Using Data Definition Language (DDL) in SSMS

Overview

Data Definition Language (DDL) is a subset of SQL used to define and manage the structure of database objects. It allows you to create, modify, and delete schemas, tables, and other database objects.

Key DDL Commands Demonstrated

1. Create a Schema

A schema organizes database objects such as tables and views.

CREATE SCHEMA [test_schema];

Use square brackets for object names with special characters.
After execution, the schema appears under Security → Schemas in SSMS.
Requires elevated permissions.

2. Create a Table

Tables store the actual data. You can define columns, types, and constraints.

CREATE TABLE [test_schema].[users] (
    ID INT NOT NULL,
    FIRST_NAME VARCHAR(255) NOT NULL,
    LAST_NAME VARCHAR(255) NOT NULL,
    AGE INT NOT NULL,
    HOBBY VARCHAR(255),
    PRIMARY KEY (ID)
);

NOT NULL makes a field required.
Fields without NOT NULL are optional (can be null).
PRIMARY KEY defines the unique identifier for each row.
After execution, the table appears under Tables → Columns in SSMS.

3. Drop a Table

To delete a table:

DROP [test_schema].[users];

All dependent objects must be removed before dropping the schema.
Refresh the Tables folder in SSMS to see changes.

4. Drop a Schema

To delete a schema:

DROP [test_schema]

Will fail if the schema contains any objects.
Only works after all tables, views, or other objects in the schema are dropped.

Important Notes

DDL commands require elevated permissions or ownership of the database.
Regular users can be granted specific DDL permissions.
Always drop dependent objects before dropping a schema to avoid errors.
Using SSMS, execute DDL commands via the Query window and check the Messages pane for success or errors.

Summary: DDL in SSMS allows you to create, modify, and delete database structures such as schemas and tables. Proper permissions and object dependencies must be managed to successfully execute these commands.

Using Data Manipulation Language (DML) in SSMS

Overview

Data Manipulation Language (DML) is a subset of SQL used to manipulate the data stored in database tables. Common DML operations include SELECT, INSERT, UPDATE, and DELETE.

Table Setup

Example table: test_schema.users

Column	Type	Nullable	Notes
ID	INT	NO	Primary key
FIRST_NAME	VARCHAR(255)	NO	Required
LAST_NAME	VARCHAR(255)	NO	Required
AGE	INT	NO	Required
HOBBY	VARCHAR(255)	YES	Optional

1. SELECT

Read data from the table.

SELECT * FROM [test_schema].[users]
ORDER BY age;

* selects all columns.
ORDER BY can sort ascending (default) or descending (DESC).

2. INSERT

Add new records to the table.

INSERT INTO test_schema.users (ID, FIRST_NAME, LAST_NAME, AGE)
VALUES (0, 'Bob', 'Barker', 78),
       (1, 'Alice', 'Brown', 45),
       (2, 'Nancy', 'Drew', 66),
       (3, 'Steve', 'Smith', 12);

Order of fields must match order of values.
Primary key values must be unique.
Optional columns (like HOBBY) can be omitted.

3. UPDATE

Modify existing records.

-- Update a single record
UPDATE [test_schema].[users]
SET [HOBBY] = 'Horseback Riding'
WHERE [FIRST_NAME] = 'Steve';

-- Update all records
UPDATE [test_schema].[users]
SET [HOBBY] = 'Cross Country Skiing';

WHERE clause specifies which records to update.
Omitting WHERE updates all rows.

4. DELETE

Remove records from the table.

-- Delete specific records
DELETE FROM [test_schema].[users]
WHERE [AGE] > 45;

-- Delete all records
DELETE FROM [test_schema].[users];

WHERE clause controls which rows are deleted.
Omitting WHERE deletes all rows in the table.

Key Points

SELECT reads data without changing it.
INSERT adds new rows.
UPDATE modifies existing rows (use WHERE to avoid affecting all rows unintentionally).
DELETE removes rows (use WHERE to avoid removing all rows accidentally).
DML statements operate on data, not on the structure of tables (that's DDL).

Summary: DML in SSMS is used to retrieve, insert, update, and delete data in tables. Proper use of the WHERE clause is essential to control which records are affected.

Querying Azure SQL Database Using the Azure Portal

Overview

The Azure portal provides a Query Editor to interact with Azure SQL Databases without needing an external tool like SSMS. You can create databases, log in, and run SQL queries directly in the portal.

1. Creating an Azure SQL Database

Go to Azure portal → Click Create a resource → Select SQL Database.
Configure Basics:
- Select a Resource Group or create a new one.
- Give the Database Name (e.g., testdb).
- Create a SQL Server if you don’t have one:
  - Server Name (global unique, e.g., test-server26)
  - Server Admin Login (e.g., bbrooks26)
  - Password
  - Location (e.g., Canada Central)
Click Review + Create → Create.
Wait for deployment to complete → Click Go to Resource.

2. Using Query Editor in Azure Portal

On the left-hand menu of your database page, click Query Editor (preview).
Log in using SQL Server Authentication:
- Login: SQL Admin Username (e.g., bbrooks26)
- Password: Password set during creation
Access the tree structure:
- Tables
- Views
- Stored Procedures
Query window is on the right with options:
- Run
- Cancel query
- Save query
- Results / Messages

3. Example: Creating a Table

CREATE TABLE [test_table] (
    id INT NOT NULL,
    name CHAR(255) NOT NULL
);

Run the query → Query succeeded.
Refresh Tables in the tree → Your table appears.

Key Points

Query Editor allows testing SQL commands directly in the Azure portal.
Supports DDL (create tables) and DML (insert, select, update, delete) queries.
Good for quick testing or minor modifications without external tools.
Refresh the tree after creating objects to see them in the portal.

Querying Azure SQL Database Using SSMS

Overview

Microsoft SQL Server Management Studio (SSMS) allows you to connect to Azure SQL Databases in the cloud to manage and query data just like on an on-premises SQL Server.

1. Connect to Azure SQL Database

Open SSMS → File → Connect Object Explorer.
Obtain your Server name from the Azure portal:
- Go to your database (e.g., testdb) → Overview → Server name (e.g., test-server26.database.windows.net).
Enter the following in SSMS:
- Server type: Database Engine
- Server name: Paste the Azure SQL server name
- Authentication: SQL Server Authentication
- Login: Administrator username created during Azure SQL setup (e.g., bbrooks26)
- Password: Corresponding password
Click Connect → SSMS connects to the cloud database over the Internet.
Optional: Disconnect any unused servers from Object Explorer.

2. Explore Database in SSMS

In Object Explorer, expand your Azure SQL Server → Databases → Select your database (e.g., testdb).
Database folders:
- Tables
- Views
- Security
- Server Objects
- Management
Initially, there may be no custom tables.

3. Create a Table

Click New Query in the toolbar → Open a query window.
Example SQL query to create a table:

CREATE TABLE test_table (
    id INT NOT NULL,
    name CHAR(255) NOT NULL
);

Click Execute → Messages confirm: Commands completed successfully.
In Object Explorer, right-click Tables → Refresh → The new table appears.
Expand the table to see columns: id, name.

Key Points

You need server name and credentials to connect to Azure SQL Database from SSMS.
SSMS allows full DDL and DML operations on Azure SQL Database.
Once connected, querying and managing data is identical to an on-premises SQL Server.

Querying Azure SQL Database Using Azure Data Studio

Overview

Azure Data Studio (ADS) is a lightweight, cross-platform client for connecting to and querying SQL databases, including Azure SQL Database. It provides multiple ways to connect: directly to a server or through an Azure account.

1. Install Azure Data Studio

Go to Microsoft Docs → Download and install Azure Data Studio.
Choose the appropriate installer for your OS (e.g., Windows User Installer .exe).

2. Connect to an Azure SQL Database

Option A: Direct Server Connection

Open Azure Data Studio → Connections pane → SERVERS → Add Connection.
In Connection Details:
- Server: Copy the server name from Azure portal → SQL Database → Overview.
- Authentication type: SQL Login
- User name: SQL admin username
- Password: SQL admin password
Click Connect → Your server appears under SERVERS in the left pane.
Expand Databases → Select your database (e.g., testdb).

Option B: Connect via Azure Account

In Connections pane → AZURE → Add Account.
Sign in with your Azure account credentials.
After signing in, linked subscriptions and databases appear.
Select your SQL Database → Enter SQL login if prompted.

3. Query a Table

Expand your database → Tables → Select table (e.g., test_table).
Click New Query → Choose the database from the dropdown at the top.
Example SQL query:

SELECT * FROM [test_table];

Click Run → Results appear at the bottom with columns and data.

4. Key Points

Server Connection: Connects to a single server using credentials.
Azure Account Connection: Provides access to all databases under that Azure account.
Queries and operations in ADS are similar to SSMS.
You can disconnect or delete connections when finished.

Querying Azure SQL Database Using `sqlcmd` Utility

Overview

sqlcmd is a command-line utility for connecting to SQL Server and Azure SQL Databases. It is useful for running queries interactively or automating database operations through scripts or batch files.

1. Install `sqlcmd`

Go to Microsoft Docs: sqlcmd Utility → Download the latest version.
Choose x64 or x86 MSI for Windows, depending on your system.
Run the MSI to install sqlcmd.

2. Connect to Azure SQL Database

Use the following command structure in Command Prompt:

sqlcmd -S <ServerName> -d <DatabaseName> -U <Username> -P <Password>

-S: Server name (found in Azure Portal → SQL Database → Overview → Server name)
-d: Database name (e.g., testdb)
-U: SQL login username
-P: Password for the SQL login

Example:

sqlcmd -S test-server26.database.windows.net -d testdb -U bbrooks26 -P YourPassword

Once connected, the prompt changes to a numbered line (e.g., 1>), indicating readiness for SQL commands. 3. Run Queries

type your SQL statement. Example:

SELECT * FROM test_table
GO

GO executes the batch of statements entered.
Results display in the command-line window, including column headers and rows. Example output:

id  name
0   Bob Barker
1   Nancy Drew
2   Alice Cooper

Exit the session

exit

4. Key Advantages

Works in scripts and batch files for automation.
Can run any SQL commands as you would in SSMS or Azure Data Studio.
Useful for scheduled tasks, e.g., deleting old records weekly.

Querying Relational Data in PostgreSQL Using Azure Data Studio

Overview

PostgreSQL is an open-source relational database that supports complex queries on large datasets. This guide demonstrates how to deploy a PostgreSQL database in Azure and query it using Azure Data Studio.

1. Deploy PostgreSQL in Azure

Open Azure Portal → Click Create a resource.
Search for Azure Database for PostgreSQL → Click Create.
Choose Single Server (for this demo).
Fill in the Basics:
- Resource group: e.g., Database
- Server name: must be unique (e.g., testdb266)
- Administrator username: e.g., bbrooks26
- Password: set a secure password
Click Review + Create → Create.

2. Configure Firewall & SSL

In the PostgreSQL server overview:
- Ensure SSL enforce is ENABLED.
Go to Connection security → Firewall rules → Click Add current client IP → Save.
- Allows Azure Data Studio to access your database from your machine.

3. Connect Using Azure Data Studio

Install Azure Data Studio (if not already installed).
Install PostgreSQL extension:
- Open Extensions → Search PostgreSQL → Click Install.
Add a connection:
- Connection type: PostgreSQL
- Server name: copy from Azure Portal
- Authentication type: Password
- Username: <username>@<database> (e.g., bbrooks26@testdb266)
- Password: your database password
- Database name: default postgres or any specific database
Click Connect.

4. Create Database & Table

Create a database:

CREATE DATABASE test;

Switch to the new database:

Use the database selector in Azure Data Studio.

Create a table `users`:

CREATE TABLE users (
    id SERIAL NOT NULL,
    first_name VARCHAR(255) NOT NULL,
    last_name VARCHAR(255) NOT NULL
);

5. Insert Data

INSERT INTO users (first_name, last_name) VALUES
('Bob', 'Barker'),
('Alice', 'Smith'),
('Susan', 'Reynolds');

6. Query Data

SELECT * FROM users;

Result:

id	first_name	last_name
1	bob	barker
2	alice	smith
3	susan	reynolds

Summary

Get up PostgreSQL in Azure.
Configure firewall and SSL for connectivity.
Connect using Azure Data Studio with PostgreSQL extension.
Create databases and tables, insert data, and run queries.

Querying Relational Data in MySQL Using MySQL Workbench

Overview

MySQL is a popular open-source relational database, widely used for application backends. This guide demonstrates how to deploy a MySQL database in Azure and query it using MySQL Workbench.

1. Deploy MySQL in Azure

Open Azure Portal → Click Create a resource.
Search for Azure Database for MySQL → Click Create.
Choose Single Server → Click Create.
Fill in the Basics:
- Resource group: e.g., Database
- Server name: must be unique (e.g., test-database26)
- Administrator username: e.g., bbrooks26
- Password: set a secure password
Click Review + Create → Create.

2. Configure Firewall & SSL

In the MySQL server overview:
- Ensure SSL enforce is ENABLED.
Go to Connection security → Firewall rules → Click Add current client IP address → Save.
- Allows MySQL Workbench to access the database from your machine.

3. Connect Using MySQL Workbench

Install MySQL Workbench from dev.mysql.com/downloads/workbench.
Open MySQL Workbench → Database → Connect to Database.
Enter connection parameters:
- Hostname: copy the server name from Azure
- Username: <username>@<database> (e.g., bbrooks26@test-database26)
- Password: your database password
Click OK → Successfully connected.

4. Create Database & Table

Create a database:

CREATE DATABASE test;

Refresh Schemas → Right-click test → Set as Default Schema.

CREATE TABLE users
(
id SERIAL NOT NULL,
first_name VARCHAR(255) NOT NULL,
last_name VARCHAR(255) NOT NULL,
);

5. Insert Data

INSERT INTO users (first_name, last_name, age) VALUES
('Bob', 'Barker', 45),
('Alice', 'Smith', 16),
('Susan', 'Reynolds', 54);

6. Query Data

SELECT * FROM users;

Result:

id	first_name	last_name	age
1	Bob	Barker	45
2	Alice	Smith	16
3	Susan	Reynolds	54

Summary

Set up MySQL in Azure.
Configure firewall and SSL for connectivity.
Connect using MySQL workbench.
Create. databases and tables, insert data and query the contents.

Querying Relational Data in MariaDB Using MySQL Workbench

Overview

MariaDB is a free, community-developed database engine fully compatible with MySQL. This guide demonstrates how to deploy MariaDB in Azure and query it using MySQL Workbench.

1. Deploy MariaDB in Azure

Open Azure Portal → Click Create a resource.
Search for Azure Database for MariaDB → Click Create.
Fill in Basics:
- Resource group: e.g., Database
- Server name: must be unique (e.g., test-server26)
- Administrator username: e.g., bbrooks26
- Password: set a secure password
Click Review + Create → Create.
Wait for deployment to complete → Click Go to resource.

2. Configure Firewall & SSL

Under Overview, ensure SSL enforce is ENABLED.
Go to Connection security → Firewall rules → Click Add current client IP address → Save.

3. Connect Using MySQL Workbench

Note: MySQL Workbench version 8.0 has a known bug with direct MariaDB connections. Use Manage Connections or the + sign in Connections to add a new connection.

Download and install MySQL Workbench from dev.mysql.com/downloads/workbench.
Add a new connection:
- Connection Name: e.g., MariaDB
- Hostname: copy server name from Azure
- Username: server admin login (e.g., bbrooks26)
- Password: your database password
Click OK → Enter password when prompted → Click Continue Anyway if connection warning appears.

4. Create Database & Table

Create a database:

CREATE DATABASE test;

Create a table `users`

CREATE TABLE users (
    id SERIAL NOT NULL,
    first_name VARCHAR(255) NOT NULL,
    last_name VARCHAR(255) NOT NULL
);

Refresh Tables → Confirm users table appears with columns: id, first_name, last_name.

5. Insert Data

INSERT INTO users (first_name, last_name) VALUES
('Joe', 'Smith'),
('Susan', 'Redding'),
('Stanley', 'Hudson');

6. Query Data

SELECT + FROM users;

Result:

id	first_name	last_name
1	Joe	Smith
2	Susan	Redding
3	Stanley	Hudson

Summary:

Set up MariaDB in Azure
Configure firewall and SSL
Connect using MySQL workbench
Create databases and tables, insert data and query the table contents

Characteristics of Non-relational Data (NoSQL)

Overview

Relational DBs: Well-defined schema, common for on-premises and small-to-medium datasets.
Non-relational (NoSQL) DBs: Schema-less, flexible, scalable, well-suited for distributed cloud systems and big data.

Evolution

Historically: Relational databases (SQL Server, MySQL, etc.) dominated before widespread cloud adoption.
Now: Distributed, mobile, and large-scale apps (social media, IoT, analytics) have made NoSQL the standard in many scenarios.

Key Differences: Relational vs Non-relational

Relational (SQL)

Schema: Fixed and must be defined in advance.
Data integrity: Enforced strictly — non-conforming data is rejected.
Examples: SQL Server, PostgreSQL, MySQL.

Non-relational (NoSQL)

Schema: No fixed schema; flexible records.
Storage: Can store binary files (blobs), JSON documents, etc.
Examples: Blob storage, Azure Table Storage, Cosmos DB, MongoDB.

Benefits of NoSQL's Schema-less Approach

Agility: Easy to adapt data structures between development cycles.
Flexibility: Developers can change "agreed-upon" schema without disrupting stored data.
Performance:
- Faster writes (no schema validation overhead).
- Can decouple writes from reads (reducing bottlenecks).
High Availability:
- Easier replication (even encouraged for performance).
- Eventual consistency rather than immediate consistency.

Eventual Consistency

Relational DBs: Immediate consistency — updates instantly visible.
NoSQL DBs: Eventual consistency — updates become visible after a delay.
Trade-off: Speed & scalability vs real-time accuracy.

Common Applications of NoSQL

Mobile apps: Handle schema changes and app updates smoothly.
Real-time analytics: Quick writes & large data lookups.
Content management systems: Support various media formats.
Personalization systems: Store varied user profile data for targeted experiences.
IoT applications: Handle rapid and varied incoming device data.
Database migration: Simplifies moving from one storage system to another.

Scaling in NoSQL

Horizontal scaling via sharding: Distributes data across multiple servers.
Easier due to lack of rigid schema rules.
Relational DBs: More complex to shard due to strict schema constraints.

Types of NoSQL Databases

Key-Value Stores:
- Key acts as a unique identifier (similar to primary key).
- Example: Azure Table Storage, Redis.
Columnar Stores:
- Optimized for reading columns instead of rows.
- Example: Apache Cassandra.
Graph Databases:
- Store data as nodes and relationships as edges.
- Example: Neo4j.

Choosing a NoSQL Database

Consider:

Data model:
- Key-value: If data can be accessed by a single identifier.
- Graph: If relationships and traversal queries are important.
Consistency requirements:
- Can your application tolerate eventual consistency?
Deployment context:
- Cloud, on-premises, or hybrid?
Ease of deployment & operations:
- How quickly can you provision and manage it?
Data integration:
- How does it handle distributed databases and replication?
Data model support:
- Does it support all formats you plan to store?
Geographic presence:
- Does the provider operate in all your required regions?
Service levels:
- Latency and availability guarantees.

Ideal NoSQL Use Cases

Distributed gaming
E-commerce
Social media platforms
IoT sensor data
Personalization systems
Real-time analytics

Shared traits:

Big data with high throughput
Need for low-latency performance
Geographically distributed users
Frequent schema changes or varied data formats

Summary

NoSQL databases:

Offer flexibility by removing strict schema constraints.
Support big data with fast read/write operations.
Handle geographic replication and disaster recovery efficiently.
Trade real-time consistency for scalability and performance.

Azure Data Fundamentals – Relational vs Non-relational Data

Characteristics of Non-relational Data

Non-relational databases (often referred to as NoSQL) differ fundamentally from traditional relational databases like SQL Server. While relational systems require a rigid schema before data can be stored, NoSQL databases are schema-less, allowing them to store a variety of data formats without enforcing predefined table structures.

Key points:

No fixed schema – flexible and adaptable to changing data structures.
Data types supported – can store binary files, JSON documents, images, videos, logs, etc.
Examples in Azure – Blob Storage, Azure Table Storage, Cosmos DB.

Benefits:

Agility in development – schema changes don’t disrupt operations.
Scalability – horizontal scaling via sharding across multiple servers.
Faster writes – fewer schema checks during ingestion.
Resilience to format changes – handles variations in incoming data gracefully.
High availability – easy replication across regions.
Eventual consistency – trades immediate consistency for performance and scalability.

Common use cases:

Mobile apps with evolving data structures.
Real-time analytics.
Content management systems.
IoT applications with varying data formats.
Personalization systems (e.g., targeted advertising).
Database migrations with varied data types.

Types of NoSQL databases:

Key-value – data retrieved via unique keys (e.g., Redis, Azure Table Storage).
Columnar – optimized for column-based queries.
Graph – stores data as nodes and edges to represent complex relationships.

When choosing NoSQL:

Match your data model to the database type.
Consider eventual consistency implications.
Assess cloud vs hybrid deployment needs.
Review geographic availability and service-level guarantees.

Identifying When to Use Non-relational vs Relational Data

Both relational and non-relational databases have strengths and trade-offs, and cloud-native apps may use either — or both — depending on requirements.

Relational databases (SQL Server, PostgreSQL, MySQL):

Fixed schema – suitable for well-defined, structured data.
ACID compliance – ensures atomicity, consistency, isolation, and durability.
Best for:
- Complex relationships between data.
- Transaction-heavy applications.
- Applications requiring immediate consistency.
- Centralized deployments without geographic distribution.
Scaling – typically scales vertically (adding resources to a single server) though limited sharding is possible.
Language – SQL, mature and powerful for complex queries.

Non-relational databases (NoSQL: Cosmos DB, MongoDB, Cassandra):

Schema flexibility – can start storing data without predefined structures.
High performance – excels in fast writes and reads.
Best for:
- Big data workloads with high throughput.
- Geographically distributed systems.
- Applications tolerating eventual consistency.
- Dynamic or semi-structured data.
- Simple queries without complex joins.
Scaling – horizontal scaling across servers and regions.
Replication – built-in, supports redundancy and fault tolerance.

CAP Theorem Considerations

Relational DBs – excel in Consistency and Availability, weaker in Partition Tolerance.
NoSQL DBs – excel in Availability and Partition Tolerance, trade-off is eventual Consistency.

Practical Guidelines

Choose Relational when:

Schema is stable and well-defined.
Strong transactional guarantees are required.
Data relationships are complex and queries involve many joins.
Write safety with concurrent writes is critical.

Choose Non-relational when:

Data is large, fast-changing, or unstructured.
Geographic distribution is needed.
Simple, high-volume queries dominate.
The system must handle failures gracefully with minimal downtime.

Database as a Service (DBaaS)

Available for both relational and non-relational databases in the cloud.
Features:
- Security, scalability, monitoring handled by the provider.
- Serverless options – no server management needed.
- Pay-as-you-go – only pay for the storage and capacity used.
- Automatic scaling and just-in-time capacity adjustments.

Types of Non-relational Data

Relational databases are general-purpose, using fixed tabular schemas with rows and columns, and are queried using SQL-like languages. Non-relational databases, in contrast, are requirement-specific — designed for particular types of data and use cases, without rigid tables. They store data in models optimized for their domain, each with its own querying mechanism.

1. Document Data Stores

Definition: Store records as documents, often JSON files, with named fields and embedded objects.
Example: Azure Cosmos DB.
Querying: Based on property values inside documents.
Use case: Dynamic, semi-structured data.

Structure example:

{
  "id": "123",
  "name": "Product A",
  "price": 29.99,
  "specs": {
    "color": "red",
    "weight": "1.5kg"
  }
}


### 2. **Columnar Data Stores**

- **Definition**: Store data in _columns_ instead of rows, with a **denormalized** approach (duplicate data for faster retrieval).
- **Example**: Apache Cassandra (non-Azure example).
- **Key Concepts**:
    - **Column families**: Groups of columns queried together.
    - **Sparse population**: Only some rows have values for certain columns.
    - **Dynamic schemas**: Columns can be added without predefined structure.
- **Optimization**: Fast retrieval of entire columns.
- **Storage**: Data stored in **key order** for fast range queries.

### 3. **Key-Value Data Stores**

- **Definition**: Data stored as `{ key: value }` pairs in a large hash table.
- **Example**: Azure Table Storage.
- **Advantages**:
    - Single unique key for each record.
    - Extremely scalable.
    - High-speed lookups.
- **Example structure**:
```plaintext
"user:123" → { "name": "Alice", "email": "alice@example.com" }

4. Graph Data Stores

Definition: Store nodes (objects with properties) and edges (relationships between nodes).
Model: Mathematical graph theory.
Querying: Traversing edges to find related nodes.
Example: Azure Cosmos DB with Gremlin API.
Use case: Social networks, recommendation engines, complex relationship data.
Example:

Node: Person { name: "George" }
Edge: ParentOf
Query: Find all children of people whose grandparents are named George.

5. Object Data Stores

Definition: Store binary objects (images, videos, files, VM disk images).
Example: Azure Blob Storage.
Querying: Retrieve the entire object, often via a URL.
Use cases:
- Serving static web assets.
- Storing backups or large datasets.
- AI/ML image/video analysis.
Example:

https://myblobstorage.blob.core.windows.net/images/product123.jpg

6. External Index Data Stores

Definition: Store indexes rather than actual data.
Example: Azure Search.
Function:
- Maintain metadata pointing to real data.
- Enable near real-time search and retrieval.
- Can index data across multiple dimensions.
- Support free-text searches.
Benefit: Searching an index is far faster than scanning all source data.
Example use:
- Pre-index millions of documents so keyword search is instantaneous.

CSV and JSON Files for Non-relational Data

Overview

In distributed systems, CSV (Comma-Separated Values) and JSON (JavaScript Object Notation) are among the most popular formats for ingesting, exchanging, and storing non-relational data. They are both used for:

Unstructured data: Fields are unknown or inconsistent.
Semi-structured data: Fields follow a pattern but can be flexible.

Both are widely supported and often serve different purposes in analytics, APIs, and NoSQL systems.

CSV (Comma-Separated Values)

Structure: A plain-text format where each row represents a record, and each column is separated by a delimiter (usually a comma, but can be ; or \t).
Compatibility: Readable by spreadsheet software like Excel and most database import tools.
Best Use Cases:
- Bulk imports into tabular storag e.
- Data exchange for analytics and reporting.
- Simple datasets without hierarchical relationships.
Azure Integration Example:
- Azure Table Storage can import directly from CSV for batch operations.

Example CSV file:

id,name,temperature
1,Sensor_A,23.5
2,Sensor_B,25.1
3,Sensor_C,22.8

JSON (JavaScript Object Notation)

Structure: Key-value pairs; supports atomic types (string, number, boolean) and nested structures.
Hierarchical Support: Can store arrays and complex, nested objects.
Serialization:
- Widely supported across programming languages.
- Most libraries offer built-in JSON serialization (convert objects to JSON strings) and deserialization (parse JSON strings into objects).
Use Cases:
- NoSQL Databases: e.g., Azure Cosmos DB stores its documents as JSON.
- Microservices Communication: APIs often exchange JSON payloads.
- Logical Data Structures: Used to store and represent configuration or data models.
Azure Integration Examples:
- Azure Data Lake Storage stores JSON as native files and allows querying JSON as structured records.
- Azure NoSQL Table Storage supports JSON-formatted documents.

Example JSON file:

[
  {
    "id": 1,
    "name": "Sensor_A",
    "temperature": 23.5,
    "metadata": { "location": "Warehouse 1" }
  },
  {
    "id": 2,
    "name": "Sensor_B",
    "temperature": 25.1,
    "metadata": { "location": "Warehouse 2" }
  }
]

Comparing CSV and JSON

Feature	CSV	JSON
Data Structure	Flat/tabular	Hierarchical (supports nesting)
Human Readability	Easy	Easy (but can be verbose)
Parsing Complexity	Simple parsers available everywhere	Requires structured parsing
Storage Efficiency	Generally smaller for flat data	More compact for hierarchical structures
Best For	Analytics, spreadsheets	APIs, NoSQL, inter-service communication

Limitations & Considerations

Data Corruption: Neither CSV nor JSON has built-in integrity checks (e.g., checksums). This makes them vulnerable to corruption without detection.
Scalability: Not ideal for extremely large datasets in cold storage; formats like Parquet are more efficient for big data.
Schema by Convention: Even though both formats are schemaless, consumers (like reporting tools) benefit from a consistent field naming convention.
- Example: A temperature charting app must know the property name (temperature) to plot data correctly.

Best Practices

Choose CSV for flat, tabular data and easy integration with spreadsheet and BI tools.
Choose JSON for complex, hierarchical, or API-exchanged data.
Store large-scale cold data in more efficient formats like Parquet or Avro.
Define a schema by convention to ensure compatibility with downstream tools.
Use Azure Storage or Azure Data Lake for scalable, cloud-native storage with native CSV/JSON support.

Requirements of Non-relational Data Stores

Non-relational data stores (NoSQL databases) are optimized for scalability, performance, and flexibility, often at the expense of strict transaction guarantees and complex indexing. They typically have minimal or no fixed schema, which makes them highly adaptable for diverse data types.

Common Characteristics

Minimal/No Schema:
- Some may have simple fixed keys, but most avoid rigid schemas.
- This enables horizontal scaling and high performance.
Non-transactional by Default:
- Operations are processed individually, without batch rollback support.
- Limited transactionality may exist, but it’s less robust than in relational systems.
- Best suited for distributed environments, not high-transaction workloads like banking.
Simple Indexing:
- Usually just a primary index, with minimal secondary indexes.
- Keeps queries fast and scalable.

Data Store Types

1. Document Data Store

Data Format: JSON-like documents.
Denormalized: Duplicated values are stored independently, no reference compression.
Tunable Consistency: Adjust trade-off between consistency and scalability.
Indexes:
- Primary key for each document.
- Properties within the document can also be indexed.
Typical Data Size: KB to MB per document.
Scale: Petabytes.
Use Case Example: MongoDB storing product catalogs with all details in each record.

2. Columnar Data Store

Structure: Groups data into column families instead of rows.
Denormalized: Duplicate values remain to optimize read performance.
Indexes: Both primary and secondary indexes allowed.
Data Size: Medium to large — a single column can store GBs of data.
Scale: Large, suitable for analytics at scale.
Use Case Example: Apache Cassandra for fast retrieval of aggregated metrics.

3. Key-Value Data Store

Structure: One unique key per value.
Denormalized: Identical values are duplicated.
Indexing: Only one primary index for very fast lookups.
Data Size: Small, simple values.
Scale: Petabyte range possible.
Use Case Example: Redis caching user sessions.

4. Graph Data Store

Structure: Nodes (entities) connected by edges (relationships).
Normalized: Shared nodes are not duplicated; they exist as single entities.
Indexing:
- Primary index for each node.
- Secondary indexes for node properties.
Data Size: Small (KB range) per node.
Scale: Large (TB range), but not petabyte due to complexity.
Performance Note: Read latency increases with overly large graphs.
Use Case Example: Neo4j for social network relationships.

5. Object Data Store

Structure: Stores binary objects (BLOBs).
Denormalized: Each object is independent, no transaction consistency.
Operations: Create/update are atomic at object level.
Data Size: Large to extremely large (up to TB per object).
Scale: Petabytes possible.
Use Case Example: Azure Blob Storage for storing raw video files.

6. External Index Data Store

Purpose: Stores only indexes to other data sources.
Denormalized: Indexes are not normalized.
Transactions: Not applicable.
Data Size: Small per index (KB range).
Scale: Can grow large, but excessive size reduces efficiency.
Use Case Example: Elasticsearch indexing external relational and NoSQL databases.

Summary Table

Data Store Type	Normalization	Indexes	Data Size	Scale	Common Use Case
Document	No	Primary + optional sec.	KB–MB	PB	JSON product catalogs
Columnar	No	Primary + secondary	MB–GB	Large	Analytical queries
Key-Value	No	Primary only	Small	PB	Session caching
Graph	Yes	Primary + secondary	KB	TB	Social networks
Object (Blob)	No	Primary only	Large–TB	PB	Media storage
External Index	No	N/A (secondary to other)	KB	Large	Search index

Free-form Text for Search — Architecture, Challenges, and Implementation

Free-form text search is the capability to input arbitrary text (words, phrases, sentences) into a system and retrieve items that best match the input, regardless of how the query is structured. This is the type of search most people are familiar with from using internet search engines like Google.

Core Concept

In a free-form search:

User Input — The search term(s) are provided without strict formatting rules.
Query Processing — The system matches these terms against a prebuilt search index.
Results — The system returns relevant items, often with keyword highlights.
Fuzzy Matching — Systems may find related or similar words (e.g., bake matches baker, baking) and handle typos through suggestions.

Why Indexing is Critical

Free-form searches often deal with extremely large unstructured datasets. For example:

Google Search indexes over 30 trillion individual web pages.
Searching that raw data directly would be far too slow; therefore, most of the work happens before a user types anything, via indexing.

Indexing Process:

Runs offline or incrementally ("piecemeal") to update in the background.
Stores keywords and document references in an optimized structure for quick lookups.
Enables near real-time search performance when the query is made.

Architecture Overview

A typical free-form text search architecture has these components:

Data Storage — Where the raw content lives (e.g., Azure Data Lake, SQL tables).
Index Builder — Creates searchable indexes from the stored data (Azure Cognitive Search, Elasticsearch, Solr).
Search Engine / API — Accepts queries, looks up matching items in the index, and returns results.
Optional Post-processing — Highlights matches, ranks results, or applies filters.

Flow: Data Sources → Indexing Service → Search Index Storage → Search Query → Results

Implementation Options in Azure and Beyond

You can implement your own free-form text search using various tools:

Azure Cognitive Search
- Managed cloud service from Microsoft (formerly Azure Search).
- Handles multiple file types: Excel, PDF, Word, JSON, etc.
- Can automatically index from sources like Azure Blob Storage or Data Lake.
- Provides an API for search queries.
Elasticsearch
- Open-source, built on the Lucene search library.
- Available on Azure, Google Cloud, AWS.
- Widely used for scalable, distributed search.
HDInsight Apache Solr
- Enterprise search platform also based on Lucene.
- Can index many file types and provides advanced search features.
SQL Server / Azure SQL Database Full-Text Search
- Allows storing documents in database tables.
- Generates a full-text index for keyword-based search queries.
- Integrated into database engine.

Index Storage Options

Most indexing tools can store the index in the same system that created it:

Azure Cognitive Search
HDInsight Apache Solr
Elasticsearch
Azure SQL Database (using built-in full-text search)

Key Considerations for Choosing a Solution

When selecting a free-form text search system, consider:

Hosting Model
- Managed cloud service (e.g., Azure Cognitive Search) — less maintenance, potentially lower cost over time.
- Self-hosted (e.g., Elasticsearch on your servers) — more control, but requires more resources and upkeep.
Schema Design
- Will you define the index schema ahead of time?
- Some services allow flexible schema changes; others require predefinition.
Feature Set
- Do you only need full-text search, or also aggregations, analytics, and filtering?
- Ensure the solution supports your reporting and search complexity needs.
Security
- Does the data store meet your compliance and data protection requirements?
Scalability
- Consider query load, dataset growth, and re-indexing performance.

Example Azure Cognitive Search API Query Flow:

POST https://<search_service>.search.windows.net/indexes/<index_name>/docs/search?api-version=2021-04-30-Preview
Content-Type: application/json
api-key: <your_api_key>

{
  "search": "baking",
  "highlight": "content",
  "top": 10
}

search: Term or phrase to search for.
highlight: Field(s) to highlight matches in.
top: Limit number of results.

Key Takeaways

Free-form text search is only as fast and accurate as its index. Most of the real "work" happens during index creation, not during the actual query, allowing large-scale systems to return results in milliseconds despite massive underlying datasets.

Time Series Solutions – Benefits, Challenges, and Architecture

Overview

Time Series Solutions are systems designed to store, process, and analyze data that is organized by time. Every piece of data (often called a datum) has a timestamp, which determines when that measurement or event occurred. These solutions have grown rapidly with the rise of:

Internet of Things (IoT)
Industrial Internet of Things (IIoT)

Example sources include:

IoT sensors in manufacturing
Stock market price feeds
Autonomous vehicle telemetry
Smart home devices

Characteristics of Time Series Data

Timestamped: Each record has a precise timestamp from the data source.
Ordered: Ideally received in chronological order, though network delays can cause out-of-order arrivals.
Insert-Only: Data is rarely updated after being stored; it becomes an immutable historical record.
High Volume, Small Size: Many small data points (a few KB each) arriving rapidly.
Many Sources: Hundreds or thousands of devices feeding data simultaneously.
Telemetry-Focused: Commonly measures metrics like temperature, pressure, speed, or energy output.

Applications

Industrial Monitoring: Detect equipment issues before failures occur.
Financial Analysis: Stock market price tracking over time.
IoT Device Tracking: Usage patterns from consumer smart devices.
Predictive Maintenance: Using historical trends to forecast failures or degradation.
Real-Time Alerts: Detecting anomalies and notifying operators.

Benefits

Trend Analysis: Understand how metrics change over time to optimize operations.
Anomaly Detection: Identify sudden abnormal events (e.g., pressure spikes).
Predictive Modeling: Machine learning can forecast future behavior from historical patterns.
Operational Efficiency: Improve resource usage and reduce downtime.
Immediate Alerting: Detect problems in real time to trigger automated or manual interventions.

Challenges

Out-of-Order Data: Caused by network delays or device processing time differences.
Volume Handling: Systems must handle potentially millions of data points per second.
Time Sensitivity: Alerts and analyses must be generated with minimal delay.
Data Normalization: Schema consistency ensures smooth querying and analysis.
Storage Scale: Time series databases often need terabytes of capacity.

Common Architecture for Time Series Solutions

1. Data Ingress

Data flows from devices or sensors into the system.
Often uses IoT Hubs, Event Hubs, or similar for ingestion.
Challenge: Incoming data can arrive in bursts, not at a steady rate.

2. Stream Buffering Layer

Temporarily stores incoming bursts to prevent downstream overload.
Examples: Azure IoT Hub, Apache Kafka.
Smooths ingestion rate for consistent processing.

3. Stream Processing Layer

Real-time anomaly detection (e.g., sliding time window calculations).
Optional integration with machine learning models to detect patterns.
Generates immediate alerts when thresholds are crossed.
Can also enrich or transform data before storage.

4. Analytical Data Store

Stores both raw time series data and processed results.
Examples: Azure Data Explorer, TimescaleDB, InfluxDB.
Optimized for time-based queries and aggregations.

5. Analysis & Visualization

Queries run against historical and/or real-time data.
Dashboards, reports, and alerting systems built on top.
Tools: Grafana, Power BI, custom web dashboards.

Data Store Requirements for Time Series Data

Normalized Schema: The shape of the data is usually known in advance.
Sequential Storage: Stored in time order for efficient time range queries.
Multiple Indexes: For filtering by time and other dimensions (e.g., location, sensor type).
Consistency Model: Immediate write consistency not a major concern since writes are append-only.
Large Capacity: Capable of storing terabytes of small telemetry points.
Fast Query Performance: Particularly for recent data and rolling time windows.

Example Use Case – Industrial Wind Farm

Each turbine sends telemetry (rotation speed, power output, vibration levels) every second.
Data flows into an Azure Event Hub for buffering.
Stream processing checks for anomalies (e.g., zero output for more than 60 seconds).
Alerts sent to the plant manager via SMS.
All data stored in Azure Data Explorer for historical trend analysis.
Dashboard shows performance trends, predicting maintenance needs before failures occur.

Key Takeaways

Time Series Solutions are crucial for IoT, finance, and industrial analytics.
They rely on time-based indexing, stream processing, and scalable storage.
The architecture is optimized for high-throughput, real-time, and historical workloads.
Benefits include early anomaly detection, trend analysis, and predictive forecasting.
Challenges include data bursts, out-of-order arrivals, and scaling to massive volumes.

Provisioning Non-relational Data Services

What is Provisioning?

Provisioning = the process of setting up IT resources so they are available to users and systems.
Goes beyond configuration: it includes allocating compute, storage, and networking resources that power the service.
Example: Spinning up Azure Cosmos DB requires Azure to provision:
- Disks (storage for data)
- Memory (RAM for query processing)
- CPU (compute resources for workloads)
- Networking (bandwidth, IP allocations, etc.)

Think of it like leasing a car:

You specify how much “horsepower” you need.
Azure takes care of everything “under the hood” (resource allocation, scaling, fault tolerance).

Tools for Provisioning in Azure

1. Azure Portal (GUI-based)

User-friendly graphical interface.
Ideal for beginners and non-technical users.
Provides service-specific pages that prompt for required details.
Performs validation before committing provisioning.
Accessible from any web browser.

2. Azure Command Line Interface (CLI)

Cross-platform tool (Windows, macOS, Linux).
Syntax is simple and easy to discover.
Can be run from:
- Local terminal
- Cloud Shell in the Azure Portal (no installation required).
Great for automation:
- Commands can be stored in scripts.
- Scripts can be executed on-demand or scheduled.

Example command to provision a Cosmos DB account:

az cosmosdb create \
  --name MyCosmosDB \
  --resource-group MyResourceGroup \
  --locations regionName=eastus \
  --default-consistency-level Session

3. Azure PowerShell

Extension of Windows PowerShell, but also available on macOS and Linux.
Uses cmdlets (predefined scripts) for tasks like provisioning VMs or databases.
Ideal for Windows administrators already familiar with PowerShell.
Integrates seamlessly into existing automation workflows.

Example PowerShell command:

New-AzCosmosDBAccount `
  -ResourceGroupName "MyResourceGroup" `
  -Name "MyCosmosDB" `
  -Location "East US" `
  -DefaultConsistencyLevel "Session"

4. Azure Resource Manager (ARM) Templates

Infrastructure as Code (IaC): Services are described in JSON files.
Allows repeatable, version-controlled deployments.
Supports complex multi-resource provisioning.
Deployable via:
Azure CLI:

az deployment group create \
  --resource-group MyResourceGroup \
  --template-file template.json

Azure Powershell:

New-AzResourceGroupDeployment `
  -ResourceGroupName "MyResourceGroup" `
  -TemplateFile "template.json"

Why Multiple Provisioning Options?

Portal → Best for one-time setup, demos, beginners.
CLI → Cross-platform automation, scriptable.
PowerShell → Powerful for Windows-centric environments.
ARM Templates → Best for enterprise-scale deployments, reproducibility, and DevOps pipelines.

Key Takeaways

Provisioning = allocating and configuring resources (storage, compute, network). Azure offers flexible provisioning tools: Portal (GUI) CLI (cross-platform automation) PowerShell (cmdlets for scripting) ARM Templates (infrastructure as code) Choosing a method depends on skill set, automation needs, and environment.

Configuring Non-relational Data Services

Overview

After provisioning a resource (e.g., Cosmos DB), it must be configured to meet application requirements. This includes adjusting network connectivity, firewall rules, authentication, access control, and optionally enabling advanced security.

Configuring Connectivity

By default, some services (like Cosmos DB) allow open access from anywhere, which is insecure.
To control access:
- Firewalls and Virtual Networks Page:
  - Exceptions: Allow access for other Azure services in the same subscription.
  - Firewall: Add IP addresses (e.g., office PCs, on-premise servers) to limit access.
  - Virtual Networks: Restrict traffic to specific VNets. Example: A web server in a VNet uploading photos to storage.
Private Endpoints:
- Use Azure Private Endpoint to secure communications with a private IP from your VNet.
- This keeps traffic on the Azure backbone instead of the public Internet.
- Configured in the Private Endpoint Connections Page, used together with firewall/VNet settings.

Configuring Authentication

Many services (e.g., Storage Account) use access keys:
- Each account has two keys (key1, key2).
- Any user or app with the key can connect.
- Weak, coarse-grained security. If a key is leaked, all apps must be updated after regeneration.
Azure Active Directory (AAD) is recommended:
- Provides stronger, more granular security.
- Works across Azure and on-premise apps.
- Minimizes vulnerabilities compared to static keys.

Configuring Access Control

Role-Based Access Control (RBAC): Assigns access based on job roles.
Defines permissions for tasks like read, write, modify, delete.
Examples:
- Let a team manage all resources in a resource group (VMs, websites, subnets).
- Allow a DBA group to manage SQL databases.
- Give an app access to all resources in a resource group.

RBAC Role Assignments

Three elements:
- Security Principal → who (user, group, or app).
- Role Definition → what (permissions: read, write, delete).
- Scope → where (applies at management group, subscription, resource group, or resource level).
Inheritance: Permissions at parent scopes are inherited by children.
Configured in the Access Control (IAM) Page under the Role Assignments tab.

Configuring Advanced Security

Provides assessment and threat protection:
- Assessment: Identifies security issues and suggests fixes.
- Threat Protection: Uses adaptive intelligence to monitor users, devices, and resources for anomalies.
Benefits:
- Helps detect suspicious behavior quickly.
- Offers insights into network traffic.
Note: This feature incurs additional cost but adds significant protection.

Provisioning Other Non-relational Data Services

Overview

Besides Cosmos DB, Azure provides other non-relational data services. These are optimized for handling unstructured or semi-structured data rather than structured relational data. The three main services are:

Azure Blob Storage
- Stores large amounts of unstructured data (e.g., images, video, documents).
- Data type: Binary Large Objects (BLOBs).
Azure Data Lake Storage
- Stores large-scale structured and unstructured data in native formats.
- Designed for big data analytics.
Azure File Storage
- Provides shared file storage accessible via Server Message Block (SMB) protocol.
- Can be mounted on Windows, Linux, and macOS in Azure.

All three services require the creation of an Azure Storage Account.

Creating a Storage Account in Azure Portal

Steps when provisioning any storage service:

Subscription – Select your Azure subscription.
Resource Group – Logical grouping of resources with shared lifecycle and permissions.
Storage Account Name – Must be globally unique.
Location – Choose the region closest to your users or development environment.
Performance Options:
- Standard (HDD-based):
  - Cheapest.
  - Higher latency.
  - Best for infrequent, bulk storage.
- Premium (SSD-based):
  - Lower latency, high I/O performance.
  - Best for databases, VM disks, or intensive workloads.
  - More expensive.
- ⚠️ Data Lake Storage only supports Standard accounts (HDD).

Replication Options

Storage accounts also allow different redundancy models (configured as SKUs):

LRS (Locally Redundant Storage): Keeps 3 copies in a single data center.
ZRS (Zone-Redundant Storage): Replicates across 3 different availability zones in one region. Ensures availability even if a zone goes down.
GRS (Geo-Redundant Storage): Copies data to another region, enabling disaster recovery.

Provisioning via CLI & PowerShell

Azure provides multiple ways to provision services:

Azure CLI (Command Line Interface):

az storage account create \
    --name mystorageaccount \
    --resource-group MyResourceGroup \
    --location eastus \
    --sku Standard_ZRS

- **Azure PowerShell:**
```powershell
New-AzStorageAccount `
    -ResourceGroupName "MyResourceGroup" `
    -Name "mystorageaccount" `
    -Location "EastUS" `
    -SkuName "Standard_ZRS"

Provisioning Data Lake Storage

Must configure at account creation (cannot enable later).
In Azure Portal -> Advanced tab -> Data Lake Storage Gen 2 -Z Enable Hierarchical Namespace CLI:

az storage account create \
    --name mydatalakeaccount \
    --resource-group MyResourceGroup \
    --location eastus \
    --sku Standard_LRS \
    --enable-hierarchical-namespace true

PowerShell:

New-AzStorageAccount `
    -ResourceGroupName "MyResourceGroup" `
    -Name "mydatalakeaccount" `
    -Location "EastUS" `
    -SkuName "Standard_LRS" `
    -EnableHierarchicalNamespace $true

Provisioning Blob Storage

Blobs are stored inside containers (like folders for binary files).
Each container requires a unique name.
Access options: Private (default). Public read (container-level). Role-based access control (RBAC) for fine-grained permissions.

Portal:

Go to Storage Account → Containers → Add Container.

CLI:

az storage container create \
    --account-name mystorageaccount \
    --name mycontainer

Powershell:

$account = Get-AzStorageAccount -ResourceGroupName "MyResourceGroup" -Name "mystorageaccount"
New-AzStorageContainer -Name "mycontainer" -Context $account.Context

Provisioning File Storage

File storage is provisioned via File Shares in the storage account.

Portal:

Navigate to Storage Account → File Shares → Add File Share. CLI:

az storage share create \
    --account-name mystorageaccount \
    --name myfileshare

Powershell:

New-AzStorageShare -Name "myfileshare" -Context $account.Context

Key Takeaways

Blob Storage: Best for images, videos, backups.
Data Lake Storage: Big data & analytics.
File Storage: Shared access with SMB protocol.
Storage Accounts: Foundation for all services; must define redundancy, performance, and configuration at creation.
Tools: Provisioning possible via Portal, CLI, PowerShell, ARM templates.
Critical: Some settings (like hierarchical namespace for Data Lake) are immutable after creation.

Azure Table Storage

Overview

Azure Table Storage is a NoSQL key-attribute store designed for storing large amounts of structured data. It uses a schemaless design, which means tables do not enforce a fixed structure on the data. This flexibility makes it adaptable to application and data changes, and it is typically more cost-effective than traditional relational databases.

Key benefits include:

Schemaless design → simplifies adaptation and eliminates the need for foreign keys, stored procedures, or complex joins.
Denormalization → supports fast access by storing data in a query-optimized form.
Clustered indexing → allows efficient querying.
OData protocol support → enables querying and updates using a REST-based API.

Core Concepts

Storage Accounts

All access to Azure Storage is done through a storage account.
The account is the globally unique namespace for your tables.
Example format: http://<storage-account-name>.table.core.windows.net/<table-name>

Tables

A table is a collection of entities.
Tables do not enforce schema, allowing each entity to have different sets of properties.
You can create unlimited tables, as long as each is uniquely named.

Entities

An entity is equivalent to a row in relational databases.
Each entity can be up to 1 MB in size (2 MB in Cosmos DB).
Entities are made up of properties.

Properties

A property is a name–value pair.
Each entity can have up to 252 properties.
Additionally, there are three system properties:

PartitionKey
RowKey
Timestamp

Table Service Data Model

Storage Account → Parent namespace and basis for authorization.
Tables → Containers for entities.
Entities → Data rows with user-defined and system properties.
Properties → Attributes stored as name–value pairs.

System Properties

PartitionKey

Groups entities into partitions for load balancing across multiple storage nodes.
Acts as the first part of an entity’s primary key.
Must be specified for every insert, update, or delete operation.
Partitions enable atomic operations (batch insert/update).

RowKey

Unique identifier for an entity within a partition.
Combined with the PartitionKey, forms the entity’s unique primary key.
Must also be specified in every insert, update, or delete operation.

Timestamp

A system-managed property that records when an entity was last modified.
Automatically updated by Azure on every change.
Cannot be set manually — any user-provided value is ignored.

Example Request URI

When working with Azure Table Storage, every request must specify the storage account and table: https://<account-name>.table.core.windows.net/<table-name>

Summary

Azure Table Storage provides a highly scalable, cost-effective solution for structured NoSQL storage. Its schemaless design and REST-based OData API make it adaptable and easy to integrate into cloud-native applications. By understanding the PartitionKey, RowKey, and Timestamp system properties, you can design efficient and scalable data storage solutions in Azure.

Azure Blob Storage Overview

Video Context

Topic Title: Azure Blob Storage
Host: Richard Spencer

Features of Azure Blob Storage

Object storage solution for the cloud, optimized for massive amounts of unstructured data (text, binary, images, audio, video, logs, backups).
Common use cases:
- Serving images/documents directly to browsers
- Distributed access file storage
- Streaming video/audio
- Logging
- Backup and disaster recovery
- Big data analytics
Supports Azure Data Lake Storage Gen2, providing:
- Hierarchical file system
- Enterprise big data analytics support
- High availability:
  - LRS (Locally Redundant Storage) → 3 copies in one location
  - ZRS (Zone Redundant Storage) → 3 copies across zones in a region
- Strong consistency → every read after write reflects latest updates
- Low-cost tiered storage with disaster recovery capabilities
- Geo-redundant storage (GRS) and Geo-zone redundant storage (GZRS) for cross-region replication

Accessing Storage Resources

Access methods:
- Azure PowerShell
- Azure CLI
- Azure Storage REST API
- Azure Storage client libraries (.NET, Java, Node.js, Python, Go, PHP, Ruby)
Blob storage resources hierarchy:
1. Storage account → Unique namespace in Azure, base address for objects
2. Container → Organizes a set of blobs, similar to a file system directory
3. Blob → Actual object (file: text, image, video, etc.)
Unlimited containers per storage account, unlimited blobs per container.

Migrating Data to Blob Storage

Several options exist depending on data size, performance needs, and network availability:

Tools & Libraries

AzCopy: Command-line tool (Windows/Linux) to copy data across containers/storage accounts.
Azure Storage Data Movement Library: .NET library used by AzCopy for bulk transfers.
Azure Data Factory: Supports ingestion/transfer with authentication (account key, SAS, service principal, managed identities).

Large-Scale Migration Solutions

Azure Data Box: Physical appliance to transfer large on-prem datasets when bandwidth is limited.
Azure Import/Export Service: Use customer-provided hard drives to move large datasets in/out of Azure storage accounts.
Blobfuse: Virtual file system driver to mount and access blob data via Linux file system.

Managing Azure Blob Storage

Video Context

Topic Title: Managing Azure Blob Storage
Host: Richard Spencer

Blob Storage Types

Azure Blob Storage is designed for large binary objects (images, audio, video, logs, etc.). It supports three blob types:

Block blobs
- Best for static data (e.g., images, text, binary files).
- Made up of blocks that can be uploaded/managed individually.
- Optimized for large sequential writes.
Page blobs
- Optimized for random read/write access.
- Can store files up to 8 TB.
- Commonly used for virtual hard drive (VHD) files → serve as disks for Azure VMs.
Append blobs
- Optimized for data that grows over time.
- Great for logs, telemetry, audit trails.
- Data can only be appended (not modified).

Creating an Azure Storage Container

A container organizes blobs in a storage account.

Using Azure Portal

Navigate to your Storage Account → Containers.
Click + Container.
Provide a name and set access level (Private, Blob, or Container).
- Blob-level access: allows anonymous read-only access for blobs.

Using Azure CLI

az storage container create \
  --account-name <storage_account> \
  --name <container_name> \
  --auth-mode login

Using Azure PowerShell

New-AzStorageContainer -Name "<container_name>" -Context $ctx

Uploading a Blob

Azure Portal

Go to Storage Account → Containers → [your container] → Upload.

Azure CLI

az storage blob upload \
  --account-name <storage_account> \
  --container-name <container_name> \
  --name <blob_name> \
  --file <local_file_path>

Azure PowerShell

Set-AzStorageBlobContent -File "<local_file_path>" -Container "<container_name>" -Blob "<blob_name>"

Listing Blobs in a Container

Azure Portal

Navigate to Containers → Select Container to view blobs.

Azure CLI

az storage blob list \
  --account-name <storage_account> \
  --container-name <container_name> \
  --output table

Azure PowerShell

Get-AzStorageBlob -Container "<container_name>" -Context $ctx

Downloading a Blob

Azure Portal

Navigate to container → Select blob → Download.

Azure CLI

az storage blob download \
  --account-name <storage_account> \
  --container-name <container_name> \
  --name <blob_name> \
  --file <local_download_path>

Azure Powershell

Get-AzStorageBlobContent -Blob "<blob_name>" -Container "<container_name>" -Destination "<local_download_path>" -Context $ctx

Deleting a Blob

Azure Portal

Navigate to container → Select blob → Delete.

Azure CLI

az storage blob delete \
  --account-name <storage_account> \
  --container-name <container_name> \
  --name <blob_name>

Azure PowerShell

Remove-AzStorageBlob -Container "<container_name>" -Blob "<blob_name>" -Context $ctx

⚠️ Note: If soft delete is enabled, blobs are recoverable for a retention period instead of being permanently removed.

Deleting an Entire Container

Deleting a container removes all blobs within it.

Azure Portal

Navigate to Containers → Select Container → Delete.

Azure CLI

az storage container delete \
  --account-name <storage_account> \
  --name <container_name>

Azure Powershell

Remove-AzStorageContainer -Name "<container_name>" -Context $ctx

⚠️ Be careful: This action permanently deletes all blobs inside the container unless soft delete is enabled.

Azure File Storage

What is Azure File Storage?

Azure File Storage is a fully managed file share service in the cloud.

Uses the Server Message Block (SMB) protocol, enabling remote and on-premise access via APIs over encrypted communications.
Compatible with Windows, Linux, and macOS.
Designed for sharing files, development tools, and applications that depend on native file systems.
Also supports Network File System (NFS) for Linux and macOS clients, offering POSIX compliance and tighter integration with Unix-like systems.

Supported Protocols

Azure Files supports two distinct protocols for mounting file shares:

SMB (Server Message Block)
- More mature integration.
- No feature restrictions.
- Works across Windows, Linux, and macOS.
NFS (Network File System)
- POSIX-compliant.
- Standard for Unix and Linux environments.
- Cannot mix with SMB: a share must be either SMB or NFS.

Key Features (Especially with SMB Shares)

Azure File Sync
- Syncs files from on-premises storage to Azure cloud.
Identity-based Authentication
- Integrates with role-based access control (RBAC).
Full Backup Support
- Native Azure backup integration.
Soft Delete
- Deleted files are retained and can be recovered.
Encryption
- In-transit (SSL/TLS).
- At rest (stored securely in Azure).
Snapshots
- Enables restoring files to previous points in time.

Common Use Cases

Containerization
- Prevents container data loss by storing persistent data in an Azure File share volume instead of inside containers.
- Supports Azure Container Instances.
Lift-and-Shift Applications
- Eases migration of apps relying on file shares into the cloud.
- Supports both full lift-and-shift (app + data in the cloud) and hybrid lift-and-shift (app remains on-premises, data moves to Azure).
File Server Replacement / Supplement
- Replace or extend traditional on-premises file servers.
- Azure File Sync supports hybrid scenarios.
Cloud Development Simplification
- Diagnostic Share: Centralized logging, crash dumps, metrics storage.
- Dev/Test/Debug Tools: Share common toolkits across cloud VMs without copying.
- Shared Application Settings: Store config files centrally and allow applications to load them via REST APIs or SMB mounts.

Benefits of Azure File Storage

Familiar Programmability
- Applications use existing file system I/O APIs.
- Developers can reuse existing code and skills.
Fully Managed
- No need to manage or scale hardware manually.
Shared Access
- Seamlessly replaces on-premises file shares using SMB or NFS.
Scripting and Tooling
- Managed via PowerShell cmdlets and Azure CLI.
- Works from local PCs, on-premises, or in the cloud.
Resiliency
- Built-in high availability.
- No downtime concerns about power or networking failures — Microsoft Azure ensures resilience at the infrastructure level.

Managing Azure File Storage

Overview

Azure File Storage provides shared cloud file storage that behaves much like a local network share.

Users can create file shares (with proper permissions).
They can read, write, upload, and download files as if they were on a traditional local drive.
Management can be done via Azure Storage Explorer, Azure Portal, desktop integration, or command-line tools like AzCopy.

Azure Storage Explorer

Azure Storage Explorer is a desktop utility (downloadable from Microsoft) for managing Azure Storage Accounts.

Capabilities:

Create blob containers and file shares.
Upload and download files.
Manage storage resources graphically.

📌 Note: A lightweight version of Storage Explorer is also available directly inside the Azure Portal (on the storage account Overview page).

Uploading and Downloading Files

Methods:

Manual Upload/Download
- Use Storage Explorer, the Azure Portal, or desktop-mounted shares.
- Supports simple drag-and-drop operations.
- Works best for a small number of files.
AzCopy (Recommended for Large Transfers)
- Command-line utility optimized for bulk transfers.
- Reliable: supports failure detection and resumes from failure point.
- Faster and more efficient than manual methods.

Using Shared Access Signature (SAS) Tokens

What is a SAS Token?

A time-limited, credential-free way to grant access to Azure Storage resources.
Allows anonymous but controlled access without needing account keys.
Useful in scenarios where you don’t know in advance who will need access.

Creating a SAS Token (via Azure Portal):

Navigate to your Storage Account → Settings → Shared access signature.
Configure the following options:
- Allowed services → File.
- Allowed resource types → Container and Object.
- Permissions → Select desired privileges (e.g., read, write, list, delete).
- Start/End Time → Define validity window.
- IP Address Range → Restrict access to specific IPs (optional).
Click Generate SAS → Copy the SAS Token and Connection String.
Store the SAS token securely (avoid exposure to unauthorized users).

AzCopy for File Operations

AzCopy is the primary command-line tool for managing large-scale file transfers with Azure File Storage. It works in combination with SAS tokens to authenticate and perform operations.

Key Operations:

Upload Files
Download Files
Delete Files
List Files
Create Directories

Example Commands:

Upload a file:

azcopy copy "C:\local\path\to\file.txt" "https://<storageaccount>.file.core.windows.net/<sharename>?<SAS-token>" --recursive

Download a file:

azcopy copy "https://<storageaccount>.file.core.windows.net/<sharename>/file.txt?<SAS-token>" "C:\local\download\path"

Upload a file recursively:

azcopy copy "C:\local\folder" "https://<storageaccount>.file.core.windows.net/<sharename>?<SAS-token>" --recursive

Summary

Storage Explorer: Great for GUI-based management and simple tasks.
Manual Upload/Download: Works for a few files but not efficient at scale.
AzCopy: Best for bulk, automated, and resilient file transfers.
SAS Tokens: Enable secure, time-bound, and permission-specific access without exposing account keys.

Together, these tools provide flexible ways to manage Azure File Storage for both small-scale and enterprise-scale operations.

Azure Cosmos DB

Azure Cosmos DB is Microsoft’s fully managed, globally distributed NoSQL database service designed for modern applications that require high scalability, flexibility, and low-latency data access.

Key Features

Fully Managed Service No infrastructure management or maintenance required. Microsoft handles patching, scaling, and availability.
Global Distribution & Low Latency Data is replicated across multiple Azure regions, ensuring 99.999% availability SLA with multi-region writes and near real-time global data access.
Schemaless & Flexible Unlike relational databases, Cosmos DB does not enforce a rigid schema, making it ideal for dynamic, unstructured, or semi-structured data (e.g., IoT, retail, gaming, mobile apps).
Elastic Scalability
- Serverless: Handles spiky workloads without manual intervention.
- Automatic Scaling: Instantly adjusts throughput and storage to match unpredictable traffic bursts.
- Cost-Effective: Pay only for what you use, no need for idle servers.
API Support Multiple database APIs available, enabling compatibility with popular ecosystems:
- Core (SQL) API – Native query language support
- MongoDB API – For Mongo-based apps
- Cassandra API – Wide-column workloads
- Gremlin API – Graph database use cases
- Table API – Key-value storage
Integration with Azure Services Works seamlessly with Azure Functions, IoT Hub, Kubernetes, App Services, and more.
Analytics Support Enables No-ETL analytics on near real-time operational data.
Security
- Enterprise-grade encryption at rest with support for customer-managed keys.
- Role-based access control (RBAC) for fine-grained access management.

Service Level Agreements (SLAs)

Availability: 99.999% uptime guarantee.
Continuity: Automatic multi-region replication ensures zero downtime and strong consistency options (RPO = 0).

Common Use Cases

IoT (Internet of Things) Collects and stores vast amounts of dynamic, non-uniform data from distributed devices (e.g., oil pipelines, traffic sensors).
Retail Handles inconsistent customer data and supports personalized shopping experiences.
Gaming Supports real-time leaderboards, session data, and fast in-game interactions.
Mobile & Web Applications Ideal for modern apps requiring schemaless, flexible storage with instant global access.

In summary: Azure Cosmos DB is a serverless, globally distributed NoSQL database that provides high availability, scalability, and flexibility. It eliminates the need for rigid schemas and infrastructure management, making it well-suited for IoT, retail, gaming, and mobile/web applications.

Azure Cosmos DB Resource Model

In this section, we explore the Azure Cosmos DB resource model, which defines how accounts, databases, containers, and items are organized and managed. Understanding this hierarchy is critical for designing scalable, globally distributed applications on Cosmos DB.

1. Cosmos DB Overview

Service Type: Fully managed Platform as a Service (PaaS).
Global Distribution: Supports multi-region writes and reads for maximum uptime and performance. Data can be replicated worldwide to reduce latency for users across continents.
Unique DNS Name: Each Cosmos DB account is assigned a unique DNS endpoint, accessible via:
- Azure Portal
- Command-line tools
- Language-specific SDKs (e.g., .NET, Java, Python, Node.js)

2. Accounts

An account is the top-level resource in Cosmos DB.
Each account supports up to 50 accounts per Azure subscription.
Provides:
- Global distribution
- Multiple APIs (SQL, MongoDB, Cassandra, Gremlin, Table)
Single or Multi-Region Writes:
- Single Write Region: Cheaper, but limited availability.
- Multi-Region Writes: Higher availability and faster performance for globally distributed workloads.

3. Databases

A database is a unit of management within an account.
Contains one or more containers.
Each database can be created using one of several APIs:
- SQL (Core) API – Default, most common.
- Cassandra API – Wide-column workloads.
- MongoDB API – MongoDB compatibility.
- Gremlin API – Graph database use cases.
- Table API – Key-value storage.

Database Operations

Create: Spin up a new database.
Enumerate: List all databases within an account.
Read: Fetch details of a specific database.
Update: Modify properties (e.g., throughput).
Delete: Remove the database and its containers.

4. Containers

A container is the fundamental unit of scalability in Cosmos DB.
All throughput (RU/s) and storage are provisioned at the container level.
Cosmos DB automatically partitions containers using the logical partition key for horizontal scaling.

Provisioning Models

Dedicated Provisioned Throughput
- Reserved exclusively for that container.
- Backed by an SLA.
- Ideal for production workloads.
Shared Provisioned Throughput
- Throughput is shared among multiple containers in the same database.
- Cost-effective for development or small workloads.

5. Items

Items are the actual data entities stored inside containers.
Schema-agnostic: Different entities (e.g., Customer, Product) can coexist in the same container.
Automatic Indexing:
- Every item is indexed by default.
- No need to manage schemas or indexes manually.

Additional Features

Time-to-Live (TTL):
- Can be set on items or containers.
- Expired items are automatically deleted.
- Helps control data lifecycle without manual cleanup.
Consistency Guarantee:
- Ensures queries never return stale data, even in globally distributed scenarios.

6. Resource Model Hierarchy

Azure Subscription └── Cosmos DB Account └── Database(s) └── Container(s) └── Item(s)

7. Example: SQL API Resource Creation

-- Create a new database
CREATE DATABASE eCommerce;

-- Create a container with a partition key
CREATE CONTAINER eCommerce.Products
    (id STRING, name STRING, price NUMBER)
    WITH PARTITION KEY = /category;

-- Insert an item
INSERT INTO eCommerce.Products (id, name, price, category)
VALUES ("p1", "Laptop", 1200, "Electronics");

Key Takeaways

Account: Top-level resource, provides global distribution and API choice.
Database: Management unit containing containers.
Container: Unit of scalability (throughput + storage).
Items: Schema-agnostic data records, automatically indexed.
Partitioning & TTL: Enable scalability and lifecycle management.

Provisioning an Azure Cosmos DB

Learning objective: By the end of this video, you will be able to provision an Azure Cosmos database in the Azure portal.

Steps to Provision Cosmos DB

Navigate to Cosmos DB in the Azure Portal
- Use the search bar or menu to find Azure Cosmos DB.
- Click Create Azure Cosmos DB account.
Basics Configuration
- Subscription: Select your subscription (e.g., Pay-As-You-Go).
- Resource Group: Create or select an existing group (e.g., CosmosDB).
- Account Name: Provide a unique name (e.g., richardcosmosdemo.documents.azure.com).
- API Selection: Choose from:
  - Core (SQL) – document data with SQL queries
  - MongoDB – document database
  - Gremlin – graph data
  - Azure Table – table-based data
  - Cassandra – wide-column data
- Location: Choose region (e.g., US West).
- Capacity Mode:
  - Provisioned throughput (specify RU/s)
  - Serverless (auto-scaled, no throughput config)
- Free Tier: Optionally apply free 400 RU/s and 5 GB storage.
- Account Type: Production or Non-production (tag only).
- Geo-Redundancy & Multi-Region Writes: Optional.
Networking
- Options:
  - All networks (default)
  - Public endpoint (selected networks, firewall rules, virtual networks)
  - Private endpoint (for private access)
Backup Policy
- Periodic (default): Every 240 mins, 8-hour retention, 2 copies.
- Continuous (must be enabled separately).
Encryption
- Default: Service-managed keys.
- Option: Customer-managed key.
Tags
- Add key-value pairs for resource tracking (e.g., CreatedBy=Richard).
Review + Create
- Review settings, check estimated creation time (~15 mins).
- Download template for Infrastructure as Code automation.
- Click Create to begin deployment.

After Deployment

Monitor progress (initializing → submitting → deploying).
Once complete, click Go to resource.
Database is ready for further configuration.

✅ You have successfully provisioned an Azure Cosmos DB account.

Configuring an Azure Cosmos Database

Learning objective: By the end of this video, you will be able to configure an Azure Cosmos database within the Azure portal.

Navigating to the Cosmos DB Console

Go to the Azure portal and select Azure Cosmos DB from the left-hand dropdown.
Select the database you want to configure.
Collapse the navigation bar for a larger view of the configuration screen.
From here, you can view key details such as:
- Resource Group
- Subscription
- Database URI (for external access)
- Read and Write locations

Key Configuration Areas

1. Containers

Add new containers for storing and scaling data.
Containers are the unit of scalability for throughput and storage.
View and manage monitoring data across 1 hour, 24 hours, 7 days, or 30 days.

2. Access Control

Use Role-Based Access Control (RBAC) to grant permissions.
View and adjust access levels to ensure least privilege is applied.

3. Settings

Features

Toggle features such as:
- Azure Synapse Link for analytics.
- Replication settings for global distribution.

Global Replication

Add/remove regions for read/write replication.
Multi-region writes improve availability and disaster recovery, but increase costs.
Example setup: US West, UK South, and Canada East as write and read regions.
Each region incurs additional billing for throughput and storage.

Default Consistency Models

Choose one of five consistency levels:

Strong: Reads always reflect the latest writes. Highest cost.
Bounded Staleness: Guarantees global ordering with configurable lag (e.g., 5 seconds).
Session (default): Guarantees consistency per user session. Balanced cost and performance.
Consistent Prefix: Ensures reads never see out-of-order writes.
Eventual: Least cost; replicas eventually converge but no ordering guarantees.

4. Backup & Restore

Configured during provisioning.
Default: Backups every 240 minutes, retaining 8 hours with 2 copies.
Can be adjusted as needed.

5. Networking

Configure firewall settings, private endpoints, and CORS rules.
CORS (Cross-Origin Resource Sharing): Required if a web app in one domain accesses resources in another. Requests must be authenticated.

6. Keys & Security

Use read/write keys or read-only keys for API calls.
Enable Advanced Security to defend against threats like DDoS attacks.
Manage locks to prevent accidental deletions or modifications.

7. Containers & Data Explorer

Create new containers directly:
- Example:
  - Database ID: richardcontainer
  - Container ID: containerdemo
  - Partition Key: /user/id
  - Throughput: 400 RU/s (minimum)
  - Automatic Indexing: Enabled
- Analytics store can be optionally enabled.
Once created, you can:
- Insert sample data
- Add or upload items (e.g., JSON files)
- Manage stored procedures, triggers, and functions

8. Monitoring & Alerts

Track throughput, storage, and latency metrics.
Create alerts to proactively respond to issues (e.g., high usage or latency).

Summary

Configuring Cosmos DB involves tuning settings to balance performance, availability, cost, and consistency:

Manage replication across regions.
Select the right consistency model.
Configure backup, security, and networking.
Create and manage containers for your workloads.
Use monitoring and alerts to keep systems healthy.

With these configurations, you can tailor Cosmos DB to support your application’s specific needs while maintaining control over performance and cost.

Querying Data in Azure Cosmos DB

Learning objective: By the end of this video, you will be able to query data in Azure Cosmos DB using SQL-like queries within the Azure portal.

Accessing Data Explorer

In the Azure portal, go to Azure Cosmos DB.
Select the database you have already created.
Collapse the left-hand navigation bar for a cleaner view.
Navigate to Data Explorer.

Here, you can:

Start with a Sample Database (e.g., SampleDB → Persons).
Or create a new container to load your own data.

Exploring Sample Data

When you expand Persons → Items, you’ll see pre-populated documents.
Each entry can be clicked to view its details (e.g., firstname, age, id).
This works for small datasets but becomes inefficient with large numbers of entries.

Using Filters and Queries

Cosmos DB supports SQL-like queries for filtering and ordering data.

Default Query

SELECT * FROM c

c is the alias for the container.
Retrieves all items in the container.

Ordering Results

SELECT * FROM c ORDER BY c.id DESC;

Returns results in descending order by id.
Replace DESCby ASC for ascending order.

Filtering by Field Values

Query by First Name

SELECT * FROM c WHERE c.firstname = "John";

Returns only entries where firstnameequals John.

Query by Age Greater Than 30

SELECT * FROM c WHERE c.age > 30;

Returns only entires with age greater than 30.
Example result: Véronique (50), Eva (44)

Practical Usage

Combine filters and ordering for precise queries.
Example:

SELECT * FROM c WHERE c.age > 30 ORDER BY c.firstname ASC

Useful for retrieving targeted subsets instead of browsing entries manually.

Summary

To query data in Azure Cosmos DB:

Use Data Explorer in the portal.
Start with sample data or your own containers.
Apply SQL-like queries for filtering and ordering.
Example queries:
- SELECT * FROM c → return all data
- SELECT * FROM c WHERE c.firstname = 'John' → filter by name
- SELECT * FROM c ORDER BY c.id DESC → sort by ID

This allows you to efficiently pull only the data your application needs without scrolling through every entry.

Azure Cosmos DB APIs

Speaker: Richard Spencer Topic: Azure Cosmos DB API Overview

Azure Cosmos DB is a globally distributed, elastically scalable, and low-latency database service backed by strong service-level agreements (SLAs). It’s designed to handle massive workloads across different geographies with predictable performance.

Key Features

Global distribution → Deploy databases close to users worldwide.
Elastic scalability → Scale up or down automatically depending on demand.
Low latency → Guaranteed by SLAs.
Consistency levels → Five options allow trade-offs between performance and correctness.

The Five Consistency Levels

Strong consistency → Always returns the most recent committed version of data (highest guarantee, but more expensive).
Bounded staleness → Guarantees reads lag behind writes by at most "K" versions or "T" time.
Session → Within a session, reads are consistent; different sessions may see slight delays.
Consistent prefix → Guarantees that reads never see out-of-order writes (but may see partial history).
Eventual consistency → Lowest latency and cost; replicas eventually converge without immediate consistency.

Cosmos DB APIs

Cosmos DB offers multiple APIs so developers can use familiar query languages and SDKs while Cosmos handles the storage backend in its Atom-Record-Sequence (ARS) format.

1. Core (SQL) API

Default API for Cosmos DB.
Treats data as JSON documents.
Query language is SQL-like (supports SELECT, FROM, WHERE, COUNT, BETWEEN, etc.).
Supports JavaScript type system, expression evaluation, and function invocation.
Example query:
```
SELECT * FROM c WHERE c.age > 30
```


### 2. MongoDB API

- Implements **MongoDB wire protocol**.
- Allows MongoDB clients, SDKs, and drivers to connect **without code changes**.
- Stores data in **document format**.
- Makes Cosmos DB behave like a **native MongoDB database**.
- Best for teams already using MongoDB tooling.

### 3. Cassandra API

- Allows querying using **CQL (Cassandra Query Language)**.
- Data appears as a **partitioned row store**.
- Supports familiar commands like:
```cql
CREATE KEYSPACE MyKeyspace;
CREATE TABLE Users (id UUID PRIMARY KEY, name TEXT, age INT);
ALTER TABLE Users ADD email TEXT;

Provides transparent integration for teams with Cassandra experience.
Only connection settings need updating when migrating.

4. Azure Table API

Provides high availability and global distribution.
Automatically indexes all properties → no need to manage indexes manually.
Originally limited to partition key + row key indexing, now supports more.
Supports querying via:
- OData
- LINQ queries
- REST API (GET operations)
Best suited for applications already using Azure Table Storage.

5. Gremlin API

Provides a graph-based view of data.
Uses Apache TinkerPop’s Gremlin query language for graph traversal.
Data model:
- Vertices → individual items (nodes).
- Edges → relationships between items.
Example Gremlin query

g.V().has('person', 'name', 'John').out('knows').values('name')

Ideal for social networks, recommendation systems, fraud detection, or any scenario where relationships matter.

Summary

Azure Cosmos DB is not just a database—it’s a multi-model platform that adapts to various developer needs through multiple APIs:

Use Core SQL for JSON + SQL-like querying.
Use MongoDB API for Mongo-native applications.
Use Cassandra API if your team already uses CQL.
Use Azure Table API for existing Table storage workloads.
Use Gremlin API for graph-based data and complex relationships.

Each API maps onto the same underlying globally distributed ARS data model, giving flexibility without sacrificing scalability or performance.

Using Core (SQL) API for Azure Cosmos DB

This section explains how to use the Core (SQL) API with Azure Cosmos DB through the Azure Cloud Shell (Bash environment). The commands also work in the Azure CLI. This approach is useful for automation of database accounts, containers, and databases.

1. Accessing Cloud Shell

In the Azure Portal, click on Cloud Shell in the top menu bar.
Enlarge the panel if needed to get more workspace.
Choose the Bash Shell environment.

2. Creating a Cosmos DB Account

You must specify:

--name (account name)
--resource-group (must already exist)
--default-consistency-level
--locations (region)

Example:

az cosmosdb create \
  --name richardspencer \
  --resource-group CosmosDB \
  --default-consistency-level Session \
  --locations regionName=WestUS

This command can take 5–10 minutes to run.
The output will include:
- Document endpoints
- Failover priority
- IDs
- Resource group

3. Enabling Multi-Region Writes

Cosmos DB accounts start with single write region. To enable multiple write regions:

az cosmosdb update \
  --name richardspencer \
  --resource-group CosmosDB \
  --enable-multiple-write-locations true

This command runs faster than initial creation.
Once successful, endpoints and status are updated.

4. Adding Multiple Regions

You can expand the Cosmos DB account across regions with failover priorities:

az cosmosdb update \
  --name richardspencer \
  --resource-group CosmosDB \
  --locations regionName=WestUS failoverPriority=0 \
  --locations regionName=EastUS failoverPriority=1 \
  --locations regionName=SouthCentralUS failoverPriority=2

Failover priority 0 = primary region.
The output confirms success and lists new endpoints.

5. Listing Connection Strings

To retrieve connection strings for applications:

az cosmosdb keys list \
  --name richardspencer \
  --resource-group CosmosDB \
  --type connection-strings

The result includes:
- Primary & secondary keys
- Primary read-only & secondary read-only keys
These connection strings can be used in applications or SDKs.

6. Creating a Database

To create a new SQL database inside Cosmos DB

az cosmosdb sql database create \
  --account-name richardspencer \
  --resource-group CosmosDB \
  --name database1

The output confirms:
- Database name
- Resource group
- Type

7. Verifying in the Azure Portal

Open Azure Cosmos DB from the Portal homepage.
Select the created account (e.g., richardspencer).
Open Data Explorer → you should see database1.
Under Overview, confirm:
- Read/write regions: West US, East US, South Central US
- Account properties

Key Takeaways

Automation: These CLI commands enable scripting of Cosmos DB account setup, databases, and failover policies.
Multi-region availability: Cosmos DB supports automatic failover and multiple write regions.
Security: Connection strings retrieved via CLI can be used for secure application integration.
Consistency: Session consistency is a good default for balancing performance and correctness.

Using Azure Cosmos DB API for MongoDB: Creating a Database and Collection

Overview

In this session, you will learn how to create a database and a collection for the MongoDB API using Azure Cosmos DB. The process can be done through:

Azure CLI (Command Line Interface)
Azure Cloud Shell (integrated in the Azure portal)

The demo uses Cloud Shell with a Bash environment.

Steps to Create MongoDB Database & Collection

1. Define Variables

First, set up variables for resources:

uniqueId=$RANDOM
resourceGroupName="Group-$uniqueId"
location="westus2"
accountName="cosmos-$uniqueId"
serverVersion="4.0"
databaseName="database1"
collectionName="collection1"

2. Create a Resource Group

Resource groups organize Azure resources with shared lifecycles and permissions.

az group create \
  --name $resourceGroupName \
  --location $location

✅ Output confirms resource group creation, e.g., Group-7089.

3. Create a Cosmos DB Account (MongoDB API)

This step provisions the Cosmos DB account with MongoDB API:

az cosmosdb create \
  --name $accountName \
  --resource-group $resourceGroupName \
  --kind MongoDB \
  --server-version $serverVersion \
  --default-consistency-level Eventual \
  --locations regionName=westus2 failoverPriority=0 \
  --locations regionName=eastus2 failoverPriority=1

⚠️ Takes 15–20 minutes.

4. Create a MongoDB Database

After the account is ready, create the MongoDB database:

az cosmosdb mongodb database create \
  --account-name $accountName \
  --resource-group $resourceGroupName \
  --name $databaseName

5. Define Index Policy

Example: unique index with 30-day TTL (time-to-live).

idxpolicy='{
  "indexingMode": "consistent",
  "automatic": true,
  "includedPaths": [
    {
      "path": "/*",
      "indexes": [
        {
          "kind": "Range",
          "dataType": "Number",
          "precision": -1
        }
      ]
    }
  ],
  "uniqueKeys": [
    {
      "paths": ["/user_id"]
    }
  ],
  "ttl": 2592000
}'

6. Create a MongoDB Collection

Finally, create the collection with sharding, throughput, and index policy:

az cosmosdb mongodb collection create \
  --account-name $accountName \
  --resource-group $resourceGroupName \
  --database-name $databaseName \
  --name $collectionName \
  --shard "user_id" \
  --throughput 400 \
  --idx $idxpolicy

Validation in Azure Portal

Navigate to Azure Cosmos DB in the portal.
Select your account (e.g., cosmos-7089).
Confirm:
- MongoDB database
- Collection
- Throughput settings
- URI endpoint

Key Takeaways

Azure Cosmos DB allows easy integration with MongoDB applications.
The process involves creating:
1. Resource group
2. Cosmos account (MongoDB API)
3. Database
4. Collection with indexes & throughput

This setup supports global distribution, scalability, and fine-tuned policies for applications.

Using Cassandra API in Azure Cosmos DB

Overview

This session demonstrates how to create and configure an Azure Cosmos DB account using the Cassandra API, set up tables, insert data, query records, and manage replication and consistency.

Steps to Create Cosmos DB with Cassandra API

Create a Resource
- Go to Create a resource in the Azure portal.
- Navigate to Databases → Azure Cosmos DB.
- Select it to begin account creation.
Fill in Basics (Account Creation Wizard)
- Subscription: Choose Pay-As-You-Go.
- Resource Group: Select or create one (e.g., CosmosDB).
- Account Name: Example: richardkspencercassandra.
- API: Select Cassandra API.
- Notebooks (Preview): Leave Off.
- Location: West US.
- Provisioning Model: Provisioned throughput (400 RU/s + 5GB free tier).
- Discounts: Apply Free Tier discount.
Other Settings
- Environment: Non-Production.
- Geo-Redundancy & Multi-region Writes: Disabled.
- Networking: Public access for all networks (options: public, private).
- Backup Policy: Backup interval = 240 min, backup retention = 8 hours.
- Encryption: Service-managed key.
- Tags: Optional metadata for tracking.
Review + Create
- Validate settings.
- Click Create (takes up to ~15 min).
- Once deployment is complete, click Go to Resource.

Data Explorer Setup

Open Data Explorer
- Navigate to Data Explorer in the Cosmos DB Overview.
- Click New Table.

Create Table with CQL

Provide Keyspace name (e.g., table1).
Enter CQL commands:

CREATE TABLE user (
    userid int PRIMARY KEY,
    name text,
    email text,
    address text
);

- Set throughput (e.g., 400 RU/s, or Autoscale).
- Click **OK**.

3. **Add Rows**
    - Use **Add Row** option:
        - Example row:
            `userid = 10, name = Richard, email = richard@spencer.com, address = 34 sunset drive`
        - Example row:
            `userid = 14, name = Rick, email = richard@sunset.ca, address = 1 sunset drive`

4. **Query Data**
    - Use **CQL Query Builder** or **Query Text**.
    - Example query:
```sql
SELECT * FROM user WHERE userid = 10;

Options available: Add Row, Edit Row, Delete Rows, Run Query.

Advanced Settings

Replication
- Under Replicate Data Globally, enable multiple-region reads and optional multi-region writes.
Consistency Levels
- Available modes: STRONG, BOUNDED STALENESS, SESSION, CONSISTENT PREFIX, EVENTUAL.
- Default: SESSION consistency.
Monitoring & Logs
- Access logs, long-running queries, and firewall settings.
- Similar management options across all Cosmos DB APIs.
Connection Strings
- Critical for SDK integrations (Java, Python, etc.).
- Connection string includes host, port, credentials.

Key Takeaways

Cosmos DB with Cassandra API enables familiar Cassandra Query Language (CQL) on top of scalable Azure infrastructure.
Setup involves configuring account basics, enabling Cassandra API, creating keyspaces & tables, and performing CQL operations.
Supports global replication, consistency control, autoscaling throughput, and integration with multiple programming SDKs.
The Data Explorer provides an easy interface for table creation, data insertion, and query execution.

Using Azure Table API in Cosmos DB

Overview

This session demonstrates how to create and manage an Azure Cosmos DB account using the Azure Table API, create tables, add entities, query data, and configure throughput, replication, and consistency.

Steps to Create Cosmos DB with Table API

Create a Resource
- Navigate to Create a resource in the Azure portal.
- Go to Databases → Azure Cosmos DB.
- Select Azure Table API.
Fill in Account Basics
- Subscription: Pay-As-You-Go
- Resource Group: Example: CosmosDB
- Account Name: Example: richardkspencertableapi
- API: Azure Table
- Notebooks (Preview): Off
- Location: US West
- Provisioning Model: Provisioned throughput (400 RU/s + 5 GB free tier)
- Environment: Non-Production
- Geo-Redundancy & Multi-region Writes: Disabled
Networking & Backup
- Networking: All networks (public access)
- Backup Interval: 240 minutes, retain 8 hours of data
- Encryption: Service-managed keys
- Tags: Optional
Review + Create
- Validate settings.
- Click Create (≈15 min deployment).
- Once complete, click Go to Resource.
- Azure Table Endpoint will be available for SDK connection.

Data Explorer Setup

Create a Table
- Navigate to Data Explorer.
- Click New Table.
- Provide Table ID (e.g., container1).
- Set throughput (400 RU/s) and storage.
- Click OK to provision.
Add Entities
- Click Add Entity.
- Required columns: PartitionKey, RowKey, plus any custom properties.
- Example entities:

PartitionKey	RowKey	Task	AssignedTo	Product
HouseholdTasks	001	take out garbage	Richard
HouseholdTasks	002	scrub the floors	Sean	Mr. Clean

PartitionKey + RowKey form the primary key for each entity.

Query Entities
- Add clauses to filter entities.
- Examples:
```
WHERE RowKey = '002';
WHERE AssignedTo = 'Sean';
```

Advanced Settings

Throughput
- Enable/disable provisioning throughput.
- Scale throughput per application needs.
Replication
- Enable multi-region writes.
- Specify additional regions for global distribution.
Consistency Levels
- Available: STRONG, BOUNDED STALENESS, SESSION, CONSISTENT PREFIX, EVENTUAL
- Example in video: BOUNDED STALENESS
Backup & Restore
- Configurable through the Cosmos DB account panel.
Connection Strings
- Required for SDKs (Python, Java, .NET) to connect to Table API.

Key Takeaways

Azure Table API offers a schemaless, globally distributed NoSQL table service.
Tables are defined by PartitionKey and RowKey.
Data can be queried, added, updated, or deleted through Data Explorer or SDKs.
Supports throughput provisioning, multi-region replication, and consistency settings.
Integrates seamlessly with multiple programming SDKs for application use.

Using Gremlin API in Azure Cosmos DB

Overview

This session demonstrates how to provision an Azure Cosmos DB account with the Gremlin API, configure a Gremlin console, create vertices and edges, and query a graph database.

Steps to Provision Cosmos DB with Gremlin API

Prerequisites
- Install the Gremlin Console (recommended version 3.4.3).
- If on Windows, install Java Runtime.
Create Cosmos DB Account
- Navigate to Create a resource → Azure Cosmos DB.
- Set the following:
  - Subscription: Pay-As-You-Go
  - Resource Group: Example: Gremlin
  - Account Name: Example: richardkspencer
  - API: Gremlin
  - Capacity Mode: Provisioned throughput (Free Tier discount applied)
  - Environment: Non-production
  - Geo-Redundancy & Multi-region Writes: Disabled
- Networking: All networks
- Backup Policy: Every 240 minutes, retain 8 hours
- Encryption: Service-managed key
- Review & create (≈15 min).
Retrieve Connection Information
- Gremlin Endpoint: richardkspencer.gremlin.cosmos.azure.com
- Primary Key: Found under Keys
- Use these to configure Gremlin Console.

Configure Gremlin Console

Edit remote-secure.yaml in apache-tinkerpop-gremlin-console/config:

   hosts: [ "richardkspencer.gremlin.cosmos.azure.com" ]
   port: 443
   username: /dbs/sample-database/colls/sample-graph
   password: <PRIMARY_KEY>
   connectionPool:
     enableSsl: true
   serializer:
     className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV2d0

Run Gremlin Console:

cd apache-tinkerpop-gremlin-console/bin
gremlin.bat   # Windows
# OR
./gremlin.sh  # Linux/Mac

Connect to Cosmos DB:

:remote connect tinkerpop.server conf/remote-secure.yaml
:remote console

Test connection:

g.V().count()
# Should return 0 initially

Creating Vertices and Edges

Add Vertices

g.addV('person')
 .property('firstName','Thomas')
 .property('lastName','Anderson')
 .property('age', 45)
 .property('userid', 1)

Repeat for additional vertices: Mary Kay, Robin, Ben, Jack.

Add Edges

g.V().has('firstName','Thomas').addE('knows').to(g.V().has('firstName','Mary Kay'))
g.V().has('firstName','Thomas').addE('knows').to(g.V().has('firstName','Robin'))
g.V().has('firstName','Robin').addE('knows').to(g.V().has('firstName','Ben'))

Querying the Graph

Filter by property

g.V().hasLabel('person').has('age', gt(40))
# Returns Thomas Anderson

Traverse relationships

g.V().hasLabel('person').has('age', gt(40))
# Returns Thomas Anderson

Visualizing in Azure Portal

Open Data Explorer → Graph.
Execute the same Gremlin queries.
Visualize vertices and edges graphically.

Key Takeaways

Gremlin API in Cosmos DB allows creating and managing graph databases.
Supports vertices (nodes) and edges (relationships).
Query with Gremlin syntax: traversal, filtering, and graph exploration.
Integrates with Gremlin Console and Azure Data Explorer for management and visualization.
Useful for applications like IoT networks, social graphs, or gaming.

Using the Azure Portal to Manage Cosmos DB

Overview

This session demonstrates how to create and manage a Cosmos DB instance using the Azure Portal, including configuring account settings, networking, backup, encryption, and replication.

Steps to Create a Cosmos DB Instance

Start Creation
- Go to Create a resource → Azure Cosmos DB → Create.
Basics Tab
- Subscription: Select your subscription (e.g., Pay-As-You-Go).
- Resource Group: Choose an existing group or create a new one.
- Account Name: Must be globally unique (e.g., democosmosdb1234).
- API: Options include SQL (Core), MongoDB, Cassandra, Azure Table, and Gremlin.
  - Default example: SQL API
- Location: Choose the data center (e.g., West US).
- Capacity Mode:
  - Provisioned throughput: Specify throughput per container.
  - Serverless: Automatically scales; pay per usage.
- Account Type: Production or Non-production (affects interface only).
- Geo-Redundancy & Multi-region Writes: Can be enabled now or later.
Networking Tab
- All networks: Publicly accessible.
- Public endpoint (selected networks): Limit access to specific virtual networks.
- Private endpoint: Secure endpoint within a virtual network (not Internet-routable).
Backup Policy
- Backup interval: Default 240 minutes.
- Retention: Default 8 hours.
- Adjust according to business needs.
Encryption
- Cosmos DB encrypts data in transit and at rest.
- Options:
  - Service-managed key (default, simplest)
  - Customer-managed key (bring your own key)
Tags
- Optional: Add tags to manage and categorize resources.
Review + Create
- Validate all settings.
- Click Create.
- Deployment takes a few minutes.
- Click Go to resource when complete.

Managing Cosmos DB After Creation

Replicate Data Globally

Navigate to Settings → Replicate data globally.
A world map displays available regions.
Options:
- Enable Multi-region Writes for active-active replication.
- Click on regions to add/remove replication targets.
- Click Save to apply changes.

Key Takeaways

Azure Portal provides a graphical interface to create, configure, and manage Cosmos DB.
API choice determines database behavior (SQL, MongoDB, Cassandra, Table, Gremlin).
Networking, backup, and encryption settings are configurable during or after deployment.
Global replication and multi-region writes allow scaling for worldwide applications.
Management tasks can be performed without writing code, making Azure Portal user-friendly for database administration.

Using Azure Resource Manager Templates to Manage Cosmos DB

This video demonstrates how to manage Cosmos DB using Azure Resource Manager (ARM) templates.

Key Steps and Concepts

1. Access Custom Deployment

In the Azure portal, search for Deploy and select Deploy a custom template.
Opens the Custom Deployment window with three tabs: Select a template, Basics, and Review + Create.

2. Build or Edit a Template

Click Build your own template in the editor.
Paste or write a JSON ARM template.
Key components of a Cosmos DB template:
- "type": "Microsoft.DocumentDB/databaseAccounts" specifies the resource type.
- "apiVersion" sets the API version for deployment.
- "name" must be globally unique.
- "location" defines the deployment region.
- "properties" includes:
  - "databaseAccountOfferType" (e.g., Standard)
  - "consistencyPolicy" (e.g., defaultConsistencyLevel = Session)
  - "locations" for multi-region setup (can be a single region).

3. Save and Deploy Template

Click Save to store the template.
Fill in subscription and resource group details on the Basics tab.
Click Review + Create, validate, and then Create.

4. Result

The ARM template provisions a Cosmos DB account automatically.
The deployed account appears in the selected resource group and region.
ARM templates can also be run via PowerShell, Azure CLI, or Visual Studio.

Summary

ARM templates allow automated, repeatable, and flexible provisioning of Cosmos DB resources. Microsoft uses ARM templates under the hood for all portal-created resources, making them a powerful tool for infrastructure management.

Using Azure PowerShell to Manage Cosmos DB

This video demonstrates how to create and manage an Azure Cosmos DB instance using Azure PowerShell.

Key Steps and Concepts

1. Launch Azure Cloud Shell

Open the Azure portal and click the Cloud Shell button.
Cloud Shell requires a mounted storage account; create one if prompted.

2. Install Cosmos DB Module

Run the command:
```
Install-Module Az.CosmosDB
```

- Confirm installation from PSGallery when prompted.

### 3. Define Locations for Multi-Region Setup
- Use `New-AzCosmosDBLocationObject` to define regions and failover priorities:
```powershell
$locations = @(
    New-AzCosmosDBLocationObject -LocationName "East US" -FailoverPriority 0 -IsZoneRedundant $false
    New-AzCosmosDBLocationObject -LocationName "West US" -FailoverPriority 1 -IsZoneRedundant $false
    New-AzCosmosDBLocationObject -LocationName "Canada Central" -FailoverPriority 2 -IsZoneRedundant $false
)

4. Create Cosmos DB Account

Use New-AzCosmosDBAccount to provision the database:

New-AzCosmosDBAccount `
    -ResourceGroupName "YourResourceGroup" `
    -LocationObject $locations `
    -Name "demo-cosmos-ps-267" `
    -ApiKind "Sql" `
    -EnableAutomaticFailover $true `
    -DefaultConsistencyLevel "Session"

The Name must be globally unique.
ApiKind can be SQL, Gremlin, Table, or MongoDB.
EnableAutomaticFailover allows failover across regions.
DefaultConsistencyLevel can be set to Session, Strong, Bounded Staleness, or Eventual.

5. Verify Deployment

Once the command completes, check the Cosmos DB account in your resource group.
Under Replicate data globally, confirm the write and read regions.

Summary

Azure PowerShell allows fully scripted provisioning and management of Cosmos DB. This can be done via Cloud Shell or a local PowerShell environment, enabling automation and consistent deployments.

Using Azure CLI to Manage Cosmos DB

This video demonstrates how to create and manage an Azure Cosmos DB instance using Azure CLI.

Key Steps and Concepts

1. Launch Azure Cloud Shell

Open the Azure portal and click the Cloud Shell button.
If this is your first time, create a storage account when prompted.
Switch the shell to Bash (from PowerShell) to use Azure CLI commands.

2. Create Cosmos DB Instance

Use the az cosmosdb create command:

az cosmosdb create \
    -n demo-cosmos-cli-436 \
    -g YourResourceGroup \
    --default-consistency-level Session \
    --locations regionName="West US 2" failoverPriority=0 \
    --locations regionName="East US 2" failoverPriority=1 \
    --locations regionName="Canada Central" failoverPriority=2

- `-n` specifies a globally unique instance name.
- `-g` specifies the existing resource group.
- `--default-consistency-level` can be `Session`, `Strong`, `BoundedStaleness`, or `Eventual`.
- `--locations` sets multiple regions and failover priorities.
    - `failoverPriority=0` is the primary read/write region.
    - Subsequent priorities define failover order.

### 3. Verify Deployment

- The command returns a JSON object showing the provisioning state of each region.
- Check in the Azure portal under your resource group.
- Under **Replicate data globally**, confirm your write region and additional read regions.

## Summary

Azure CLI allows complete management of Cosmos DB instances, including creation, replication, and reconfiguration.
CLI scripts can be run via Azure Cloud Shell, locally on your machine, or integrated into Visual Studio for automation and flexibility.

# Cosmos DB Firewall

This video explains how to configure firewall settings for an Azure Cosmos DB account.

## Firewall Settings

Navigate to **Firewall and virtual networks** under **Settings** in your Cosmos DB account. There are three main ways to control access:

### 1. All Networks
- Allows access from anywhere, including the public Internet.
- Essentially disables the firewall.
- Useful for quick testing but **not secure** for production.

### 2. Selected Networks
- Restrict access to specific IP addresses or CIDR ranges.
- Only resources with IPs in this list can connect.
- **Limitations**:
  - IP addresses must be static.
  - Dynamic IPs cannot reliably access the database.
- Example: Add your current IP to allow access.

### 3. Virtual Networks
- Grant access to entire virtual networks instead of individual IPs.
- New resources added to these virtual networks automatically inherit access.
- Useful for managing multiple Azure services programmatically.

## Exceptions
- **Allow access from Azure Portal**: Required to access the database via the portal.
- **Accept connections from public Azure datacenters**: Opens access to all public Azure IP ranges—less secure.

## Demonstration
- Attempting to create a new item in a database without proper firewall rules results in a block.
- Adding the IP address to the firewall allows access after a short propagation delay.
- Virtual networks simplify management for multiple resources.

## Summary
- **All networks**: open access, least secure.
- **Selected IPs**: precise control, requires static IPs.
- **Virtual networks**: scalable and flexible for multiple resources.
- Always configure exceptions carefully to balance access needs and security.

# Cosmos DB Authentication

This video explains the authentication methods available in Azure Cosmos DB.

## 1. Authorization Key
- Simplest and quickest way to access a Cosmos DB account programmatically.
- Requires two pieces of information:
  - `endpointUrl`: The URI of the Cosmos DB account.
  - `authorizationKey`: A read-write or read-only key from the account.
- Limitations:
  - Anyone with the key can access the database.
  - Not suitable for production secrets management.
- Example use:
  - Connect using the Cosmos client in Visual Studio.
  - Create databases and containers programmatically.

## 2. Cosmos DB Users and Permissions
- Provides fine-grained access control on a database level.
- Cosmos DB users are specific to the database; they are not Azure Active Directory users.
- Permissions are assigned per container and can be restricted to specific partitions.
  - `permissionMode`: `Read` or `All`.
  - `PartitionKey`: Restricts access to specific records in the container.
- Requires generating resource tokens for the user to access data programmatically.

## Summary
- **Authorization key**: quick setup, broad access, less secure.
- **Cosmos DB users**: granular access control, suitable for production scenarios requiring limited permissions.
- Fine-grained permissions can target specific containers and partitions.

# Cosmos DB Security

This video explains the security features and best practices for Azure Cosmos DB.

## 1. Encryption
- Cosmos DB encrypts data **in transit** and **at rest**.
- Ensures that only authorized users with valid keys or certificates can read the data.
- Protects sensitive information even if the database is compromised.

## 2. Access Control Principles
- **Minimum authority**: Users should only have access to what they need.
- **Authentication & authorization**: Only verified users can access data.
- **Consider administrators and database providers**: Security depends on whether you host your own database or use a cloud provider like Azure.

## 3. Best Practices
- Keep software up to date with patches and security updates.
- Always use **HTTPS** for exposed URLs.
- Enforce **strong passwords** and consider **multifactor authentication**.

## 4. Azure Cosmos DB Security Features
- **Firewall**: Restricts access to authorized IP addresses.
- **Azure Active Directory integration & RBAC**: Control permissions for Azure users using roles.
- **High availability & backups**: Automatic replication and backup across regions.
- **Container-level isolation**: Store sensitive data in isolated containers with specific authorization.

## 5. Access Keys
- Auto-generated primary and secondary keys during database creation.
- Support **key regeneration** to rotate keys without disrupting clients.
- Grant or deny access to the **entire Cosmos DB instance**.

## 6. Granular Access Control
- Use **RBAC** to assign specific permissions to users or services.
- Use **resource tokens** to grant access to subsets of data within containers.

## Summary
- Cosmos DB ensures strong security with encryption, controlled access, backups, and isolation.
- Access keys provide easy setup but broad access.
- RBAC and resource tokens allow fine-grained control for production scenarios.

# Using Azure Cosmos DB Explorer

This video demonstrates how to use **Azure Cosmos DB Data Explorer** to create, manage, and query data.

## 1. Accessing Data Explorer
- Navigate to your Cosmos DB account in the Azure portal.
- Click **Data Explorer** on the left-hand menu.
- Data Explorer provides a basic UI for interacting with your databases and containers.

## 2. Creating a Database and Container
- Click **New Container** → option to create a new database as well.
- **Database ID**: e.g., `DemoDB`.
- **Provisioned Throughput**: default 400 Request Units per second.
- **Container ID**: e.g., `Persons`.
- **Partition Key**: choose a property in your records (e.g., `/name`).
- Indexing is automatic by default, indexing all fields for queries.

## 3. Adding Data
- Select **Items** under the container and click **New Item**.
- JSON records can include:
  - Strings, numbers, arrays, and nested objects.
  - Different properties for each record are allowed (NoSQL flexibility).
- Cosmos DB automatically adds an `id` property if not specified.
- Save the item to store it in the container.

### Example Records
```json
[
	{
	  "name": "John Doe",
	  "age": 45,
	  "hobbies": ["car restoration", "skiing"]
	},
	{
	  "name": "Ruth Bitterman",
	  "age": 65,
	  "address": "123 Easy Street"
	},
	{
	  "name": "Alice Nelson",
	  "hobbies": ["skiing", "sky diving"],
	  "address": "123 Easy Street"
	}
]

4. Querying Data

Use Edit Filter or New SQL Query.
Example: Filter by age:

SELECT * FROM c WHERE c.age < 50;

Example: Filter by array content:

SELECT * FROM c WHERE ARRAY_CONTAINS(c.hobbies, "skiing")

5. Advanced Options

Upload Items from files.
Update/Delete existing items.
Stored Procedures, User Defined Functions (UDFs), and Triggers:
- Trigger Types: Pre or Post.
- Trigger Operations: Create, Delete, Replace, or All.
Link to Azure Synapse for advanced analytics.

Summary

Azure Cosmos DB Data Explorer allows quick interaction with your databases:
- Create databases and containers.
- Add and manage JSON data.
- Filter and query data using simple or advanced queries.
- Implement stored procedures, UDFs, and triggers for automated logic.

Azure Cosmos DB Geo-replication

This video explains how Azure Cosmos DB achieves global reliability, low latency, and high availability using geo-replication.

1. Global Distribution

Distributed databases face challenges serving clients worldwide due to network latency.
Cosmos DB solves this by replicating databases across multiple Azure regions.
Read and write requests are routed to the closest regional replica:
- Australian clients → Australian replica
- Japanese clients → Japanese replica
From the client’s perspective, the system behaves as a single global database.

2. Multi-region Read and Write

Reads: Served from the nearest replica.
Writes: Replicated across all regions using eventual consistency:
- Writes may not appear immediately in all regions.
- All replicas eventually become consistent.
Client requests are managed through multi-homing APIs, which determine the closest region automatically.

3. High Availability and Low Latency

Cosmos DB guarantees:
- 99.999% read/write availability globally.
- 99% of read/write requests return within 10 milliseconds.
Supports elastic scalability:
- Add/remove regions as needed without downtime.
- Write requests are replicated across all regions.

4. Business Continuity

Automatic Failover:
- If a regional replica fails, traffic is routed to other regions seamlessly.
- Customers experience no service interruption.
Manual failover testing:
- Cosmos DB provides APIs to simulate failover for testing preparedness.

5. Resource Management

Data is stored in containers with configurable throughput measured in Request Units per second (RUs/sec).
Cosmos DB handles partitioning and indexing automatically to optimize performance across regions.

6. Consistency Models

Cosmos DB allows tuning the trade-off between latency/performance and data consistency.
Options range from strong consistency to eventual consistency, depending on application needs.

7. Summary

Geo-replication enables:
- High availability with multiple regional replicas.
- Low-latency reads and writes worldwide.
- Business continuity through automatic failover.
Eventual consistency ensures replicas converge over time.
With multi-region setup, Cosmos DB can achieve 99.999% global read/write availability while supporting scalable and responsive applications.

Cosmos DB IP Access Control Policy Issues

This video discusses potential issues and best practices for managing IP-based access control in Azure Cosmos DB.

1. Purpose of IP Access Control

Globally distributed databases like Cosmos DB should not be openly accessible to the Internet.
Access should be limited to a small set of IP addresses representing trusted clients or services.

2. Cosmos DB Firewall

Cosmos DB includes a built-in firewall:
- Blocks requests from IP addresses outside allowed ranges.
- Provides granularity down to individual machines.
Static IP addresses of trusted clients should be added to the firewall configuration.

3. Challenges with Azure Internal Resources

Some Azure services do not have static IP addresses.
Cosmos DB allows an option to permit all internal Azure IP addresses, but:
- This opens access to other subscriptions in Azure.
- Static IP limitations are more secure if possible.

4. Logging and Troubleshooting

Diagnostic logging helps track rejected requests:
- Logs include 403 forbidden errors and source IP addresses.
- Useful for identifying misconfigured clients or potential malicious activity.
SDK behavior:
- Cosmos DB SDK returns a generic 403 response for blocked IPs.
- Applications must ensure their IPs are in the allowed ranges.

5. Virtual Network Integration

Instead of listing public IP addresses, you can:
- Place clients in subnets within a virtual network.
- Configure Cosmos DB to allow access only from selected virtual networks.
- Subnet-based access automatically grants access to new resources in the subnet.

6. Private IP Addresses

Private Link allows Cosmos DB access via private IP addresses.
Advantages:
- More secure because private IPs are not routable via the Internet.
- Reduces the security risk footprint.
Important: Do not list private IP addresses in the main firewall; use private endpoint configuration instead.

7. Key Takeaways

IP-based access control is effective only for static IP addresses.
Virtual network and subnet-based access is more scalable and secure than public IP whitelisting.
Private IP access via Private Link is the most secure method.
IP access control is one layer of security, complementing:
- Resource-based access control (RBAC)
- Resource tokens

Cosmos DB .NET SDK Issues

This video discusses common issues and best practices when using the Cosmos DB .NET SDK in production applications.

1. Always Use the Latest SDK

Keep your SDK up-to-date to benefit from the latest:
- Network features
- Retry logic
- Stability improvements
"Evergreen" your software regularly, ideally whenever you modify it.

2. SDK Logging and Diagnostics

Enable logging to diagnose networking and application issues:
- Can log SQL query metrics and diagnostic strings from SDK calls.
- Use configuration to toggle logging on/off to avoid performance impacts.
Portal metrics:
- Helps distinguish client-side vs server-side issues.

3. Understanding Retry Logic

Cosmos DB SDK includes built-in retry logic for transient errors:
- Automatically retries failed requests until success or max attempts.
- Transparent to the application—success responses may hide transient failures.
Avoid implementing custom retry logic that conflicts with SDK retry behavior.

4. Common Error Codes

Status Code	Meaning	Notes
400	Bad Request	Client request invalid (e.g., field validation failed)
408	Request Timeout	Network idle too long or slow server response
410	Gone	Resource no longer exists
429	Too Many Requests	Request rate exceeds configured throughput (RU/s)
503	Service Unavailable	Service busy or down for maintenance

4xx errors: usually client-side, fixable by adjusting the request.
5xx errors: server-side, cannot be corrected by the client.

5. Virtual Machine Considerations

VMs without public IPs use SNAT ports to access Cosmos DB:
- Limited number of ports can cause throttling or request timeouts.
Solutions:
1. Place Cosmos DB and VM in the same subnet, enable service endpoints.
2. Assign a public IP to the VM (less secure, but avoids SNAT exhaustion).

6. Diagnostics for Latency

V3 SDK provides diagnostic objects with request and response timings.
Use diagnostics to identify:
- Network congestion
- Geographic latency (keep app & database in same region)
- Query inefficiencies
Optimize by batching requests where possible to reduce server processing time.

7. Summary Best Practices

Keep SDK up-to-date.
Enable logging and metrics for diagnostics.
Understand and leverage SDK retry logic.
Handle HTTP error codes correctly.
Be aware of SNAT limitations for VMs without public IPs.
Use diagnostic metrics to optimize queries and reduce latency.

Cosmos DB Java SDK Issues

This video covers network issues and troubleshooting tips when using the Azure Cosmos DB Java SDK.

1. CPU and Load Considerations

High CPU usage on the application host can degrade network performance.
Recommendation: Keep CPU utilization below 50% and distribute load across multiple hosts if necessary.

2. Region and Latency

Minimize latency by running your application and Cosmos DB in the same Azure region.
Applications in different regions incur higher latency due to cross-region communication.

3. Connection Throttling and Port Limitations

Be aware of host system connection limitations (Linux or VM-based hosts).
SNAT/PAT port exhaustion occurs when VMs without public IPs use limited outbound ports.
Use the Java SDK's connection policy to manage the number of connections allowed.
Ensure connection counts match the HTTP proxy limits if using one.

4. Request Rate Too Large Errors

Occurs when request rate exceeds provisioned throughput on Cosmos DB.
Solutions:
- Implement backoff retry using the server-specified interval (getRetryAfterInMilliseconds).
- Increase database throughput if errors persist under expected loads.
- Test with expected production request rates to validate configuration.

5. Logging and Diagnostics

Log network errors for debugging, but allow debug-level logging to be toggled off in production to avoid performance impact.
Java SDK v4 uses SLF4j logging facade:
- Supports frameworks like Log4j and Logback.
Log SQL metrics, diagnostics, and retry information to identify network issues.

6. Network Connection Monitoring

Use netstat to monitor open connections:
- Linux: netstat -nap
- Windows: netstat -abn
Check Cosmos DB connections:
- Established connections should not exceed the connection pool size.
- Many CLOSE_WAIT connections indicate too frequent open/close operations.

7. Summary Best Practices

Keep CPU load under control and distribute application load.
Run application and Cosmos DB in the same region to minimize latency.
Match SDK connection policy to host and proxy limits.
Handle request rate too large errors with retry/backoff logic.
Enable configurable logging and monitor connection states.
Validate production throughput and connection limits during testing.

Using Azure Synapse Analytics Service

This video demonstrates how to provision and use Azure Synapse Analytics, a limitless analytics service for data integration, preparation, management, and serving for BI and ML.

1. Accessing Synapse Analytics

In the Azure Portal, search for Synapse Analytics.
Click to open the Azure Synapse Analytics console.
From the console, you can create a Synapse workspace.

2. Creating a Synapse Workspace

Basics Tab:
- Select Subscription and Resource Group (create a new one if needed).
- Set a Workspace Name and Region (e.g., East US).
- Specify Data Lake Storage Gen2 Account Name and File System Name.
- Assign yourself the Storage Blob Data Contributor role.
Security Tab:
- Set Admin Username and Password.
- Optionally enable/disable Double Encryption.
- Allow pipelines to access SQL pools.
Networking Tab:
- Choose to allow connections from all IP addresses or configure specific IP ranges.
Tags Tab:
- Optionally add tags for resource management (e.g., Created by Richard Spencer).
Summary Tab:
- Review all configurations and estimated cost (e.g., serverless SQL $6.40 CAD per TB).
- Click Create to deploy the workspace (~5–10 minutes).

3. Accessing the Workspace

After deployment, click Go to Resource.
Open the Workspace Web URL to access the Synapse Analytics portal.
Accept cookies and enter the workspace.

4. Workspace Hubs

The left-hand pane contains several hubs:

Data Hub

View workspace data and linked data sources (like the Data Lake storage created).

Develop Hub

Create:
- SQL Scripts
- Spark Notebooks
- Data Flows Mapping
- Power BI Reports

Integrate Hub

Build pipelines for data ingestion and orchestration.
Copy data from other sources and schedule data flows.

Monitor Hub

Monitor jobs and pipeline execution.

Manage Hub

Manage security, SQL pools, and Apache Spark pools.

5. Summary Best Practices

Azure Synapse enables end-to-end analytics: ingest, prepare, manage, and serve data.
Provisioning a workspace takes just a few minutes.
Explore all hubs to develop scripts, orchestrate pipelines, and monitor/manage your analytics environment.
Use linked data sources to integrate external storage or databases.

Best Practices for Azure Synapse SQL Pool

This video covers best practices for managing dedicated SQL pools in Azure Synapse Analytics, focusing on cost optimization, scaling, and monitoring.

1. Accessing SQL Pools

Open the Dedicated SQL pools console in your Synapse workspace.
Select the SQL pool you want to manage.
Best practice: Always access SQL pools through a Synapse workspace for full integration.

2. Cost Optimization: Pause and Resume

Pause your SQL pool when not in use to save costs.
- Ensure there are no active user queries before pausing.
- Pausing temporarily limits connectivity.
Resume your SQL pool when needed.
- Connectivity will return once the pool is fully resumed.

3. Scaling the SQL Pool

Use the Scale button to adjust the Data Warehouse Units (DWUs).
- Slide to increase or decrease DWUs according to your workload.
- Example ranges: DW100c → DW30000c.
Best practice: Start small, monitor workload, and find the optimal DWU level for performance vs. cost.

4. Monitoring Query Activity

Use the View Queries option to inspect current queries.
Monitor long-running queries for optimization opportunities.
Optionally export query activity to Excel for deeper analysis.

5. Maintenance and Scheduling

Define a maintenance window to ensure SQL pool updates and operations do not disrupt workloads.

6. Alerts and Metrics

Set up alerts for metrics like:
- Long-running queries
- Resource usage
- Query performance issues
Use these metrics to optimize queries and performance over time.

Summary

Best practices for Azure Synapse SQL pools include:

Pause and resume to save costs.
Scale the pool based on workload needs.
Monitor queries to optimize performance.
Schedule maintenance during low-impact times.
Use alerts and metrics for proactive management.

Following these practices ensures cost efficiency, high performance, and optimized query execution in Synapse SQL pools.

Understanding Data Warehouse Units (DWU) in Azure Synapse Analytics

Data Warehouse Units (DWUs) are a measure of compute resources in Azure SQL Data Warehouse / Synapse SQL pools, including CPU, memory, and I/O capabilities. They determine how quickly your queries, data ingestion, and table operations can be processed.

1. What DWUs Are

DWUs abstract underlying hardware, allowing Microsoft to optimize or move resources without affecting your workload.
Increasing DWUs:
- Improves query performance by allowing parallel execution.
- Accelerates data ingestion and table copying.
Decreasing DWUs:
- Reduces costs by scaling down unused resources.

2. Scaling DWUs

You can scale your DWUs up or down based on workload needs:

Methods:

Azure Portal – Slider in the database properties page.
SQL Server Management Studio (SSMS) – Database properties.
PowerShell – Set-AzSqlDatabase cmdlet.
REST API – Send requests to Azure.
T-SQL – Use ALTER DATABASE commands.

Scaling Workflow:

Scale Up: Pause or complete active transactions, provision additional nodes, reattach.
Scale Down: Ensure transactions are complete, release excess nodes, maintain data consistency.

3. Metrics to Monitor DWU Performance

Query Performance: How fast can large datasets be scanned and aggregated? (CPU & I/O intensive)
Data Ingestion: Speed of loading data from Azure Storage or Data Lake. (CPU & network intensive)
Table Copying: Reading and writing data across nodes. (CPU, I/O, network intensive)

4. Service Level Objectives (SLOs)

SLOs define performance and cost targets.
Increasing DWUs:
- Boosts performance
- Increases cost
Decreasing DWUs:
- Lowers cost
- Reduces performance
Optimize DWUs to balance cost vs. workload needs.

5. Performance Tiers

Generation 1: Uses DWUs.
Generation 2: Uses Compute DWUs (cDWUs) with local disk-based caching for faster access.
Both generations support:
- On-demand scaling
- Pausing when compute is not required
Generation 2 offers faster performance due to caching but may require cache warming after scaling.

6. Optimizing DWUs

Start with a smaller DWU.
Monitor performance during test and production loads.
Adjust DWUs up or down based on:
- Query performance
- Data ingestion speed
- Table operations
Goal: Achieve high performance at minimal cost.

7. Permissions for Changing DWUs

Requires ALTER DATABASE permissions.
Built-in Azure roles:
- SQL DB Contributor
- SQL Server Contributor

Summary

DWUs are the fundamental measure of compute resources in Synapse SQL pools. Properly managing and scaling DWUs allows you to optimize performance, cost, and workload efficiency. Always monitor metrics and adjust DWUs to match your workload, ensuring optimal resource utilization.

Comparing Transactional (OLTP) and Analytical (OLAP) Workloads

Understanding the differences between OLTP and OLAP systems is critical for designing effective databases and data warehouses. These two types of workloads serve very different purposes in an organization.

1. OLTP: Online Transaction Processing

Purpose: Manage day-to-day business operations and transactions.

Key Features:

Simple queries: Typically INSERT, UPDATE, DELETE operations.
Fast response times: Needed for end-user productivity (e.g., point-of-sale systems, ATMs).
High concurrency: Supports thousands of users performing small transactions simultaneously.
ACID-compliant: Ensures data integrity.
Frequent backups: Data is backed up regularly to prevent loss.
Smaller storage requirements: Handles current transactional data, not historical aggregates.
Lifecycle patterns: May experience varying access patterns (e.g., month-end vs. year-end processing).

Examples:

Cashier systems at supermarkets
Bank ATMs
Telephone survey data entry

Environment Considerations:

Must be highly available.
Handles small, simple transactions.
Background jobs may run data-intensive processes (e.g., batch updates).

2. OLAP: Online Analytical Processing

Purpose: Consolidate and analyze historical data from multiple sources for reporting and decision-making.

Key Features:

Complex queries: Primarily SELECT statements to aggregate and analyze data.
Multi-dimensional view: Provides a consolidated view of organizational data across time, regions, and other dimensions.
Used by analysts: Supports business intelligence and complex decision-making.
Less frequent backups: Data is preprocessed before entering the warehouse.
Larger storage requirements: Stores large historical datasets.
Complex processing: Aggregates data from multiple sources, often with heavy CPU, I/O, and network use.

Examples:

Sales analysis across years and regions
Business intelligence dashboards
Historical trend analysis

Environment Considerations:

Optimized for read-heavy workloads.
Complex queries can require significant compute resources.
Supports fewer concurrent users compared to OLTP.

3. Key Differences: OLTP vs. OLAP

Feature	OLTP	OLAP
Purpose	Day-to-day transactions	Data analysis & reporting
Query Type	Simple, transactional (`INSERT/UPDATE/DELETE`)	Complex, analytical (`SELECT`)
Response Time	Short, interactive	Longer, batch-oriented
Users	Employees, end-users	Business analysts
Data Integrity	High (ACID)	Moderate (preprocessed data)
Storage Requirement	Smaller	Larger (historical data)
Backup Frequency	Frequent	Less frequent
Transactions	Small	Large, aggregated
Concurrency	High (thousands of users)	Lower

4. Summary

OLTP systems are designed for speed, concurrency, and transactional integrity, supporting day-to-day business operations.
OLAP systems are designed for complex analysis, historical data aggregation, and decision support, providing a multi-dimensional view of the enterprise.
Together, OLTP and OLAP form the two sides of a data warehousing architecture: OLTP as the source of data, OLAP as the analytical layer.

Comparing Batch and Real-Time Data Processing

Data processing approaches can be broadly classified into batch processing and real-time processing, each serving different business needs.

1. Batch Processing

Definition: Processing large volumes of data collected over a period of time as a single group or “batch.”

Characteristics:

Periodic execution: Data is collected, processed, and output at scheduled intervals.
Separate stages: Input, processing, and output are handled in distinct programs or steps.
Latency: Results are not immediate; output may be delayed by hours or more.
Best suited for: Operations where immediate action is not critical.

Examples:

Payroll systems
Billing systems
Monthly or daily reporting

Advantages:

Efficient for processing massive datasets.
Can optimize resource usage by grouping tasks.
Easier to manage for predictable workloads.

Limitations:

Not suitable for applications that require immediate insights or responses.
Cannot react to events in real-time.

2. Real-Time Data Processing

Definition: Continuous input and processing of data to produce immediate or near-immediate results.

Characteristics:

Continuous execution: Data is processed as soon as it is generated.
Low latency: Provides actionable insights immediately.
Best suited for: Time-sensitive operations where immediate decisions are required.

Examples:

ATM transactions
Radar and monitoring systems
Fraud detection systems
Live sales or operational analytics

Advantages:

Enables quick decision-making.
Supports complex event processing (CEP) for pattern detection and threat identification.
Powers operational intelligence (OI) to identify and address problems immediately.
Enhances customer experience by resolving issues proactively.

Limitations:

Requires more computational resources and advanced architecture.
Can be complex to implement and maintain.

3. Operational Intelligence vs. Operational Business Intelligence

Type	Data Focus	Processing Type	Purpose
Operational Intelligence (OI)	Real-time operational data	Continuous / near real-time	Detect and remedy issues immediately; supports quick decisions
Operational Business Intelligence (OBI)	Historical operational data	Batch / periodic	Descriptive or historical analysis for reporting and trend analysis

4. Summary

Batch processing is ideal for non-time-sensitive tasks with large datasets, such as payroll or monthly reporting.
Real-time processing is necessary for immediate decision-making, pattern detection, and operational insights, such as fraud detection or live analytics.
Organizations often use a combination of both approaches depending on the nature of the workload and business requirements.

Using Azure Portal to Create a Synapse SQL Pool

This guide explains how to create a dedicated SQL pool in Azure Synapse Analytics using the Azure Portal.

1. Navigate to Synapse Workspace

Log in to the Microsoft Azure Portal.
In the search bar, type Synapse Analytics and select it.
Click on the Synapse workspace you want to use (e.g., synapseworkspacerks).
Collapse the left-hand side menu to see more workspace options if needed.

2. Create a Dedicated SQL Pool

In the workspace console, locate Analytics pools → SQL pools.
Click New dedicated SQL pool.
Provide the following details:
- Dedicated SQL pool name: Example: rkssqlpool
- Performance level: Use the slider to select compute resources:
  - Left: lower resources, lower cost
  - Right: higher resources, higher cost (e.g., DW30000c ≈ $579/hr)
  - For demo purposes, select DW100c.
Choose a data source:
- Blank dedicated SQL pool (default)
- Restore from a backup
- Restore from a restore point
Keep Collation as default.
Add a tag (optional):
- Example: CreatedBy: RichardSpencer
Click Next: Review + create.

3. Review and Deploy

Review your configuration:
- Performance level
- Cost estimate
- Tags
Click Create to start the deployment.
Deployment typically completes in ~5 minutes.
Once deployed, click Go to resource.

4. Access Your SQL Pool

In the Synapse workspace, go to Analytics pools → SQL pools.
Locate the dedicated SQL pool you created (e.g., rkssqlpool).
Click on it to access the pool for:
- Loading data
- Processing streams
- Reading data from your Data Lake

Notes

Dedicated SQL pools are formerly known as SQL Data Warehouses (SQL DW).
Compute resources define the performance and cost of your pool.
You can scale the pool later if needed to adjust performance or cost.

Using Azure PowerShell to Create a Synapse SQL Pool

This guide explains how to create a dedicated SQL pool in Azure Synapse Analytics using Azure PowerShell.

1. Install Azure PowerShell

Open Windows PowerShell ISE or any PowerShell console.
Run the following script to install the Az module:

if ($PVersionTable.PSEdition -eq 'Desktop' -and (Get-Module -Name AzureRM -ListAvailable)) {
    Write-Warning -Message ('Az module not installed. Having both the AzureRM and Az modules installed at the same time is not supported.')
} else {
    Install-Module -Name Az -AllowClobber -Scope CurrentUser
}

Confirm any prompts to complete the installation.

2. Connect to Your Azure Account

Connect-AzAccount

Sign in with your Azure credentials.
If multiple subscriptions exist, the first one will be used by default.

3. Set Variables

Define variables for resource group, server, database and credentials:

$resourceGroupName = "myResourceGroup"
$location = "West US"
$serverName = "server$(Get-Random)"
$adminLogin = "sqladmin"
$adminPassword = "YourSecurePassword123!"
$startIp = "0.0.0.0"
$endIp = "0.0.0.0"
$databaseName = "mySampleDataWarehouse"

4. Create a Resource Group

New-AzResourceGroup -Name $resourceGroupName -Location $location

5. Create a SQL Server

New-AzSqlServer `
    -ResourceGroupName $resourceGroupName `
    -ServerName $serverName `
    -Location $location `
    -SqlAdministratorCredentials (New-Object -TypeName PSCredential -ArgumentList $adminLogin, (ConvertTo-SecureString $adminPassword -AsPlainText -Force))

6. Configure Firewall Rules

Allow access to the server:

New-AzSqlServerFirewallRule `
    -ResourceGroupName $resourceGroupName `
    -ServerName $serverName `
    -FirewallRuleName "AllowAll" `
    -StartIpAddress $startIp `
    -EndIpAddress $endIp

7. Create the Dedicated SQL Pool

New-AzSqlDatabase `
    -ResourceGroupName $resourceGroupName `
    -ServerName $serverName `
    -DatabaseName $databaseName `
    -Edition "DataWarehouse" `
    -RequestedServiceObjectiveName "DW100c" `
    -CollationName "SQL_Latin1_General_CP1_CI_AS" `
    -MaxSizeBytes 1099511627776

RequestedServiceObjectiveName defines the data warehouse units (DWUs) for performance.
The creation may take up to 10 minutes.

8. Verify the SQL Pool

In the Azure portal, search for SQL pools -> dedicated SQL pools.
Select your new SQL pool.
You can now access this pool from a synapse workspace for data loading, stream processing and analytics.

Notes

Ensure you have appropriate permissions to create servers and databases.
firewalls rules are necessary for access to the server from your IP range.
You can scale the pool late to adjust performance or cost.

Data Warehousing Workloads

Data warehousing workloads are the various types of tasks and processing activities that a data warehouse is designed to handle. Understanding these workloads helps in designing, optimizing, and maintaining an efficient data warehouse.

1. Continuous Data Loading

Description: Also called near real-time data loading, it involves constantly ingesting new data into the warehouse.
Characteristics:
- Indexes and optimization structures must be continuously updated.
- Impacts aggregate and summary data used in reports and dashboards.
Example: Real-time customer order processing.

2. Batch Data Loading

Description: Processes large amounts of data periodically instead of continuously.
Characteristics:
- Suitable for data that doesn’t require real-time updates.
- Less impact on indexes and optimization structures compared to continuous loading.
Examples: Payroll processing, nightly ETL jobs.

3. Reporting

Description: Generating structured reports from warehouse data for business insights.
Characteristics:
- Requires indexing and query optimization.
- Often involves storage partitioning and pre-aggregated data.
- Fast query performance is critical.
Examples: Daily financial reports, operational dashboards.

4. Business Analytics

Description: Analytical applications that allow users to explore data and perform multidimensional analysis.
Characteristics:
- Users can drill down into data cubes and perform tactical analysis.
- Business Intelligence (BI) architects often build reusable cubes for analysts.
- Users may have limited programming experience but require actionable insights.
Examples: Market trend analysis, sales performance by region.

5. Ad Hoc Queries

Description: Unpredictable queries generated as business questions arise.
Characteristics:
- Data miners or analysts access data in random ways.
- Hard to optimize since queries are unpredictable.
- Data warehouses must support flexible and responsive querying.
Examples: One-off analysis for a specific business decision.

6. OLTP Database Integration

Description: Traditional Online Transaction Processing (OLTP) systems provide source data to the warehouse.
Characteristics:
- Data warehouses exist on top of OLTP systems to facilitate analysis.
- OLAP (Online Analytical Processing) systems analyze the data for reporting and BI.
Example: Feeding transaction data from a retail POS system into a warehouse for sales analysis.

Summary

Data warehouse workloads include:

Workload	Key Purpose	Example
Continuous Data Loading	Real-time ingestion	Customer orders
Batch Data Loading	Periodic data updates	Payroll
Reporting	Generate business reports	Financial dashboards
Business Analytics	Multidimensional analysis	Sales trend analysis
Ad Hoc Queries	Random query support	Data mining
OLTP Integration	Provide source data	Retail transaction data

These workloads collectively enable businesses to make informed decisions, optimize processes, and gain strategic advantages.

When to Use a Data Warehouse Solution

A data warehouse is a specialized database system designed to collect, integrate, and analyze data from multiple sources. It is primarily used for reporting, analysis, and business intelligence (BI), rather than for real-time transaction processing.

Key Characteristics of a Data Warehouse

Purpose-built: Optimized for complex queries and analysis rather than transactional operations.
Historical data: Stores large volumes of historical data for trend analysis.
Structured repository: Holds data in a consistent, structured format.
Integrates multiple sources: Combines data from different systems into a unified schema.

Data Warehouse vs. Data Mart

Feature	Data Warehouse	Data Mart
Scope	Enterprise-wide, multiple subject areas	Subset of warehouse, specific to a business line or team
Purpose	Centralized storage, analysis, and reporting	User-focused access layer for analysis
Data Volume	Large, detailed historical data	Smaller, often summarized data

Typical Workloads and Use Cases

Integration of multiple data sources
- Combines sales, marketing, finance, and other departmental data for unified reporting.
Historical analysis
- Enables multi-year trend analysis and business intelligence.
Data restructuring
- Standardizes formats from diverse sources for consistent analysis.
Reporting
- Generates end-user reports, dashboards, and automated insights.
Business Intelligence (BI)
- Supports OLAP cubes, multidimensional analysis, and interactive dashboards.
Data mining
- Identifies hidden patterns and trends through complex ad hoc queries.
Secure access
- Provides controlled access to sensitive data for authorized users.
Support for analytical tools
- Optimized for BI and analytics tools to extract insights efficiently.

Example Scenarios

An online retailer analyzing how product reviews correlate with sales trends.
A company evaluating historical marketing campaign performance to plan future campaigns.
A business generating daily executive dashboards summarizing key metrics like revenue, profit, and sales.

Summary

You should consider a data warehouse solution when you need to:

Consolidate and integrate data from multiple sources.
Perform complex queries or trend analysis on historical data.
Generate reports, dashboards, or BI insights.
Support data mining and ad hoc analytical queries.
Provide secure and structured access to analytical data.

A data warehouse acts as a central platform for informed decision-making and strategic business insights.

Using Azure Data Lake Analytics

Azure Data Lake Analytics (ADLA) is a distributed analytics service built on Azure Data Lake Storage that enables you to process and analyze massive amounts of data efficiently. This video demonstrates how to create a Data Lake Analytics account, load sample data, and run a simple query.

Step 1: Access Data Lake Analytics

Go to the Azure Portal.
In the search bar, type Data Lake Analytics and select it from the search results.
Click Create Data Lake Analytics to start the setup.

Step 2: Create a Data Lake Analytics Account

Provide your subscription and resource group.
- A resource group groups related resources with the same lifecycle and permissions.
Specify service details:
- Service name (e.g., datalakeservicerks)
- Location (e.g., East US 2)
- Storage account: create a new Azure Data Lake Storage Gen1 account (e.g., gen1rks)
- Pricing plan: Pay-as-You-Go
Review and click Create to deploy the account.

Step 3: Load Sample Data

Open your Data Lake Analytics account after deployment.
Go to Sample Scripts.
Click Copy sample data to copy 50 MB of sample data into your Data Lake Storage Gen1 account.

Optional: Advanced U-SQL extensions (2.5 GB) can be installed for cognitive analytics and advanced machine learning.

Step 4: Run a Simple Query

In Sample Scripts, select Query a TSV file.
Review the script:
- Define schema.
- Extract data from the file.
- Output results.
Set AUs (Analytics Units) to control compute resources:
- More AUs = higher performance, higher cost.
- For small queries, 1 AU is sufficient.
Click Submit to run the query.

The query progress will show stages: Extract → Aggregation → Output. You can view the results in the specified output file.

Step 5: Explore Your Data

Use the Data Explorer in your Data Lake Analytics account to browse data.
Drill down to view the sample data loaded from your TSV files.

Summary

Azure Data Lake Analytics allows you to:

Create a scalable analytics environment over Azure Data Lake Storage.
Load and manage large datasets easily.
Execute distributed queries using U-SQL scripts.
Scale compute resources with Analytics Units for performance optimization.
Explore and visualize output data interactively.

This setup provides a fast and flexible way for developers and analysts to process big data in Azure.

Modern Data Warehousing

What is a Data Warehouse?

A Data Warehouse is a centralized environment where large volumes of data are stored in a format that is easily accessible for analysts and business users. Unlike transactional databases, data warehouses are optimized for analytics, reporting, and OLAP (Online Analytical Processing).

Modern Data Warehouses handle big data, which is characterized by the five V’s:

Velocity (speed of data generation),
Volume (amount of data),
Value (usefulness of data),
Variety (different types of data),
Veracity (accuracy and reliability).

They integrate data from multiple sources (structured and unstructured), such as:

Relational databases,
Images, videos, and IoT sensor data,
Event and streaming sources.

Key Features of Modern Data Warehouses

Supports both Batch and Stream processing
- Batch processing: Scheduled ETL jobs that run periodically (e.g., every 24 hours). Suitable for historical data.
- Stream processing: Real-time analytics with low latency (milliseconds to minutes). Essential for IoT devices, transactions, and live monitoring.
Tooling and Services
- Azure Data Factory – ETL/ELT orchestration
- Azure Data Lake Storage – Scalable storage
- Azure Databricks – Data engineering & machine learning
- Power BI – Data visualization and reporting

Data Warehouse Workloads

According to Gartner (2007), there are six common workload types:

Continuous data loading – Similar to OLTP, continuously updates indexes and aggregates.
Batch data loading – Periodic ingestion of non-real-time data.
Reporting systems – Optimized for pre-built reports with heavy indexing and tuning.
Business analytics – Enables tactical decision-making and multi-dimensional drill-downs.
Ad-hoc queries – Used by data miners; unpredictable and hard to optimize.
Analytic & BI functions – Advanced analysis and business intelligence tasks.

Types of Data Warehouses

Traditional Data Warehouse – Batch-oriented, daily ETL jobs, data is typically a day old.
Data Mart – Department-specific subset of the warehouse, smaller and more flexible.
Real-time Data Warehouse – Low latency, continuously updated, reflects live business state.
Historical Data Warehouse – Stores decades of data, used for compliance and long-term trend analysis.
Analytic OLTP – Attempts to combine transactional and analytical workloads (considered bad practice due to performance issues).

Performance Considerations

Mixed workloads (transactional vs. analytical) often cause bottlenecks.
Best practice: separate transactional databases from analytical data warehouses.
If workloads cannot be split, performance trade-offs may be necessary.
Optimizations include: indexing strategies, storage partitioning, and tuning for specific workloads.

Design Considerations

When building a Modern Data Warehouse, organizations must balance:

Long-term business strategy vs. current technical needs.
User requirements (reporting, analytics, self-service BI).
Infrastructure planning (hardware, storage, testing, training).
Scalability and performance optimization based on workloads.

Benefits of Modern Data Warehousing

Why Do We Need a Data Warehouse?

A Data Warehouse provides a permanent storage space for data required to support:

Reporting
Analysis
Business Intelligence (BI) functions

Without a warehouse, organizations risk making decisions based on limited or outdated data. By consolidating data from multiple sources, a Data Warehouse ensures consistent, accurate, and timely insights.

Key Applications:

Market segmentation – Better targeting of customer groups
Inventory management – Optimized stock levels and supply chain decisions
Financial management – Unified financial reporting across divisions
Sales insights – Analysis of trends and forecasting future opportunities

Historical Data Storage

Stores large volumes of historical data.
Enables analysis across different time periods.
Supports trend identification and future prediction.
Unlike transactional databases, data warehouses are optimized for reporting and analytics, not daily transactions.

Business Intelligence Benefits

Once populated, a Data Warehouse integrates into BI systems, unlocking powerful benefits:

Self-service reporting
- End users can create their own reports without IT bottlenecks.
- Increases consistency and accuracy of reports.
- Reduces operational costs by removing duplicate reporting systems.
Ad-hoc analysis
- Users can run flexible, on-demand queries.
- Enables deeper exploration of business problems.
Faster analysis
- BI tools no longer compete with transactional systems.
- Queries run more efficiently on optimized warehouse structures.

Advanced Analytics Capabilities

Data Mining – Identifies hidden patterns and predictive insights.
Dashboards – Centralized, real-time views of business health.
Drill-down analysis – Click into summary metrics to explore detailed records.
Dynamic visualization – Graphical, user-friendly presentations of data.

Regulatory and Compliance Support

Warehouses can retain data for legally required periods, unlike transactional systems.
Simplifies compliance reporting by keeping all historical data in one secure place.

Metadata & Documentation

Stores descriptions of data (metadata) alongside the warehouse.
Improves data transparency and helps users understand what metrics mean.
Makes report and dashboard creation simpler and more accurate.

Operational Process Support

Consolidates data from different systems across multiple divisions.
Enables a single view of enterprise-wide financials.
Supports operational alignment across varied tools and platforms.

Security & Controlled Access

Security is centralized in the Data Warehouse.
Instead of securing multiple disparate systems, organizations only secure the warehouse.
Provides role-based access control ensuring only authorized users can see sensitive data.

Analytical Tool Integration

Works seamlessly with modern analytics and visualization tools such as:
- Tableau
- Power BI
- Qlik
These tools perform best when extracting structured, curated data from a warehouse.

Summary of Benefits

Better decision-making through accurate, timely insights
Historical trend analysis for predictive forecasting
Faster and self-service reporting
Regulatory compliance and long-term storage
Secure and centralized data governance
Integration with BI and analytics tools

Modern Data Warehousing transforms data from a passive asset into an active driver of business strategy.

Modern Data Warehousing Architecture

Introduction

Modern data warehousing enables organizations to bring data together at any scale and deliver insights through:

Analytical dashboards
Operational reports
Advanced analytics for all users

By combining structured and unstructured data, businesses can unlock unified insights across the enterprise.

Key Components of Modern Data Warehousing

1. Data Ingestion

Azure Data Factory
- A hybrid data integration service.
- Allows you to Create, Schedule, and Orchestrate ETL/ELT workflows.
- Can run workflows multiple times a day or on-demand.
- Moves both structured and unstructured data into the data pipeline.

2. Data Storage

Azure Blob Storage
- Massively scalable object storage.
- Handles unstructured data such as images, audio, video, and documents.
- Ideal for distributed access, streaming, and log file storage.
- Acts as the staging area for raw ingested data.

3. Data Processing & Transformation

Azure Databricks
- Fast, collaborative analytics platform based on Apache Spark.
- Performs data cleansing, transformation, and scalable analytics.
- Supports two main environments:
  - SQL Analytics – optimized for querying and reporting.
  - Workspace – for collaborative data science and machine learning.
- Can run ad hoc queries directly on ingested data.

4. Data Warehousing

Azure Synapse Analytics
- A cloud data warehouse with decoupled compute and storage.
- Supports elastic scaling:
  - Scale compute up/down independently of storage.
  - Pause/resume compute capacity to optimize cost.
- Massively Parallel Processing (MPP) architecture distributes workloads across nodes.
- Two types of SQL pools:
  - Dedicated SQL Pool (Data Warehouse Units – DWUs define compute scale).
  - Serverless SQL Pool (auto-scales based on query needs).
- Provides a single repository by integrating structured and transformed data.

5. Data Modeling & Semantics

Azure Analysis Services
- Enterprise-grade analytics-as-a-service.
- Provides semantic data models for consistent reporting.
- Supports governance, deployment, testing, and delivery of BI solutions.
- Enables ad hoc analysis across cloud and on-premises data sources.

6. Data Visualization & Reporting

Power BI
- A suite of business analytics tools.
- Connects to hundreds of data sources.
- Provides ad hoc analysis, beautiful reports, and dashboards.
- Supports sharing insights across web, mobile, and enterprise systems.
- Enables self-service BI for thousands of users.

How the Architecture Works Together

Ingest structured and unstructured data via Azure Data Factory.
Store raw data in Azure Blob Storage.
Cleanse, transform, and analyze data at scale using Azure Databricks.
Load processed data into Azure Synapse Analytics for warehousing.
Define semantic models and enterprise BI solutions with Azure Analysis Services.
Present insights via Power BI dashboards, reports, and mobile apps.

Benefits of Modern Data Warehousing

Unified repository for structured + unstructured data.
Elastic scaling of compute and storage.
Optimized cost management (pause/resume compute).
Massively parallel processing for performance.
Enterprise-grade BI through Analysis Services + Power BI.
Self-service analytics enabling thousands of users to gain insights.

Summary

Modern Data Warehousing in Azure combines ingestion, storage, processing, analytics, and visualization into a scalable, cloud-first architecture. By integrating services such as Azure Data Factory, Blob Storage, Databricks, Synapse Analytics, Analysis Services, and Power BI, organizations can transform raw data into actionable intelligence for decision-making.

Azure Data Factory Overview

What is Azure Data Factory?

Azure Data Factory (ADF) is a fully managed cloud-based data integration and migration service. It enables users to bring together data from:

On-premises sources: e.g., Microsoft SQL Server.
Cloud sources: e.g., Azure SQL Database, Azure Blob Storage, and Azure Table Storage.

ADF orchestrates and operationalizes processes that refine raw data into actionable insights. It supports:

Hybrid ETL (Extract, Transform, Load) jobs.
ELT (Extract, Load, Transform) jobs.
Data integration projects at scale.

How Azure Data Factory Works

ADF automates and orchestrates the end-to-end data integration process through pipelines that provide a “single pane of glass” for managing ETL workflows.

Data Sources

On-premises: Physical servers, SQL Server.
Cloud: Azure SQL, Blob Storage, Table Storage.
Structured: SQL databases, Excel, OLTP systems.
Unstructured: NoSQL databases, documents.
Semi-structured: XML, emails.

Data Processing and Transformation

Mapping Data Flows
- Visual, no-code data transformation logic.
- Designed for users who prefer drag-and-drop interface.
Code-based Transformations
- External compute services like HDInsight (Hadoop/Spark clusters) can be used.
- Suitable for big data frameworks and complex transformations.
Azure Data Lake Analytics
- Serverless service for parallel data processing without needing infrastructure setup.
Machine Learning
- Data can be fed into ML pipelines for predictions and trend analysis.

Monitoring and Management

ADF offers built-in monitoring features:

Azure Monitor: Centralized logs, visibility, and alerts.
APIs: Check pipeline status programmatically.
Azure PowerShell: Command-line management.
Azure Portal Health Panels: Graphical interface for pipeline health dashboards.

Key Components of Azure Data Factory

Activities
- A single step in a pipeline (e.g., copy activity to move data).
Data Flows
- Visual logic for scalable transformations.
- Can be reused as part of multiple pipelines.
Datasets
- Represent data structures within data stores.
- Reference input/output data used in activities.
Integration Runtimes (IR)
- Compute infrastructure for executing pipelines.
- Supports data movement across private/public networks.
Linked Services
- Define connection information for external resources (like connection strings).
- Example:
  - Linked Service = Azure Storage connection string.
  - Dataset = Specific blob container/folder.
Pipelines
- Logical grouping of activities to perform a unit of work.
- Example pipeline:
  - Copy data from Azure Blob.
  - Run a Hive query on HDInsight to partition data.
- Activities can run sequentially or in parallel.

Why Azure Data Factory?

Enables seamless integration of diverse data sources.
Provides both no-code and code-based options.
Scales to handle enterprise-level big data scenarios.
Offers flexible monitoring and hybrid connectivity (on-prem + cloud).

Azure Data Lake Storage Generation 2 (ADLS Gen2)

Introduction

Azure Data Lake Storage Generation 2 (ADLS Gen2) is Microsoft’s latest version of cloud-based big data storage. It is built on top of Azure Blob Storage and enhances performance, management, and security for analytics workloads.

Performance: Optimized for analytics without needing to copy/transfer data.
Management: Supports directories and subdirectories for easier organization.
Security: Enforces POSIX-compliant permissions at directory or file level.

ADLS Gen2 is designed for big data analytics at scale, combining low cost, scalability, and integration with Azure services.

Key Features of ADLS Gen2

1. Performance Enhancements

Hierarchical Namespace Unlike Blob Storage’s flat namespace, ADLS Gen2 supports directories and subdirectories, which:
- Greatly improve performance of directory management operations.
- Reduce the need to duplicate/move data before analysis.
- Increase overall job performance.
Optimized Driver
- Uses ABFS (Azure Blob File System) driver.
- Optimized specifically for big data workloads.
- REST APIs are exposed through a dedicated endpoint.
Reduced Compute Costs
- Improved performance means less computational power required, reducing costs.

2. Cost-Effectiveness

Built on Azure Blob Storage, making it low-cost for both:
- Capacity (storage size).
- Transactions (read/write operations).
Flexible compute models:
- On-demand clusters.
- Pay-per-job execution model.
No need to move or transform data before analysis, unlike some other cloud providers.

3. Security

Supports Access Control Lists (ACLs).
Enforces POSIX-style permissions (read/write/execute) with added granularity specific to ADLS Gen2.
Provides fine-grained security for directories and individual files.

4. Hadoop & Open-Source Compatibility

Provides Hadoop-compatible access, allowing data to be managed just like in HDFS (Hadoop Distributed File System).
Supported by open-source big data frameworks:
- Hadoop
- Spark
- Other analytics engines that rely on HDFS APIs.

5. Scalability

Designed to handle exabytes of data.
High throughput:
- Gigabytes per second of throughput.
- High IOPS (Input/Output Operations per Second).
- Near-constant request latency across service, account, and file levels.

Data Lake Entities (Conceptual Mapping with Blob Storage)

Since ADLS Gen2 is built on Blob Storage, some concepts overlap:

Concept	Blob Storage Equivalent	ADLS Gen2 Equivalent
Top-level organization	Container	Container
Lower-level org	Virtual Directory	Directory
Data container	Blob	File
Storage purpose	General-purpose storage	Analytics storage

Supported Features from Blob Storage

ADLS Gen2 inherits many features from Blob Storage, such as:

Diagnostic logging
Access tiers (hot, cool, archive for cost optimization)
Lifecycle management (automated rules for data retention and movement)

Azure Integration

Works seamlessly with Azure services for:
- Data ingestion.
- Analytics pipelines.
- Visualization and reporting.
Supports open-source platforms like Hadoop and Spark.

Why Use ADLS Gen2?

Cost-effective storage built on Blob architecture.
Enhanced performance with hierarchical namespaces.
POSIX-based security for fine-grained access control.
Seamless compatibility with Azure ecosystem and open-source tools.
Highly scalable for enterprise-level big data workloads.

Azure Databricks

Overview

Azure Databricks is a cloud-based data engineering and analytics platform built on Apache Spark. It is designed for processing and transforming massive quantities of data, supporting machine learning workflows, and enabling real-time or batch analytics pipelines. The platform is tightly integrated with other Azure services and offers enterprise-grade security.

Azure Databricks SQL Analytics

SQL Analytics provides a user-friendly platform for data analysts to query data lakes using SQL.

Key Features:

Run ad hoc SQL queries on data lakes.
Create multiple visualization types (graphs, charts, dashboards).
Combine visualizations and text into interactive dashboards.
Configure alerts that notify when query results cross defined thresholds.
Use fully managed SQL endpoints that scale based on query latency and concurrent user load.
Enterprise-grade Azure security:
- Integration with Azure Active Directory (AAD).
- Role-based access controls for queries, dashboards, endpoints, and alerts.
- Enterprise SLAs for reliability and protection.
All communication occurs within the Azure backbone network, not over the public internet.

Azure Databricks Workspace

The workspace is an interactive environment designed for collaboration between data engineers, data scientists, and machine learning engineers.

Key Capabilities:

Integration with Azure for one-click setup and streamlined workflows.
Data ingestion from Azure Data Factory (batch or near real-time streaming).
Storage in Azure Blob Storage or Data Lake Store.
Exploration and transformation of data using Spark-based tools.
Collaboration through shared notebooks, visualization, and reporting.

Apache Spark in Databricks

Databricks builds on Apache Spark, offering a unified analytics engine with distributed processing.

Core Components in Spark (available in Databricks):

MLlib: A library of machine learning algorithms (classification, regression, clustering).
GraphX: For graph analytics and computations.
Spark Core API: Supports SQL, Python, Java.
Spark SQL and DataFrames:
- Structured data processing.
- DataFrames act like distributed tables (similar to R or Pandas DataFrames).
Streaming: Real-time data processing integrated with sources like Kafka, Flume, and HDFS.

Databricks enhances Spark with:

Fully managed Spark clusters (no infrastructure setup required).
Interactive workspace for exploration, visualization, and teamwork.
Zero management cloud platform with automatic scaling and updates.

Apache Spark Clusters in Azure Databricks

Clusters in Databricks provide scalable computing for big data.

Features:

Rapid cluster creation in seconds.
Autoscaling clusters up and down based on demand.
REST API support for programmatic management.
Secure data integration to unify data without centralization.
Instant access to new Spark features with each release.

Azure Databricks Runtime

The Databricks Runtime is built on Apache Spark but optimized for the Azure cloud.

Advantages:

Abstracts away infrastructure complexity.
Optimized Spark engine with performance improvements in I/O and processing layers.
High-speed performance suitable for production jobs.

Collaborative Environment

The Databricks workspace enhances collaboration and productivity.

Capabilities:

Exploration and prototyping of data workflows.
Use notebooks to document processes and run code interactively.
Visualization tools (e.g., Matplotlib) and interactive dashboards.
Combine analysis, documentation, and visualization in one environment.

Enterprise Security

Security is built into Databricks through Azure’s enterprise-grade services.

Azure Active Directory (AAD) for identity management.
Role-Based Access Control (RBAC) for fine-grained permissions (clusters, jobs, notebooks, data).
Enterprise-grade SLAs for reliability and data protection.

Summary

Azure Databricks is a powerful, collaborative, and secure analytics environment. It combines the scalability of Apache Spark with Azure’s ecosystem and security features, making it suitable for:

Big data pipelines.
Machine learning workflows.
Real-time or batch analytics.
Enterprise-scale collaborative data projects.

Azure Analysis Services

Overview

Azure Analysis Services is a fully managed Platform as a Service (PaaS) that provides enterprise-grade data models in the cloud. These models make it easier and faster for users to perform ad hoc data analysis using tools such as Power BI and Excel.

It supports advanced features including:

Combining data from multiple sources.
Defining metrics in a single semantic data model.
Applying role-based security controls.

The service integrates deeply with Azure, including Azure Active Directory, Azure Data Factory, Azure Automation, and Azure Functions.

Service Tiers

Azure Analysis Services is available in three tiers, each with different performance and feature sets:

Developer Tier
- Intended for evaluation, development, and testing.
- Includes the same functionality as the Standard tier.
- Limited in query processing units (QPUs) and memory.
- Cannot downgrade after moving up to higher tiers.
Basic Tier
- Suitable for smaller production solutions.
- Works with smaller tabular models.
- Limited concurrency and refresh capabilities.
- Does not support scale-out features.
Standard Tier
- For mission-critical production systems.
- Supports elastic user concurrency and rapidly growing data models.
- Provides advanced data refresh for near real-time updates.
- Supports all tabular modeling features.

Scaling and Performance

Scale-up/scale-down: Adjust resources as needed.
Pause/Resume: Suspend servers when not in use to save cost.
Scale-out: Use multiple query replicas for faster query responses and higher availability. Replicas synchronize copies of the tabular model across regions.

⚠️ Availability of features and replicas may depend on the region selected during deployment.

Compatibility with SQL Server Analysis Services

Azure Analysis Services is based on SQL Server Analysis Services and supports:

Tabular models
Partitioning (enables incremental loads, parallelization, and reduces memory consumption)
DirectQuery mode (delegates queries directly to the relational backend, supporting very large datasets).

Supported sources include:

SQL Server Data Warehouse
Azure SQL Database
Synapse Analytics
Oracle and others

Security Features

Azure Analysis Services provides multiple layers of security:

Infrastructure-Level

DDoS protection by default.
Firewall rules to block unauthorized IPs.

Authentication & Authorization

Integration with Azure Active Directory for organizational login.
Role-based access control (RBAC) for datasets.
Server administrator roles restricted to admins only.

Encryption

Data encryption at rest in Azure.

Data Model-Level Security

Row-Level Security (RLS): Defined with DAX expressions to filter rows per user.
Object-Level Security (OLS): Restrict access at the table or column level (supported at 1400+ compatibility).

Service Principals

Azure AD applications that allow automated or service-level operations without needing user credentials.

Governance

Azure Analysis Services complies with:

Microsoft Online Services Terms
Microsoft Privacy Statement

This ensures enterprise-grade compliance and governance.

Tools for Interaction

You can use familiar tools and frameworks to work with Azure Analysis Services:

Tabular Models (TOM)
- Exposed via JSON and Tabular Model Scripting Language (TMSL).
- Used for scripting and customization.
SQL Server Management Studio (SSMS)
- Manage servers and databases directly.
- Run scripts, monitor queries, and administer models.
Open Source Tools
- Supported by a large developer community.
Visual Studio
- Develop and deploy analysis models using Analysis Services projects.
PowerShell
- Automate tasks such as creating servers, resuming/pausing operations, or changing service tiers.

Key Takeaways

Azure Analysis Services enables enterprise-grade analytics in the cloud with high scalability and deep Azure integration.
Choose the correct tier based on workload: Developer (testing), Basic (small production), or Standard (mission-critical).
Supports scaling, partitioning, DirectQuery, and security at multiple levels.
Integrates seamlessly with Power BI, Excel, and Azure services, making it a flexible analytics solution.
Provides familiar tools (SSMS, Visual Studio, PowerShell) and strong governance/security for enterprise use.

Azure HDInsight

Overview

Azure HDInsight is a managed, full-spectrum, open-source analytics service designed to provide fast, easy, and cost-effective processing of massive amounts of data. It supports popular open-source frameworks such as:

Hadoop
Spark
Hive & LLAP
Kafka
Storm
HBase
R
And more

These frameworks enable a wide range of scenarios, including ETL (Extract, Transform, Load), data warehousing, machine learning, and IoT.

What is Big Data?

Big data is characterized by:

Volume: Massive amounts of data
Velocity: High-speed data ingestion
Variety: Many formats (structured, semi-structured, unstructured)

It can include historical stored data or real-time streaming data, enabling insights at scale.

Benefits of Azure HDInsight

Cloud-native: Optimized clusters for Hadoop, Spark, LLAP, Kafka, Storm, HBase, and ML.
Extensibility: Extend clusters with tools like Hue, Presto, or certified applications.
Global availability: Available in more regions than competitors, including Azure Government, China, and Germany.
Cost efficiency & scalability: Scale up/down and pay only for what you use.
Monitoring: Integrated with Azure Monitor Logs.
Productivity: Rich developer tools for Hadoop/Spark in Visual Studio, Eclipse, IntelliJ, with support for Scala, Python, R, Java, and .NET.
Security & compliance: Integration with Azure AD, VNET, encryption.

Use Cases

Batch Processing (ETL) Extract, transform, and load structured/unstructured data for downstream analytics.
Data Science Build apps that extract insights, integrate with Azure ML for predictive analytics.
Data Warehousing Perform interactive queries at petabyte scale.
Hybrid Environments Extend on-premises big data systems into Azure for advanced analytics.
IoT Real-time data ingestion and analysis from thousands of IoT devices.

Supported Cluster Types

Apache Hadoop: HDFS + YARN + MapReduce for batch parallel processing.
Apache Spark: In-memory distributed processing for faster analytics.
Apache HBase: NoSQL database for unstructured/semi-structured data.
Machine Learning Services: Distributed R processes for scalable analytics.
Apache Storm: Real-time computation on data streams.
Interactive Query (LLAP/Hive): In-memory caching for faster Hive queries.
Apache Kafka: Stream processing and message queuing.

Technology Foundation

Built on the Hortonworks Data Platform (HDP), leveraging Hadoop, Spark, Hive, Kafka, Storm, HBase, and LLAP.
Supports distributed data storage and large-scale processing.

Programming Language Support

Default: Java, Python, .NET, Go
JVM-based: Clojure, Jython, Scala
Hadoop-specific: Pig Latin, HiveQL, Spark SQL
Extendable: Custom libraries and modules via script actions

Development Tools

IDEs: IntelliJ, Eclipse, Visual Studio, VS Code
Azure Toolkits: for IntelliJ, Eclipse, VS Code, Visual Studio
BI Integration: Excel with Power Query or Microsoft Hive ODBC Driver

Data Residency

Do not store data: Spark, Hadoop, LLAP, Storm, ML Services
Store customer data: Kafka, HBase (within a single region)

This ensures in-region compliance with requirements such as those in the Azure Trust Center.

Azure Distributed Data Engineering Toolkit (AZTK)

Overview

The Azure Distributed Data Engineering Toolkit (AZTK) is a client-side tool for provisioning on-demand Spark clusters on Docker environments. It is not a server-side Azure service. AZTK provides full control over your infrastructure for Spark deployments.

Key features include:

Command-line interface (CLI) and Python SDK for managing clusters.
Built on Azure Batch for job scheduling and VM provisioning.
Ability to bring your own Docker image, making setup reproducible.
Cost-efficient: pay only for the cores you consume, with low-priority VMs providing up to an 80% discount.
Mixed-mode clusters: supports Azure Blob Storage and Azure Data Lake.

Creating and Using Spark Clusters

Cluster Creation

To create a Spark cluster:

aztk spark cluster create --id <spark_cluster_id> --size <number_of_nodes> --vm-size <vm_type>

spark_cluster_id: unique identifier for the cluster
size: number of nodes
vm-size: type of virtual machine

Submitting Jobs

Submit jobs to a cluster:

aztk spark cluster submit --id <spark_cluster_id> --name <job_name> --app <app_file> --app-parameters <params>

Specify Python files, JARs, or other application artifacts.
Pass parameters to your Spark job as needed.

Interactive Mode

Work interactively with Spark:

aztk spark cluster ssh --id <spark_cluster_id>

Default port forwards:
- Spark Web UI → localhost:8080
- Jupyter → localhost:8888
Ports configurable via thunderbolt/ssh.yaml.

Managing Clusters

Get cluster details:

aztk spark cluster get --id <spark_cluster_id>

Delete a cluster:

aztk spark cluster delete --id <spark_cluster_id>

Summary

AZTK is a flexible and cost-effective toolkit for client-managed Spark deployments:

Provides full infrastructure control on Docker and Azure Batch.
Supports interactive workflows, batch jobs, and reproducible environments.
Ideal for data engineers who need scalable, on-demand Spark clusters without relying on a managed service.

A data lake is where data is stored, but without the structure or query optimization of a data warehouse. It will likely contain a high volume of data as well as a variety of data types. for example, a single data lake might contain a collection of blog posts, stored as text files, flat file extracts from a relational database, and JSON objects containing events generated by sensors in an industrial system, it can even store structured data like a standard database, though it's not optimized for querying such data in the interest reporting and analysis. There is a place for both data warehouses and data lakes in the same data ecosystem and data pipelines often move data between both