Microsoft Azure Data Factory

Azure Data Factory is Azure’s ETL cloud service for serverless data integration and scale-out data transformation. The service provides a code-free user interface for intuitive creation, monitoring, and management from a single console. You can also lift and shift existing SSIS packages to Azure and run them in ADF with full compatibility.

Microsoft Solutions Badge Color areto partner

What Is Azure Data Factory?

In times of big data, disorganized raw data is often stored in relational, non-relational, and other storage systems. On its own, however, the raw data lacks context or the necessary significance to be used meaningfully by analysts, data specialists or decision-makers in companies.

Big data requires a process orchestration and operationalization service that transforms these vast amounts of raw data into actionable business insights.

Azure Data Factory is a dedicated managed cloud service for these complex hybrid projects with ETL (extract, transform, and load), ELT (extract, load, and transform), and data integration. 

The Azure Data Factory platform is the cloud-based ETL and data integration service that enables users to create data-driven workflows to orchestrate data movements and transformations on demand. With Azure Data Factory, users can create and schedule data-driven workflows (called pipelines) that collect data from different data stores. Users can create complex ETL processes that visually transform data using data flows or compute services such as Azure HDInsight Hadoop, Azure Databricks, and Azure SQL Database. 

Users can also publish your transformed data to data stores, such as  Azure Synapse Analytics, for use by business intelligence (BI) applications. Through Azure Data Factory, raw data can ultimately be organized into meaningful data stores and data lakes and used to make better business decisions. 

How Does Azure Data Factory Work?

Data Factory includes a set of connected systems that form a comprehensive end-to-end platform for data engineers. This visual guide provides a comprehensive overview of the architecture of Data Factory

Azure Data Factory Accelerates Data Transformation

Azure Data Factory Datentransformation areto Microsoft Partner 2

Data Factory provides a code-free data integration and transformation layer that supports all digital transformation initiatives.

  • With Azure Data Factory, users are empowered via a no programming experience and data engineers to drive business processes and IT-driven analytics/business intelligence.
  • In Azure Data Factory, data is prepared, ETL created and processed, orchestrated and monitored as are pipelines without any code. The Apache Spark™ managed service is responsible for code generation and maintenance.
  • Accelerate the transformation that automates copy activities with Azure Data Factory with intelligent target-based mapping.

Modernize SSIS with Azure Data Factory

Azure Data Factory helps organizations modernize SSIS.

  • With the Azure Hybrid benefit, achieve cost savings of up to 88 percent.
  • With Azure Data Factory, use the only fully compatible service that allows users to easily move all SSIS packages to the cloud.
  • The Deployment Wizard and detailed documentation with step-by-step instructions make migration easy.
  • By leveraging Azure Data Factory, realize a vision of hybrid big data and data warehousing initiatives by combining them with data pipelines in the Data Factory cloud.
Azure Data Factory SSIS Modernisierung areto Microsoft Partner 2

Azure Data Factory - Connectors

ms azure data factory areto MS Partner 2

Data collection from several different sources can be costly, time-consuming, and sometimes require multiple solutions. Azure Data Factory provides a single pay-as-you-go service. Here are the following options: 

  • Choose from more than 90 built-in connectors to collectdata from big data sources such as Amazon Redshift, Google BigQuery, HDFS, enterprise data warehouses  such as Oracle Exadata, Teradata, SaaS applications such as Salesforce, Marketo and ServiceNow, and all Azure data services. 
  • Leverage the full capacity of underlying network bandwidth with throughput of up to 5 GB/s

Azure Synapse Analytics and Azure Data Factory

With Azure Data Factory, collect data from on-premises, hybrid, and multicloud sources. The next step: Transform them into Azure Synapse Analytics.

  • Integrate data in the familiar Data Factory interface within Azure Synapse pipelines.
  • Transform and analyze data with data flows in Azure Synapse Studio without any programming.
azure synapse architecture Microsoft Partner areto 2 eng

Azure Data Factory - Global Cloud Centers

azure data factory security portal areto Microsoft Partner 2
  • Access Azure Data Factory in more than 25 regions worldwide to ensure data compliance, efficiency, and low costs for outbound network traffic.
  • Azure Data Factory has been certified to HIPAA, HITECH, ISO/IEC 27001, ISO/IEC 27018 and CSA STAR.
  • Using a managed identity and a service principal, you securely connect to Azure Data Services.
  • Store credentials with Azure Key Vault. A managed virtual network provides an isolated and highly secure environment to run data integration pipelines

The Benefits of Azure Data Factory

User-friendly

With Azure Data Factory, users can rehost SQL Server Integration Services (SSIS) with a few clicks, and create code-free ETL/ELT pipelines with built-in Git and CI/CD support.

Cost-effective

With Azure Data Factory, use a fully managed, serverless cloud service that scales on demand and bills on a pay-as-you-go basis.

Powerful

Azure Data Factory provides more than 90 built-in connectors to capture all on-premises and software-as-a-service (SaaS) data. Leverage on-demand orchestration and monitoring.

Intelligent

Azure Data Factory provides autonomous ETL to increase operational efficiency and support integrators without programming experience.

areto’s Microsoft Azure Reference Architecture

areto’s reference architecture developed offers many advantages.

The use of areto’s reference architecture provides customers with architectural best practices for the development and operation of reliable, secure, efficient and cost-effective systems in the cloud. Areto’s architectural solutions are consistently measured against Microsoft best practices in order to deliver the highest benefit to customers.

The areto reference architecture is based on five pillars: operational excellence, safety, reliability, performance efficiency, cost optimization.

Operational Excellence
Optimal design of operation and monitoring of the systems as well as continuous improvement of supporting processes and procedures

Security
Protection of information, systems, assets, risk assessments and risk mitigation strategies 

Cost optimization 
Maximizing ROI through the continuous process of improving the system throughout its lifecycle.  

Reliability
Ensure security, disaster recovery, business continuity as data is mirrored in multiple redundant locations. 

Performance efficiency
Efficient use of computer resources, scalability to meet short-term requirement peaks, sustainability

Why Microsoft ?

2021 CIPS MQ 2

Gartner, Magic Quadrant for Cloud Infrastructure & Platform Services, Raj Bala, Bob Gill, Dennis Smith, Kevin Ji, David Wright, 27 July 2021. Gartner and Magic Quadrant are registered trademarks of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved. Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

gartner mq for cloud ai developers scaled 2

This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from AWS. Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

With the Microsoft expert team from areto to the data driven company!

Overtake the competition by making faster and better decisions!

Find out where your company currently stands on the way to becoming a data-driven company.
We analyze the status quo and show you what potential exists.
How do you want to get started?

Free consultation & demo appointments

Do you already have a strategy for your future Micrsoft Data Analytics solution? Are you already taking advantage of modern cloud platforms and automation? We would be happy to show you examples of how our customers are already using areto’s agile and scalable Microsoft solutions.

Workshops / Coachings

In our Microsoft workshops and coaching sessions, you will gain the necessary know-how, e.g. for setting up a modern cloud strategy or IBCS-compliant reporting with Power BI . The areto Microsoft TrainingCenter offers a wide range of learning content.

Proof of Concepts

Is Azure right for us? Are the framework conditions suitable for it? What prerequisites need to be created? Proof of concepts (POCs) answer these and other questions. This way, you are well prepared for your project.

Microsoft Azure Data Factory Know-How Video Library

Azure Data Factory and Customer Churn Story

Azure Data Factory Stringify data flow transformation

How to delete missing source rows from your target database using data flows

Azure Data Factory Data Flows for SQL Developers

Data Wrangling in Microsoft Azure Data Factory

PaaSport to Paradise: Azure SQL Database + SSIS in Azure Data Factory

Leverage your data. Discover opportunities. Gain new insights.

We look forward to speaking with you!

Florian Grell areto

Short introduction to Azure Data Factory

Establishing a connection / collecting data

In companies, different types of data are stored in different sources (local, in the cloud, structured, unstructured as well as partially structured), usually all of them arrive at different intervals, at different speeds.

The first step in creating an information system for production involves connecting to all the necessary data as well as processing sources, e.g. SaaS (Software-as-a-Service) services, databases, file shares and FTP web services. The next step involves moving the data to a central location for further processing. Without Data Factory, organizations must create custom components for moving data or write custom services to integrate these data sources and processing. Integrating or managing these systems is expensive and time-consuming. Often, organizations lack appropriate monitoring , alerting, and control capabilities of a fully managed service.

With Data Factory, you can leverage copy activity in a data pipeline to move data from both on-premises and cloud-based source data stores to a central data store in the cloud for further analysis. For example, you can collect data in Azure Data Lake Storage and later transform it using an Azure Data Lake Analytics compute service. You can also collect data in Azure Blob Storage and transform it later using Azure HDInsight Hadoop clusters.

Transform / Extend

When data exists in a centralized data store in the cloud, you can process or transform the collected data with ADF mapping data flows. With data flows, data engineers can create as well as manage graphs for data transformation running under Spark without having to be familiar with Spark clusters or Spark programming.

If you prefer manual coding of transformations: ADF supports external activities to run your transformations with compute services, e.g. HDInsight Hadoop, Spark, Data Lake Analytics, Machine Learning.

CI/CD and publishing

Data Factory has full CI/CD support for your data pipelines via Azure DevOps and GitHub. This allows you to incrementally develop and deploy your ETL processes before publishing the finished product. After the raw data is in an enterprise-usable format, load it with Azure Data Warehouse, Azure SQL Database, Azure CosmosDB, or another analytics engine that your users can reference in their business intelligence tools.

Monitor

After you have successfully created and deployed your data integration pipeline to get business value from the optimized data, you can monitor the planned activities as well as pipelines for success and failure rates. Azure Data Factory provides built-in support for pipeline monitoring via Azure Monitor, API, PowerShell, Azure Monitor logs, and integrity buckets in the Azure portal.

General concepts

An Azure subscription can have at least one Azure Data Factory instance (or Data Factory). Azure Data Factory consists of the following main components:

  • Pipelines
  • Activities
  • Datasets
  • Linked services
  • Data flows
  • Integration Runtimes

Together, they provide the platform on which you can assemble data-driven workflows with steps for moving as well as transforming data.