Cloud usage continues to grow as agile development, rapid deployment, and unlimited scale become a vital competitive factor for businesses. areto helps build an optimal cloud architecture and select from over 200 AWS cloud services, such as AWS Lambda, AWS Console, AWS S3, AWS Kinesis, AWS Redshift. We identify and develop use cases that help organizations respond faster and reduce their costs.

Amazon Web Services
(AWS)

What is AWS (Amazon Web Services)?

Amazon Web Services (AWS) is the world’s most comprehensive cloud platform with more than 200 services. Extensive features, globally distributed data centers, and high reliability make AWS the market leader. Our customers rely on AWS to become more agile, reduce costs, and innovate faster.

AWS - The Way to the Cloud

The path to the cloud is an individual challenge for every company. Success depends (as always) on careful preparation: an accurate, realistic inventory of the organization’s current state, a common understanding of the go and the path required to achieve this goal. With this knowledge, organizations can set goals and create workflows that enable employees, and therefore the company, to succeed in the cloud.

areto’s AWS strategy workshops involve all relevant stakeholders. Companies can leverage these workshops to start or complement their successful “digital journeys”. In the first phase of the workshop, we examine the awareness of what is possible in relation to AWS. We higlhight the Analytics capabilities, examine existing processes, define necessary work streams, and identify dependencies between these work streams. This allows us to optimize the structure of the AWS architecture as well as data optimization. The areto AWS workshops show you which organizational skills you need to update, how to change existing processes and how to introduce new processes.

AWS Well-Architected Framework

areto follows AWS’s Well-Architected Framework.

By using the AWS Well-Architected Framework, areto uses architectural best practices to develop and operate reliable, secure, efficient, and cost-effective systems in the cloud. Areto’s architectures are consistently measured against AWS best practices to deliver the highest value to customers.

The AWS Well-Architected Framework is based on five pillars: operational excellence, security, reliability, performance efficiency, and cost optimization.

Operational excellence
optimal design of operation and monitoring of the systems as well as continuous improvement of supporting processes and procedures 

Security
protection of information, systems, assets, risk assessments and risk mitigation strategies

Cost optimization
maximizing ROI through the continuous process of improving the system throughout its lifecycle

Reliability
ensure security, disaster recovery, business continuity as data is mirrored in multiple redundant locations

Performance efficiency
efficient use of computer resources, scalability to meet short-term requirement peaks, and sustainability

AWS - Architecture Best Practices

For analytics workloads and environments, AWS offers several core components that areto uses to develop robust architectures for customers’ analytics applications. By means of Use Cases and AWS architecture blueprints, we ideally use the AWS services that have proven value in our individual customer projects.

areto’s approach of building AWS architectures in conceptual layers enables us to design individually suitable access controls, pipelines, ELT flows (extract, load, transform) for the data integration. In the following, we use our AWS standard architectures to show the areto procedure for setting up an analysis workload with and on AWS Services.

Data Ingestion Layer

The data collection layer is responsible for ingesting data into the central storage for analysis, i.e. a data lake. It consists of services that help to use records in batch or real-time streaming mode from external sources, such as website clickstreams, database event streams, social media feeds, local or cloud-native data stores. AWS provides services for collecting real-time data, as well as features for securely loading and analyzing streaming data.  AWS also provides services for streaming data to Amazon Simple Storage Service (Amazon S3) for long-term storage. Or Amazon Managed Streaming for Apache Kafka (MSK) can be used. This is a fully managed service that allows users to run highly available, secure Apache Kafka clusters to process streaming data without having to modify their existing code base.

AWS Database Migration Services (DMS) allows users to replicate and ingest existing databases while ability to keep the source databases fully functional. The service supports multiple database sources or destinations, including writing data directly to Amazon S3.

AWS Streaming Data Analytics Reference Architecture areto consulting 1

Streaming Data Analytics Reference Architecture; Source AWS

AWS Identity and Access Management IAM areto consulting 1
AWS Lake Formation areto 1

Data Access and Security Layer

A Data Access and Security layer allows access to and protects data assets at the same time to ensure that all data is stored securely so that only authorized people have access. This layer allows:

  • Secure data access to the central data repository (i.e., the data lake)
  • Secure access to the central data catalog
  • Fine-grained access control to Data Catalog databases, tables, and columns
  • Encryption of data sets

AWS Identity and Access Management (IAM) securely manages access to AWS services and resources. With IAM, you can create and manage AWS users and groups, and use permissions to allow or deny their access to AWS resources.

With AWS CloudTrail, organizations can log, continuously monitor, and retain account activity related to data access actions of users and roles in their AWS infrastructure.

AWS Lake Formation is an integrated data lake service that makes it easy to cleanse, catalog, transform, secure, and make data available for analytics or machine learning (ML). Lake Formation provides its own permissions model that extends the AWS IAM permissions model to configure data access and security policies for data lakes and to audit and control access from AWS analytics and ML services. This centrally defined authorization model enables fine-grained access to data stored in data lakes through a simple grant/revocation mechanism.

Data Catalog and Search layer

The Catalog and Search Layer manages the discovery and cataloging of the metadata of the analytics workload’s data assets. This layer also provides search capabilities as the data sets increase in size. Scenarios in which users want to find a table based on individual defined criteria and extract subsets of data are quite common in analysis applications.

AWS Glue is a fully managed extract, transform, load (ETL) service that is often used in projects to prepare and load data for analysis. With AWS Glue, customers can reference their data stored in AWS. AWS Glue discovers this data and stores associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once this data is cataloged, it is immediately searchable, queryable and available for ETL.

With Amazon Elasticsearch Service, fully managed Elasticsearch clusters can be deployed to the AWS Cloud to explore data assets.

AWS Glue 1

Streaming Data Analytics Reference Architecture; Source AWS

Central Storage Layer

The central Storage Layer – Data Lake. It supports the storage of all types (unstructured, semi-structured and structured) of data and makes it available to applications. As the amount of data grows over time, this layer should elastically scale in a secure but cost-saving manner. 

When processing data, intermediate statuses are stored. This avoids unnecessary duplication of work in the processing pipeline. In addition, users are enabled to use these intermediate statuses. Depending on the application, all data can be updated frequently and stored temporarily or long-term. 

Amazon S3 provides an optimal foundation for centralized storage with virtually unlimited scalability, native encryption, and access control capabilities.  As users’ data storage requirements increase over time, selected data can be transferred through lifecycle policies to lower-cost tiers such as S3 Infrequent Access or Amazon S3 Glacier. This reduces storage costs, but at the same time preserves original raw data. In our projects, we also use S3 Intelligent-Tiering, which automatically optimizes storage costs as data access patterns change. This is done without affecting performance or operational expenses. 

1200px Amazon S3 Logo svg 1

Amazon S3 makes it easy to build a multi-tenant environment where many users can apply their own data analysis tools to a common data set. This reduces costs while improving data management compared to traditional solutions, which usually require multiple, distributed copies of the data. To provide easy access, Amazon S3 provides RESTful APIs.

With Amazon S3, your data lake can decouple storage from computing power and data processing. In traditional Hadoop or data warehouse solutions, storage and computing power are tightly coupled, preventing costs from being reduced. Optimizing data processing processes is also made more difficult. With Amazon S3, you can store all data types in their native formats and use as many or as few virtual servers for data processing as you want. Users can also integrate serverless solutions such as AWS Lambda, Amazon Athena, Amazon Redshift Spectrum, Amazon Rekognition, AWS Glue, which allow them to process data without provisioning or managing servers.

data analytics mit AWS areto 1

In the Processing and Analytics Layer the services for querying and processing (i.e. cleansing, validation, transformation, enrichment, normalization) of the records are found.

On the right you will find examples of services we use in standard architectures. There are several other services that can also be used for processing or analysis, including Amazon Kinesis, Amazon RDS, Apache Kafka, AWS Glue.

Processing and Analytics Layer

Amazon Redshift is a fully managed data warehouse that simplifies the analysis of data using standard SQL and existing business intelligence (BI) tools. Redshift Spectrum is a feature of Amazon Redshift that allows you to run queries against exabytes of unstructured data in Amazon S3 without the need for a load or ETL.

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, meaning there is no need to manage infrastructure, users only pay for executed queries. Athena integrates with AWS Glue Data Catalog, allowing users to create a unified metadata repository for different services, crawl data sources to discover schemas, populate the Data Catalog with new or modified table/partition definitions, and maintain schema versioning.

With Amazon Neptune, fast, reliable, fully managed graph database are created. Graph databases make it easier to create or run applications that work with highly linked records.

Amazon SageMaker is a fully managed machine learning platform that enables developers and data scientists to build, train, and deploy machine learning models at any scale quickly and easily.

User Access and Interface Layer

The User Access to the Interface Layer provides secure user access and an administrative interface for managing users.

With AWS Lambda, you can run serverless applications on a managed platform that supports microservices architectures, deployment, and management of functional-level execution.

Amazon API Gateway enables organizations to run a fully managed REST API that integrates with AWS Lambda. It includes traffic management, authorization and access control, monitoring, and API versioning. For example, you can create a Data Lake API with API Gateway that receives requests over HTTPS. When an API request is made, Amazon API Gateway uses a custom authorizer (a Lambda function) to ensure that all requests are authorized before the data is deployed.

Amazon Cognito adds user sign-in, sign-in, and data synchronization to serverless applications. Amazon Cognito user pools provide built-in sign-in screens and connection to Facebook, Google, and Amazon using (via SAML).

real time batch AWS lambda 1

Image source: AWS “Unite Real-Time and Batch Analytics Using the Big Data Lambda Architecture, Without Servers!”

 

Sports Analytics with AWS

AWS Sports Analytics Solutions for the Bundesliga

The German Bundesliga (DFL) uses AWS services for artificial intelligence (AI), machine learning (ML), analytics, database administration and storage to provide real-time statistics to predict future matches and match outcomes and recommend personalized game footage to fans for mobile, online, streaming and television broadcasts.

Beat the Bookie – the blog of our colleague André Doerr.

After almost 5 years of sports betting without any profit, André started reading some articles about sports betting. He quickly realized that he had no chance to beat the bookie, to make a profit without doing a statistical analysis.

As a DWH architect as well as a certified Data Vault 2.0 Practitioner, he has developed a concept for an analytics system that automates all these statistical tasks.

In his blog, he shows ways to build such an analytical system. He explains the technical part, how to collect, model and analyze data from the web. In the process, he takes a look at various aspects of sports betting.

Automate your betting models with AWS

How does my typical betting weekend looks like, when I start ckecking, whether there are some interesting matches? I start my laptop, open the browser, start my Python program, start the database and after some minutes, I am able to start my data prcoessing, which collects all the data and calculates the predictions. That’s already great, but wouldn’t it be even better to have all predictions always already up-to-date? This blog will show you how to setup and run a small automated data pipeline in AWS, which extracts all stats from Understat.com. mehr…

SAP-Integration with Theobald Software

Xtract Universal from Theobald Software is a flexible stand-alone solution for an integration of SAP data. 

With Xtract Universal Designer, the user can connect to one or more SAP systems, configure SAP extractions with just a few mouse clicks – without programming or scripting. Preview and log functionality supports the development of the various SAP data extracts.

Once the data is extracted from SAP, it can be integrated directly into one of over 20 supported target environments. These include AWS Redshift,and AWS S3.

Your SAP extractions can be easily modified to include new data elements or injected into other destinations to meet your changing business needs. This eliminates the need for expensive or time-consuming ABAP programming.

More about Theobald Software.

Theobald Software SAP Amazon S3 connector

Xtract Universal supports both bulk data extraction and incremental extraction of your SAP data. Multiple extractions can run simultaneously, can be fully automated and monitored. With built-in security, companies have complete control over who can access their sensitive SAP data.

Become a data driven company with areto cloud experts!

Overtake the competition by making faster and better decisions!

Find out where your company currently stands on the way to becoming a data-driven company.
We analyze the status quo and show you what potential exists.
How do you want to get started?

Free consultation & demo appointments

Do you already have a strategy for your future cloud solution? Are you already taking advantage of modern cloud platforms and automation? We would be happy to show you examples of how our customers are already using areto’s agile and scalable cloud computing solutions.

Workshops / Coachings

Our cloud workshops and coachings provide you with the necessary know-how to build a modern cloud strategy. The areto Cloud Training Center offers a wide range of learning content. 

Proof of Concepts

Which cloud architecture is right for us? Are the framework conditions suitable? Which prerequisites need to be created? Proof of concepts (POCs) answer these and other questions. So you start your project well prepared. 

Why AWS ?

2021 CIPS MQ 2

Gartner, Magic Quadrant for Cloud Infrastructure & Platform Services, Raj Bala, Bob Gill, Dennis Smith, Kevin Ji, David Wright, 27 July 2021. Gartner and Magic Quadrant are registered trademarks of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved. Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

gartner mq for cloud ai developers scaled 2

This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from AWS. Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

Help for self-help -
areto consultation hours

Book a support appointment with one of our AWS experts! Quick solution approaches and best-practise to your concrete problems in dealing with the market-leading AWS services!

Costs

0,5 hours – 110 €
1,0 hours – 200 €
2,0 hours – 350 €

Amazon Web Services (AWS) consultation hour

The AWS consultation hour offers you the possibility to get support for small and large questions at short notice. Benefit from the experience of our experts in solving your problem. This way you can quickly get back to your actual work.

AWS Know-How Video Library

Introduction to Amazon S3 (Amazon Simple Storage Service)

Einführung in den AWS Database Migration Service

Introduction AWS Identity and Access Management (IAM)

Introduction to Amazon CloudTrail

Introducion to AWS Lake Formation

Introduction AWS Glue

Introduction to AWS Elasticsearch Service

Introducion to Amazon Neptune

Introduction to Amazon EMR

Introduction to AWS Lambda

Introducion to Amazon Neptune

Changing the Game with Machine Learning

Leverage your data. Discover opportunities. Gain new insights.

We look forward to hearing from you

till sander areto 1

Till Sander
CTO
Phone: +49 221 66 95 75-0
E-mail: till.sander@areto.de