Data vault modeling is quickly becoming the standard approach to modeling a data warehouse. Compared to other popular approaches, Data Vault modeling represents a paradigm shift – a new way of thinking.

Data Vault

DATA VAULT 2.0 // DATA VAULT AUTOMATION

What is Data Vault ?

Data Vault is a modeling technique for data warehouses that is particularly suitable for agile data warehouses. It offers a high degree of flexibility for extensions, a complete unitemporal historicization of the data and allows a strong parallelization of the data loading processes. Data Vault modeling was developed in the 1990s by Dan Linstedt. After its first implementations in 2000, it gained greater attention from 2002 onwards through a series of articles. In 2007, Linstedt won the support of Bill Inmon, who called it the “optimal choice” for his DW 2.0 architecture.

areto is a specialist for the package of modeling, architecture and methodology approaches propagated by Linstedt since 2013 under the name Data Vault 2.0. Also noteworthy are the publications by Hans Hultgren on data vault modeling and by John Giles on the creation of data vault models using patterns.

Starting Point: Classic Modeling vs. Data Vault

Classic Modeling areto

Established Data Warehousing with dimensional modeling

Ralph Kimball’s dimensional modeling focuses on simple data analysis and is optimal for the access layer of a data warehouse. 

Bill Inmon propagated an enterprise integration layer in 3rd normal form, which transforms all source systems into a uniform, historicized departmental model. The modeling in 3rd normal form is optimized for operational systems and quickly reaches its limits when it comes to data integration.

ARCHITECTURE AND MODELING

  • Development of a central data model
  • Low disk space consumption
  • Optimal support of the units
  • Mapping of logical dimensions based on technical keys

Agile BI made easy with Data Vault 2.0

ARCHITECTURE AND MODELING

  • Optimal support for agile development
  • A reduction of effort in modeling
  • Strong standardization of processes
  • Simplification of charging processes
  • Effort reduction in testing
  • Decoupling of dependencies of supply systems and processing lines
  • For a scalable, flexible and consistent warehouse
Agile BI made easy areto
Performance in Data Warehousing areto eng 1

Why should companies use Data Vault?

Data Vault allows flexible and fast customization of the data warehouse. A real advantage for companies. Static data warehouses become increasingly complex over time. This automatically leads to higher costs for continuously occurring extensions and changes to the data warehouse. However, the extensive implementation and test cycles not only lead to an increase in costs, but also often to personnel bottlenecks, innovation backlogs and an exhaustive search for ETL and modeling experts.

Companies that want to survive in today’s competition cannot afford these waiting times. They need to respond quickly to ever-changing current market needs. This has to be echoed in the data warehouse implementation. Data Vault is the solution.

Modern data warehouses are agile!

Modern modularly scalable data warehousing

Modern

Data Vault combines the best of the dimensional and normalized modeling world. Data Vault is specifically designed to solve agility, flexibility, and scalability issues. It was developed as a granular, non-volatile, auditable, historical repository for enterprise data from multiple operating systems.

Modular

Changes extend the model without changing existing ones. Thus, there is hardly any impact on existing processes and only minimal testing effort (regression tests).

Scalable

Complete parallelization of loading. Different interfaces can be loaded independently of each other. Incremental approach. Content is insert only and provided with Slowly Changing Dimension 2 (SCD2) historicization. ETL or ELT can/should take place automatically.

What are the benefits of Data Vault?

Advantages for the specialist departments

  • Faster access to new data sources
  • Reduction of waiting times for important analysis results (Time to Insight)
  • Massive reduction of development time when implementing business requirements
  • Compliance with compliance requirements (e.B Basel II, BCBS 239)
  • Identification of new opportunities and risks,
  • Faster return on investment (ROI)
  • Scalability of the data warehouse
  • Documentation and traceability of all data up to the source system

Technical advantages

  • Near-Real-Time Loading
  • Big Data Processing
  • Seamless integration of a wide variety of data sources (e.B. NoSQL/unstructured data)
  • Agile, iterative development cycles with incremental expansion of the DWH
  • Automatable ETL patterns

How does the Data Vault 2.0 architecture and modeling approach work?

Data Vault architecture areto consulting screen 1

Data Vault - a holistic solution

Data Vault was not developed as a pure data model, but much more as an all-encompassing collection of methods:

  • Process model
  • Modeling
  • Data processing
  • Architecture

With Data Vault model an additively and agilely data warehouse!

Data Modeling Methods 

  • Conceptual elements of modeling
  • Hub & Spoke
  • Auditable design rules

Methods of data processing 

  • Standardization approach for integration logic
  • Realtime & Batch Support
  • ETL templates and automation approaches

Architectural Principles 

  • Separation of integration/historicization logic and business logic 
  • Prerequisites for virtualizing the BI layer 
  • Integration of big data scenarios and NoSQL databases

Agile development process 

  • Support of agile procedures (SCRUM based)
  • Iterative, incremental approach to development
  • Encapsulation and decoupling of changes

Data Vault Architecture

The Data Vault architecture essentially consists of three layers:

  • Staging Layer: collects the raw data from the data source systems
  • Data Vault Layer: contains the
    • Raw Vault: Storage of raw data
    • Business Vault: Contains harmonized and transformed data based on business logic
  • Business Intelligence Layer: Accesses predominantly the Business Vault and provides information for analysis and reporting
DWH Architecture Daua Vault 2 0
Data Vault Elemente screen areto

Data Vault Components

Data Vault 2.0 offers a high degree of flexibility for extensions of the DWH, a complete historicization of the data and allows a strong parallelization of the data loading processes. In modeling, all information belonging to an object is divided into three categories and strictly separated from each other.

The first category “Hub” includes information that clearly describes an object, i.e. gives its identity (e.g. product number for the product).

Hub – Is the “root” of an entity (integration anchor):

  • Hash/Surrogate Key (SK)
  • Business Key (BK)
  • Audit information (source, creation date)

The second category “Link” describes relationships between objects (e.g. assignment of a product to a sales channel).

Link – Maps the relationships between hubs:

  • Hash/Surrogate Key (SK)
  • Hash/surrogate keys of the connected hubs (FKs)
  • Audit information (source, creation date)

Attributes that describe an object (e.B. product name) belong to the third category, the “satellite”.

Satellite – Stores the detailed data from hubs and links:

  • Hash/surrogate key of the hubs or links
  • Detail attributes and history
  • Audit information (source, creation date)

This type of modeling allows changes to be made flexibly, so that no existing tables need be adapted. New tables are simply added. Because of the strong schematization of the data loading processes, templates can (ought to) be used. For example, a change or extension of the data loading process is usually already possible by adjusting the configuration.

Data Warehouse Automation Solutions

In the interest of our customers, areto ensures that data integration is as standardized as possible. The proliferation of Data Vault as a data modeling method for the data warehouse has led to the development of numerous Data Warehouse Automation (DWA) solutions. The combination of leading DWA tools, analytical databases such as Exasol or Snowflake and the technical expertise of areto leads to large time and cost savings. areto offers market-leading solutions from our partners WhereScape, Data Vault Builder and Matillion or our own open source solution areto Data Chef (which we successfully use in many customer projects). 

Conclusion: What is the deal with Data Vault 2.0?

The Data Vault architecture and modeling approach, with its simple and understandable modeling paradigms and naming conventions, enables a quick understanding of both the source and the transformed data. Data Vault combines the best of the dimensional and normalized modeling world. This makes modeling scalable, flexible and consistent in itself. It can be adapted to the individual company needs and offers optimal support for agile process models.

Data Vault is revolutionizing the architecture of the data warehouse with its new way of data integration and data delivery. Because of the strong standardization of processes, it is possible to automate the data provisioning to a very high degree.

With Data Vault, you create new opportunities and perspectives to grow your business and lead it into the future.

Data Vault Know-how Video Library

Strategische Entscheidungen schneller treffen - Datavault Builder - Exasol - areto

Data Vault Automatisierung mit Matillion und areto

Snowflake Cloud DWH - Datenversorgung mit Kafka und dem areto Data Chef

Help for self-help -
areto consultation hours

Book a support appointment with one of our Data Vault 2.0 experts!  Quick solution approaches and best-practise to your concrete problems in dealing with the innovative modeling and architecture approach to agile data warehouse modeling!

Costs

0,5 hours – 110 €
1,0 hours – 200 €
2,0 hours – 350 €

Data Vault 2.0 consultation

The Data Vault 2.0 consultation hour offers you the opportunity to receive support for small and large questions at short notice. Benefit from the experience of our experts in solving your problem. This way, you can quickly get back to your actual work.

Becoming a data-driven company with the areto Data Vault experts!

Overtake the competition by making faster and better decisions!

Find out where your company currently stands on the way to becoming a data-driven company.
We analyze the status quo and show you what potential exists.
How do you want to get started?

Free consulting & demo appointments

Do you already have a strategy for your future DWH solution? Are you already taking advantage of modern cloud platforms and automation? We would be happy to show you examples of how our customers are already using areto’s agile and scalable DWH solutions.

Workshops / Coachings

Our workshops and coaching sessions provide you with the know-how you need to set up a modern DWH. The areto DWH TrainingCenter offers a wide range of learning content.

Proof of Concepts

Which DWH architecture is right for us? Are the framework conditions suitable for it? Which prerequisites must be created? Proof of concepts (POCs) answer these and other questions so that you can make the right investment decisions. This way, you start your project well prepared.

areto Data Vault Customers

Leverage your data. Discover opportunities. Gain new insights.
We look forward to hearing from you!
till sander areto 1

Till Sander
CTO
Phone: +49 221 66 95 75-0
E-mail: till.sander@areto.de