top of page


Setup and operation of a high-availability big data analytics platform

The task

To provide a central analytics platform for a wide variety of use cases from different departments, a big data environment must be designed, implemented and put into operation. For the supply of data, this environment should be connected to various source systems on-premise and in the cloud via data loading paths.


The challenge

For productive use, all relevant cluster services must be multi-client capable and highly available, the data must not be kept in the cloud for reasons of data security. Personal data require the flexible establishment of rules for storage, use and deletion.


our solution

A big data environment based on Hadoop was designed, implemented and put into operation. To go live, all relevant cluster services were kerberized and configured with high availability, an additional test cluster was established for the transfer of frameworks and processes tested on the development environment into regular operation and a mirror environment was defined to ensure business continuity. For advanced analytics on large data, GPU resources were integrated into the Hadoop cluster. The specialist departments were familiarized with the use of the new platform in several innovation workshops.


The customer benefit

Thanks to the use of big data, a wide variety of data sources can be analyzed comprehensively for the first time. Automated mechanisms ensure the quality of the data. Thanks to high availability, evaluations are available around the clock. A transparent documentation of the architecture and the processes enables the customer to solve even complex problems in self-service.



Our role

  • Consulting / Dev Ops / System Administration

Our activities

  • Planning, installation, operation of HDP (Hortonworks) cluster environments

  • Rely on high availability of all relevant systems (Hadoop, Postgres)

  • Automatic mirroring of important data between Hadoop clusters

  • Identity Management with integration on Hadoop (Kerberos)

  • Advice on technology stack

Technologies & methods

  • Applications: Hadoop, Hive LLAP, NiFi, PowerBI, DaSense, Oozie, Ranger, Ambari, Yarn, IPA, HAProxy, Keepalived, Postgres, PGBouncer

  • Databases: Hive, Postgres

  • Languages / frameworks: Python, Shell, SQL / Docker, CUDA, Map / Reduce, Tez, Spark, Kerberos, Jira, Git, UML, Jenkins

  • Methods: Agile, ITIL, DevOps

bottom of page