SAP HANA Interview Questions & Answers

1. What is SAP HANA?

SAP HANA (High-Performance Analytic Appliance).

SAP HANA is SAP AG’s implementation of in-memory database technology.

There are four components within the software group:

SAP HANA DB (or HANA DB) refers to the database technology itself,

SAP HANA Studio refers to the suite of tools provided by SAP for modeling,

SAP HANA Appliance refers to HANA DB as delivered on partner certified hardware (see below) as an appliance. It also includes the modeling tools from HANA Studio as well replication and data transformation tools to move data into HANA DB,

SAP HANA Application Cloud refers to the cloud based infrastructure for delivery of applications (typically existing SAP applications rewritten to run on HANA).

2. What is SAP HANA Appliance 1.0?

SAP HANA 1.0 is an analytics appliance that consists of certified hardware, an In Memory DataBase (IMDB) an Analytics Engine and some tooling for getting data in and out of HANA. You build the logic and structures yourself, and use a tool e.g. SAP BusinessObjects, to visualise or analyse data.

3. What are the product names?

The short answer is: it's a mystery. SAP has changed them around a lot and now they call it SAP HANA Appliance, SAP HANA Database and SAP HANA Studio. Applications built on HANA will be marked "powered by SAP HANA". Probably they will change it all again.

4. What additional limitations does Sybase Replication Server present?

SRS has additional restrictions which are worth bearing on mind. It can only replicate Unicode data and does not support IBM DB2 compressed tables.

5. What source databases does HANA support for batch loads?

If you use SAP BusinessObjects Data Services 4.0 for bulk loads then pretty much anything. BO-DS is a very flexible Extract, Transform & Load tool that supports many databases - check out the specs for more details.

6. What source databases does HANA support in real-time?

If you use Sybase Replication Server (SRS) for near real-time data then you need to watch out for licensing still (SAP have license deals pending). If you run DB2 then you're fine but with Oracle and Microsoft SQL Server there are some license challenges if you buy your license through SAP, because you may have a limited license that does not allow extraction. Talk to SAP for further information on this.

7. What storage subsystem does HANA use?

This varies from vendor to vendor but it is shared network attached storage (NAS). Both regular magnetic disks and SSD storage can be used for the backup of the database (HANA runs in memory remember, so disk storage is just for backup, and later, for data ageing). Note that you require 2x storage that you have RAM, which is 2x the database size - i.e. storage size = 4x database size. In most cases there is additional ultra-high speed SSD storage for log files.

8. How big does HANA scale?

Theoretically at least - very well. The biggest single-server HANA hardware will run most mid-size workloads - 2TB of in-memory storage is equivalent to 5-20TB of Oracle storage. The way that HANA works means that it is possible to chain multiple systems together - meaning that scalability has thus-far been determined by the size of customers' wallets. Do note that whilst SAP talk up "Big Data" quite a lot, HANA currently only scales to the small-end of Big Data, which refers to the kind of huge datasets that FaceBook or Google have to store - not Terabytes, but rather Petabytes.

9. Does SAP make their own IMDB/HANA hardware?

Yes, but only in the labs so far. There are no public plans to compete against IBM/HP/Dell in this space, but it may make sense for SAP to enter the appliance market, especially in the context of Data Centres and even more so in the context of the SAP Business byDesign cloud offering, which will run on IMDB.

10. Why doesn't HANA run on blades?

It's unclear but probably because the blades don't yet offer the same performance. HANA is optimized for the Intel X7560 CPU and will run fastest on this. And for instance, the Dell M910 blade can only run 2x X7650 CPUs and 512Gb RAM in this configuration, which probably explains the limitations. What's certain is that HANA will eventually run on blades - it's born to run on blade technology!

11. What hardware is supported right now?

Talk to your hardware vendor - all of the major vendors e.g. HP, IBM, Dell, have HANA offerings now. Technically HANA will run on any Intel x64 based system from your laptop through to the big 40-core, 2TB RAM servers. It is however only supported on a small number of big rack-mount servers like the Dell R910 and HP DL980.

12. What's the wider market opportunity for IMDB?

This is the interesting thing - no one knows yet, and few analysts seem to have cottoned on that the wider market opportunity might be huge. Think not just SAP applications but any third party that requires ultra-high speed. Think not just an appliance but a development platform. Time will tell.

13. You mean I have to buy a HANA only 2.5x smaller than my big Oracle RDBMS? What about archiving and data ageing?

Yes, in some instances you may have to buy a HANA appliance that is only 2.5x smaller than it would be under Oracle. And data ageing isn't part of the 1.0 release, but SAP is certainly working on it pretty hard. Let's hope they release something faster than you need to buy a bigger HANA appliance!

14. Does HANA/IMDB replace Oracle?

It's the elephant in the room, but once the Business Suite runs on IMDB, Oracle won't be needed any more by SAP customers who purchase HANA. This doesn't affect anything in the short term because those people buying HANA today will still need an Oracle ERP system.

15. Why is HANA so fast?

Regular RDBMS technologies put the information on spinning plates of iron (hard disks) from which the information is retrieved. HANA stores information in electronic memory, which is some 50x faster (depending on how you calculate). HANA stores a copy on magnetic disk, in case of power failure or the like. In addition, most SAP systems have the database on one system and a calculation engine on another, and they pass information between them. With HANA, this all happens within the same machine.

16. What does HANA cost?

SAP hasn't entirely confirmed HANA licensing costs but the hardware is somewhere around $1-200k per TB. Add to this licensing costs which are still being made on a per-customer basis.

17. What is HANA bad at?

There are some current issues around HANA when delivering ad-hoc analytics, especially when using the SAP BusinessObjects Webi tool. Essentially the problem is that you can ask computationally very difficult questions with Webi, which can cause very long response times with HANA. SAP will need to build optimization for both Webi and HANA to reduce the computational complexity of these questions, but they're not there yet.

What's more, it's worth noting that HANA 1.0 is not a Data Warehouse and it is more of a Data Mart - that is, suited to point applications where there is a clear use case.

18. What is HANA great at?

The best thing that HANA brings to the table is the ability to aggregate large data volumes in near real-time - and to have the data updated in near real-time. SAP's demos show hundreds of billions of records of data being aggregated in a matter of seconds. SAP has built a set of Analytics Apps on top of HANA and this are set to be great point use cases to get customers up and running quickly.

19. If I can run NetWeaver BW on IMDB/HANA, why can't I run the Business Suite/ERP 6.0?

Simply because it's not mature enough yet to support business critical applications. From a technology perspective, it is already possible to run the Business Suite on IMDB and SAP has trialled moving some large databases into IMDB already.

20. What's the difference between HANA and IMDB?

HANA is the name for the current BI appliance (HANA 1.0) and the BW Data Warehouse appliance (HANA 1.0 SP03). Both of these use the SAP IMDB Database Technology (SAP HANA Database) as their underlying RDBMS. Expect SAP to start to differentiate this more clearly as they start to position the technology for use cases other than Analytics.

21. What is SAP HANA 1.5, 1.2 or 1.0 SP03?

These are all the same thing, and 1.0 SP03 is touted to be the final name for what should go into RampUp (beta) in Q4 2011. This will allow any SAP NetWeaver BW 7.3 Data Warehouse to be migrated into a HANA appliance. HANA 1.0 SP03 specifically also accelerates BW calculations and planning, which means you get even more performance gains.

22. What are the limitations of HANA 1.0?

Quite a few so far - it can only replicate certain data, from certain databases, in certain formats, using the Sybase Replication Server. Batch loading is done using SAP BusinessObjects Data Services 4.0 and is optimised only for SAP BusinessObjects BI 4.0 reporting.

23. What are the product names?

24. Define Five-minute rule?

It is a rule of thumb for deciding whether a data item

should be kept in memory, or stored on disk and read back

into memory when required. The rule is “randomly accessed disk pages of cache are re-used every 5 minutes”.

25. Define multi-core CPU?

Multiple CPU’s on one chip or in one package is called

multi-core CPU.

26. Define Stall?

Waiting for data to be loaded from main memory into the Cpu cache is called as Stall.

27. What is SAP In-Memory Appliance (SAP HANA)?

HANA is an in-memory technique to store data that is

particularly suited for handling very large amounts of tabular, or relational, data with extra ordinary performance. Common databases store tabular data row

wise. Reorganizing the data in memory column-wise brings a tremendous speed increase when accessing a subset of the data in each table row.

28. What are the components or products of HANA?

SAP HANA contains the following components.

- SAP HANA DATABASE

- SAP HANA Studio SAP HANA CLIENT

- SAP HANA INFORMATION COMPOSER

- DIAGNOSTICAGENT7.3

- SAP HANA client package for MS excel

- SAP HANA UI for Information Access (INA)

- SAP HANAAFLL0

Software Update Manager for SAP HANA

- SAP LT Replication Add On

SAP LT Replication Server

- SAP HANA Direct Extractor Connection (DXC)

- SAP Data Services 4.0

29. What are the different editions available in HANA appliance software?

Platform Edition:

Platform edition is intended for customers who want to use ETL-based replication and already have a license for SAP BO Data Services.

Enterprise Edition:

Enterprise edition is intended for customers who want to

use either trigger-based replication or ETL-based replication and do not already have all of the necessary licenses for SAP BO Data Services.

30. . What is columnar and Row-Based Data Storage?

A database table contains data in the form of rows and columns. However Computer memory is organized as a

linear structure. To store a table in linear memory, there are

two options. A row-based storage stores a table as a

sequence of records, each of which contains the fields of

one row. In a columnar storage the entries of a column are

stored in contiguous memory locations. The SAP HANA

database allows specifying whether a table is to be stored column-wise or row-wise. It is also possible to alter an

existing table from columnar to row-based and vice versa. Search operations in tabular data can be accelerated by

organizing data in columns instead in rows.

31. What are the advantages of Column based tables?

Calculations are typically executed on single or a few columns only.

The table is searched based on values of a few columns.

The table has a large number of columns.

The table has a large number of rows and columnar

operations are required (aggregate, scan, etc.).

High compression rates can be achieved because the

majority of the columns contain only few distinct values(compared to number of rows).

32. What are the advantages of Row-based tables?

The application needs to only process a single record at one time (many selects and/or updates of single records).

The application typically needs to access a complete record (or row).

The columns contain mainly distinct values so that the compression rate would be low.

Neither aggregations nor fast searching are required.

The table has a small number of rows (e. g. configuration tables).

33. Which case the data to be stored in columnar storage?

To enable fast on-the-fly aggregations, ad-hoc reporting, and to benefit from compression mechanisms it is recommended that transaction data to be stored in acolumn-based table.

34. What is paralisation?

Column-based storage makes it easy to execute operations in parallel using multiple processor cores. In a column store data is already vertically partitioned means that operations on different columns can easily be processed in parallel. If multiple columns need to be searched or aggregated, each of these operations can be assigned to a different processor core. In addition operations on one column can be parallelized by partitioning the column into multiple sections that can be processed by different processor cores.

35. What are the different Compression Techniques?

1.Run-length encoding

2. Cluster encoding

3. Dictionary encoding

36. Why materialized aggregates are not required?

With a scanning speed of several gigabytes per millisecond, in-memory column stores, make it possible to calculate aggregates on large amounts of data on the fly with high performance. This is expected to eliminate the need for materialized aggregates in many cases.

37. What are the advantages of Eliminating materialized aggregates?

Simplified data model Simplified application logic Higher level of concurrency and With the fly Aggregation we have aggregated values up to date

38. What are the different types of replication techniques?

- ETL based replication using BODS

- Trigger based replication using SLT

- Extractor based data acquisition using DXC.

39. Define SLT?

SLT stands for SAP [andscape Transformation which is a trigger based replication. SLT replication server is the replication technology to pass data from source system to the target system. The source can be either SAP or non-SAP. Target system is SAP HANA system which contains HANA database.

40. What is Configuration in SLT?

The information to create the connection between the source system, SLT system, and the SAP HANA system is specified within the SLT system as a Configuration. You can define a new configuration in Configuration & Monitoring Dashboard (transaction LTR).

41. What is Configuration and Monitoring Dashboard?

It is an application that runs on SLT replication server to specify configuration information (such as source system, target system, and relevant connections) so that data can be replicated, it can also use it to monitor the replication status (transaction LTR).

Status Yellow: it may occur due to triggers which are not yet created successfully.

Status Red: it may occur if master job is aborted (manually in transaction SM37).

42. What is advanced replication settings?

A transaction that runs on SLT replication server to specify advanced replication settings like

a. Modifying target table structures.

b. Specifying performance optimization settings

c. Define transformation rules

43. Define Latency?

It is the length of time to replicate data (a table entry) from the source system to the target system.

44. Define logging table?

A table in the source system that records any changes to a table that is being replicated. This ensures that SLT replication server can replicate these changes to the target system.

45. What are Transformation rules?

A rule specified in the Advanced Replication settings transaction for source tables such that data is transformed during the replication process.

Example you can specify rule to:

i. Convert fields

ii. Fill empty fields

iii. Skip records

46. When to change the number of Data Transfer job?

If the speed of the initial load/replication latency time is not satisfactory 1f SLT replication server has more

resources than initially available, we can increase the number of data transfer and/or initial load jobs. After the completion of the initial load, we may want to reduce the number of initial load jobs.

47. When to go for table partitioning?

1f the table size in HANA database exceeds 2 billion records, split the table by using portioning features by using “Advanced replication settings” (transaction IUUC_REPLCONT, tab page IUUC_REPLTABSTG).

48. What are the jobs involved in replication process?

- Master Job (IUUC_MONITOR_<MT_ID>)

- Master Controlling Job (IUUC_REPLIC_CNTR_<MT_ID>)

- Data Load Job

(DTL_MT_DATA_LOAD_< MT_I D >_<2digits>)

- Migration Object Definition Job(IUUC_DEF_MIG_OBJ_<2digits>)

- Access Plan Calculation Job(ACC_PLAN_CALC_<MT_ID>_<2digits>)