Effects of changing data in HANA's column store tables

January 30, 2014, 4:09 am

≪ Previous: Create your own application site using SAP HANA UI Toolkit

I was curious about the inner workings of HANA. When you change data in column store tables, what is the effect? How much is written to the data files, how much change happens to the backup dumps, how much is written to the log files? With a real SAP system, you can measure the activity of the HANA database, but it is hard to measure the actual amount of changed data in some kind of controlled way. As usual, SAP provides some documentation, but not in the area I am interested in (with the usual disclaimer "as far as I can see"). The real fun with SAP always starts with reverse engineering, so I decided to create my own lab experiment.

My lab consists of:

- HANA 1.0 revision 70

- 7.4 GB or 49.108.572 rows of real-life ASCII data

- resulting in a 960 MB column store table

I wanted to measure in a controlled way the reaction of the database to changes in this table. First I delete some data, then I insert the same amount (but different) data. These two steps should simulate an update. I had a look at the in-memory table size. I created a backup dump and measured how much of the data file and backup dump file have changed. I tried to identify the written log files. This table summarizes my measurements:

Description	Delete #1	Insert #1	Delete #2	Insert #2	Delete #3	Insert #3	Delete #4	Insert #4	Delete #5	Insert #5
MEMORY_SIZE_IN_MAIN	998.832.085	1.049.845.069	1.041.197.293	1.071.021.253	1.075.204.349	1.086.465.005	1.089.642.421	1.117.598.501	1.124.042.589	1.146.048.837
RECORD_COUNT	47.359.893	49.187.726	47.620.728	49.179.120	47.542.719	49.117.838	45.957.846	49.701.911	43.134.825	49.712.183
delta rows	1.748.679	1.827.833	1.566.998	1.558.392	1.636.401	1.575.119	3.159.992	3.744.065	6.567.086	6.577.358
delta size (raw ASCII Bytes)	276.500.919	288.948.984	247.715.063	246.271.066	258.70.592	249.046.220	499.568.790	591.892.341	1.038.293.459	1.039.845.787
delta size (column store Bytes)	9.232.860	51.012.984	8.647.776	29.823.960	4.183.096	11.260.656	3.177.416	27.956.080	6.444.088	22.006.248
changed datafile (Bytes)	135.090.176	1.318.801.408	1.112.358.912	1.369.174.016	1.119.375.360	1.366.949.888	149.934.080	3.299.258.368	91.615.232	3.882.795.008
ratio of changed datafile	0,036	0,351	0,296	0,364	0,298	0,364	0,040	0,819	0,023	0,851
delta backupfile (Bytes)	48.971.776	1.066.991.616	1.060.003.840	1.117.069.312	1.073.582.080	1.098.489.856	62.418.944	1.202.388.992	90.517.504	1.284.259.840
ratio of changed backupfile	0,037	0,780	0,786	0,790	0,793	0,767	0,043	0,835	0,061	0,844
logfiles written (Bytes)	16.314.368	239.538.176	14.798.848	204.611.584	14.266.368	206.983.168	25.640.960	486.739.968	55.025.664	856.743.936
Annotations		3,2% changes	delta merge	3,2% changes	delta merge	3,2% changes		6,5% changes		13% changes

The table contains a lot of information, let me summarize my most interesting findings:

After the delete #2 and delete #3 I manually performed a delta merge on the table, which is of course not necessary or outright useless. Surprisingly, this delta merge has got a huge effect on the data file and the backup dump file. Delete #1 was comparable in size, but without a manual delta merge and shows only very small changes to the data file and the backup dump file.
Delta merges after inserts have always significant effects on the data file and backup dump file. The changes to the data file are typically several times higher than the actual amount of changed data. This is no issue for HANA since these writes happen asynchronously, but it is important to know that the amount of change to the datafiles is not representative for how much change actually happened to the data.
Even though the delta merge causes significant changes to the table representation both in the data file and in the backup dump file, it is still possible for most real-life SAP systems to use backup deduplication because only a tiny percentage of the overall data is changed per day. (I verified that on an actual BW on HANA system.) However, I predict that there is some critical threshold, if too many table (partitions) are changed and therefore reorganized via a delta merge then it won't be possible to benefit from a backup deduplication any more.
Only delete #1 and Delete #2 showed a reduction in the memory consumption of the table. Delete #3, Delete #4 and Delete #5 increased the memory consumption of my sample table! Also the data file and the backup dump file size increased over the time by ~20%. Contrary to that, the amount of data was the same at the beginning and at the end of my experiment.
The amount of log files written by HANA correlates very well with the amount of (raw ASCII) data inserted into the database. So as a rule of thumb, if you want to know how much data has been added to or updated in your HANA instance, have a look at the log files.
Deletes are very efficient in terms of log files, only few data needs to be written to the logs. Don't expect any significant reduction of the in-memory space requirements, however.

In case of questions or if something was unclear, please ask.

↧

New at SAP HANA

January 30, 2014, 6:38 am

Latest and popular articles on SAP ERP

≫ Next: SAP River: Enabling the Development Environment

≪ Previous: Effects of changing data in HANA's column store tables

Hello Community,

i am new at SAP HANA and i searching for some expert knowledge here in SCN.

Last year i have pass the SAP courses HA100 and HA200 and i want to try the certification. So i have found 2 certifications:

C_HANATEC131
C_HANASUP_1

But i have some problems to know about the difference between this 2 certifications ...

Both are possible for my passed courses.

Please see: Curriculum | SAP HANA - Administration

Could anyone tell me the difference about this?

Markus

↧

SAP River: Enabling the Development Environment

January 30, 2014, 12:38 pm

Latest and popular articles on SAP ERP

≫ Next: SAP River: Creating the Development Environment

≪ Previous: New at SAP HANA

Want to:

Know how to use the system administrator to add a new system?

See a detailed walkthrough of installing the delivery unit to the system?

Learn how to simply install the SAP Hana Studio plug-in?

(A separate SAP HANA Studio plug-in must be installed for every copy of Studio and for every developer that is using Studio)

Access the SQL code needed to set up a developer role and create the three essential stored procedures?

Code located on SAP River Developer Guide:

Developer Role Code: Section 2.2

Stored Procedures Code: Section 2.3

View other tutorials on SAP River at the SAP HANA Academy.

SAP HANA Academy - over 500 free tutorial technical videos on using SAP HANA.

-Tom Flanagan

SAP HANA Academy

Follow @saphanaacademy

↧

SAP River: Creating the Development Environment

January 30, 2014, 3:56 pm

Latest and popular articles on SAP ERP

≫ Next: Enter the next level: Reality in Real-time

≪ Previous: SAP River: Enabling the Development Environment

Want to:

Set up a uniquely named environment configured for SAP River?

(Tip: Name of SAP River environment must not be the name of a user or a schema)

Create a new user for a specific SAP River development environment?

Simply allocate the privilege rights for each created user?

Store work locally with a repository workspace?

(Pro Tip courtesy of John Appleby: Configure your SAP River repository workspace to co-locate with a GitHub repository so you can share your projects with friends and colleagues in the cloud)

View other tutorials on SAP River at the SAP HANA Academy.

SAP HANA Academy - over 500 free tutorial technical videos on using SAP HANA.

-Tom Flanagan

SAP HANA Academy

Follow @saphanaacademy

↧

Enter the next level: Reality in Real-time

January 31, 2014, 1:43 am

Latest and popular articles on SAP ERP

≫ Next: SAP River: Hello World

≪ Previous: SAP River: Creating the Development Environment

May I ask you, what are you doing the whole day long? Don´t worry, I am not interested in the details, but my good guess is that most of you are, regardless of your actual position and what may be printed on your business card, to some degree acting as a manager today:

You have to plan and organize your job, you receive and give feedback and you have to react to it and try to make ends meet wherever possible. And this does not end at the borders of the job as bringing together family and business life is a special challenge for many of us these days. So step back for a moment and think about what you are doing day in day out. The usual tasks e.g. in managing projects can be described as Planning, Execution and Controlling from a higher aggregation level. Sounds fairly reasonable?

Yes, and of course it should be done in the right sequence, e.g. making up your mind before you start and having a look on the results afterwards. And you can do this cycle forever and ever: Based on results and rising experience, the next planning will be much better than before. Only if the results were not as good as expected, simply your planning was not good enough. As example, PMI methodology is also following that basic phased pattern, in addition putting however great emphasis on the importance of communication. So planning is king, don´t forget to talk about it and everything is fine. Simply and straightforward, brilliant theory, bullet proof…

But in practice?

What about all these ever changing conditions where you have to react always and ASAP? Of course early alert systems are a good idea, but they do not cure the basic problem that you have to do corrective action and forget about your well defined phase plans here. And if you have a closer look, it is even less than that: You can´t even separate the tasks that clearly when you ask yourself at the end of a day what you have done. And by the way, in the heat of the moment you should better not ask a manager under pressure, whether he is now rather in a controlling mode or already starting planning again…

So let´s face reality, management could in practice not be divided in clearly separated tasks which are following phase by phase on each other. And management process is only circular in terms of that you sometimes have to start all over again all of a sudden and that usually everything happens at the same time!

Seems like a no-brainer?

Yes! I can really not claim any credits for this finding at all. The empiric research of Henry Mintzberg in the late sixties showed clearly that even back then the activities of managers could not be aligned with that circular management theory and divided by clear phase. And hence I learned it back at university (also already some years ago….) that daily work of a manager is characterized by open cycles, a fragmented work day, and lots of ambiguity.

All fair, but what´s my point is that although these facts are known since quite a long time, they are in general very poorly reflected in IT systems!

What are the two main reasons for this?

Up to now, technology and systems simply were not ready for this! Since the beginning of the computer age, capacity in storage and memory was scarce or at least expensive and hence basically never available in desired quantity. Remember Moore´s Law, IT price-performance rate is doubling every 18 months and this rule has proven constant over quite a number of years already. Price of sizing of course still matters, but it is no longer the critical factor or even the bottleneck that it used to be. Back in the old days however, the computing power was sometimes to be split between different systems and this limitation was also the reason for the separation between Online Transactional Processing (OLTP) and Online Analytical Processing (OLAP). At the center of OLAP are complex multidimensional and multi relational data analyses with high data volumes, supporting controlling and planning processes. Being not as time critical however as e.g. the business transaction in front of a customer, it was hence a logical step to separate it from daily business which was done in real-time like e.g. with ERP systems. So the aggregated data is collected separately in Data Warehouses with a delay due to batch upload. These are not in direct touch with operational systems and are also technically strictly separated by building up multidimensional info cubes rather than the transactional row based structures used in OLTP.

The second reason in my opinion is the approach of software design in general and ERP Systems in particular to cover as many different use cases as possible. This ends up in an enormous wealth of data and functions. In an attempt to bring structure to things, for developers and solution architects it was a quite logical step to do this also by different user roles. This split into managerial, transactional and analytical roles was also to some degree due for performance reasons taken to different systems as well, but the main focus was to use different roles in order to base the use cases on this. Of course this separation by roles is not a bad idea at all, sometimes even being a direct translation from the business world. However it leads to a problem, as developers were actually not thinking from the user in the first place. They came rather from the other end of the scope and end up to cluster that functionality in specific roles afterwards and not thinking from the requirements of the particular user in the first place. As a result, an average ERP user sees quite a lot of information and interaction points on his screen which he will probably never use, require or even understand. At the same time, so much space on the user interface is wasted for the not necessary information while others, important to the user, can only accessed by entering a different transaction with a different screen.

It was getting better with the invention of portals, combining the access to relevant transactions. But the basic idea that every transaction is reflecting a specific role is still alive. In the times of mobile apps, self-explaining and specific as it can get, this makes the classical look somewhat outdated. And they are also not flexible enough, especially given the requirement of doing everything in real-time on the spot and distinctions between roles are vanishing more and more these days.

So let us call it a day regarding the challenges for the user. But what can be done? How easily can a user solve his management tasks with current IT systems? Will this situation go on forever?

The good news is: Separating OLAP from OLTP is a work-around for too-slow processors that we no longer need. SAP has started the invention of HANA two years ago, which is, together with mobile and cloud solutions, the basis for a management in real-time deserving this classification for the first time in computer industry.

What Is HANA? Starting as an analytics platform, HANA has grown in the meantime to a flexible deployment environment for managing OLTP, analytical and big data usage types, in-memory and on-disk, on-premise and in the Cloud in one system. And these options are not separated like before, HANA supports advanced applications that can mix OLTP, analytics, and big data in real-time

In times where everything can be recorded, data volume is growing extremely. HANA allows for an adequate acceleration for big data and fast movement of data to match this. It is incredible in terms speed, for instance I saw an operative CRM system which was migrated to HANA database delivering answering times which were up to a hundred times faster. But that is not the main feature for me: The new infrastructure changes software architecture, blurring the boundaries between systems and transactional and analytical applications and allowing for a complete new user experience in true real-time. And this is supporting the needs of today´s business life. Remember that we are all managers today?

What is the bottom line?

Globalization offers new business opportunities. Also small and midsize companies have become international players in recent years. But in order to stay or become successful in this environment, you have to be fast and be cost effective. You have to be available 24 hours a day, 365 days a year and immediate access to information has become a critical success factor – absolutely vital to making the right decisions at the right time, to providing excellent customer service, and to making timely cost-reduction measures.

Things are happening at the same time, and IT systems have to reflect this by being true real-time systems. Firstly, this means to get rid of the boundaries between systems as well as of the distinctions of different application types. Secondly, make business processes and transactions accessible to the user as simply and effectively as possible. So if you tear down the boundaries in the system landscape, you mustn´t keep them up in user transactions! Radically rethink what really makes sense for this particular user, offer as much flexibility as possible and keep everything away from the user what he doesn´t need and what hinders him in is business tasks.

It is about reducing complexity!

And by the way, that is exactly where it comes back to management theory: Today, the systems theory is used as a framework. In a complex world it is about to manage complexity, so you have to reduce it in order to concentrate on the important things. So reduction of complexity is the most important challenge of a manager, and systems should help him in doing so and not adding to it!

↧

SAP River: Hello World

January 31, 2014, 9:57 am

Latest and popular articles on SAP ERP

≫ Next: HANA vs Hadoop - showdown

≪ Previous: Enter the next level: Reality in Real-time

Want to:

Start creating SAP River applications using SAP HANA Studio?

Share your repository workspace projects with the SAP HANA server?

Know how to input RDL (River Definition Language) code to activate an application on the SAP HANA server?

Efficiently activate and inactivate objects?

View other tutorials on SAP River at the SAP HANA Academy.

SAP HANA Academy - over 500 free tutorial technical videos on using SAP HANA.

-Tom Flanagan

SAP HANA Academy

Follow @saphanaacademy

↧

HANA vs Hadoop - showdown

February 1, 2014, 8:52 am

Latest and popular articles on SAP ERP

≫ Next: 10 Golden Rules for SAP HANA Project Managers

≪ Previous: SAP River: Hello World

HANA and Hadoop are very good friends. HANA is a great place to store high-value, often used data, and Hadoop is a great place to persist information for archival and retrieval in new ways - especially information which you don't want to structure in advance, like web logs or other large information sources. Holding this stuff in an in-memory database has relatively little value.

As of HANA SP06 you can connect HANA into Hadoop and run batch jobs in Hadoop to load more information into HANA, which you can then perform super-fast aggregations on within HANA. This is a very co-operative existance.

However, Hadoop is capable - in theory - of handling analytic queries. If you look at documentation from Hadoop distributions like Hortonworks or Cloudera, they suggest that this isn't the primary purpose of Hadoop, but it's clear that Hadoop is headed in this direction. Paradoxically, as Hadoop heads in this direction, Hadoop has evolved to contained structured tables using Hive or Impala. And with ORC and Parquet file formats within the HDFS filesystem, Hadoop also uses columnar storage.

So in some sense Hadoop and HANA are converging. I was interested to see from an aggregation perspective, how Hadoop and HANA compare. With HANA, we get very good parallelization even across a very large system and near-linear scalability. This translates to between 9 and 30m aggregations/sec/core depending on query complexity. For most of my test examples, I expect to get around 14m - with a moderate amount of grouping, say 1000 groups. On my 40-core HANA system that means that I get about 500m aggregations/second.

My research appears to show that Cloudera Impala has the best aggregation engine, so I've started with that. I'd like to know your feedback.

Setup Environment

I'm using one 32-core AWS EC2 Compute Optimized C3.8xlarge 60GB instance. In practice this is about 40% faster core-core than my 40-core HANA system. Yes that's a nice secret - HANA One uses the same tech, and HANA One is also 40% faster core-core than on-premise HANA systems.

I've decked it out with RedHat Enterprise Linux 6.4 and the default options. A few notes on configuring Cloudera:

- Make sure you set an Elastic IP for your box and bind it to the primary interface

- Ensure that port 8080 is open in your security group

- Disable selinux by editing /etc/selinux/config and setting SELINUX to disabled

- Make sure you configure a fully qualified hostname in files /etc/sysconfig/network and /etc/hosts

- Reboot after the last two steps

- Disable iptables during installation using chkconfig iptables off && /etc/init.d/iptables stop

Installation is straightforward - just login as root and run the following:

wget http://archive.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin&& chmod +x cloudera-manager-installer.bin && ./cloudera-manager-installer.bin

The only thing to note during the installation is to use fully qualified hostnames, login to all hosts as ec2-user, and use your AWS Private Key as the Authentication Method. This works for Cloudera and Hortonworks alike.

Testing using Hive

The first thing I did was to benchmark my test using Hive. My test data is some Financial Services market data and I'm using 28m rows for initial testing. With HANA, we get 100ms response times when aggregating this data, but let's start small and work up.

I can load data quickly enough - 5-10 seconds. We can't compare this to HANA (which takes a similar time) because HANA also orders, compresses and dictionary keys the data when it loads. Hadoop just dumps it into a filesystem. Running a simple aggregation when using TEXTFILE storage on Hive runs in around a minute - 600x slower than HANA.

That's roughly what we would expect, because Hive isn't optimized in any way.

CREATE TABLE trades ( tradetime TIMESTAMP, exch STRING, symb STRING, cond STRING, volume INT, price DOUBLE ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH '/var/load/trades.csv' INTO TABLE trades;

select symbol, sum(price*volume)/sum(volume) from trades group by symbol;

Moving from Hive to Impala

I struggled a bit here because Cloudera 5.0 Beta is more than a little buggy. Sometimes you could see the Hive tables from Parquet, sometimes not. Sometimes it would throw up random errors. This is definitely not software you could use in production.

I used Parquet Snappy compression which should provide a blend of performance and compression. You can't load tables directly into Impala - instead, you have to load into Hive and then Impala. That's quite frustrating.

create table trades_parquet like trades;

set PARQUET_COMPRESSION_CODEC=snappy;

insert into trades_parquet select * from trades;

Query: insert into trades_parquet select * from trades

Inserted 28573433 rows in 127.11s

So now we are loading at around 220k rows/sec - on equivalent hardware we could expect nearer 5m from HANA. This appears to be because Impala doesn't parallelize loading so we are CPU bound in one thread. I've read that they didn't optimize writes for Impala yet so that makes sense.

select symbol, sum(price*volume)/sum(volume) from trades group by symbol;

Now the first time this runs, it takes 40 seconds. However, the next time it runs it takes just 7 seconds (still 70x slower than HANA). I see 4 active CPUs, and so we have 10x less parallelization than HANA, and around 7x less efficiency, which translates to 7x less throughput in multi-user scenarios, at a minimum.

Final Words

For me, this confirms what I already suspected to be the case - Hadoop is pretty good at consuming data (and I'm sure with more nodes it would be even better) and good at batch jobs to process data. It's not any better than HANA in this respect, but this $/GB is much lower of course, and if your data isn't that valuable to you and isn't accessed often, storing it in HANA will be cost-prohibitive.

But when it comes to aggregating, even in the best case scenario, Hadoop is 7x less efficient on the same hardware, and the number of features that HANA has, simplicity of operation and storing data only once - if your data is hot, and is accessed and aggregated often in different ways, HANA is the king.

And we didn't even cover in this blog the number of features that HANA has and the incredible maturity of HANA's SQL and OLAP engines compared to what is in Hadoop, plus the fact that Impala is the fastest engine but it is only supported by Cloudera and very immature.

Since with Smart Data Access, we can store our hot data in HANA and cold data in Hadoop, this makes HANA and Hadoop very good friends, rather than competition.

What do you think?

↧

10 Golden Rules for SAP HANA Project Managers

February 1, 2014, 9:56 am

Latest and popular articles on SAP ERP

≫ Next: Distribute the SAP HANA system landscape using "System Archive Link" function

≪ Previous: HANA vs Hadoop - showdown

One of the things I do in my day job is to oversee the HANA projects that we have going on. I usually provide some advice at the beginning, have a QA role and if the project needs a little help at critical points, I'll get stuck in. I've done this on all the 32 HANA projects we had going on over the last year and so I've got a few battle scars. I thought I'd share my experiences, which relate to any SAP upgrade or migration project but which are particularly important for SAP HANA Migrations.

Why are they particularly important? Well, businesses have high expectations of HANA and it's not just a "database" shift to the business. They have been promised amazing things, and it is important that you setup the project to succeed. Here are my rules.

1) Governance is by far the most important thing

I learnt everything I know about this from master project manager Kiran Patel. When Kiran is managing the project, I don't even need to check on anything. With a good project manager, you can have bad infrastructure, business delays, a flood or even bad technical resources. They will ensure that the business is kept in check, timelines are assured and acts of god are avoided. OK, maybe not the act of god. They will get the right resource model both with the business, customer IT and third party suppliers and manage the relationship between customer, SAP and consulting.

They will also know the rest of this list. It is critical that either your project manager, or someone that the project manager has access to (often the role that I play) have previous experience of upgrades/migrations. There is specific language to learn and specific things that can go wrong. A project manager managing a migration for the first time is likely to fail.

2) Separate out project communications and business communications

Normally this is done by separating the project leadership team from the steering group. This is important because it creates a firewall between the business and the project team and allows the business to be specifically informed on how the project is going and what the major blockers are. If you don't do this then the internal IT team will often mask things from the business, and there is a risk that bad decisions will be made.

I had a situation some years back where a customer refused to listen to our advice. We documented it, they agreed to take responsibility for any side-effects (which they claimed did not represent a risk), created an exception and when the project went wrong and we lost 2 weeks of time, we created a change request. The client-side team refused the change request but when the steering group met, this was clearly explained to them and they approved the change request without delay.

3) Create full project plans at the right level of detail, with specific timings

I hate long project plans because they are hard to read, and I refuse to install MS Project on my machine (I'm not a project manager!). So I want to see a high level project plan which shows week by week activities and commitments and allows you to see how you are in the macro scheme of things. This is the plan you should use communicating with the steering group. It should fit on one page of A4 and anyone should be able to understand it.

Then you need a detailed project plan, which lists individual tasks. I like one line to take 1-6 hours to complete and be at a level that a project team can understand. It should make things like the steps of an upgrade (pre, main, post) separate and separate actions in different teams like upgrade, backup etc. But don't do too much detail - the project manager should be able to say "did you do this?" and understand the response.

4) Build in metrics and success criteria, and measure them at every stage

This is important for any project and critical for a HANA project. What did the sales team promise to the business? The business team will have an implicit set of expectations from the sales process, and you have to deal with this. The way you do this is to ask them for their implicit expectations, and turn them into an explicit set of expectations.

What process, what data load, what report? What timings, what change in look and feel? Document these, push back and try to get them reduced, and get the business to sign off on them as success criteria. Make sure they are empirically measurable. And now measure them - before project start, and at each phase. Analyze the risk of not meeting expectations and add additional work to tune systems if necessary.

I had one project where the business had a set of expectations and the project was long and complex (6-7 months). We knew that we couldn't meet the expectations with the project alone, and we noted there were a set of infrastructure changes coming with the project. So what we did was to synchronize the improvements with the go-live by making sure that the system didn't get slowly faster month after month, but rather we got all the improvements in one go at go-live.

5) Ensure you have an impartial advisor on the project

This is the role that I usually play on a project. This allows the project to be set up to succeed and to push the team to do things the "right" way in the first place. It's important that they know enough about the project to see the major risks, and advise the project team. In addition, and this is super-important, the project manager can call them in at short notice at critical times to make decisions.

For instance, I have been involved in a project where there were unexpected infrastructure problems and the project team had been working for 18 hours and was exhausted and was making mistakes. I calculated the time to restore versus the time to continue, analyzed the risk, and pulled the go-live. The project team was emotionally involved and couldn't have made that decision for themselves, but it was the right call. We took the time to understand the problem, got some sleep, regrouped and went successfully live two weeks later.

6) Create a living Run Book, review and revise it at every stage

I am a big believer in the run book, and different people do it different ways. I try to make it as concise as possible, without missing information. It must never miss steps because they are too small or "assumed". It should allow a sick project member to drop out at the last minute and be replaced by an expert who hasn't been involved (key that you use an expert in this situation) and it should allow an additional member to join the project if there are delays and you need to shift the team to work 24/7.

I've had both of those situations happen many times and I'm always relieved when I see a good run book. But remember to get the book reviewed by your impartial advisor because they will be critical. Both the content and the style of the book matter. Also make sure that if you hit an issue on a dry run, you mark a big red X on the run book and figure out what that issue was, and modify the run book. It is a living document.

7) Deal with performance by design, and early

My first ever serious SAP escalation was on the first BW 7.0 Integrated Planning Project back in 2006-2007. The project had signed up to 5-10 second response times and we were up at 5 minutes for individual planning steps. The project team hadn't understood the impact of having the wrong infrastructure (Sun T2000) and not dealing with production data volumes during the design phase.

By the time I got involved it was late in the day and we had to make some fast and drastic changes. We re-platformed to Sun X4600, patched the system, installed a bunch of performance fixes for the SAP code, optimized the application and rewrote a lot of code. The performance problems weren't just in one place - they were everywhere. We also negotiated with the business and reset the expectations on performance to 10-20 seconds, which we met.

Design performance into your project from day one.

8) Don't trust your team when they are under pressure

If you ask the team how long something will take, you won't get the right answer. So you need to physically measure how long each step in the project plan takes and put that into your timeline. If it takes too long, find out why, and rerun the step and update the run book and timeline.

If you ask the team how it is going, they will say "OK" until it is too late. So you need to specifically find out where they are compared to the plan and document that against the timeline.

If the timing slips and you ask the team what happened, they will often try to baffle you with technology speak. So you need to coax them to talk in your language, which is plan vs actual, and understand what slipped, how you will change it in the next run, and how to update the run book.

It doesn't matter how senior your team is, this is human nature.

9) Create a sleep plan, and stick to it

I don't know how many projects I've been brought into where the whole project team has been working on it for 18 hours, they are tired, making mistakes and don't want to be helped because there's "just 5 minutes left". There is never just 5 minutes left!

So it is the project manager's job to create a sleep plan and rotation. The rule of thumb is that technical resources start making mistakes after 12 hours. They are usually useless after around 18 hours and need 9 hours to refresh. You need to look at your team and understand what their dynamic is and plan sleep and rest accordingly.

At Bluefin we have a top class team around the world - in the US, EMEA and Asia for this, and I split the work into 8 hour segments and follow the sun. For critical tasks we wake the local team so they can do the things we are most familiar with. But the important thing is that tired people make bad decisions.

10) You need to deliver the pizza and bring the coffee

I'm a huge believer that leadership is 10% strategy and 90% carrying the water. This doesn't mean that you should undermine your team, or micromanage - though it is often necessary to micromanage during parts of a project, as you will see from the above points.

What it means is that your team have to know that their leadership team is invested. If you're the lead for the customer or project, or even the CIO, you need to be there and you need to buy the pizza and make the coffee. Leadership is about being humble and serving your team and they will respect you for it.

Final Words

I have a bunch of technical advice too, but I wanted to separate it out into a separate article because this one is getting too long and this feels relevant as it is. These rules have served me well for scores of successful SAP projects and I hope they serve you well.

Good luck in your project! Do you have any non-technical rules you'd like to add?

↧

Distribute the SAP HANA system landscape using "System Archive Link" function

February 2, 2014, 6:09 am

Latest and popular articles on SAP ERP

≫ Next: Physics Analysis in SAP HANA -- Simple PoC

≪ Previous: 10 Golden Rules for SAP HANA Project Managers

In SAP HANA Studio SPS07 there is a new function called “Add System Archive Link...”.

This function allows linking your SAP HANA Studio to a centrally stored SAP HANA Systems Archive file. The SAP HANA System Administrator only needs to maintain this central file once. All the linked SAP HANA Studios will display the SAP HANA Systems maintained in this central file.

Configuring SAP HANA Studio using "System Archive Link"

Maintain all SAP HANA System centrally on your central SAP HANA Studio. Don't use the studio on you own laptop as a central studio, this because as a system administrator you might have SAP HANA systems in your list that shouldn't be distributed to other people yet. I use the SAP HANA Studio on the SAP HANA Server, because it is available and it is central for all the studios in you companies network.

After you have maintained the system landscape in the central studio you need to export it. In SAP HANA Studio goto File -> Export. In the pop-up windows select SAP HANA => Landscape.

In the next screen (shown below) you can select which systems should be exported and you can select the location where the landscape.xml needs to be stored. Click Finish to export the landscape.xml.

Copy the landscape.xml file to a location that can be shared with all SAP HANA Studios.

Link the non-central SAP HANA Studio to the central landscape.xml file

Now the landscape.xml file is on a central place I start my local SAP HANA Studio. In the Systems view I open the context menu and select “Add System Archive Link...".

In the pop-up fill in the field as shown below.

Click Finish to complete the action. In my local SAP HANA Studio I see now the system landscape that I exported from the central studio. The little chain icon shows that this is a linked system landscape.

This feature makes my life as SAP HANA System Administrator a bit easier, but exporting the system landscape every time I changed something in my setup is to much work.

Making it all even more easier

I think this whole setup can be done without exporting the landscape.xml file from the central studio every time I have changed the landscape. So I took a look at the central studio and found that the SAP HANA Studio stores it's system landscape in the file hdbstudio.xml located in the directory <workspace>/.metadata/.plugins/com.sap.ndb.studio.datamodel

It is also possible to start SAP HANA Studio with different workspaces using the -data <new workspace location> argument on the command line. With this knowledge I setup a new workspace in the shared /hana file system on my server. I call this the SharedWorkspace.

As of now I start my central SAP HANA Studio with the command /hana/shared/HAY/hdbstudio/hdbstudio -data /hana/SharedWorkspace/ to make sure that the file hdbstudio.xml is on my shared /hana file system.

On my Linux Mint laptop I use the "add System Archive Link.." and linked to the file /hana/SharedWorkspace/.metadata/.plugins/com.sap.ndb.studio.datamodel/hdbstudio.xml Using the System Archive Link function to link to the always up to date central hdbstudio.xmlmakes sure that I don't need to export the landscape.xml file anymore. Every time I start my local SAP HANA Studio it automatically pickup the changes from the central setup in the hdbstudio.xml file. Now my life really easier!

To wrap up thing

This worked for me in my test landscape, but your mileage may vary depending on your local SAP HANA System landscape and the security settings on the use of nfs between systems. I hope this also helps to make your life, as a SAP HANA System Administrator, a bit easier.

↧

Physics Analysis in SAP HANA -- Simple PoC

February 2, 2014, 10:53 am

Latest and popular articles on SAP ERP

≫ Next: SAP River: Hello World Using OData

≪ Previous: Distribute the SAP HANA system landscape using "System Archive Link" function

I wanted to share with the community one of my personal projects. I had been thinking about how to do physics analysis on SAP HANA. To start off, I did a simple PoC, following these steps.

1. Data

I need High Energy Particle Physics data. I wish I could get the real data from a collder, but for a simple PoC, I settled for data from a Physics event generator. I chose Pythia 8.1 as it is one of the most widely used in Physics community. I set Pythia command file to collide p+ with p+ at a CM energy of 1.400e+04 GeV, for 100,000 times (or 'events' in Physicist's term). I first tried with 1M events but that totally bombed my machine.

2. Interface with SAP HANA

I modified a C++ example analysis program (main03.cc) included in Pythia 8.1 package to output generated events to data.csv on a HANA server, which is then imported into a table for analysis.

3. Analysis in SAP HANA

I wrote a procedure, which is equivalent to C++ code in main03.cc but in SQL Script. The sql is shown below.

create procedure testPythia ( out epDev doubleTab ) language sqlscript reads sql data as
begin
t_main = select * from "SYSTEM"."Pythia" ;
t_main_with_rownum      = CE_PROJECTION(:t_main, [CE_CALC('rownum()', integer) as rownum, "tt" as e] );
t_ievent_with_rownum    = CE_PROJECTION(:t_main, [CE_CALC('rownum()', integer) as rownum, "ievent" ] );
t_rownum_of_firstrow_in_ievent = select "ievent", min(rownum) as firstrownum                                 from :t_ievent_with_rownum group by "ievent";
t_first_e_of_each_event        = select a."ievent", b.e as e                                 from       :t_rownum_of_firstrow_in_ievent as a                                 inner join :t_main_with_rownum as b                                 on         a.firstrownum = b.rownum;
t_ievent_sumpxyz = select "ievent", sum("px") as sumpx, sum("py") as sumpy, sum("pz") as sumpz, sum("tt") as sume                   from :t_main where "isfinal" = 1                   group by "ievent";
t_ievent_sumpxyz_with_first_e = select a."ievent", a.e as "firstE", b.sumpx, b.sumpy, b.sumpz, b.sume                                from       :t_first_e_of_each_event as a                                inner join :t_ievent_sumpxyz as b                                on        a."ievent" = b."ievent";
t_ievent_sumxyz_div_by_first_e = select  "ievent", (sumpx/"firstE") as pxdiv, (sumpy/"firstE") as pydiv, (sumpz/"firstE") as pzdiv, (sume/"firstE") as ediv                                 from :t_ievent_sumpxyz_with_first_e;
t_epdev = select abs(ediv - 1) as e, abs(pxdiv) as px , abs(pydiv)as py, abs(pzdiv) as pz          from :t_ievent_sumxyz_div_by_first_e;
epDev   = select e+px+py+pz as epDev from :t_epdev;
end;

The calculation is a simple one but it's good enough for illustration purpose. I did not attempt to optimize the procedure. The server processing time is 2:00.711 minutes for this analysis of 100,000 events and 189,505,757 particle records.

Using this same setup, the next steps would be to change the beam energy in Pythia to 6.5 TeV per beam to match the planned beam energy of the LHC for 2015. More importantly, I need to find a more meaningful analysis that also better showcases the power of SAP HANA. I welcome your suggestion.

References

1. Pythia

http://home.thep.lu.se/~torbjorn/Pythia.html

2. LHC

http://en.wikipedia.org/wiki/Large_Hadron_Collider

↧

SAP River: Hello World Using OData

February 2, 2014, 1:24 pm

Latest and popular articles on SAP ERP

≫ Next: Reverse Geocode your HANA Data with this XS JavaScript Utility

≪ Previous: Physics Analysis in SAP HANA -- Simple PoC

Want to:

Know how to use Odata as an end point for an SAP River application?

(Pro Tip courtesy of John Appleby: OData documentation works much better in a browser – Postman plug-in for Google Chrome)

Create a URL to access your application?

Get a token from the server?

Input form data using JSON to a SAP HANA data table?

Call an action using standard OData syntax?

View other tutorials on SAP River at the SAP HANA Academy.

SAP HANA Academy - over 500 free tutorial technical videos on using SAP HANA.

-Tom Flanagan

SAP HANA Academy

Follow @saphanaacademy

↧

Reverse Geocode your HANA Data with this XS JavaScript Utility

February 2, 2014, 2:33 pm

Latest and popular articles on SAP ERP

≫ Next: How to access SAP River using Python

≪ Previous: SAP River: Hello World Using OData

Geocoding is the process of taking address information and turning it into geographic coordinates that can be used to view that location on a map. Reverse geocoding is the opposite process where you start with a point on the globe (perhaps a latitude longitude coordinate) and convert it into a textual address. This blog explains how to use an XS JavaScript utility to reverse geocode data in your HANA tables. This utility currently makes use of the Google Geocoding API but could easily be extended to use other service providers. All the source code is available on GitHub, all feedback welcome!

Example

Here is an example of the utility in use. I start off with table data like the below, where I have known latitude and longitude coordinates:

Then I run the XS JS service with parameters specifying the source table, source fields and target fields. The URL looks somewhat like this (the parameters are explained more fully below):

http://<server>:80<instance>/<package path>/geodataEnrich.xsjs?schema=GEODATAENRICH&table=
testtable03&maxrecs=10&log=active&fldblank=COUNTRY&fldstat=STATUS&fldlat=LATITUDE&fldlon=
LONGITUDE&fldcty=COUNTRY&fldad1=REGION&fldad2=SUBREGION&fldad3=CITY&fldpost=POSTCODE

Here is the same table after the service runs, complete with textual address data. Easy peasy!

The "STATUS" column holds the results of the geocode API call. You can see "ZERO_RECORDS" for a point given in the middle of the Atlantic Ocean.

The motivation for this utility came from some work I did analysing UK traffic accident data where the raw data contained latitude and longitude but not much else useful about the geographic location of accidents. Then I read John Appleby's blog about analysing weather data, where he made a suggestion about a re-usable service to reverse geocode data. I realised that I'd already done a lot of what was mentioned there and making it more generic was not a big step. Hence this utility.

Goals

My goals were to produce something that was production ready, not just a proof of concept or a demo, but an immediately usable XS JavaScript service. This means supporting exception handling, trace files, simulation mode and giving due consideration to security. It also means supporting "throttling" so that the API calls are not made to Google's service too quickly (something mentioned in their terms of service). I also wanted something that would be as easy to install as possible, so that means the very fewest files that could be cut-and-pasted and no need for package imports or configuration. Finally I wanted something that would be easily extensible, so that more features could be added easily if it proved useful.

How to install and use

Get the source code from GitHub and create the 3 files in any XS project like this (the folder name does not matter):

The three files are as follows:

geocodeApiGoogle.xsjslib: XSJS Library file that wraps the Google Geocode API (and the destination file below) and provides formatted reverse geocode results from a simple JS function call.
geocodeApiGoogleDest.xshttpdest: HTTP destination file, this is required by the XS engine to make calls to external URLs.
geodataEnrich.xsjs: the main XS service, this is what is called to do the work of reading and writing to your tables, making use of the XSJS library above.

You have to make one tiny edit to get everything plumbed in correctly, in the geocodeApiGoogle.xsjslib file, go to line 22 and edit this line to contain your project path as its first parameter (so you will repace "geodataenrich.services" with your project name and path):

var dest = $.net.http.readDestination("geodataenrich.services", "geocodeApiGoogleDest");

Ok, with that done we're ready to go.

Assumptions

I have made a bunch of assumptions about the tables and fields that can be used. Firstly, the fields you write the resulting address data to are all strings. The fields you write address data to must also be long enough to hold the resulting address data. Rather than just blindly truncate, which could mess up subsequent analysis you may want to do, the program throws an error if any target field is too small. In testing I've found 50 characters is sufficient to hold any result. The final assumption is that the table being read from and written to must have a primary key. The primary key doesn't need to be specified, but it must exist to allow the SQL UPDATE calls to work.

Suggested Usage

Rather than just fire off the service on a mass of data, I've found it safer to do this in a series of steps. Before going through these steps, lets look at the complete set of URL parameters and what they do:

URL Parameter	Sample Value	Explanation of Parameter
maxrecs	10	The maximum records to update eg 1000. At time of writing the free Google service has a limit of 2500 calls per day, or 100000 calls for business accounts. Defaults to 2500.
mindelayms	500	The minimum delay in milliseconds between each API call to the Google service. Defaults to 500, meaing 500 milliseconds
log	active	Controls logging to screen or trace file. Omit the parameter entirely for no logging, use log=active to see details on screen when URL finishes, and use log=hana to write to the HANA trace file only (as an information level trace record).
simulate	active	Controls whether table UPDATEs are done. Omit the parameter entirely to update records, use simulate=active to not do any update.
schema	TESTSCHEMA	Source schema name (required)
table	testtable01	Source table name (required)
fldlat	LATITUDE	Source field holding latitude (required)
fldlon	LONGITUDE	Source field holding longitude (required)
fldcty	COUNTRY	Name of field in source table that will receive the Country address information (optional)
fldad1	REGION	Name of field in source table that will receive the admin level 1 information, like a region (optional)
fldad2	SUBREGION	Name of field in source table that will receive the admin level 2 information, like a sub-region (optional)
fldad3	SUBSUBREG	Name of field in source table that will receive the admin level 3 information, like a sub-sub-region or a city (optional)
fldpost	POSTCODE	Name of field in source table that will receive the post code or zip code information (optional)
fldblank	COUNTRY	Name of field in source table that is used to identify records you want to write to, this is to prevent the same records being written to over and over again. If a record contains a value of NULL in this field, then this service will attempt to write to all target fields. If this field is filled with something not NULL, the record will not be selected. This field is required.
fldstat	STATUS	Name of field in source table this will receive the status of the geocode API call (optional). A successful geocode result will produce a value of "OK" in this field. If you go over your daily limit this field will start holding "OVER_QUERY_LIMIT". The complete list is available in the Google documentation.

Ok, so now let's work through the steps to safely reverse geocode your data.

Step 1

First run the service on a small sample of your data (parameter maxrecs=10), in simulate mode (parameter simulate=active) with logging to the screen (parameter log=active), to make sure the parameters in the URL are ok. The processing rate defaults to being limited to 2 external API calls per second, which means a 500 millisecond delay between API calls, but you can change this with the parameter mindelayms. The XS service sleeps until the requisite interval has elapsed. Since we have logging on, we get a detailed view of what is going to happen:

Step 2

Next run the service in live mode (omit the simulate parameter) with logging to screen for 10 records. If this completes ok without errors, then we're good to go for a larger run.

Step 3

Finally run the service in live mode without logging (omit the log parameter) for the full 2500 records, or more.

Here are some sample URLs. All URLs will start as usual:

http://<server>:80<instance>/<package path>/

The following URL will simulate the update of 10 records to table "GEODATA"."testtable01", with 400ms delay between calls, logging to screen, and storing

result of geocode API call in the field STATUS. The field to select on is COUNTRY (ie search for records with COUNTRY=NULL) and the fields to write to are ZIP and COUNTRY:

geodataEnrich.xsjs?schema=GEODATAENRICH&table=testtable01&maxrecs=10&mindelayms=400&log=
active&simulate=active&fldblank=COUNTRY&fldstat=STATUS&fldpost=ZIP&fldcty=COUNTRY

The following URL will do a live update of 2000 records, with 100ms delay between calls, with no logging. The field to select on is COUNTRY and the fields to write to are POSTCODE, REGION and SUBREGION:

geodataEnrich.xsjs?schema=GEODATAENRICH&table=testtable01&maxrecs=2000&mindelayms=100&
fldblank=COUNTRY&fldpost=POSTCODE&fldad1=REGION&fldad2=SUBREGION

Google API Limitations

At the time of writing Google place some restrictions on their service. Unless you have a business license, you need to limit to 2500 calls per day. Plus you are not supposed to make them too frequently (although they don't specify precisely how frequently is allowed). They also state that the results should be destined for use on a Google map. Read the API documentation to get current limitations, which may well have changed since this blog was written.

Security

As I mentioned earlier, I wanted to give some consideration to security. This utility contains UPDATE SQL calls and receives parameters from a URL (when you run the service) and parameters from an external 3rd party service (when you receive geocode results). These introduce some risks and to mitigate, firstly you should run the service logged in as a user with rights to update the tables you want, but ideally not every table. Secondly, when you're running the service make sure the URL you write is correct by testing in simulate mode and checking the logs written to the screen. These logs show the SQL statements (or rather, templates for the SQL statements) that will be executed. These risks are well inside your control, but one aspect that is not in your control is the results of the geocode calls. What would happen if some nefarious 3rd party service returned address results that were designed to cause trouble when used in an SQL call? I am reminded here about the xkcd cartoon about a kid called Bobby Drop Tables:

The joke is of course about SQL injection, and in a similar way to the cartoon, you could imagine a deliberately evil address being written in a similar style to the kid's name and causing trouble. If we were to blindly write "UPDATE TABLE SET FIELD = <whatever we got from the 3rd party service>" we are at risk. This risk goes away by parameterising the SQL statement like this "UPDATE TABLE SET FIELD = ?" and then replacing the '?' by using XS API calls to insert properly escaped string values to the SQL. This has been done throughout the code for all the values that are retrieved from 3rd party services. So we're protected against that risk well.

Improvements

There are a few obvious improvements that could be made. More address fields could added (e.g. locality, neighbourhood), there are many listed in the Google API documentation. A UI might be nice, instead of dealing with crafting long URLs. At present only the Google Geocode API is supported, but it is very easy to extend (just add a new pair of files like geocodeApiGoogle.xsjslib and geocodeApiGoogleDest.xshttpdest that return similar objects with the address data). Also, only reverse geocoding is supported as that is all I required, but it would be moderately easy to support forward geocoding as it is just "going the opposite way". The code does not pass JSLint (I found JSLint pretty harsh), but it has been pretty well tested. There are plenty of performance improvements that could be made (caching geocode API results and re-using them if subsequent points are very close) and it would also be possible to remove the current restriction that the table must have a primary key.

Well that's it, the source code is there for anyone to use for any purpose, the only restriction is that I'm not held liable if it does not work as intended . Let me know if anyone finds a use for this in a project.

↧

How to access SAP River using Python

February 3, 2014, 1:29 am

Latest and popular articles on SAP ERP

≫ Next: Deciphering the New Generation Database Market Landscape

≪ Previous: Reverse Geocode your HANA Data with this XS JavaScript Utility

Inspired by the excellent blog from John Appleby introducing Top 10 tips for SAP River developers, we've added three new video tutorials to the SAP HANA Academy that show how to combine Python and SAP River.

Python is great for prototyping where, for example, you'd like to verify specific functionality of your SAP River application - but without first having to write a proper UI. You can also use Python to quickly and easily load test data.

In the videos I cover the following:

Getting Started: How to install and configure Python - including the HTTP Requests package & a very simple example of how to read data from SAP River using OData
Data and Actions: How to perform OData requests from Python - CRUD (create, read, update, delete) and call SAP River actions
Bulk Loading: How to bulk load data from a CSV file into SAP River

Each tutorial includes downloadable code snippets of the examples presented to help you get up and running quickly.

Happy viewing!

For more SAP River tutorials visit the SAP HANA Academy

↧

Deciphering the New Generation Database Market Landscape

February 3, 2014, 8:36 am

Latest and popular articles on SAP ERP

≫ Next: SAP Lumira: Getting Started

≪ Previous: How to access SAP River using Python

SAP and SAPinsider cordially invite you to attend the third session in our Understanding the SAP HANA Difference Webinar Series. Watch this informative Webinar featuring Gigaom Research Analyst William McKnight to catch a glimpse of the future of database platforms and discover where SAP HANA fits in today's market landscape.

Date: Tuesday February 20, 2014

Time: 11:00 am EST

Speaker: William McKnight, Analyst, Gigaom Research

For the third Webinar in our series, “Deciphering the New Generation Database Market Landscape,” featured speaker William McKnight, from Gigaom Research, will discuss:

The brief history of the data platform
The current operational and analytical divide
Storage alternatives and the advent of in-memory systems
Scale-out architectures
The information architecture to pursue
Platform categories of the practical future
Platform support
The future of the database market landscape

The five-part SAP HANA Difference Webinar series is devoted to demonstrating the differentiating technical features of SAP HANA and highlighting what makes SAP HANA the pre-eminent in-memory database management system for conducting real-time business.

SAP HANA– with its unique ability to perform advanced, real-time analytics while simultaneously handling real-time transaction workloads – is both the present and future of in-memory database management systems. Explore the many features and capabilities of SAP HANA today and discover what makes this innovative solution different.

Register Today!

↧

SAP Lumira: Getting Started

February 4, 2014, 7:38 am

Latest and popular articles on SAP ERP

≫ Next: SAP River: Requesting sap-river.com Sandbox

≪ Previous: Deciphering the New Generation Database Market Landscape

Want To:

Download a free version of SAP Lumira?

Store your created content and easily share it with colleagues?

Quickly access the data sets your co-workers are using and view the visualizations they are creating?

Set up a SAP Lumira Cloud account?

View other tutorials on SAP Lumira at the SAP HANA Academy.

SAP HANA Academy - over 500 free tutorial technical videos on using SAP HANA.

-Tom Flanagan

SAP HANA Academy

Follow @saphanaacademy

↧

SAP River: Requesting sap-river.com Sandbox

February 4, 2014, 4:02 pm

Latest and popular articles on SAP ERP

≫ Next: Code snippet to quickly view SQLscript trace variables

≪ Previous: SAP Lumira: Getting Started

Want to:

Familiarize yourself with River Definition Language using a free 7-day trial of a SAP HANA landscape Sandbox?

Simply focus on coding in RDL?

Quickly access the SAP River developer guide and tutorials?

Use Explorer to examine and work with existing data and generate new data to work with?

See the OData Calls that you can use to call data from any UI or mobile application?

Easily use a URL to pick out a specific piece of data?

View other tutorials on SAP River at the SAP HANA Academy.

SAP HANA Academy - over 500 free tutorial technical videos on using SAP HANA.

-Tom Flanagan

SAP HANA Academy

Follow @saphanaacademy

↧

Code snippet to quickly view SQLscript trace variables

February 4, 2014, 6:52 pm

Latest and popular articles on SAP ERP

≫ Next: HANA and HA/DR

≪ Previous: SAP River: Requesting sap-river.com Sandbox

A useful way to debug SQLscript code is to add TRACE statements. When you add TRACE statements to SQLscript code the trace results are stored in temp tables. The temp table names can be retrieved from table SYS.SQLSCRIPT_TRACE.

If like me you use this alot to debug then you may find it tedious to select from the SYS.SQLSCRIPT_TRACE table each time to get the temp table for the table variable output and then select from this temp table.

I meant to put this up a while back but I created a quick and dirty procedure to display the results of all the trace statements quickly. You may find it useful when analyzing table variable data.

CREATE PROCEDURE TRACE_RESULTS_P(IN result_type NVARCHAR(10), IN filter_criteria NVARCHAR(200))
LANGUAGE SQLSCRIPT AS  CURSOR c_trace FOR select statement_id, variable_name, table_name, trace_timestamp  from SYS.SQLSCRIPT_TRACE order by trace_timestamp desc;  sql_string NVARCHAR(20000) := '';  trace_count INT := 0;
BEGIN  if result_type = 'count'  THEN  FOR trace_row AS c_trace  DO  IF :trace_count > 0  THEN  sql_string := :sql_string || ' UNION ';  END IF;  sql_string := :sql_string|| ' select '''||trace_row.variable_name||''' as trace_name, '''||trace_row.trace_timestamp||  ''' as timestamp, count(*) as count from '||trace_row.table_name||' '||filter_criteria;  trace_count := trace_count+1;  END FOR;  EXECUTE IMMEDIATE (:sql_string);  ELSEIF result_type = 'all'  THEN  FOR trace_row AS c_trace  DO  sql_string := 'select '''||trace_row.variable_name|| ''' as trace_name, * from ';  sql_string := :sql_string || trace_row.table_name||' '||filter_criteria;  EXECUTE IMMEDIATE (:sql_string);  END FOR;  END IF;
END;

As you can see it takes two params:

- result_type ~ specifies whether to just get the count of each table variable or to display the contents of the table variables

- filter_criteria ~ specifies some filter criteria to be applied to the query on the table variables (useful if looking for results on one document number for example)

It loops through the temp tables and uses dynamic sql to execute the selects on the tables.

To demonstrate I created a small procedure with a couple of trace statements:

CREATE PROCEDURE TRACE_TEST_P(OUT t_vbap ecc_reporting.vbap)
LANGUAGE SQLSCRIPT AS
BEGIN  t_vbak = select vbeln from ecc_reporting.vbak where vbeln IN ('0030059722','0030136383');  t_vbak = trace(:t_vbak);  t_vbap = select vbap.*  from ecc_reporting.vbap as vbap, :t_vbak as vbak  where vbap.vbeln = vbak.vbeln;  t_vbap = trace(:t_vbap);
END;

After calling the procedure there are two rows in the SYS.SQLSCRIPT_TRACE for the two table variables which were traced:

So now by calling my TRACE_RESULTS_P procedure I can quickly display the data in the temp tables:

call TRACE_RESULTS_P('all','');

TRACE_RESULTS_P

With filter criteria:

call TRACE_RESULTS_P('all',' where vbeln = ''0030059722''');

Getting the count:

call TRACE_RESULTS_P('count','');

Count with filter criteria:

call TRACE_RESULTS_P('count',' where vbeln = ''0030059722''');

The above can be very handy if you have trace statements on multiple table variables.

↧

HANA and HA/DR

February 5, 2014, 6:44 pm

Latest and popular articles on SAP ERP

≫ Next: Move HANA data and log files to different mount point

≪ Previous: Code snippet to quickly view SQLscript trace variables

Recently, another large database vendor took the opportunity to rework and republish their view of SAP HANA's HA/DR capabilities. Amazing that another vendor is writing customer evaluation materials for one of their competitors, but we will lay that discussion aside. While the publish date is February 2014, the "whitepaper" is just a rehash of what was written in 2013, including many of the inaccuracies and false claims. Instead of responding to the inaccuracies, let me point you to a great blog SAP wrote last year that lays out the truth about HANA's HA/DR (and backup/restore) capabilities.

The Blog is titled: The Truth about SAP HANA and HA/DR, and you can find that blog post below. The information is very topical, helpful, and most of all - accurate, unlike some other publications....

http://www.saphana.com/community/blogs/blog/2013/06/27/the-truth-about-sap-hana-and-hadr

↧

Move HANA data and log files to different mount point

February 5, 2014, 8:01 pm

Latest and popular articles on SAP ERP

≫ Next: Learning form SAP HANA Boot Camp

≪ Previous: HANA and HA/DR

I decided to write this because I ran into a space problem while trying to upgrade a HANA test system. Of course, it isn’t a certified appliance, but I had to move the data and log files to another mount point to free up enough space to complete the upgrade.

Moving data files to a different mount point is a common procedure for many databases, but typically isn’t an issue with HANA’s appliance model. HANA has a very specific hardware configuration with the persistent mount point being 4 times RAM. The introduction of the Tailored Datacenter Integration (TDI) may warrant this process a little more often. I’ve outlined the steps below to move the $(DIR_GLOBAL)/hdb/data and $(DIR_GLOBAL)/hdb/log directories to a different location.

MAKE BACKUP OF DATABASE BEFORE DOING ANYTHING.

Make new mount point
Logon to appliance as root or sudo user
- Create new directory where data and log files will be stored
  - Eg. mkdir /hanadata
- Mount the new drive, NFS mount or LUN at /hanadata. This is typically done in /etc/fstab, but I won’t go into details here
Create data and log subdirectories under new mount point
- cd /hanadata
  - mkdir data
  - mkdir log
- - Change ownership of new data dir to match ownership of /usr/sap/<SID>
- chown <sid>adm:sapgrp /<newmount> -R
  - eg. chown anaadm:sapsys /hanadata/ -R
Change location of persistence in Hana studio – HANA must be running
- Browse to global.ini>persistence>
  - Update basepath_datavolumes to point to /<newmount>/data
    - Update basepath_logvolumes to point to /<newmount>/log
Stop HANA
- su - <sid>adm
- HDB stop
- Verify HANA services are stopped
Copy files to new location
- I suggest to copy, instead of move, in case there are any problems. The original files can be removed/deleted after verifying HANA is up and running using the new file location.
- cp /usr/sap/<SID>/global/hdb/data/* /hanadata/data -rp
- cp /usr/sap/<SID>/global/hdb/log/* /hanadata/log –rp
  - The r is recursive to include all sub-directories
  - The p is preserve mode, to keep all permissions
Start DB

Verify HANA starts and check nameserver trace file to verify HANA is using correct data and log files:

Troubleshooting

I ran into the issue below when I tried to start HANA the first time after copying the files. This occurred
because I didn’t use the –p switch in my copy command, so the files took the root permissions. I am including to show what will happen if <sid>adm doesn’t have write permissions to the data or log files.

- I easily fixed this issue by again running the below command
  - chown <sid>adm:sapgrp /<newmount> -R

↧

Learning form SAP HANA Boot Camp

February 6, 2014, 6:45 am

Latest and popular articles on SAP ERP

≫ Next: SAP River: Looking at Entities, Associations, Actions, and Views

≪ Previous: Move HANA data and log files to different mount point

HANA BOOT CAMP

It was a great days for me and I got to participate in HANA Boot Camp for SAP startup focus

It is great experience with HANA use cases and building the capabilities in HANA. As my understanding i am sharing to us.

SAP HANA is the platform that provides real-time analytics and applications which helps customers to manage and analyze massive volumes of data instantly and cost effectively, while reducing IT complexity.

SAP HANA helps customers:

Find valuable and actionable information from their mass amounts of data
Accelerate business processes with rapid analysis and reporting
Invent new business models and processes
Reduce Total cost of ownership (TCO) with less hardware and maintenance

At the end of this course, you should be able explain how SAP HANA:

Supports analytical applications that work with big data
Delivers real-time decision support in complex data environments
Supplies effective predictive capabilities
Benefits from advances in hardware and software technology that provide revolutionary processing speeds while reducing administrative costs

The SAP HANA Appliance

SAP's HANA is a combined hardware and software solution that includes server technology, from one of multiple hardware partners paired with SAP HANA appliance software. HANA stands for High performance Analytics Appliance. The in-memory technology employed with this appliance is the next wave of technology innovation which serves up processing power and speeds that allow customers to take massive quantities of real-time data and analyze it on the spot.

What is In-Memory Computing?

In-memory computing is an innovative technology that improves business processes by combining rapid database transactions with real-time data analytics. This database system relies on main memory for data storage versus a disk storage mechanism which makes for faster access to the database and all associated information. The SAP HANA platform leverages in-memory computing to speed up data-intensive process, reduce complexity and accelerate the value delivered from IT.

Our World - SAP HANA

The Your World application is based in SAP HANA, and demonstrates real-time queries against a mid-sized data set (70 million rows). The geographic area displayed on the map provides the first criteria for aggregation - the population totals shown at the bottom of the browser. You can layer on additional criteria in an ad-hoc fashion, for example, the number or percent of people in the displayed area that hold college degrees. All aggregations are done on the fly, in real-time.

These are all topics are covered.

Technical Overview of SAP HANA
Provision HANA servers from Amazon
Install client software
Configure the HANA Studio development environment
Business Story and Data Model
SAP HANA Modeling
Modeling in SAP HANA
Calculation Views and Procedures in SAP HANA
Configure Application Development Environment
Application Services (ODATA)
Application Services (XSJS)

SAP HANA Technical Overview

Key SAP HANA Innovations
HANA Building Blocks
Moving to HANA

Leveraging and Extending Software + Hardware Innovation

What Makes HANA the Platform of Choice and how is it working HANA Data Base?

OLAP
XS App Server
Unstructured (Text)
Scripting Engine
Predictive Analysis Library
Business Function Library

What Makes HANA the Platform of Choice?

OData Services

HTTP
RESTful services
JSON / XML

MDX

ODBO
MS Excel
3rd Party OLAP Tools

ESP

Near Real Time
SAP & Non-SAP

Data Services

Batch Transfer
SAP & Non-SAP
Extensive Transformations
Structured & Unstructured
Hadoop Integration

SQL

ODBC / JDBC
3rd Party Apps
3rd Party Tools

BICS

BICS
NetWeaver BW
SAP BOBJ

SAP HANA ACID complaint data base and enterprise data base it will carry massive data storage

SAP HANA is a relation data base like Oracle, DB2, and SQL....

What’s the Diff? Column Store vs. Row Store?
SAP HANA Design Goals
Hardware innovation leads to software innovation
Multi-engine data platform: beyond SQL
Simplified System Architecture
Support for up to very large data-sets
Failure recovery and High Availability

The Power of In-memory

SAP HANA Building Blocks

HANA Deployment Options for Different Use Cases

HANA is at the heart of SAP’s vision and strategy

Configuring the development environment

Switch to the SAP HANA Development Perspective in the SAP HANA Studio

Create a repository by creating a repository workspace (on your local machine) and pointing to you HANA system.

Create a package for your application under the Content part of the SAP HANA server

Begin Team development of your application in the Project Explorer Tab

Open SAP HANA, Developer Edition, in your Amazon Web Services Account

Install the SAP HANA Client and SAP HANA Studio Software on your Laptop

Open SAP HANA Studio and Connect to SAP HANA on AWS

How to create the Attribute, Analytical, Calculation views

Create a SQL Script Procedure

How to create a SQL Script procedure for calculating the sales prices based on the discounts for different product categories. Also, use an input parameter to filter the data by product type.

ODATA SERVICE:

Creating a simple OData Service:

To create an O Data service from an existing HANA table or view, you need to only define an XSODATA service definition file.

Calling the O data service from UI:

Creating a XSJS Service:

The XSODATA services are a very fast way to build OData services for existing database artefacts. However, sometimes you need more control over the processing logic of the service handler or a custom output format. For this reason, we can also write custom service handlers using server side JavaScript.

i will share some more documents soon.....

Thanks to KGS.Saravanan, Suneet and Chinmaya

Hope it will help for beginners.

↧