Quantcast
Channel: SAP HANA and In-Memory Computing
Viewing all 927 articles
Browse latest View live

A REGI lock file ate my HANA Studio

$
0
0

I have wanted to post a blog in the SAP HANA space for some time now, but I struggled initially to identify the right opportunity.  The desire is to write something helpful and, until recently, I wasn't sure I possessed the right level of knowledge to contribute in a meaningful fashion.  Fortunately my Analytics background finally paid off, and Lumira Server gave me the shot I was looking for...

 

You may have recently heard about Lumira Server becoming Generally Available and I would have been remiss had I not taken the opportunity to install it immediately on my HANA SPS07 instance.  I had previously installed the Customer Validation (CUV) build at release level 1.14 and the Lumira Server user guide states explicitly that you must uninstall a previous version of the delivery unit before importing the latest version. Personally I think this is a bit superfluous as HANA gives you the ability to overwrite or reactivate existing delivery units, but I've enough experience with SAP product to follow the directions and not argue.

 

The instructions go something like this:

 

On your SAP HANA system, navigate to the hdbclient directory where regi program is located

 

  1. On your SAP HANA system, navigate to the hdbclient directory where regi program is located.
  2. At the command line, enter the following: export REGI_USER= <USERNAME>.

    Replace <USERNAME> with your SAP HANA system administrator name.

  3. Enter the following: export REGI_PASSWD=<PASSWORD>.

    Replace <PASSWORD> with the SAP HANA system administrator password.


  4. Enter the following: export REGI_HOST=<HOST>:3<INSTANCE>15.

    Replace <HOST> with the server name and <INSTANCE> with XS Engine port number. For example, if you are using instance 27 on myserver enter: export REGI_HOST= myserver:32715.

  5. Enter the following:

 

  • To undeploy 1.14 or newer, enter: regi undeploy LUMIRA_SERVER --vendor=sap.com
  • To undeploy versions earlier than 1.14 enter: regi undeploy HANA_EXP_WEB sap.com

 

The delivery unit is undeployed and you can install the latest support package.

 

I ran this on Windows though so the commands were a bit different.  Specifically, I ran:

 

set REGI_USER=SYSTEM

set REGI_PASSWD=PASSWORD

set REGI_HOST=host:30015

 

instead of export as you would on a Unix based system.

 

Once this completed successfully I was able to install version 1.15 of Lumira Server and be on my way ... or so I thought.  This process coincided with me updating my HANA server to Revision 72, along with Studio and the corresponding client components.  However, I started noticing a constant issue shortly thereafter where activation of new code caused HANA Studio to hang.  Exiting the application had the same effect...


As a relative newcomer to troubleshooting HANA Studio issues, I assumed it was related to Rev 72.  I even tweeted the following (incriminating me in the process):

 

 

Note the nefarious "Build Workspace" message that never disappeared.

 

I scoured the internet for explanations, even altering some .ini files in the hope of rectifying the situation (note: appending -clean to my eclipse.ini file had no impact on the integrity of my install).

 

 

I was completely hobbled in my ability to activate any new code, and since I was in the midst of consuming the excellent SAP River Tutorials it was a constant struggle for a few days.  In my utter frustration I nearly broke down and uninstalled studio altogether, but I'd done enough customization (i.e. changing perspectives, adding the River plugin, etc.) that I hoped to avoid it if possible.

 

 

I decided to use the excellent Process Monitor (ex-SysInternals now owned by Microsoft) to examine my problem a bit deeper.  It paid off in spades and I can't say how pleased I was to have this familiar tool, often used to troubleshoot BI issues, solve a HANA Studio problem just as effectively.  Process Monitor identified a .lock file in the C:\eclipse\HANA\LumiraServer\._SYS_REGI_settings directory that HANA Studio couldn't delete.  I had tried launching Studio as Administrator, as well as the default, so I'm not sure what prevented me from deleting the file, but manually removing the lock file restored full functionality to my HANA Studio.

 

Here was the culprit:

 

REGI_LOCK.jpg

I hope this post helps you if you run into a similar problem, and I couldn't be happier to find such utility out of Process Monitor.  At the end of the day, I spent some time living with and then troubleshooting an issue that left me with a much improved understanding for how HANA Studio works, and the satisfaction of solving a problem without uninstalling (and losing) the considerable amount of work I'd already performed.


HANA Monitoring Handy SQL's

$
0
0

Monitoring Memory Usage

Used Memory

The total amount of memory in use by SAP HANA is referred to as its Used Memory. This is the most precise indicator of the amount of memory that the SAP HANA database uses at any time

 

When used:  To understand the current used memory in HANA when HANA alerts shows usage greater than licensed memory. Understanding memory usage by components will help in troubleshooting and perform necessary memory clean up actions.


Display the current size of the Used Memory; you can use the following SQL statement

 

SELECT

        ROUND(SUM(TOTAL_MEMORY_USED_SIZE/1024/1024/1024),

        2) AS"Used Memory GB"

FROM SYS.M_SERVICE_MEMORY;

 

Display current used memory for Column Store Tables

 

SELECT

        ROUND(SUM(MEMORY_SIZE_IN_TOTAL)/1024/1024) AS"Column Tables MB Used"

FROM M_CS_TABLES;

Display current memory used breakdown by Schema

 

SELECT

        SCHEMA_NAME AS"Schema",

        ROUND(SUM(MEMORY_SIZE_IN_TOTAL) /1024/1024) AS"MB Used"

FROM M_CS_TABLES

GROUPBY SCHEMA_NAME

ORDERBY"MB Used"DESC;

Display memory usage by components

 

SELECT

        host,

        component,

        sum(used_memory_size) used_mem_size

FROMPUBLIC.M_SERVICE_COMPONENT_MEMORY

groupby host,

        component

ORDERBYsum(used_memory_size) desc;


Database resident

Resident memory is the physical memory actually in operational use by a process.

SELECTSUM(PHYSICAL_MEMORY_SIZE/1024/1024/1024) "Database Resident"FROM M_SERVICE_MEMORY;

 

Find the total resident on each node and physical memory size

 

SELECT

        HOST,

        ROUND(USED_PHYSICAL_MEMORY/1024/1024/1024,

        2) AS"Resident GB",

        ROUND((USED_PHYSICAL_MEMORY + FREE_PHYSICAL_MEMORY)/1024/1024/1024,

        2) AS"Physical Memory GB"

FROMPUBLIC.M_HOST_RESOURCE_UTILIZATION;

 

Find total Resident

 

SELECT

        T1.HOST,

        (T1.USED_PHYSICAL_MEMORY + T2.SHARED_MEMORY_ALLOCATED_SIZE)/1024/1024/1024 "Total Resident"

FROM M_HOST_RESOURCE_UTILIZATION AS T1 JOIN (SELECT

        M_SERVICE_MEMORY.HOST,

        SUM(M_SERVICE_MEMORY.SHARED_MEMORY_ALLOCATED_SIZE) AS SHARED_MEMORY_ALLOCATED_SIZE

       FROM SYS.M_SERVICE_MEMORY

       GROUPBY M_SERVICE_MEMORY.HOST) AS T2 ON T2.HOST = T1.HOST;

 

Maximum peak used memory

SAP HANA database tracks the highest-ever value of Used Memory reached since the database was started. In fact, this is probably the single most significant memory indicator that you should monitor as an overall indicator of the total amount of memory required to operate the SAP HANA database over a long period of time.

 

SELECT

        ROUND(SUM("M")/1024/1024/1024,

       2) as"Max Peak Used Memory GB"

FROM (SELECT

        SUM(CODE_SIZE+SHARED_MEMORY_ALLOCATED_SIZE) AS"M"

       FROM SYS.M_SERVICE_MEMORY

       UNIONSELECT

        SUM(INCLUSIVE_PEAK_ALLOCATION_SIZE) AS"M"

       FROM M_HEAP_MEMORY

       WHERE DEPTH = 0);

Peak used memory

SAP HANA maintains a special Used Memory indicator, called the Peak Used Memory. This is useful to keep track of the peak value (the maximum, or “high water mark”) of Used Memory over time. Here is how to read the Peak Used Memory:

 

SELECT

        ROUND(SUM("M")/1024/1024/1024,

       2) as"Peak Used Memory GB"

FROM (SELECT

        SUM(CODE_SIZE+SHARED_MEMORY_ALLOCATED_SIZE) AS"M"

       FROM SYS.M_SERVICE_MEMORY

       UNIONSELECT

        SUM(INCLUSIVE_PEAK_ALLOCATION_SIZE) AS"M"

       FROM M_HEAP_MEMORY_RESET  

       WHERE DEPTH = 0);


Memory usage in server

 

free –g | awk '/Mem:/ {print "Physical Memory: " $2 " GB."} /cache:/ {print "Resident: " $3 " GB."}'


Memory Cleanup: Forcing Garbage collector from Server

Login to Hana server -> open HDBAdmin.sh and navigate to Services -> Console

Select the node where the garbage collection to be triggered. Execute the below command

 

mm gc –f

The garbage collector will be triggered, and free up the memory. This will not unload the tables.

 

Resetting Monitoring Views

When Used: when testing a report or need to monitor the peak of memory usage by a SQL, monitor IO, memory objects throughput and statistics about garbage collection jobs. The below will allow to reset these statistics.

Memory allocator statistics

M_HEAP_MEMORY view contains information about memory consumption of various components in the system.

ALTERSYSTEMRESET MONITORING VIEW SYS.M_HEAP_MEMORY_RESET;

 

M_CONTEXT_MEMORY view contains information about memory consumption grouped by connections and/or users.

 

ALTERSYSTEMRESET MONITORING VIEW SYS.M_CONTEXT_MEMORY_RESET;

 

File access statistics

M_VOLUME_IO_STATISTICS_RESET view shows information about basic I/O operations on I/O subsystems (that is, paths).

 

ALTERSYSTEMRESET MONITORING VIEW SYS.M_VOLUME_IO_STATISTICS_RESET;

Memory object statistics

M_MEMORY_OBJECTS_RESET view provides information about the number and size of resources currently in the resource container and about the throughput of the resource container.

 

ALTERSYSTEMRESET MONITORING VIEW SYS.M_MEMORY_OBJECTS_RESET;

 

Garbage collection/history manager statistics

M_GARBAGE_COLLECTION_STATISTICS_RESET view shows various statistics about garbage collection jobs.

 

ALTERSYSTEMRESET MONITORING VIEW SYS.M_GARBAGE_COLLECTION_STATISTICS_RESET;


Schema/Tables Monitoring

Find Tables loaded into memory & delta records

When used: To see what tables are loaded to memory at any given time; If a report is running slow see if the table is loaded to memory though the tables goes on lazy loading it is a best practice to have the table loaded to memory.

 

SELECT

        LOADED,

       TABLE_NAME,

        RECORD_COUNT,

        RAW_RECORD_COUNT_IN_DELTA ,

        MEMORY_SIZE_IN_TOTAL,

        MEMORY_SIZE_IN_MAIN,

        MEMORY_SIZE_IN_DELTA

from M_CS_TABLES

where schema_name = 'SCHEMA'

orderby RAW_RECORD_COUNT_IN_DELTA Desc

 

To drill down further and see what columns is not loaded /loaded please use below

Select top 100 LOADED,

HOST,

TABLE_NAME,

COLUMN_NAME,

MEMORY_SIZE_IN_TOTAL

fromPUBLIC.M_CS_COLUMNS

WHERE SCHEMA_NAME = 'SCHEMA'

AND LOADED <> 'TRUE'

MERGE DELTA

See if there is delta to be merged. RAW_RECORD_COUNT_IN_DELTA will provide the delta count.

SELECT

        LOADED,

       TABLE_NAME,

        RECORD_COUNT,

        RAW_RECORD_COUNT_IN_DELTA ,

        MEMORY_SIZE_IN_TOTAL,

        MEMORY_SIZE_IN_MAIN,

        MEMORY_SIZE_IN_DELTA

from M_CS_TABLES

where schema_name = 'SCHEMA'

orderby RAW_RECORD_COUNT_IN_DELTA Desc

 

Forcing delta Merge

UPDATE SCHEMA.COLUMN_STATISTICS MERGE DELTA INDEX;

Smart merge

UPDATE<table_name> MERGE DELTA INDEXWITH PARAMETERS ('SMART_MERGE'='ON')

Find Auto Merge On

select TABLE_NAME, AUTO_MERGE_ON from SYS.TABLES


Find Compression

When used: To see the uncompressed size and the compression ratio in HANA for the loaded tables.

 

SELECT top 100 "SCHEMA_NAME",

sum("DISTINCT_COUNT") RECORD_COUNT,

sum("MEMORY_SIZE_IN_TOTAL") COMPRESSED_SIZE,

sum("UNCOMPRESSED_SIZE") UNCOMPRESSED_SIZE,

(sum("UNCOMPRESSED_SIZE")/sum("MEMORY_SIZE_IN_TOTAL")) as COMPRESSION_RATIO,

100*(sum("UNCOMPRESSED_SIZE")/sum("MEMORY_SIZE_IN_TOTAL")) as COMPRESSION_PERCENTAGE

FROM"SYS"."M_CS_ALL_COLUMNS"

GROUPBY"SCHEMA_NAME"

havingsum("UNCOMPRESSED_SIZE") >0

ORDERBY UNCOMPRESSED_SIZE DESC;

To go on a detail level and identify what type of compression is applied on each column and the ratio please use below

select

        COLUMN_NAME,

        LOADED,

        COMPRESSION_TYPE,

        MEMORY_SIZE_IN_TOTAL,

        UNCOMPRESSED_SIZE,

        COMPRESSION_RATIO_IN_PERCENTAGE as COMPRESSION_FACTOR

from M_CS_COLUMNS

where schema_name = 'SCHEMA'

Forcing compression on a table

update SCHEMA.COLUMN_STATISTICS  with parameters ('OPTIMIZE_COMPRESSION' = 'TRUE');

Find which node is active

to find which node your session is connected to

SELECT

        HOST,

        PORT,

        CONNECTION_ID

FROM M_CONNECTIONS

WHERE OWN = 'TRUE';

Expensive Statements

Ensure the expensive statement trace is ON

When used: To troubleshoot a report failure or a sql failure and understand why it failed. Also to monitor the expensive sqls executed in HANA. Identify the ways for performance optimization.

Find expensive statements for errors

SELECT

       "HOST",

        "PORT",

        "CONNECTION_ID",

        "TRANSACTION_ID",

        "STATEMENT_ID",

        "DB_USER",

        "APP_USER",

        "START_TIME",

        "DURATION_MICROSEC",

        "OBJECT_NAME",

        "OPERATION",

        "RECORDS",

        "STATEMENT_STRING",

        "PARAMETERS",

        "ERROR_CODE",

        "ERROR_TEXT",

        "LOCK_WAIT_COUNT",

        "LOCK_WAIT_DURATION",

        "ALLOC_MEM_SIZE_ROWSTORE",

        "ALLOC_MEM_SIZE_COLSTORE",

        "MEMORY_SIZE",

        "REUSED_MEMORY_SIZE",

        "CPU_TIME"

FROM  "PUBLIC"."M_EXPENSIVE_STATEMENTS"

WHERE ERROR_CODE > 0

ORDERBY START_TIME DESC;

Finding expensive statements executed by User

SELECT

       "HOST",

        "PORT",

        "CONNECTION_ID",

        "TRANSACTION_ID",

        "STATEMENT_ID",

        "DB_USER",

        "APP_USER",

        "START_TIME",

        "DURATION_MICROSEC",

        "OBJECT_NAME",

        "OPERATION",

        "RECORDS",

        "STATEMENT_STRING",

        "PARAMETERS",

        "ERROR_CODE",

        "ERROR_TEXT",

        "LOCK_WAIT_COUNT",

        "LOCK_WAIT_DURATION",

        "ALLOC_MEM_SIZE_ROWSTORE",

        "ALLOC_MEM_SIZE_COLSTORE",

        "MEMORY_SIZE",

        "REUSED_MEMORY_SIZE",

        "CPU_TIME"

FROM  "PUBLIC"."M_EXPENSIVE_STATEMENTS"

WHERE STATEMENT_STRING LIKE'%NAIRV%'

 

CONNECTIONS

Find running connections

SELECT"HOST", "PORT", "CONNECTION_ID", "TRANSACTION_ID", "START_TIME", "IDLE_TIME", "CONNECTION_STATUS", "CLIENT_HOST", "CLIENT_IP", "CLIENT_PID", "USER_NAME", "CONNECTION_TYPE", "OWN", "IS_HISTORY_SAVED", "MEMORY_SIZE_PER_CONNECTION", "AUTO_COMMIT", "LAST_ACTION", "CURRENT_STATEMENT_ID", "CURRENT_OPERATOR_NAME", "FETCHED_RECORD_COUNT", "AFFECTED_RECORD_COUNT", "SENT_MESSAGE_SIZE", "SENT_MESSAGE_COUNT", "RECEIVED_MESSAGE_SIZE", "RECEIVED_MESSAGE_COUNT", "CREATOR_THREAD_ID", "CREATED_BY", "IS_ENCRYPTED", "END_TIME", "PARENT_CONNECTION_ID", "CLIENT_DISTRIBUTION_MODE", "LOGICAL_CONNECTION_ID", "CURRENT_SCHEMA_NAME", "CURRENT_THREAD_ID"

FROM"PUBLIC"."M_CONNECTIONS"

WHERE  CONNECTION_STATUS = 'RUNNING'

ORDERBY"START_TIME"DESC

 

Resetting Connections

Find the connection

SELECT CONNECTION_ID, IDLE_TIME

FROM M_CONNECTIONS

WHERE CONNECTION_STATUS = 'IDLE'AND CONNECTION_TYPE = 'Remote'

  ORDERBY IDLE_TIME DESC

Disconnect Session

ALTERSYSTEMDISCONNECT SESSION '203927';

ALTERSYSTEM CANCEL SESSION '237048';

Find owners of objects

 

SELECT * FROM"PUBLIC"."OWNERSHIP"WHERESCHEMA='SCHEMA'


Find Granted Privileges for Users

SELECT * FROMPUBLIC.GRANTED_PRIVILEGES

WHERE GRANTEE_TYPE = 'USER'AND GRANTOR = 'NAIRV'

PASSWORD Policy

Disable password policy on a user, this is used when you don’t want the policy to be applied on a user. This will set to lifetime.

ALTERUSER USER DISABLE PASSWORD LIFETIME

Audit Policy

Configure

Enable global auditing

altersystemalter configuration ('global.ini',

       'SYSTEM')

set ('auditingconfiguration',

       'global_auditing_state' ) = 'true'with reconfigure;

Set the auditing file type

altersystemalter configuration ('global.ini','SYSTEM')

set ('auditingconfiguration'

,'default_audit_trail_type' ) = 'CSVTEXTFILE'

with reconfigure;

aduit target path

altersystemalter configuration ('global.ini','SYSTEM')

set ('auditingconfiguration'

,'default_audit_trail_path' ) = 'path'

with reconfigure;

Find the policy implemented

Select * frompublic.audit_policies;

 

To enable/ disable global auditing

-- change the configuration for setting the audit

altersystemalter configuration ('global.ini',

       'SYSTEM')

set ('auditingconfiguration',

       'global_auditing_state' ) = 'true'with reconfigure;

 

Add audit policy

CREATEAUDIT POLICY Audit_EDW_DM_DROPTABLE_H00 AUDITING SUCCESSFUL DROPTABLELEVEL CRITICAL;

Policy enable/disable

ALTERAUDIT POLICY Audit_EDW_DM_DROPTABLE_H00 ENABLE;

Thoughts on The Future of Hadoop in Enterprise Environments

$
0
0

enterprise hadoop

Between answering questions this week at BI2014 in Orlando and preparing to ask some at Hadoop Summit in Amsterdam next week, here are some thoughts on where we are with Hadoop for the enterprise.

 

The elephant in the room. It’s now clear that Hadoop will play a key part of your future enterprise architecture. But how will it integrate with existing infrastructures and other new technologies?

 

Growing pains. It’s teenagers vs parents. Teenagers are justifiably proud of disrupting the way their parents did things. But they have a nasty tendency to assume that their parents are idiots. As they get older, they’re forced to deal with more of the boring necessities of life — and realize their parents knew a thing or two after all.

 

As Hadoop matures beyond pixel-only business models (social websites, online video games) and inexpensive data storage it is being forced to embrace — and learn a few lessons from — the enterprise systems that drive real-world business.

 

Old dogs are learning new tricks. Even without Hadoop, enterprise systems are poised for big disruptive changes. New in-memory systems eliminate the need for separate operational and analytic environments, combining transaction integrity with breakthrough performance. The data is stored once, and every transaction is available for analysis the instant it is written. The increased simplicity and agility of the system means it’s both faster and — despite the higher costs of memory — cheaper to run than traditional architectures.

 

Simplicity and agility. Both Hadoop and in-memory deliver enterprise architects the agility that has been sorely missing in the past. Hadoop doesn’t require upfront definition of data schemas. In-memory systems offer fast analysis without complex caching and aggregate tables.

 

Changes to support the business can be made in metadata, with less need to physically move data around. Updates can be made faster and more iterative. The future is uncertain—above all, you need an architecture than can change fluidly with new business opportunities.

 

Not just new tech in old ways. Information (not your transactional systems) is the new foundation of your future architecture. The data you are storing now you will still have in 20 years even through your applications will long since have changed.

 

This means that it’s about more than using new technology in old ways, such as replacing part of enterprise data warehouses with more flexible Hadoop “data lakes” or adding faster in-memory data marts. It has to be about supporting the business needs of the future

 

Enterprises are looking for an “innovation platform” with real-time analysis tightly connected to operations in order to power more personalized customer experiences and flexible business models.

 

Today’s complex choices. Companies want to take advantage of the cost advantages of Hadoop systems, but they realize that Hadoop doesn’t yet do everything they need (for example, Gartner surveys show a steady decline in the proportion of CIOs that believe that NoSQL will replace existing data warehousing rather than augmenting it – now just 3%). And companies see the performance advantages of in-memory processing, but aren’t sure how it can make a difference to their business.

 

The new technologies confound easy classification and the boundaries continue to blur. The elephant is starting to wear a tie, with projects to introduce transaction integrity to Hadoop. Enterprise systems are providing support for things like in-database MapReduce and text analysis.

 

There’s no easy one-size-fits-all answer today. Different “innovation applications” require different tradeoffs, based on the types of data, the ratio of analysis to action, and the need for speed. Here are some examples of how Hadoop can fit in with enterprise architectures today.

 

It’s not just about technology. Organizations shouldn’t work on architecture without also thinking hard about how their business models may look in the future. And the success of Hadoop in the enterprise space depends as much on ecosystems of enterprise-savvy vendors and partners as it does on technology.

 

The future seamless experience. Companies are looking for the “new best practice” of how to put together an end-to-end, enterprise-strength information architecture.

 

Vendors are racing to support that architecture vision with a combination of new and tried-and-true technologies. The goal is to hide complexity by automating as much as possible the handoffs between the different data systems to provide a coherent system rather than companies having to duct tape everything together themselves. This will require enterprise vendors to embrace Hadoop as if it were their own technology — which may require culture changes for some.

 

Vendors are also working on packaged “next generation” applications that combine operations, analytics, enterprise-ready mobile interfaces, links to third-party data, and integration with industry networks.

 

Trust no one. The technology continues to evolve at a rapid rate. There has been lots of enterprise experimentation with Hadoop but few organizations have yet declared Hadoop a first-tier element their enterprise architecture. In-memory processing systems are also in their infancy, just now making the transition from fast analytics to all-around business platforms.

 

Best practice is being discovered as we speak — and may change fast in the light of new technology changes. Don’t trust anybody that claims that they can tell you the right way to do things unless they’ve first spent a lot of time understanding your business.

 

Good luck, and join us in Amsterdam for more discussion! (SAP/Hadoop presentation by John Schitka at 15h10 on Wednesday April 2nd)

 

Other Links

HADOOP installation made easy

$
0
0

Using SAP HANA we can connect to HADOOP using Smart Data Access where the first thing which we need to do is the HADOOP installation.

 

This blog talks about the HADOOP installation.

It takes at the max 2 hours for the installation if you are lucky

 

Please follow the below steps:

 

Step-1:


   1. Download a stable release ending with tar.gz (hadoop-1.2.1.tar.gz)

   2. In Linux, create a new folder “/home/hadoop”

   3. Move the downloaded file to the folder “/home/hadoop” using Winscp or Filezilla.

   4. In putty type: cd /home/hadoop

   5. Type: tar xvf hadoop-1.2.1.tar.gz

 

Step-2:


Downloading and setting up java:

 

   1.Check if Java is present

Type: java –version

 

   2. If java is not present, please install it by following the below steps

   3. Make a directory where we can install Java (/usr/local/java)

   4. Download 64-bit Linux Java JDK and JRE ending with tar.gz from the below link:

 

http://oracle.com/technetwork/java/javase/downloads/index.html

 

   5. Copy the downloaded files to the created folder

   6. Extract and install java:

 

Type: cd /usr/local/java

Type: tar xvzf jdk.*.tar.gz

Type: tar xvzf jre.*.tar.gz

 

   7. Include all the variables for path and Home directories in the /etc/profile at the end of file

 

JAVA_HOME=/usr/local/java/jdk1.7.0_40

PATH=$PATH:$JAVA_HOME/bin

JRE_HOME=/usr/local/java/jre1.7.0_40

PATH=$PATH:$JRE_HOME/bin

HADOOP_INSTALL=/home/hadoop/hadoop-1.2.1

PATH=$PATH:$ HADOOP_INSTALL /bin

Export JAVA_HOME

Export JRE_HOME

Export HADOOP_INSTALL

 

   8. Run the below commands so that Linux can understand where Java is installed:


sudo update-alternatives --install "/usr/bin/java" "java" "/usr/local/java/jre1.7.0_40/bin/java" 1

sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/local/java/jdk1.7.0_40/bin/javac" 1

sudo update-alternatives --install "/usr/bin/javaws" "javaws" "/usr/local/java/jre1.7.0_40/bin/javaws" 1

sudo update-alternatives –set java /usr/local/java/ jre1.7.0_40/bin/java

sudo update-alternatives –set javac /usr/local/java/jdk1.7.0_40/bin/javac

sudo update-alternatives –set javaws /usr/local/java/jre1.7.0_40/bin/javaws

 

   9. Test Java by typing Java –version

  10. Check if JAVA_HOME is set by typing: echo $JAVA_HOME

 

Now we are done with the installation of Hadoop (Stand alone mode).

 

Step-3:


We can check if we are successful by running an example.

Go to Hadoop Installation directory

Type: mkdir output

Type: bin/hadoop jar hadoop-examples-*.jar grep input output ‘dfs[a-z.]+’

Type: ls output/*

The output is displayed with the success.

 

Step-4:


As a next step, change the configuration in the below files:

 

    1. In the Hadoop installation folder change /conf/core-site.xml file to:

<configuration>

          <property>

                    <name>fs.default.name</name>

                    <value>hdfs://localhost:9000</value>

          </property>

</configuration>

 

    2. Change /conf/hdfs-site.xml:

<configuration>

           <property>

                     <name>dfs.replication</name>

                     <value>1</value>

           </property>

</configuration>

 

    3. Change /conf/mapred-site.xml:

<configuration>

           <property>

                     <name>mapred.job.tracker</name>

                     <value>localhost:9001</value>

           </property>

</configuration>

 

    4. Edit /conf/hadoop-env.sh file:

 

export JAVA_HOME=/usr/local/java/ jdk1.7.0_40

 

Step-5:

 

   1. Setup password less ssh by running the below commands:

 

Type: ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

Type: cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

 

   2. To check if the ssh password is disabled

 

Type: ssh localhost  (It should not ask any password)

 

   3. Format the name node:

 

Type: /bin/hadoop namenode –format

 

Step-6:

 

To start all the Hadoop services:

 

Type: /bin/start-all.sh

 

Now try the same example which we tried earlier:

 

Type: bin/hadoop jar hadoop-examples-*.jar grep input output ‘dfs[a-z.]+’

 

It should give the output.

 

To stop all the Hadoop services:

 

Type: /bin/stop-all.sh


Now the installation of HADOOP is successful.

Gain insights, share experiences: SAP HANA Operation Expert Summit

$
0
0

As an IT expert working with SAP HANA, we invite you to be part of our inner-circle at an exclusive event: the SAP HANA Operation Expert Summit on May 8-9, 2014 in Walldorf, Germany.

 

SAPHANAOperationExpertSummit.jpg

Don’t expect your standard summit with speakers and coffee breaks – it’s an interactive occasion that welcomes your full participation. Panel discussions and breakout sessions offer unique opportunities to share your experience and ideas with us. We want to hear what you think of SAP HANA operation.

 

  • What are your pain points or challenges?
  • What advice, tips or tricks do you have for other users?
  • What features would you like to see in the future?

 

In addition, speed-networking sessions with SAP experts from the SAP HANA development organization will offer advice on how to best operate SAP HANA - from planning and building, all the way to running. Giving you knowledge and insights you can start using immediately.

 

Space is limited so register now online. Get the full program agenda and see what you can look forward to.

 

You know another SAP HANA operation specialist in your company? Don’t hesitate to forward this message.

 

Let’s keep making progress happen! We look forward to welcoming you in Walldorf on May 8-9.

 

Warm regards,

Kathrin

 

p.s. Please join us on the evening of May 8 for a casual meet-up and dinner. A great opportunity to meet new contacts and expand your professional network.

Fail-Safe Operation of SAP HANA®: SUSE Extends Its High-Availability Solution

$
0
0

“SAP customers invest in SAP HANA” is the conclusion reached by a recent market study carried out by Pierre Audoin Consultants (PAC). In Germany alone, half of companies expect SAP HANA to become the dominant database platform in the SAP environment. In many cases, the “SAP Business Suite® powered by SAP HANA” scenario is already being discussed in concrete terms.

Naturally, SUSE is also accommodating this development by providing SUSE Linux Enterprise Server for SAP Applications – the recommended and supported operating system for SAP HANA. In close collaboration with SAP and hardware partners, therefore, SUSE will provide two resource agents for customers to ensure the high availability of SAP HANA system replications.SAPHana-SR.png

Two Replication Scenarios

The current initial phase of the project includes the architecture and development of scale-up scenarios, which will be tested together with SAP in the coming weeks. System replication will help to replicate the database data from one computer to another computer in order to compensate for database failures (single-box replication). This is to be followed by a second project phase involving an extension for scale-out scenarios (multibox replication). With this mode of operation, internal SAP HANA high-availability (HA) mechanisms and the resource agent must work together or be coordinated with each other.

SUSE implements these scenarios with the SAPHana resource agent (RA), which performs the actual check of the SAP HANA database instances and is configured as a master/slave resource. In a scale-up scenario, the master assumes responsibility for the SAP HANA databases running in primary mode, and the slave is responsible for instances that are operated in synchronous (secondary) status.

To make configuring the cluster as simple as possible, SUSE also developed its SAPHanaTopology resource agent. This runs on all nodes of an SLE 11 HAE cluster and gathers information about the statuses and configurations of SAP HANA system replications. It was designed as a normal (stateless) clone.

 

Customers Receive Complete Package

With both the SAPHana and SAPHanaTopology resource agents, customers will therefore be able to integrate SAP HANA system replications in their cluster. This has the advantage of enabling companies to use not only their business-critical SAP systems but also their SAP HANA databases without interruption while noticeably reducing their budgets. SUSE provides the extended solution together with best practices documentation.

SAP and hardware partners who do not have their own SAP HANA high-availability solution will also benefit from this new SUSE Linux development.

Starting with SAP HANA on IBM System eX5

$
0
0

Recently I have got a chance to work with SAP HANA on IBM System Scale out system. In general a pleasant experience and great performance. 

I found this document very helpful

 

IBM Techdocs Technote: IBM Systems eX5 Solution for SAP HANA appliance

 

http://www-03.ibm.com/support/techdocs/atsmastr.nsf/5cb5ed706d254a8186256c71006d2e0a/1b52cc5a84185dc786257ca500213d27/$FILE/IBM%20-%20SAP%20HANA%20Implementation%20Guide-1.6.60-7.pdf

 

It explains how to

- how to setup a global_allocation limit so Hana no longer stops after Allocation error (account 16GB for GPFS)

- upgrade appliance to SLES SP2 and SP3 (SP2 is EOL)

- upgrade kernel (you should be at least at version 3.0.101-0.5.1*)

- upgrade GPFS to latest (you should be at least at version 3.5.0-16)

- how to update linux kernel Emulex driver in SLES SP2 (version 4.2.389 is required, you can check with ethtool -u ethX)

- disable CPU C states SAP note 1824819

- enable performance CPU governor

- explains why server restarts itself and what you can do about it (IBM RETAIN tip: H207295)

- perform a system check optimized for Hana

- enable memory patrol scrub so in case there is a memory error server doesn't restart but information is passed to Hana which can handle it.

- configure your network gear

 

Even it guides you through the process of installation of Hana appliance from scratch, but you will need IBM DVD 00AK794 SAP HANA FRU Pkg v. 1.6.60-7.

 

As a bonus there is a section on how to run Hana virtualized on VMware, which parts you need to upgrade HW to a larger T-Shirt, and a lot of information about disaster recovery.

 

With a new initiative of tailored data center integration this information is a must read.

HANA Daily Monitoring Template

$
0
0

The below blog intend to help HANA admin to have a sample template that they can use for monitoring there HANA environment on pro-active basis.


you all are invited to add you comments if you feel we should include some more steps in the template . Its intention is monitoring not resolution , each issue reported can be resolved separately .You can use this template also as pointer for what all should be monitor if you are monitoring you SAP HANA landscape via SAP Solution Manager.



1. Check first if all the services are running fine:-


pic1.png

 

2.Run Unique checker (you can schedule it in you crontab also ,so as to get updates automatically in your mail box.

 

  This program helps you to find duplicate entries in tables . Reach out to SAP to get the program or refer to         https://help.sap.com/hana/SAP_HANA_Administration_Guide_en.pdf  if you do not have it.

 

 

3.Check for CRASH dumps if Any :-

 

check it on admin console -->Performance -->to see the dumps (OOM dumps as well) give the serach text as "dump"

 

if you find any crash dump -->analyze if its because of any query -->notify the query owner to optimize it in case if its causing dumps.


4. check SLT - if any table has error status



pic2.png


No error so all is good.

 

5. Check LTR as well :

 

pic3.png

 

Also check the traditional T-codes ST22 and SM21 , it should not have any critical dumps .




6. clean up the garbage memory:-

 

frequency could be everyday or once in 3 days you can decide after seeing the pattern :

 

execute mm gc -f


It triggers the garbage collector and without unloading the tables it free up memory .

 

Remark - to execute mm gc -f  you need to log in HANA server --> HDBAdmin.sh-->Services-->console -->select the node --> execute the command.

 

 

7. Validate Backup - Successful backup taken on **/**/** . Next Back Up on **/**/**.

 

Analyze if the backup failed and take action accordingly .

 

Hope this template helps you to keep you HANA environment healthy and running . Happy Monitoring .

 

 

Please add any step you feel should be part of daily monitoring task .


BW on HANA - Post Migration ACTIVITY

$
0
0

This blog is continuation of my previous work BW on HANA - Pre Migration ACTIVITY .

 

Many of us , who have been involved in OS/DB migration project [including migration of BW system from XXX (non hana db) to YYY(non hana db)] would be

 

already aware of the post migration activities . So the 1st question , for first timers (BW on HANA) , that comes to your mind is - "how different is this migration

 

from the one that i have been doing".In terms of post migration activity , yes you would be doing most of the activities that you were doing in your

 

previous  project. In addition to that , you will have to take extra care of :-

 

1. converting the standard Info Cubes and Data stores Objects.

 

2. No JAVA support right now on HANA DB ,so configure you JAVA stack ( only if you had dual stack BW before migration)

 

3. Be aware that now you do not Need BWA or other such accelerators .

 

4.Validate the activities at HANA studio and in BW (DBA cockpit).

 

5. Delta handling

 

other than this (you are invited to add points , if i had missed something ) , at planning level ,i feel you will do the same activity that you have done earlier .

 

so , we can  follow the below task list :-

 

 

TASKAction
put check on the batch jobsRun BTCTRNS1
Activate partner profiles.check WE20 /WE21 if you maintain partner profiles and Inbound/outbound RFCs
convert  logical system Run BDLS (converts one logical system to another,which is not present in ur current machine after system copy) on BW and Source systems
restore Source System in BW restore it with standards method
Reset Prog. for ODS and DTPrun RSSGPCLA
Consistency Check

Run Report RSUPGRCHECK  to checks the consistency of generated DDIC objects for BI metadata like

InfoObjects, InfoCubes and transfer rules. it helps to  determine whether the DDIC tables required for a BI

meta object are active.

check correctness of tables run RSDU_TABLE_CONSISTENCY to check correctness of tables in row/column store
Adjust DB specific DDIC entries

Execute RS_BW_POST_MIGRATION choose HANA specific options :-

 

*chose the variant "SAP&POSTMGRHDB"

 

 

pic1.png

 

 

once done with it , validate that you see these in HANA studio and DBA cockpit -

 

a. Calculation view for all DSOs and Infocubes( for existing ones).

 

**When you activate - CALC views get created for the newly created DSO and InfoCube.

Optimization

Migrating BW on HANA does not chnages DSOs/InfoCubes  into HANA optimized.

 

Only infocube of type standard can be converted to SAP HANA -optimized infocube.

 

use - RSMIGRHANADB or RSDRI_CONVERT_CUBE_TO_INMEMORY .

 

After migration standard infocubes can be found in the SAP HANA database column store..

 

Similar for DSOs.

pic2.png

 

 

during conversion be reminded to choose :-

 

a. choose without change log - if you no longer need the data saved in the change log. After migration , no rollback or a delta upgrade old request would be possible.

 

b Reconstruct change log - it has more run time  but if the data has been archieved from the DSO only request that affect the archieve area can be restored .

 

Remarks :-

 

i.Standard DSOs are converted to SAP HANA Optimized DSOs only if the DSO is not a part of hybrid provider or a semantically -partitioned objects(SPO).

ii.If You are doing delta handling then you should choose keep the change log.

JAVA Stack (** dual stack system moving to HANADB)

a. goto config tool --> UME and manage it .

 

b. cross check the beckend parameters like - BACKEND_CLIENT,BACKEND_SID,J2EE_ADMIN,J2EE_SID etc

 

c.go to  NWA--> Select function BW-Java Configuration

Delta Handling

Load the latest delta queues to BW -->activate the Info Providers--> Stop all the process chains-->  Migrate BW

 

on HANA --> convert the DSO's and Cubes to In memory(if not done  above)--> Restart the Process chains.

SAP Notes to refer SAP note - 1659383 ; 1695778

 

 

do help me to make this checklist better , please give you inputs .

 

Happy migration

Overview: Security Considerations for BI End Users Reporting on SAP HANA

$
0
0

In a discussion related to the How-To Guide with Role Templates for SAP HANA, the question was raised which privileges are needed for BI users to consume data from SAP HANA. Since this is a good question and I'm not really aware of a complete answer in one place, my answer got longer and longer and finally cried to be converted to a blog post. So here we are.

 

In this post I start with the privileges needed to read data from a SAP HANA data model. I then mention SAP solutions that assist you in the process of creating and sometimes also managing these authorizations. Those people who are not using the described mechanisms will need to manage authorizations on their own, and I hint at the best mechanism to define them in HANA. After authorization, I also briefly dive into authentication and user provisioning.

 

Required Privileges

Technically speaking, if an end-user wants to consume the content of a given activated data model in SAP HANA, they need to send an SQL query to the database with a database user that has two privileges: the object privilege "SELECT" for the activated view; and an Analytic Privilege for the activated view. This information is contained in section 5.8 of the guide. Analytic Privileges allow you to define row-level restrictions on activated data models (user <x> may only see entries for cost_center = 100).

 

A few additional privileges are needed, such as SELECT on the _SYS_BI schema (for tools to e.g. generate a list of available data models) and SELECT on tables for currency or unit of measure conversion (if that is used in the data models).

 

SAP Solutions that Help You Generate Authorizations

What the guide does not tell you is how to define Analytic Privileges, and how to manage the object and analytic privileges if you have a large number of end users and/or a large number of data models.

 

In some situations, SAP applications offer assistance for these tasks:

 

  • BW 7.30 powered by HANA: you can generate SAP HANA content from the BW system (i.e. SAP HANA data models representing InfoCubes or DSOs) using a wizard in SAP HANA studio. I believe it is described in the SAP HANA Developer Guide. This wizard also allows you to generate Analytic Privileges for users in SAP HANA reflecting the BW Analysis Authorizations of that user in the NetWeaver BW system
  • BW 7.40 powered by HANA: the BW system contains methods to generate SAP HANA native representations of the following ABAP entities: database user corresponding to ABAP application user; data models in SAP HANA representing InfoCubes and DSOs; SAP HANA roles for generated database users; Analytic Privileges representing the BW Analysis Authorizations. Since BW manages these HANA objects, updates to the BW objects and authorizations are immediately reflected in the SAP HANA system. If anyone wants to say that this mechanism is great, I'm sure the BW team will be happy to hear it . It is documented here on scn at https://scn.sap.com/docs/DOC-52790.
  • SAP HANA Live: these content models for the SAP Business Suite come with a wizard that can be installed as an add-on in SAP HANA studio. Similarly to the BW 7.30 situation, this wizard helps you generate SAP HANA authorizations reflecting the original authorizations of the Business Suite application user.

 

Manually Managing Attribute Value Restrictions

If you do not use any of these setups, you are in the world of "create your custom data models in SAP HANA", and you will also have to define your own analytic privileges for attribute value restrictions.

 

If you have a large number of end-users (in my eyes, everything above a dozen), you absolutely want to manage these using so-called dynamic analytic privileges. These things basically allow you to define a lookup table which maps restriction values to database users; and to define a stored procedure that can look up the restriction value for the session user; and to use this stored procedure within the privilege to define the restriction value.

 

The mechanism is described in the SAP HANA Developer Guide http://help.sap.com/hana/SAP_HANA_Developer_Guide_en.pdf, section 11.4.4.

 

Authentication

So far for authorization: in scenarios with multiple end-users who have individual authorization on the data content, you need named users in the database with privileges as described above (or you need to model authorizations in the BI tools - which is only possible if the BI tool of choice offers a mechanism for that purpose, and it is only viable if all tools you are using can make use of the same authorization objects - in most cases, it is better to go with authorizations defined in SAP HANA).

 

For authentication, you probably want to make use of SSO integration. This topic has been described in many places, e.g. on SCN in these documents for Kerberos: http://scn.sap.com/docs/DOC-36305 and a blog post by Frank Bannert; and an SAP knowlege base article for SAML authentication. The absolutely best reference for implementing Kerberos authentication for SAP HANA is the how-to guide attached to SAP Note 1837331.

 

If you are not using SSO mechanisms, it's name-password authentication. In BI tools using a universe connection, you can make use of BI's credential mapping for some sort of SSO-like functionality. Other tools like SAP Business Objects Edition for Microsoft Office or SAP Lumira Desktop will simply ask you for a database user and password.

 

User Provisioning

Finally, there is the question of user provisioning - typically from a central user repository such as AD. SAP's "IDM tools" NetWeaver Identity Management and GRC Access Control contain such functionality in their current releases, I'm not aware of 3rd party IDM solutions which support SAP HANA. Solutions which offer a generic database connector should easily be extendable, since the process of creating and managing database users is quite trivial. If you have a home-grown IDM solution, you will usually also be able to extend it to "know" SAP HANA, since the ODBC and JDBC drivers for HANA are available and documented.

 

Hope you find this post helpful,

Richard

[CRICKET PREVIEW] Top Batsmen

$
0
0

The 2014 Indian Premier Twenty20 season will begin soon in Abu Dhabi. With top players shuffling sides during an eventful auction, the Emirates should feature some exciting first matches before the tournament moves to the subcontinent.

 

The previous two seasons were packed with intense action and by using the blazing speed of SAP HANA we can quickly visualize historical data in SAP Lumira. First, let’s plot the top 10 individual runs scorers from the previous two seasons:


Screen Shot 2014-04-08 at 9.13.33 AM.png


Jamaican Chris Gayle has been an absolute force for Bangalore over the previous two seasons scoring 1,528 runs. The West Indian national shattered numerous records in a historic innings last season against Pune by scoring 175 not out in just 66 balls. Gayle smashed 17 sixes and tacked on 13 fours. Gayle scored the highest amount of runs ever in a Twenty20 innings and propelled his Banagalore side to 263 runs, a Indian Premier record. Check out some graphics on Gayle’s incredible innings.

 

TimeLine of Chris Gayle’s batting on April 23rd 2013.


Screen Shot 2014-04-08 at 5.09.23 PM.png

 

 

Chris Gayle’s batting shot type. Over half of his runs were drives.

Screen Shot 2014-04-08 at 10.43.20 AM.png

Tag charts of key aspects of Chris Gayle’s batting during his historic innings.


Screen Shot 2014-04-08 at 1.44.18 PM.png


Gayle scored 70 of his 175 runs from front foot drives off the middle of the bat.


Screen Shot 2014-04-09 at 8.59.35 AM.png


While the all-rounder Gayle had a tremendous 2013 campaign but he was not the highest scoring batsman last season. Michael Hussey’s 733 runs throughout the tournament garnered the Australian the Orange Cap.  Check out the five highest scorers from 2013.


Screen Shot 2014-04-09 at 9.27.59 AM.png


Hussey’s Chennai side came up just short in the title match to Mumbai. However, this season Hussey will be joining Mumbai for their Championship defense. Hussey will add tremendous power to an already explosive squad lead by team Captain and highest team scorer, Rohit Sharama. See the individual breakdown of who scored during Mumbai’s 2013 Championship campaign.


Screen Shot 2014-04-08 at 11.31.25 AM.png

 

Keep your eyes open for more visualizations created in SAP Lumira using historical Indian Twenty20 data from SAP HANA in anticipation of the 2014 season. Please check back regularly during the season for more insights about the action on the pitch.


SAP HANA Academy - over 500 free hands-on tutorial videos on using SAP HANA.


-Tom Flanagan

SAP HANA Academy

Follow @saphanaacademy

Follow @thomflanagan

[Cricket Preview] Indian Batsmen

$
0
0

The Indian Premier Twenty20 Tournament showcases some of the best Indian Cricket talent on a world stage. The lightning speed of SAP HANA empowers us to swiftly build visualizations of historical T20 data in SAP Lumira. With a few simple clicks we can create a stunning visualization of the runs scored by Indian players for each of the nine teams in the 2013 season.


Screen Shot 2014-04-10 at 9.07.28 AM.png


Mumbai led the league with the most runs scored by Indian nationals. The 1,600 runs scored by Indians were instrumental in guiding the “Duniya Hila Denge Hum” (“we will rock the world”) club to their first Championship trophy. Chennai, the side Mumbia defeated in the Championship match, were just six runs shy of the top mark. Meanwhile, the Indian nationals on the New Dehli side only scored 1,193 runs throughout the tournament.

 

SAP Lumira offers multiple options for Geographic visualizations. Check out the number runs scored by Indians per team depicted over a map of India.


Screen Shot 2014-04-10 at 9.08.51 AM.png


Bangalore’s Virat Kohli lead all Indian nationals by scoring 634 runs through out the tournament including 164 runs against the New Delhi side. Check out the top 10 Indian run scorers for 2013. Each bar is split up by the opposition they scored against.


Screen Shot 2014-04-08 at 1.54.27 PM.png

 

By quickly subbing in a different dimension and changing the chart type we can visualize which team each of the top 10 individual Indian run scorers played for in 2013.


Screen Shot 2014-04-10 at 9.59.37 AM.png


Chennai, Juniper, Kolkata, and Mumbai all had dynamic duos of Indian batsmen last season. The 1,048 runs scored by Mumbai’s Rohit Sharma and Dinesh Karthik topped the 1,009 combined runs scored by Chennai’s Suresh Raina and MS Dhoni. Mumbai’s powerful pair of batsmen were yet another reason why Mumbai celebrated in front of a packed house at the Eden Garden in Kolkata to end the tournament.

 

Which Indian batsmen will make a splash on the pitch this season?

 

Keep your eyes peeled for more visualizations created in SAP Lumira using historical Indian Twenty20 data from SAP HANA in anticipation of the 2014 season. Please check back regularly during the season for more insights about the action on the pitch.


SAP HANA Academy - over 500 free hands-on tutorial videos on using SAP HANA.


-Tom Flanagan

SAP HANA Academy

Follow @saphanaacademy

Follow @thomflanagan

Florida Crystals Chose SAP HANA. Find Out Why!

$
0
0

Join us for our April 22 Webcast to learn what the organic sugar company has achieved with the adoption of SAP solutions

 

Date: Tuesday, April 22, 2014

Time: 11:00 am EST

Speaker: Gary Mitchell, VP - IT Operations, Florida Crystals

 

 

SAP cordially invites you to join us for the next session of our “SAP HANA Customer Spotlight” Webcastseries on Tuesday, April 22, 2014.

 

Join featured speakers Gary Mitchell, VP, IT operations, and Fernando Fernandez, Manager, BASIS services, both of Florida Crystals Corporation, to learn how the company adopted a strategic and conservative approach to implementing the SAP HANA platform.

Gary and Fernando will:

 

Register Today!

[Cricket Preview] 2013 Top Wicket-Takers

$
0
0

With the 2014 Indian Premier Twenty20 tournament starting in less than a week the attention of the Cricket world will soon be focused on the grounds in the UAE and India. Let’s continue to preview the 7th annual tournament by analyzing SAP Lumira visualizations of bowling data from last year's tournament. The sheer speed of SAP HANA enables us to quickly build these insight-laden graphics.

 

By simply filtering the dismissal type for no outs and run outs we can graph the top wicket-takers from last year's tournament. The 32 wickets Chennai’s Dwayne Bravo took earned the Trinidadian all-rounder the Purple Cap as the tournament's top wicket-taker. Check out the totals for the rest of the top six wicket-takers.


Screen Shot 2014-04-11 at 11.25.45 AM.png


Selecting the dismissal type dimension for legend color slices up the bars to show how each of the top wicket-takers earned their wickets. Rajasthan’s James Faulkner broke eight wickets bowling throughout the season, the highest total of the top six wicket-takers.


Screen Shot 2014-04-11 at 11.48.50 AM.png


The Australian fast bowler was tied for the league led in wickets bowled with Hyderabad’s Dale Steyn. The South African, Steyn, twice delivered balls in excess of 150 KM/H during the 2013 tournament. Check out the number of bowled wickets for each bowler last season.


Screen Shot 2014-04-11 at 1.17.20 PM.png


By converting our 2013 Top Six Wicket-Takers By Dismissal Type visualization to a pie chart and trellising it out by bowler name we quickly discover that all of the top six wicket-takers earned a majority of their wickets through catches. Kolkata’s Sunil Narine had the most diverse dismissals of the top six, inducing more leg before wickets (3) dismissals than bowled dismissals (2).


Screen Shot 2014-04-11 at 1.06.50 PM.png


Next, by slicing our top six wicket-takers by bowling type we see that each bowler took their wickets using a single bowling type. Bravo and Johnson were fast seem bowlers, Faulkner and Kumar utilized a medium seam, whilst Singh and Narine relied on off spin.


Screen Shot 2014-04-11 at 11.25.06 AM.png


To gain further insights we can switch our visualization to an area chart and swap bowling type for bowling delivery. Quickly a correlation between the effectiveness of fast and medium seam bowlers to take wickets using no movement on their ball becomes apparent. Meanwhile, the two off spin bowlers didn’t take a single wicket using a no movement ball but rather took a vast majority of their wickets with off break balls.


Screen Shot 2014-04-11 at 1.57.10 PM.png


Which bowling style do you believe is the most effective for taking wickets?

 

Keep your eyes peeled for more visualizations created in SAP Lumira using historical Indian Twenty20 data from SAP HANA in anticipation of the 2014 season. Please check back regularly during the season for more insights about the action on the pitch.


SAP HANA Academy - over 500 free hands-on tutorial videos on using SAP HANA.


-Tom Flanagan

SAP HANA Academy

Follow @saphanaacademy

Follow @thomflanagan

Why SAP HANA has not had more success than it deserves (no, I do not work for SAP)

$
0
0

I have been working with HANA and evangelizing about its use case and value with customers and prospects and yet I found it an uphill task to get engagement from the business (and in extreme cases, even IT); this despite the fact that SAP HANA is a great product and has lots of uses. So I started researching the issue and came up with some hypothesis, based on conversations and observations.

 

I theorize that IT departments went ahead and acquired SAP HANA for one or more of the following reasons:

 

1. HANA is the newest shiny toy

 

2. The IT department had left over budget and had to go buy something

 

3. The SAP or partner AE was a very good sales person and made the CIO an offer he/she could not refuse

 

4. The consulting partner convinced them SAP HANA would help solve all their problems

 

5. The IT department had to find something new and sexy to work on so the CIO and his/her staff could show they were working on the latest technology, and could put SAP HANA (aka 'Big Data') on their resume (how can you work in IT and not have 'Big Data' on your resume......).

 

Now that IT had acquired the new shiny toy, they had to go find something to go fix- create a solution and then go find a problem - we have seen this before, many times. The challenge was two fold: (1) IT could not really articulate how SAP HANA would create value for the business, and (2) because of the lack of understanding of real business issues, IT picked the wrong use case or a very weak use case.

 

Let us now turn to the product itself; SAP HANA is great, and I believe there is huge value to be derived from this product, if rightly deployed. So why has HANA not been adopted more readily. Let us look @ some of the potential reasons:

 

1. HANA is an SAP product and everyone associates HANA with the core SAP ERP and CRM; so the target customers have been mostly current SAP clients.

 

2. Many believe that you have to be an SAP shop to leverage HANA; I was talking to a friend who is trying to build a custom healthcare app; I told him he should try and leverage HANA to build his app, and his first response was that his product has nothing to do with SAP.

 

2A. Because of the belief that you have to be an SAP shop with significant SAP installations and SAP data volume to be able to use SAP HANA, many customers believe that they do not need HANA because their SAP data footprint is so small. The argument I often hear is, "We have less than 80 GB of total SAP data; why would we need HANA?".

 

3. SAP HANA has been sold as an application, whereas it is more of a fast database and a development kit. When you buy an application, you expect to install it and start getting value out of it - like SAP ERP, Workday, Ariba, etc. HANA is not an application - it is a super-fast database AND an ADK - and similar to the Apple iOS, you can use this platform to build applications which can then be used to solve business problems- so SAP HANA is an enabler and not the end app itself.

 

4. People do not remember that SAP acquired Sybase a few years back and a lot of the Sybase IP is being integrated into SAP HANA; old timers like me would remember Sybase as a very robust and stable database with a lot of capabilities (Sybase IQ came with columnar data store capability). SAP  did not throw away all these features and functionality- instead these are being made part of SAP HANA.

 

A combination of these factors has hindered the adoption of SAP HANA both among SAP shops and non SAP shops.

 

For SAP shops, SAP HANA should be a no-brainer, just for using HANA Live; the pre-delivered analytics and calculated views alone can get someone trying to build a data analytics strategy a head start with SAP data. Here, of course the value of SAP HANA is dependent on the source data being SAP (see details about HANA Live in my personal blog).

 

But there is a much bigger value for SAP HANA for non-SAP data; you can stage large data sets from the likes of Hadoop, Teradata, and other non-SAP sources into SAP HANA, using the likes of SAP SLT* or SAP Dataservices,  and use HANA to combine data from multiple sources almost real-time, and render the data using either SAP analytics tools like SAP Business Objects, or one of the many other analytics tools available in the market. (* SAP SLT does not work for all applications).

 

SAP is pushing down more and more development capability into SAP HANA and this will allow more application developers to build widgets on HANA and leverage the power and capabilities of HANA.

 

It is incumbent on SAP, SAP AEs, and the consulting firms to educate customers on the real value of SAP HANA and more important, the fact that this is not a simple plug-and-play application- this needs planning, identifying appropriate use cases, and proving the true value of SAP HANA.

 

IT departments should find a good use case, build the application powered by HANA and show the application and its robust capabilities to the business- HANA should almost be not revealed or discussed with the business (how many times do you go and discuss what database you are using with your business users?). After the use case has been proved, the capability behind it, in this case SAP HANA should be revealed- lead the discussion with the business value and then talk about the technology enabling it.

 

Until such time, SAP HANA will remain largely a toy for IT departments and CIOs trying to show they are using cutting edge technology, with minimal penetration or adoption in the business.


Installing SAP HANA SPS 7 on AWS

$
0
0

Today, I would like to show you how to make a SAP HANA SPS 7 installation within the AWS cloud.

 

You should be aware that currently, Amazon only provides SAP HANA Revision 68 as a ready to go installation image. This tutorial thoroughly explains how you can reinstall SAP HANA SPS 7 by yourself in order to replace your old version.

 

Launching an instance


First, we will install a SUSE Linux Enterprise Server 11 Service Pack 3 64-bit, sized m2.4xlarge with 68.4 GB storage space.


aws1.jpg


aws2.jpg


aws3.jpg


Choose your corresponding Availability Zone. Activating the checkbox Enable termination protection makes it impossible to accidentally delete

this instance.


aws4.jpg


Four volumes  (in my case with 68 GB storage space) must be added.


aws5.jpg


Choose the following sizes:


storage.jpg


Name your instance, e.g. “SAP HANA SPS7”.


aws6.jpg


If you formerly already had an AWS instance, you can use the same Security Group ID as before. Otherwise you can create a new one:


aws7.jpg


After reviewing your data one more time, you are ready to launch your instance.


aws8.jpg


For accessing the operating system via SSH, you can keep using your current key pairs. Should you not have one yet, then you must create one now.


aws9.jpg


aws10.jpg


Success! Your instance is now up and running!


aws11.jpg


Preparing an instance


Please log in as root user and create the following directories:

mkdir /hana

mkdir /hana/shared

mkdir /usr/sap

mkdir /hana/data

mkdir /hana/data/HDB

mkdir /hana/log

mkdir /hana/log/HDB


con1.jpg

 

With the following command, we will format our 4 volumes with the ext3 file system (proceed with "y"):

mkfs -t ext3/dev/sdb

 

Proceed this way to format the remaining volumes:

mkfs -t ext3/dev/sdc

mkfs -t ext3/dev/sdd

mkfs -t ext3/dev/sde


Next, we will edit the file  vi/etc/fstaband add the following values, so our volumes will be automatically mounted every time we restart our system. With this, we will achieve that they always will be available as soon as we log in.

 

/dev/sdb             /usr/sap             ext3       defaults              0 0

/dev/sdc             /hana/data           ext3       defaults              0 0

/dev/sdd             /hana/log            ext3       defaults              0 0

/dev/sde             /hana/shared         ext3       defaults              0 0


con6.jpg

 

With the command mount -a, all volumes displayed within fstab will automatically be mounted. Should you need to know and/or check, which volumes you have created and where they have been mounted to, you can use the df -h command for that:


con7.jpg


For the rest of the process, a Java package will be required. You can easily install it on your Linux machine using the following commands:

wget http://javadl.sun.com/webapps/download/AutoDL?BundleId=83375


mv "FILE_NAME" java-linux-x64.rpm


con8.jpg

 

Next, type in the following command in order to start unpacking your Java package:

rpm -ivh java-linux-x64.rpm


con9.jpg


It is up to you whether you would like to download and decompress the data on a Windows machine or a Linux machine. Keep in mind, though, that the decompressed data must be readily available on your Suse Linux machine.

Within the Marketplace, we download the 10 RAR files – for instance on a Windows machine – and decompress these. For this, you will need a decompression tool such as WinRAR or 7-Zip.


rar1.jpg


The decompressed files can now be transferred to our SUSE server e.g. via SFTP:


rar2.jpg


Switch to the directory *DATA_UNITS/HDB_SERVER_LINUX_X86_64 and make the data executable with the following commands:

find -name  hdbinst   -exec chmod 744 {} +
find -name  hdbsetup  -exec chmod 744 {} +

find -name  hdbuninst -exec chmod 744 {} +
find -name  sdbrun    -exec chmod 744 {} +


lin1.jpg


In order to prevent the installation from aborting due to a hardware check, we must deactivate this functionality. Please note that the code below can only be deployed if you have the scripting language Python on your machine. I will just "guess" that you already have Python on your machine:


export IDSPISPOPD="1"

python

>>> import os

>>> 'IDSPISPOPD' in os.environ.keys()

True

>>> quit()

 

lin2.jpg


Please make the file hdblcm within the subdirectory /HDB_LCM_LINUX_X86_64 executable:

chmod 744 hdblcm


lin3.jpg

 

Begin installation. Type in:

./hdblcm


lin4.jpg

 

lin5.jpg

 

Please check the generated log file for warnings and errors.

 

Now, we shall rename our instance: for this, you will need to edit the hosts file:

vi /etc/hosts

 

Next, you will need to add the following line:

127.0.0.1 imdbhdb


lin9.jpg

 

To rename our file, we must switch back to the following directory:

cd /usr/sap/HDB/SYS/global/hdb/install/bin


Rename your file:

./hdbrename -hostmap ALTERNATE=imdbhdb


lin10.jpg


Congratulations! You now have SAP HANA up and running on your computer!
Should you have any further questions or comments, I would be happy to hear from you.

Loading large CSV files to SAP HANA

$
0
0

SAP HANA makes it very easy to import a CSV file to a database. There are multiple tutorials available on HANA Academy that detail process of how to import CSV file to a table in HANA database.

 

 

Though process of importing CSV data to HANA is very simple, it can be a challenge to import a large CSV file to HANA database. Large CSV file can't be imported using HANA Studio because import process is very slow. Loading large CSV files to HANA is normally accomplished using either a control file or data is directly imported using a CSV file.

 

IMPORT FROM CONTROL FILE '/data/data.ctl';

 

where contents of /data/data.ctl file are

 

IMPORT DATA INTO TABLE "MYTABLE" FROM '/data/data.csv'

RECORD DELIMITED BY '\n'

FIELD DELIMITED BY ',';

 

or

 

IMPORT FROM CSV FILE '/data/data.csv' INTO "MYTABLE"

WITH

RECORD DELIMITED BY '\n'

FIELD DELIMITED BY ',';

 

Both of these methods are similar and all techniques described in this article apply for both methods.

 

1. Download/copy CSV file to appropriate directory on HANA Appliance. This allows use of powerful HANA appliance for processing CSV file. If you get "Cannot open CSV file" error during CSV import and your path is correct, ensure that UNIX user hdbadm has read permissions on directory. If user doesn't have read permissions, change permissions on directory where CSV and CTL file will reside.

 

2. Create sample data set from large data file and create a target table based on sample data. To create a sample data set of 1000 records, use

head -1000 data.csv > sampledatafile.csv

 

Copy sampledatafile.csv to HANA Studio machine and import sampledatafile.csv using HANA Studio. This allows easy creation of target table and column definitions can easily be adjusted. If definitions for columns are available, then edit table definition in HANA Studio. Alternatively, use ALTER TABLE commands to adjust table definitions once initial column definition has been created.

 

3. For importing large number of rows use batch clause in IMPORT command to ensure faster performance.

 

IMPORT FROM CSV FILE '/data/data.csv' INTO "MYTABLE"

WITH

RECORD DELIMITED BY '\n'

FIELD DELIMITED BY ','

BATCH 1000

 

 

4. Always include an error file in import command and check error file after import to ensure no errors are generated. Ensure that UNIX user hdbadm has write permissions available on directory where error file is being written, otherwise no error file will be generated.

 

IMPORT FROM CSV FILE '/data/data.csv' INTO "MYTABLE"

WITH

RECORD DELIMITED BY '\n'

FIELD DELIMITED BY ','

ERROR LOG /log/error.log


By default generated error file is read only. If script is executed for a 2nd time, with same error file name, file will not be refreshed. Delete log file after each execution.

 

5. If CSV load should fail if it encounters any errors, use "FAIL ON INVALID DATA". Load will fail as soon as it encounters first error.

 

IMPORT FROM CSV FILE '/data/data.csv' INTO "MYTABLE"

WITH

RECORD DELIMITED BY '\n'

FIELD DELIMITED BY ','

FAIL ON INVALID DATA

 

6. Before starting import, ensure enough memory is available on HANA appliance and set AUTOCOMMIT property to false. In HANA Studio, click on right click SQL Editor and click "Show In properties". Then set AUTOCOMMIT to false.

 

7. While importing data, ensure that column widths are set correctly for target table. If definitions for input data are not available, first import data to a staging table with large column sizes. CSV import will fail with error "inserted value is too large for column" if any column doesn't have sufficient width. During CSV import, rows with any columns with too large value will be rejected.

 

8. Manually inspect sample of CSV file and check data format. If there are any dates in CSV file, verify date column format. By default HANA expects dates to be in YYYY-MM-DD format. If dates are in any other format, use DATE FORMAT in import clause to specify a different date format.

 

IMPORT FROM CSV FILE '/data/data.csv' INTO "MYTABLE"

WITH

RECORD DELIMITED BY '\n'

FIELD DELIMITED BY ','

DATE FORMAT 'MM/DD/YYYY'

 

See this link for more details on DATE FORMAT.

 

9. Ensure that data doesn't have any special characters " included in data set. If special characters are part of data set, clean input data using sed scripts.

e.g. Description field may contain data as "Length of rod: 56" and further description". . In this case " after 56 needs to be cleaned before import function will work.

 

10. After CSV import, check table runtime properties using HANA Studio and ensure "Number of entries" looks correct. Verify size for main memory and delta memory. Perform delta merge to move data to main memory and reduce memory usage.

 

Any other suggestions are welcome.

 

Links for further reading.

 

Hadoop Summit Amsterdam 2014: Crossing The Chasm

$
0
0

hadoop summitThis year’s Hadoop Summit in Amsterdam was even more interesting than last year’s, with more sponsors (including SAP), more functionality for enterprise use cases, and more enterprise examples.

 

The conference was in the historic Beurs Van Berlage conference center. It’s a great venue for out-of-town guests situated just a few minutes’ walk from Amsterdam’s Central Station.

 

After Hortonworks president Herb Cunitz welcomed attendees, CEO Rob Bearden outlined the Big Data opportunities in a market he estimated to be worth $100Bn market by 2020.

 

hortonworks big data market growth

 

He believes that Hadoop has now crossed the chasm and is well on its way to mainstream acceptance (and 50% of that 2020 Big Data market).

 

hadoop 2 crossing the chasm

 

Mike Gualtieri principal analyst at Forrester went over the latest Forrester survey data. Data-related projects are at the forefront of the minds of today’s IT executives, with 57% of them rating it as one of their top two priorities (followed by Systems of Record and Systems of Engagement applications).

 

Organizations are particularly interested in using new technology to support rising customer expectations: surveys show that 66% of us want to be treated as individuals, and 58% want us to enjoy our dealing with companies more.

 

Gualtieri likened Hadoop to a data lake that was created by a dam, with lots of turbines at the bottom to process the data. He called Hadoop a “unique technology” and the “first general-purpose Data Operating System (DataOS)”

 

With survey results showing that 81% of large enterprises are already believers in the technology (16% are using it in production, 45% in POC, and 20% plan to use it in the future), Galtieri believes “A huge production wave is coming for Hadoop”

 

Martin Wilcox and Chris Twogood of Teradata and Timothy Mallalieu of  Microsoft all talked about the need to integrate Hadoop with traditional data technologies to get the most out of the new opportunities.

 

Hortonworks founder Arun Murthy gave an overview of the key recent advances in Hadoop 2.0, including the benefits of the YARN resource manager, the completion of the Stinger project to increase the performances of HIVE queries, and the STORM project for processing streaming data.

 

You can watch the Day 1 keynotes here

 

 

IMG_5383

 

Amsterdam dawn

 

This blog was first posted to the Business Analytics Blog -- you can subscribe or follow me on Twitter for more!

[Cricket Preview] 2013 Team Bowling

$
0
0

The 7th annual Indian Premier Twenty 20 tournament starts in less than 48 hours. We will continue to preview the 2014 season by analyzing historical data processed by the powerful speed of SAP HANA and visualized in SAP Lumira. Today we examine team bowling statistics from 2013.

 

First let’s create a simple visualization of the combined total runs each team’s bowlers allowed during the 2013 season.


Screen Shot 2014-04-14 at 9.46.54 AM.png


Even though Mumbai was victorious at the end of the tournament their bowling throughout the season was quite shaky at times. Mumbai led the league by allowing 2,832 runs. Kolkata’s 2,316 runs allowed was the best in the league but their batsmen couldn’t score enough runs to capitalize and thus the Knight Riders finished in the bottom third of the table.


One factor that caused Mumbai to led the league in runs conceded was because their bowlers allowed a ton of extra runs. Mumbai led the league with 12 no balls and their 84 wides were just shy of Rajasthan’s league worst 92 wides. Check out the rest of the extra runs each team allowed in 2013.


Screen Shot 2014-04-14 at 9.39.03 AM.png


Next, if we add bowling type as legend color to our runs conceded by team visualization we can see the speed balls were thrown when runs were allowed throughout 2013. Most teams allowed a majority of their runs from fast seam bowls. However, Rajasthan conceded a vast amount of their runs off of medium seam balls and even allowed 31 runs off of unorthodox balls.


Screen Shot 2014-04-11 at 11.48.50 AM.png


With SAP Lumira we can drill down further to see who conceded runs off of unorthodox balls. Filtering on the purple slice at the bottom of the Rajasthan Royal’s bar and subbing in bowler name for team name we discover that all 31 runs allowed from unorthodox balls last season were bowled by Brad Hogg. If we change the visualization to a donut chart and select batting shot type as a legend color we can see that drives, sweeps, and slog-sweeps were the most potent shot types against the left-handed Australian.


Screen Shot 2014-04-11 at 12.51.38 PM.png


An extremely useful feature of SAP Lumira is the ability to quickly visualize data on top of a world map. After intuitively creating a Geographic hierarchy we can quickly visualize the number of wickets taken by each team and how they earned them. Mumbai led the league with 117 wickets and like all of the teams heavily relied on catching balls to take wickets. No surprise that table cellar dwellers Delhi took a league worst 66 wickets. Kings XI Punjab’s 23 bowled wickets were a league best. Check out the map of 2013 team wickets taken by dismissal type.


Screen Shot 2014-04-13 at 1.37.07 PM.png


Will the league’s top wicket taking side win the championship again this season?

 

Keep your eyes peeled for more visualizations created in SAP Lumira using historical Indian Twenty20 data from SAP HANA in anticipation of the 2014 season. Please check back regularly during the season for more insights about the action on the pitch.


SAP HANA Academy - over 500 free hands-on tutorial videos on using SAP HANA.


-Tom Flanagan

SAP HANA Academy

Follow @saphanaacademy

Follow @thomflanagan

[Cricket Match Preview] Mumbai V Kolkata

$
0
0

The first match of the 2014 Indian Premier T20 Season will feature the previous two champions competing in front of a packed Sheikh Zayed Stadium in Abu Dhabi. The 2013 Champions have dominated their Calcuttan opposition in past tournaments, winning 10 out of the 12 previous meetings. With the supersonic speed of SAP HANA we can visualize historical data in SAP Lumira to gain insights on tonight's opening match.

 

In their first match of last season Mumbai completed an amazing comeback in the second innings to steal two points from the Knight Riders. After bowling a disastrous 26 runs in the first over of the match, Mumbai’s Harbhajan Singh redeemed himself with a decisive six in the 20th over to propel Mumbai to a 162 – 159 victory and stun the packed Eden Gardens crowd. With Lumira we can quickly create a Manhattan chart  to visualize the number of runs scored by each team per over during the match.


Screen Shot 2014-04-15 at 12.29.45 PM.png


With SAP Lumira we can quickly visualize how each team took wickets during the match. Mumbai caught 2/3rd of their six wickets while KKR earned a leg before wicket dismissal. Check out a pair of pie charts for the wickets taken during the April 24th match .


Screen Shot 2014-04-15 at 12.40.33 PM.png


When the clubs clashed again two weeks later in Wankhede Stadium the Indians trounced the Knight Riders by 65 runs. Mumbai’s batsmen were electric, scoring 8.5 runs per over, including 25 in the 20th over. Check out the Manhattan chart for the May 7th match.


Screen Shot 2014-04-15 at 12.31.01 PM.png


On May 7th 104 of Mumbai’s 170 runs were scored by knocking the ball beyond the boundary. Mumbai’s batsmen hit four sixes and 20 fours.


Screen Shot 2014-04-15 at 1.20.11 PM.png


With SAP Lumira we can dive even further into Mumbai’s batting statistics from the match by plotting the runs they scored ball by ball. After suffering a maiden first over, the Indians found their swing and cruised to an impressive total. Note that SR Tendulkar hit five consecutive fours during the 4th over.


Screen Shot 2014-04-15 at 1.32.37 PM.png


The defending champions only got stronger during the auction period acquiring the services of last year’s orange cap winner Michael Hussey. With the addition of Hussey, Kolkata may be even harder pressed to earn two points tonight, as the Australian has scored 263 runs against the Knight Riders in the past two seasons, his highest total against any team over that time frame.


Screen Shot 2014-04-15 at 1.40.30 PM.png


Hussey has faced Knight Rider bowlers five times in the past two tournaments. With SAP Lumira we can quickly create a stunning storyboard containing a radar chart of Hussy’s run production in those five matches. Hussey nearly hit for a century during one match last April.


Screen Shot 2014-04-15 at 1.47.00 PM.png


Will Mumbai’s new batsmen help put the defending champions at the top of the table to start the season?

 

Keep your eyes peeled for more visualizations created in SAP Lumira using historical Indian Twenty20 data from SAP HANA in anticipation of the 2014 season. Please check back regularly during the season for more insights about the action on the pitch.


SAP HANA Academy - over 500 free hands-on tutorial videos on using SAP HANA.


-Tom Flanagan

SAP HANA Academy

Follow @saphanaacademy

Follow @thomflanagan

Viewing all 927 articles
Browse latest View live