Data Warehousing and Data Science

27 October 2013

Can I Cluster SSAS? (How to minimize downtime during deployment)

Filed under: Analysis Services — Vincent Rainardi @ 9:37 am

Some people asked me if we can cluster SSAS. Yes we can cluster SSAS. SSAS supports clustering natively. This is since 2008 R2. Meaning that, if Server1 dies, clients will automatically access Server2. Just need to Refresh from Excel, SCA, Tableau etc and results will be returned. How does it work in the background? Same as Database Engine: 1 set of disks, 2 servers.

The real question is how do I handle huge AS cubes without any downtime when I do deployment? Because the issue is: when the new version of the cubes are being processed, they are down, i.e. users can’t access the cube. And the Full Process of the cubes can take many hours.

First, let me clarify the incorrect popular perception that “if we Process Full the SSAS database, users can’t access the cube”. That is incorrect. Provided that the structure of the cube is the same, we don’t have a new version, then Process Full on DB does not bring the cube down. Users can access the cube when the DB is being full-processed. This is because Process Full on DB is done in one transaction (read about managing XMLA transaction here). Thank you Darren.

But during deployment we have a new version, and the structure of the cube changes. So the above doesn’t apply. We will have a down time when we process the DB after deployment. So here’s the solution:

Microsoft provides a detailed guide on how to minimize downtime during deployment here. However, this requires extra hardware. Lots of extra hardware and software, which means additional investment. Essentially the process is:

  1. Deploy and full process new version of the cube to a separate SSAS instance (“Processing Server”)on a separate LUN, then detach the DB.
  2. Stop NLB traffic, detach the original DB
  3. Swap the original LUN with the new LUN.
  4. Bring the new LUN online as read only.
  5. Attach the DB as read only, re-enable NLB.
  6. Refresh the old LUN from the new LUN.
  7. Attach the old DB on the old LUN to the Processing Server.

A more economical way (no additional server required) is to deploy as Cube 2, process full, delete the old Cube and rename the cube. This way, the downtime decreases from hours to seconds.

Processing SSAS Cubes using PowerShell

Filed under: Analysis Services — Vincent Rainardi @ 12:00 am

Processing SSAS Cubes using PowerShell

There are 3 ways to process SSAS cubes using PowerShell:

  1. Import-Module “sqlps” (prompt mode) and use Invoke* commands to process SSAS cubes. This is 2012. Reincarnation of the old ASCmd.
  2. Load Microsoft.AnalysisServices library and use AMO to interrogate cube in full programming mode.
  3. Load Microsoft.AnalysisServices.XMLA and use $Server.Execute method to run an XMLA file.

When I say SSAS cubes I mean SSAS objects, i.e. dimensions, measure groups, partitions, DB.

Verbose parameter is very important (value: $true and $false). Verbose displays detailed information about the operation performed by the command. But Verbose only works when a command contains the Write-Verbose cmdlet. The Verbose parameter overrides the value of the $VerbosePreference variable for the current command. Because the default value of the $VerbosePreference variable is SilentlyContinue, verbose messages are not displayed by default. We also have $ErrorActionPreference and $WarningPreference.

The standard, sure, supported, and recommended way to process SSAS cubes is using SSIS. But the main weakness is that you have to specify each dimension one by one, then the latest partitions one by one. If you have a new partitions, you have to update the SSIS package and that’s a code change in production.

With AMO, we can iterate around all dimensions and process them all, either full process or process update, and we can loop around the Measure Groups and partitions. If you have a new dimension, or a new MG/partition, it will still work fine. With Year-Quarter-Month-Week partitioning concept, crossed with Geographical Region partitioning concept, we could end up having 50 partitions to manage. May be 300 partitions if we have customer or product partitioning. In these situations we will need AMO not only for processing this week’s partitions, but also for merging partitions every week end and month end.

In the above scenario AMO is useful, because we can programmatically interrogate the SSAS databases. A medium size SSAS implementation probably have 5 AS databases (small implementation 1-2, big one 10-20). Size of databases (commonly known as cube size) between 10 MB and 200 MB. The full-database processing time (i.e. the entire “cube” as people say it) can easily exceed 6 hours window and probably above 12 hours. What I’m getting at is: you would want to process you cube incrementally, only the last few days partitions. And therefore use Year-Quarter-Month-Week-Day partitioning scheme. And that would require AMO.

But, in addition, when you deploy cube changes into production, you will have to do full cube process (most of the time, because you make structural changes). And that is a nightmare if your full-DB processing time is 30 hours. In this cases, you will want to have 2 SSAS servers, 1 active 1 standby, encapsulated by a virtual IP. Your users hit the virtual IP, which is round-robined into the active SSAS server. At prod deployment, you deploy to the standby SSAS server (which same name as the active one), process it for 30 hours from Friday night till Monday afternoon, test it (you connect by physical IP). Then on Monday night, after all the cubes have been tested satisfactorily, you bring down the Active SSAS server and this standby server takes over.

But a lot of companies don’t have a second, identical AS server. So you would need to deploy as a different DB name & ID, then rename it. Renaming an AS DB doesn’t change the ID. Whilst the SSAS clients are connecting using DB name (Excel, Tableau, SCA), SSIS connects SSIS connects using DB ID. That is another factor why we would want to use AMO for processing SSAS cube, an advantage over SSIS.

The issue with AMO is logging. Whilst in SSIS we get Info, Warnings and Errors, using AMO we get Errors only, unless we use Trace. Traces scares me because its flakiness. The more complicated the code we write, the more flaky it becomes. We can set the Error Configuration log file to a file, but what being logged is very minimal too. This is one either using <Error Configuration> XMLA element or ErrConfig parameter when calling Object.Process in AMO.

I am a fan of executing XMLA, because that is the de facto standard for communicating to SSAS server. Even AMO and PowerShell Invoke-ASCmd are internally converted into XMLA (I think). But the issue with XMLA is, it is not programmable. You have to prepare the XMLA file to be executed. If you have a new partition, or a new dimension then you have to update the XMLA.

These days big companies uses SSAS, and they process it from Autosys, Tidal or Active Batch. Basically an Enterprise Scheduler. Mid size company who are Microsoft centric and don’t have an Enterprise Scheduler (ES) still use SQL Agent to process cubes. If you use ES, then you would want to use PowerShell to manage the processing, partitioning and security of SSAS.

At the end of a connection session (I’m talking about AMO here), we can access the error log using 2 methods: ExecuteCaptureLog or ConcatenateCaptureLog. This enables us to see the result of the processing, i.e. successful or failed. We iterate on the messages. PowerShell also gives processing error to the screen. To access this we just need to compare the $Error.count before and after processing. If the count is increased by one, it means we have a new error and $Error[0] displayed the last error which we need.

Below are some good references about PowerShell Analysis Services. These refs are specifically on Analysis Services Processing using PowerShell (ASPuP), be it via XMLA, AMO or sqlps:

  1. Darren Gosbell, Using PowerShell to execute MDX.
    Darren is one of the most knowledgable expert on ASPuP. He raised Connect on xmlaWarningCollection which sadly still open since 2007. Cache warmer using PowerShell (via get-SqlTraceEvent). His PowerSSAS, especially 2 cmdlets: InvokeAsExecute and InvokeASMDX are extremely useful for ASPuP. Process DB Full = Dim Full + Cube Full in 1 trans.
  2. Microsoft, Analysis Services PowerShell
    A must read primer. Authentication & security. If ever wonder “why it works on my local but not on that server”, read the Remote section. Covers basics of connecting from PS to AS. AS PS Reference is the bible of cmdlets including Invoke-ASCmd, Invoke-ProcessCube/Dimension/Partition (hmm how similar to PowerSSAS). These Invoke* cmdlets has TraceLevel and Verbose parameters. Scripting Admin Task contains both AS PS and ASSL (if you ever need to get the data).
  3. Microsoft, XMLA in SSAS.
    Another must read primer. Including handling error & warning and processing objects. SSAS only uses XMLA when communicating with clients. AMO, MDX etc are translated into XMLA.
  4. Daniel Calbimonte, Using Powershell to do AS Tasks
    Executing XMLA from PS (Invoke-ASCmd), etc. Part1 is here which explain about connecting.
  5. The MAK, SQLAS (Analysis Services PowerShell Provider)
    Explaining about cmdlets in SQLAS. Processing cube using PowerShell and AMO is a gold mine, contains links to the other 8 articles.
  6. Richard Lee, find unprocessed objects using PowerShell
  7. Keith Babinec, PowerShell Error Handling.
    Hugely useful to display SSAS processing error which are reported to PowerShell.
  8. Microsoft, PowerShell User’s Guide
    Also see basics of AMO programming
  9. SSAS-Info, PowerShell AMO
    Don’t forget, if you need any info about SSAS, the second place to search is always Vidas’ site. (1st one is MSDN).
  10. Sandeep iii, Processing Cubes using AMO PS.
    This is the one. Including getting the results. But it’s batched, so you don’t get status of each objects (i.e. each dim and each partition).

Blog at WordPress.com.