Data Warehousing and Data Science

29 June 2014

SQLBits 12 Telford

Filed under: Event — Vincent Rainardi @ 5:19 pm

The 12th SQLBits will be in Telford, 17th to 19th July.

On Thursday 17th July there will be 14 world class experts providing 11 one-day seminars in their specialised topics:

  1. Brent Ozar: Virtualization, SAN and Hardware for SQL Server
  2. Itzik Ben-Gan: Practical T-SQL – Efficient Solutions
  3. Jennifer Stirrup: The Data Analyst Toolkit: Microsoft Power BI & R
  4. Marco Russo: Creating a Power BI Solution in one day
  5. Brian Knight: SSIS Problem, Design, Solution
  6. Dejan Sarka: Advanced Data Modeling Topics
  7. Adam Jorgensen & John Welch: Real World Big Data – Deploy, Design and Manage Like a Pro
  8. Niko Neugebauer & Andreas Wolter: In-Memory Technologies in SQL Server 2014: CCI & XTP
  9. Allan Hirt: The A to Z of Availability Groups
  10. Dave Ballantyne & David Morrison: SQL Server Query Processing Internals
  11. Simon Sabin: SQL Server Performance Diagnosis and Query Tuning

On Friday 18th July there will be 13 DBA sessions, 10 Database Development sessions, and 12 BI sessions (plus 1 keynote speech). These 35 sessions will be conducted in 8 rooms i.e. 8 one-hour sessions running in parallel.

DBA:

  1. Denny Cherry: Using SQL Server Always On Availability Groups
  2. Brent Ozar: Watch Brent Tune Queries
  3. Klaus Aschenbrenner: Latches, Spinloc and Lock Free Data Structures
  4. Simon Sabin: Achieving PCI With SQL Server – Don’t be the next Sony
  5. Allan Hirt: Patch Management Strategies for SQL Server Deployments
  6. Ola Hallengren: Inside Ola Hallengren Maintenance Solution
  7. Hue Holleran: SAN eye for the SQL girl/guy
  8. Tim Ford: Tim’s Top Ten Templates
  9. Thomas LaRock: Cardinality Estimates in Microsoft SQL Server 2014
  10. Gail Shaw: Do’s and don’ts of database corruption
  11. Grant Fritchey: Monitoring Windows Azure SQL Server VMs and SQL Databases
  12. Gavin Payne: Designing Quality SQL Server 2014 Solutions
  13. Matan Yungman: Query Progress Tracking in SQL Server

Database Development:

  1. Dejan Sarka: Optimizing Temporal Queries
  2. Aaron Bertrand: T-SQL: Bad Habits and Best Practices
  3. Karen Lopez: Database  Design Contention Issues
  4. Benjamin Nevares: Dive into the Query Optimizer – Undocumented Insight
  5. Allan Michell: The secret to pain-free database deployments? Automation.
  6. Mladen Prajdic: Optimizing database access and diving into .Net SqlClient
  7. Christina E. Leo: Building Your T-SQL Toolkit: Window Function Fundamentals
  8. Andreas Wolter: “SQL Attack..ed” – SQL Server under attack: SQL Injection
  9. Itzik Ben-Gan: T-SQL Tips and Tricks
  10. Scott Klein: Patterns & Practices for Scaling Windows Azure SQL Database

BI:

  1. Benjamin Wright-Jones: Microsoft Analytics: the Next Wave
  2. Chris Webb: Power Query: Beyond the Basics
  3. Allan Mitchel: A Look at 3 Big Data Tools
  4. Bob Duffy: Optimizing Cube Processing
  5. Jenifer Stirrup: A One Hour Data Analyst Toolkit: using R and Power BI
  6. David Peter Hansen: SSIS Internals and Performance
  7. Marco Russo: DAX Patterns
  8. Allan Mitchell: Basket Analysis using BI Office and SSAS Tabular edition
  9. John Welch: Continuous Delivery for Data Warehouses and Marts
  10. Marco Russo: DAX Patterns
  11. Brian Knight: Performance Tuning SQL Server Integration Services
  12. Alberto Ferrari: Advanced Modelling with Analysis Services Tabular

On Saturday 19th July there will be 18 DBA sessions, 14 Database Development sessions and 16 BI sessions (plus 1 Career session), starting from 8:10 until 17:00.

DBA:

  1. David Peter Hansen: Integration Services (SSIS) for DBA
  2. Chris Adkin: Column Store Index and Batch Mode Scalability Deep Dive
  3. Richard Douglas: The Day After Tomorrow: Why You Need a Baseline
  4. Katherine Bean: Data Security in a Post Snowden World
  5. Dr Subramani Paravasivam: Advanced Reporting Technique and Managing Reports
  6. Allan Hirt: SQL Server Storage vNext: Welcome to the Future
  7. Grant Finchley: Getting Started Reading Execution Plans
  8. Rob Volk: Revenge: The SQL!
  9. Matan Yungman: Things You Can Find in Plan Cache
  10. Eduard Erwee: Hadoop: Big Data or Big Deal?
  11. Mark Broadbent: The Nicromonicon, SQL Server Book of the Dead
  12. Christian Bolton: Extending Your Data Centre to the Cloud with SQL Server 2014
  13. James Skipwith: SQL 2014 In-Memory Design Pattern #1: The Flaming Partition
  14. Brent Ozar: How to Pick SQL Server Hardware
  15. Scott Klein: Top 5 SQL Server 2014 Hybrid Features
  16. Andre Kamman: How to Write Professional PowerShell Scripts
  17. Thomas LaRock: Monitoring Databases in a Virtual Environment
  18. Neil Hambly: Effective Index Partitioning, Compression Strategy

Database Development:

  1. Benjamin Nevarez: Understanding Parameter Sniffing
  2. Itzik Ben-Gan: T-SQL Tips and Tricks
  3. Geoff Clark: T4 Templating with SSDT using SQL Server 2014
  4. Criag Ottley-Thistlethwaite: The SSDT Way or the Highway (sqlproj)
  5. Brent Ozar: How the SQL Server Engine Thinks
  6. Andrew Whettam: A Comedy of Error
  7. Denny Cherry: I’ve Got a SQL Database, Now What?
  8. Gail Shaw: Bad Plan! Sit!
  9. Klaus Aschenbrenner: The Dangerous Beauty of Bookmark Lookups
  10. Dave Ballantyne: Query Optimizer Internals: Traceflag fun
  11. Karen Lopez: Windows Azure SQL Database Design: Concepts and Trade-offs
  12. David Morrison: SQL Tips, Tricks and Misconceptions
  13. Steve Jones: Continuous Integration for Databases
  14. Bob Duffy: Migrating to the Cloud

BI:

  1. Jean-Pierre Riehl: Fasten Seatbelt and Look at the Data Steward
  2. James Rowland-Jones: Demonstrating PDW Integration with Hadoop & Polybase
  3. John Welch: Practical Unit Testing for SSIS Packages
  4. Carmel Gunn: The Irish Economic Crisis, Visualised with Power BI
  5. Mark Stacey: Is Your Data Big Enough for PDW
  6. Marco Russo: Power Query in Modern Corporate BI
  7. Peter ter Braake: Power BI, Zero to Expert
  8. Alberto Ferrari: Optimizing & Analyzing DAX Query Plan
  9. Niko Neugebauer: ETL Patterns with Clustered Columnstore Indexes
  10. Julie Koesmarno: Building Your Myth Busting Lab with Power BI
  11. Allan Mitchell: Introduction to BI Office
  12. Stephanie Locke: Intro to R
  13. Alex Whittles: MDX 101
  14. Adam Aspin: BI Tips and Tricks with SQL Server Reporting Services
  15. Gerhard Brueckl: Deep-Dive to Analysis Services Security
  16. Stacia Misner: Building a BI Performance Monitoring Solution

Career:

  1. Steve Jones: Branding Yourself for a Dream Job

Source: Retrieved from SQLBits XII Agenda on 29th June 2014 at 16:52

 

24 June 2014

SQL Server User Group Indonesia

Filed under: Event — Vincent Rainardi @ 7:44 am

I just found out today from Julie Koesmarno’s blog (link) that there is a SQL Server User Group in Indonesia. I’m from Indonesia (Surabaya) and it was a nice surprise to hear about this. They have a Yahoo Group: link, where they discussed all things SQL Server, from Configuration, SQL Programming to Data Warehousing. The moderator is Choirul Amri, link. SSUG Indonesia is listed on the SQL PASS Asia Pacific site (link), but the website it is pointing to is still not working: http://sqlserver-indo.org/.

The day after tomorrow, 26th June at 18:45 Jakarta time, Indonesian SSUG is having a gathering as Choirul Amri wrote. The agenda is:

18.15 – 19.00 Registration
19.00 – 19.45 Session 1: Azure Survival Guide for [On Premise] SQL Server DBA, by Choirul Amri, Senior Consultant – Microsoft Services
20.00 – 21.00 Session 2: Building Your Myth Busting Lab With Power BI, by Julie Koesmarno, SQL Server MVP
21.00 – 21.15 Wrap up and closing

19 June 2014

Connecting Fact Tables

Filed under: Data Warehousing — Vincent Rainardi @ 8:33 pm

In some cases, several fact tables need to be joined, for the purpose of reporting. Cubes join different fact tables, but reports can’t do that.

There are 2 approaches to do this:

  1. Create a “fact table bridge” table. The bridge table contains the fact key columns of each fact table, plus any filtering columns required.
  2. Create a new fact table containing all the required measures at the combined grain.

For approach a) some considerations are:

  1. The fact tables have different grains (if they have the same grain they should be merged). If there are 2 fact tables, it is possible that fact table 1 has 1 row but fact table 2 has 2 rows.
  2. The fact tables need to have a fact key column, which is a unique row identifier. This column needs to be indexed to support the join operation.
  3. During DW loading, the bridge table needs to be populated after both fact tables are populated.
  4. If this approach is taken, the downstream reporting system needs to do daily incremental extraction from this bridge table + fact tables, and persist the joined-up fact data in a wide fact table.

For approach b) some considerations are:

  1. It will require extra disk space as fact data is duplicated.
  2. Report will have shorter query time because there is no need to join any tables.
  3. It is possible to create a calculated measure involving facts from different fact table and persist it in the combined fact table.

Retail Example

For example, in retail, we have order fact table and delivery fact table. Order fact table contains the customer order, with the grain of 1 row for each order line. Delivery fact table contains the delivery details. One order can be sent in multiple deliveries, and several orders can be sent in one delivery.

To get all the cost information of an order, we need to join these 2 fact tables. We group by order ID, and sum up the cost of goods sales column (COGS), and the delivery cost column (DC). Joining the 2 fact tables is a lot easier because we have this bridge table, which contains the row identifier for both fact table (the fact key column).

Example of a calculation involving the two fact table is the total cost of an order, which is the sum of COGS and DC. Another example is the delivery lead times, defined as the elapsed days from the order date to the delivery date. Note: it’s the first order date and the last delivery date for that order. Meaning that, if an order is sent in multiple deliveries, then we take the last delivery date. If several orders are sent in one delivery, then we take the first order date.

The bridge table contains the fact key from both fact tables, i.e. OrderFactKey and DeliveryFactKey. The bridge table also contains several filtering columns, such as: OrderDateKey, DeliveryDateKey, CustomerKey, ProductKey, Order Number. These filtering columns allow the reports to filter the big bridge table (could be 500 million rows) to just rows for a particular date or a particular order.

The filtering columns need to be indexed. Alternatively, if all fact tables are periodic snapshot fact tables, the bridge table can be partitioned on the SnapshotDateKey. Giving the bridge table a PK is optional, but the advantage is to be able to uniquely identify a row.

16 June 2014

SSAS Stops Logging

Filed under: Analysis Services — Vincent Rainardi @ 8:17 pm

When the SSAS server is restarted, it stops populating the query log table. To make it starting logging the query again, we need to make a “no change” change on the Query Log connection string. As Mike Diehl suggested here, we could change the Application Name property.

This can be scripted into an XMLA, and executed as part of server reboot procedure. I haven’t found a way to automatically execute an XMLA every time SSAS server started, yet, but we can put this on the cube processing. After the cube processing, warm the cache, then check the maximum value of StartTime column on the OlapQueryLog table. If it is too far behind (e.g. more than 1 hour) than issue Alter xmla statement to change the App Name on the connection string. To make the App Name unique/different from the current one (so it will change), we can use the timestamp, converted to string.

This Process Cube – Warm Cache – Check Query Log – Update Conn String can be done in SSIS. How? Process Cube: using SSAS Processing task, Warm Cache: using SSAS Execute DLL task (XMLA Execute Command), Check Query Log: Execute SQL task, Update Conn String: uisng SSAS Execute DLL task (XMLA Alter instance properties).

As Mike Diehl suggested on his page above, the account used in the Connection String needs more permissions than db_datawriter, but probably less than db_owner. If we only set db_datawriter, Windows event log will produce something like: SSAS can not write into OlapQueryLog table because of insufficient permission, but it managed to create the table. (regardless whether in the SSAS Server Properties you specifies create table = false or true). Giving dbo permission certainly enables SSAS to write into OlapQueryLog.

Blog at WordPress.com.