Data Platform and Data Science

Content of This Blog

Data Platform

  1. History of Data Warehousing
  2. Not a Data Warehouse, but a Data Platform
  3. Enterprise Data Platform (EDP)
  4. Customer Data Platform (CDP)
  5. 3 Things We Need on Data Platform
  6. Components of Data Platform
  7. Data Mesh – JP Morgan Chase and Fannie Mae
  8. Data Product
  9. Cloud Data Warehouse
  10. Good Bye Data Centre – It must be in the Cloud
  11. Do We Still Need a Data Warehouse?
  12. Modern Data Warehouse
  13. Why Data Lake
  14. What is a Data Lake?
  15. What is a Data Lake, physically?
  16. Data Lake vs Data Warehouse
  17. Data Warehouse on Data Lake
  18. Data Lake Architecture
  19. Data Lake on Azure: Databricks or Synapse?
  20. Data Lake on Oracle
  21. Data Lake on Snowflake
  22. Data Lake for Asset Management
  23. Data Lakehouse
  24. Data Lakehouse vs Data Lake+Warehouse
  25. Fact & dim tables in Data Lakehouse
  26. Building Dimensional Model on Data Lakehouse
  27. Data Fabric
  28. Microsoft Fabric as a Data Fabric Platform
  29. Microsoft Fabric as a Data Warehouse Platform
  30. Microsoft Fabric for Machine Learning
  31. Microsoft Fabric vs Databricks
  32. Microsoft Fabric: AI-powered transforms & new Dataflow
  33. Data Sharing
  34. Share it, don’t move it
  35. Azure Big Data Analytics
  36. What about Hadoop?
  37. It’s Data Engineer, not ETL Developer
  38. Data Engineering
  39. Data Engineer
  40. Snowflake SQL
  41. dbt
  42. Definition of Big Data and Data Warehousing
  43. Choosing between Big Data and Data Warehousing
  44. Difference between Extract and Ingest
  45. Productionalisation
  46. What’s in Azure?
  47. Column Name in Google BigQuery
  48. Three Modes of Synapse Analytics
  49. ETL vs ELT (and Ingestion)
  50. Using Informatica to load into Snowflake
  51. Data Migration vs Data Warehousing
  52. Four Biggest Lessons Learned in 2023
  53. Data tables, but not SQL
  54. Buy don’t build
  55. Snowflake for AI: Prediction for a bank
  56. Matillion: SaaS ETL Tool
  57. Data Vault

Data Warehousing

  1. 12 Reasons for Using Dimensional Model
  2. Data Warehouse for Asset Management
  3. Why do we need a data warehouse?
  4. Data Warehousing Interview with Astera
  5. Data Warehouse Data Modelling
  6. Infrastructure in Data Warehousing
  7. Star Schema or Snowflake
  8. Using a Data Warehouse for CRM
  9. NoSQL in Data Warehousing
  10. Using Temporal Tables in Data Warehousing
  11. Loading Late Arriving Dimension Rows
  12. Late Data Warehouse Loading
  13. Foreign Keys in Fact Tables
  14. Big Volume Data Warehouse
  15. Junk Dimension Revisited
  16. Transactional Fact Table
  17. Rating Dimension
  18. Numerical Attributes
  19. Day Measures
  20. One or Two Fact Tables
  21. One or Two Dimensions
  22. A dimension with only one attribute
  23. Storing Percentage Measure in the Fact Table
  24. Primary Key in Accumulating Snapshot Fact Tables
  25. Join Brings More Rows
  26. Normalised Data Warehouse (1)
  27. Normalised Data Warehouse (2)
  28. Disk Space in Data Warehousing
  29. Why a 400 GB Data Warehouse Requires 20 TB of Disks
  30. Reasons for Creating a Data Mart from a Data Warehouse
  31. Creating a Data Mart from a Data Warehouse: Four Questions
  32. Data type of Is_Current Flag column
  33. Transaction Dimension
  34. Data Warehousing and Business Intelligence
  35. SQL 2008 Data Warehousing Features
  36. SQL 2008 DW Features (Presentation at Microsoft)
  37. Parallel Data Warehouse (what a clever name)
  38. Change Tracking in SQL Server Data Warehousing
  39. Change Data Capture in SQL Server Data Warehousing
  40. Merge Statement in SQL Server Data Warehousing
  41. Bitmap Filter (Star Join Query Optimisation)
  42. Primary Key and Clustered Index on the Fact Table
  43. Impact of Clustering the Fact Key to Query Performance
  44. Standardising Entity and Column Names
  45. General Performance Considerations for “INSERT SELECT”
  46. Preserving the History of the Facts
  47. Data Warehousing Interview Questions (2)
  48. Data Warehousing Books
  49. Normalising a Fact Table
  50. Denormalising a Fact Table
  51. Mixed Grain Fact Table
  52. Non Aggregatable Measure
  53. A Measure or An Attribute
  54. Putting Higher Grain Attributes into a Separate Dimension
  55. Time Dimension
  56. The Primary Key of a Fact Table with the Grain the same as a Dimension
  57. When To Snowflake
  58. Dimensions with Multi Value Attributes
  59. The Word “Fact” – Terminology Clarification
  60. Measures on a Dimension Tables
  61. How To Deal With Status
  62. Linking 2 Fact Tables
  63. Data Warehousing Books
  64. Data Architect, Data Warehouse Architect and BI Solution Architect
  65. Storing History on 3rd Normal Form (3NF)
  66. Building a Data Warehouse at the Same Time as the Operational System
  67. Stored Procedure vs ETL Tool
  68. ETL and Data Integration
  69. IIS in Data Warehousing
  70. Where To Put an Attribute (DW)
  71. Reviewing a Dimensional Model
  72. Who’s Who in Data Warehousing
  73. Differences between Data Warehousing and BI
  74. Role of a DW Designer/Dimensional Modeller in the ETL Development Phase of a DW/BI Project
  75. Data Sourcing
  76. Slowly Changing Dimension (SCD) Type 0
  77. Bridge Table with Multiple Instances
  78. Bridge Table with Date
  79. Populating Unknown Measure with a Zero or NULL
  80. Expiry Date column in SCD Type 2 Dimension
  81. Using Accumulated Snapshot Fact Table to Monitor Status
  82. SCD Type 2 – Initialising a New Attribute
  83. A Data Warehouse Must Always Reflect the Values in the Source System
  84. UNK in Data Warehousing
  85. Top 2 things I missed in SQL Server and Oracle
  86. Two Methods of Reporting
  87. Wherescape RED
  88. Testing Your Data Warehouse
  89. The Principle of “Do It Upstream” in Data Warehousing
  90. Updating Past Data
  91. When is a number not a measure?
  92. Parent SK in the Fact Table
  93. City and Rank
  94. Effective and Expiry Dates in Type 2 Dimensions
  95. Adding a New Attribute and Measure
  96. Name of Surrogate Key Columns
  97. Data Type Changes in the Source System
  98. A Data Warehouse with no Surrogate Keys
  99. System Interfaces Contract
  100. Month Attributes in the Date Dimension
  101. Delete Takes A Long Time
  102. Fact or Dimension
  103. Not All Surrogate Keys Define the Fact Table Grain
  104. Hadoop in Data Warehousing
  105. Data Warehouse vs Data Virtualisation
  106. Delete All Rows in Dimension Table
  107. The Main Weakness of Snowflake Schema
  108. Initial Data Load
  109. Why Do We Need a Data Warehouse?
  110. Seven Methods of Data Integration
  111. Building a Data Warehouse and BI on Low Budget
  112. Dedup
  113. Banding and Grouping in Data Warehousing
  114. Connecting Fact Tables
  115. Real Time Data Warehouse
  116. Data Extraction Method
  117. Populating Fact Tables
  118. Estimating the Size of Dimension and Fact Tables
  119. Indexing Fact Tables in SQL Server
  120. Flip Flopping in Dimension Tables
  121. Indexing Fact Tables
  122. Measure or Attribute
  123. DimMonth, DimQuarter and DimYear
  124. The ABC of Data Warehousing
  125. Accumulative Snapshot Fact Table
  126. Six Dimension Types
  127. Dimodelo
  128. Effektor
  129. Instrument Dimension
  130. Investment Performance
  131. Data Sourcing
  132. Watermark in Data Warehousing
  133. Snowflaking on Security Dimension
  134. Position Fact Table – IBOR vs ABOR
  135. Distinct-Attribute Rows in Dimension Tables
  136. 3 Things in Date Dimension
  137. EAV Fact Tables
  138. Different Measures for Different Products
  139. Data Interface (How to Manage a DW Project)
  140. Surrogate Key in Snowflake, Databricks and DbSchema
  141. Metadata Driven ETL
  142. Using Production Data in Dev & Test Environment

Data Architecture

  1. What is Data Architecture?
  2. Elements of Data Architecture
  3. Code Decode Table
  4. Responsibilities of Data Architects
  5. Data Architect jobs in UK
  6. Varchar(255)
  7. Physical Data Modelling (PDM)
  8. Painful-to-Retrieve Data Structures
  9. Data Types for Common Columns
  10. The Problem with Data Quality
  11. Hierarchy with Multiple Parents
  12. Relational vs Non Relational Databases
  13. Data Type
  14. Data Files – Delimiter and Qualifier
  15. Why they are called FLAT files
  16. DbSchema
  17. Architecture Review Board

SQL Server

  1. Find Which Partitioning Function Is Used by a Table
  2. Select count(*) For a Very Big Table
  3. SQL Server Editions
  4. Copy a Table
  5. Informatica Metadata Manager Can Now Read SSAS Cubes and SSRS Reports Metadata
  6. How Long Did It Run?
  7. Class Does Not Support Aggregation
  8. SQL Server 2008 Backup Compression
  9. The Database Does Not Have a Valid Owner
  10. Using Pivot for 2 Value Columns
  11. Q&A on: Resizing TempDB
  12. Editions and Prices of Visual Studio 2010
  13. Creating Test Data on SQL Server 2008
  14. Updating a Table Based on Itself
  15. Collation Conflict When Querying a Linked Server
  16. Inserting Multiple Rows in One SQL Statement
  17. T-SQL: Exists versus In
  18. T-SQL: Exists versus In – Performance Comparison
  19. Null in Date Column
  20. LINQ2SQL and TSQL and CLR
  21. Concat NULL Yields NULL
  22. Make yourself sysadmin on SQL 2008 Express
  23. Formatting Dates in SQL Server 2012
  24. Data Consistency in Oracle and SQL Server
  25. SQL Server 2014 Installation
  26. SQL Server Scripts Library
  27. Temporal Tables in SQL Server 2016
  28. Alt Key in SSMS
  29. U-SQL
  30. About NOLOCK
  31. Column Store Index

Project Management

  1. Team Setup for BI Project
  2. Don’t Worry About the Sign Off
  3. Development Operations (DevOps)
  4. Building Prod & Dev Environment

Data Science

  1. Learning AI – Where to start?
  2. Learning Machine Learning with Upgrad
  3. Andrew Ng’s Machine Learning Course
  4. What Machine Learning Can Be Used For
  5. Talking Avatars
  6. Salesforce Einstein
  7. Can Machine Learning replace BI?
  8. Machine Learning for Asset Management
  9. Machine Learning or Data Science?
  10. What is Data Science?
  11. Interview Questions for Data Scientists
  12. Data Scientist
  13. BI vs Data Science
  14. Why do we use Python in Machine Learning?
  15. Which Machine Learning Algorithm Should I Use?
  16. How to do AI without Machine Learning?
  17. Automating Machine Learning using Azure ML
  18. MLOps (Machine Learning Operations)
  19. Logistic Regression with PCA in Python
  20. Linear Regression in Python
  21. Why Linear Regression is so hard
  22. Feature Importance
  23. Handling Class Imbalance
  24. MCC Formula for Multiclass Classification
  25. Google Colab
  26. Ensembles – Odd and Even
  27. Entropy and Information Gain in Decision Tree
  28. SVM with RBF Kernel
  29. Tuning XGBoost Models
  30. The Trick in Understanding Human Language
  31. Tokeniser
  32. Natural Language Processing (NLP)
  33. What is Convolution?
  34. What is Convolutional Neural Network (CNN)?
  35. What is CNN? (Part 2)
  36. Recurrent Neural Network (RNN) and LSTM
  37. RNN Applications
  38. Reinforcement Learning
  39. Building a Neural Network
  40. Andrew Ng’s Deep Learning Course
  41. Forecasting time series: using statistics vs machine learning
  42. Stock Price Forecasting using LSTM
  43. Stock Price Forecasting Using CNN
  44. Stock Price Forecasting using XGBoost
  45. Managing Investment Portfolios using Machine Learning
  46. Using Reinforcement Learning to Manage Portfolio Allocation

Python

  1. Turtle
  2. Python: String and Array
  3. Python: List, Tuple, Dictionary, Set
  4. Python: If and For
  5. Python: List comprehension
  6. Python: Function, Lambda, Map, Filter, Reduce
  7. Pandas: Column, data type, group
  8. Plot: Boxplot, barplot, pairplot, heatmap

Business Knowledge

  1. Asset Management vs Wealth Management
  2. Buy Side and Sell Side
  3. Layering in Insurance
  4. Bank Data Model
  5. Credit Default Swap (CDS)
  6. Distribution Yield vs Underlying Yield
  7. Treasury in Investment Banking
  8. Off Balance Sheet Items
  9. Securitising Cash Positions
  10. Investment Banking Books for BAs and Developers
  11. Investment Banking
  12. Credit Risk and Market Risk
  13. Asset Management Business Processes and Systems
  14. Domain Knowledge
  15. Historical Portfolio Positioning
  16. Data Warehousing / Business Intelligence for Investment Banking
  17. Swaps and Options
  18. Performance Attribution
  19. Asset Management Companies in the UK
  20. ABOR and IBOR

Business Intelligence

  1. It’s not BI any more, it’s Analytics!
  2. BI Developer or Analytics Engineer?
  3. Reporting and Analytics
  4. Current Trend in Business Intelligence
  5. Where To Store The Business Logic
  6. Microsoft BI Books
  7. Data Warehousing & BI Companies
  8. How to Choose the Right BI Technology to Suit Your Style
  9. SAP Hana, an in-Memory Database
  10. Business Objects Voyager (SAP BO Analysis)
  11. What is Big Data, Data Warehouse, Data Mining
  12. Tibco Spotfire
  13. Composite
  14. The 5 Stages of DWBI Journey
  15. iDashboard
  16. Using BI Tools as ETL Tools

Power BI

  1. Power BI Datamart
  2. Power BI Q&A
  3. Dynamic Difference
  4. DAX Studio and DAX Guide
  5. DAX: Percentage of Total and Filter on 2 Columns
  6. A few DAX functions

Oracle BI

  1. Comparing Oracle 11g OLAP Option and SSAS
  2. Introduction to Oracle BI for MS BI Developer
  3. Oracle ETL Tools
  4. Bit and Pieces on Oracle
  5. Difference Between CURRENT_DATE and SYSDATE

SSIS

  1. SSIS: Looping with ADO Enumerator
  2. SSIS: Debugging a Script Component
  3. SSIS: Importing Files – Read the First N Rows
  4. SSIS: Work Flow vs Stored Procedures
  5. SSIS: SQL Server Destination or OLE DB
  6. SSIS: Importing a File with Dynamic Columns
  7. Coming SSIS Articles
  8. SSIS: Export a Query Result to a File
  9. SSIS 2008 Data Profiler
  10. SSIS: Updating a Variable based on Database
  11. SSIS: Updating a Variable based on a File (Script Component)
  12. SSIS: Updating a Variable based on a File (Script Task)
  13. SSIS: Import/Export Files with Variable File Name Set at Run Time
  14. SSIS: True/False and 1/0 on Bit Columns
  15. SSIS: Loading a Big File Fast
  16. SSIS: How to Convert DateTime to Int
  17. SSIS: How to Populate a Fact Table using SSIS (part 1)
  18. SSIS: How to Populate a Fact Table using SSIS (part 2)
  19. SSIS: SCD Wizard Performance Issue
  20. SSIS: Automating DDL Changes
  21. SSIS 2012: Converting Date from Excel for Lookup
  22. Loading a Dimension Table using SSIS
  23. Six Important Features of an ETL Tool
  24. Choosing an ETL Tool
  25. SSIS: Numeric Column Loaded as NULL
  26. Memory for SSIS
  27. Download files from Azure Blob Storage using Control-M 

Analysis Services

  1. Parent Child Dimension
  2. Top 10 Tips: Building Cubes
  3. Top 10 Tips: Cube Testing
  4. The Most Useful SSAS Book
  5. Who’s Who in Analysis Services
  6. Building Cubes From Operational System (SQLBits presentation)
  7. The Unpopular SELECT Statement
  8. Date Dimension in Analysis Services (Part 1)
  9. Date Dimension in Analysis Services (Part 2)
  10. Many-to-Many in Attribute Relationship
  11. SSAS DMV (Dynamic Management View)
  12. SSAS DMV: Join Using DataSet
  13. SSAS DMV Nugget at SQL Server User Group
  14. Optimising Cube Query Performance and Processing Performance
  15. Creating Many Roles in SSAS Cubes
  16. Vertical Fact Table
  17. Creating a Dimension with Multiple Column Key
  18. Scheduled Deploy in SSAS
  19. AMO: Enumerating DSV Tables and Columns
  20. The 4.2 billion tuples limitation in SSAS
  21. Double Click Property Name
  22. Where To Put An Attribute
  23. Attempted to read or write protected memory
  24. Combining DW and ERP Data in SSAS Cubes
  25. Combining DW and ERP Data in SSAS Cubes – 5th Case
  26. Creating a dimension from a large table
  27. Concatenating attributes to form dimension key
  28. What is a cube?
  29. Cube developer
  30. What are cubes bad at?
  31. Using SSAS cubes for reconciliation
  32. Who Are Using The Cube Now?
  33. Who Uses What Cube and When
  34. Taking a cube offline
  35. Comparing Excel 2007 and ProClarity
  36. Qlikview vs PowerPivot: connecting to SSAS cube
  37. PowerPivot
  38. Multi Language Cubes
  39. Cube Translation in Excel 2007
  40. Q&A on: Browsable Folder When Restoring Cubes
  41. Renaming Attribute & Dimension
  42. Aggregate Not Shown
  43. Ratio in SSAS
  44. Updating Partition Source SQL
  45. Parser: The query contains  … parameter, which is not declared
  46. Tuning Cube Processing Performance
  47. SSAS Developer Interview Questions
  48. Q&A on: Impersonation
  49. Parent Child Dimension: extra child node
  50. Ragged Hierarchy in SSAS
  51. Where To Put an Attribute (SSAS)
  52. Many to Many Is Not Always Right
  53. Many to Many: Which Dimension is Used?
  54. Processing ROLAP Cube and ROLAP Dimension
  55. SSAS Videos
  56. Duplicate Attribute Keys in SSAS
  57. The Trio Maestro’s SSAS 2012 Tabular Book
  58. Processing SSAS Cubes using PowerShell
  59. Can I Cluster SSAS? (Minimizing Downtime at Deployment)
  60. 5 Minutes SSAS DBA
  61. SSAS Stops Logging
  62. Process Update of a Dimension
  63. Analysis Services Tabular Model

MDX

  1. Sorting
  2. Filter
  3. Existing
  4. Except
  5. Filter on Dates Before Today
  6. Tomislav’s MDX Book
  7. Half Year To Date
  8. Setting Default Date to Last Available Date
  9. Holding History – MDX for “From Date” and “To Date”
  10. Sherry Li and Tomislav’s 2012 MDX Cook Book

Events

  1. SQLBits 4 Manchester (Data Warehouse Data Modelling)
  2. SQLBits 5 Wales (Building Cubes from Operational Systems)
  3. SQLBits 7 York
  4. 4 Sessions for SQLBits 8 Brighton
  5. SQLBits 8 Brighton (Advanced Dimensional Modelling)
  6. SQLBits 9 Liverpool
  7. SQLBits 10 London
  8. SQL Server User Group Indonesia
  9. SQLBits 12 Telford
  10. SQLBits 15 Liverpool 4-7th May 2016

24 Comments »

  1. Wooow.
    What a perfect blog about DW/BI.
    I will check it each month for new changes.
    Good luck.

    Comment by khashayar jamsahar — 29 December 2010 @ 11:21 am | Reply

  2. Hiya! I simply want to give a huge thumbs up for the great info you have got here on this post. I might be coming back to your blog for extra soon.

    Comment by Catheryn Wanczyk — 11 April 2011 @ 6:37 pm | Reply

  3. You have got a really informative blog here… thanks and keep writing…
    I was wondering how did you create this page… manually?

    Comment by theSuda — 17 May 2011 @ 6:41 am | Reply

  4. What an amazing repository for DW/BI issues!! I can’t tell you how grateful I am for you putting all this up for everyone. I will definitely be back for updates. I’m especially interested in using AMO with Powershell and your great examples of SSAS DMVs. I had never heard about it before. Thanks so much again.

    Comment by Reiner — 6 September 2011 @ 1:25 pm | Reply

  5. Very Nice collection about BI, help me alot … thanks

    Comment by neeraj1982 — 1 October 2011 @ 6:10 pm | Reply

  6. Thanks for wonderfull collection

    Comment by cemuney — 6 January 2012 @ 6:09 pm | Reply

  7. A great website!

    Comment by Dez — 28 May 2012 @ 3:33 pm | Reply

  8. I have read your book Dataware house with SQL Server. It realy super!. Thanks for sharing the information

    Comment by Uma — 23 August 2012 @ 5:34 am | Reply

  9. Very good set of articles and examples.
    Thank you.

    Comment by matteo montesi — 29 August 2012 @ 10:45 am | Reply

  10. For details DWH & BI concepts and DWH & sql interview related materials please visit on this link http://datawearhousebiworld.blogspot.in/2015/01/hello-all.html

    Comment by bidevloper — 25 January 2015 @ 12:38 pm | Reply

  11. There is lot information available in this blog about BI and data warehousing. Would follow this blog regularly for any update on Oracle BI side.

    Comment by sourishbanerjee — 31 July 2015 @ 8:27 am | Reply

  12. Always a pleasure to read your blog! Very useful information. Please continue your excellent blog many years…

    Comment by Hennie de Nooijer — 16 January 2017 @ 7:59 am | Reply

  13. Waw…

    Comment by yvonne sumilat — 14 July 2018 @ 4:03 am | Reply

  14. Wow, such a great blog! The blog explained best practice loading DW so clear. Really appreciate your efforts and time for writing this blog!

    Comment by Rose — 15 December 2020 @ 6:04 am | Reply

  15. Nice one.

    Comment by Oriyomi — 25 December 2020 @ 9:48 am | Reply

  16. Hi Vincent! This is probably the best nuts-and-bolts data warehousing blog out there, keep up with great work! I do have one question. In the “Populating Fact Tables” post (https://dwbi1.wordpress.com/2015/02/09/populating-fact-tables/) you linked to loading dimension tables blog post but the link is broken (https://www.sqlservercentral.com/columnists/vRainardi/). Do you have a working link, or a backup of the blog post?

    Comment by CuriousOne — 14 August 2021 @ 8:50 pm | Reply

  17. Hi.. Nice informative blog. Best Data Warehousing and Business Intelligence Services Provider in USA

    Comment by Mani.DWP — 12 September 2022 @ 8:46 am | Reply

  18. Great blog 🙂 Can i receive email notifications when there’s new post of yours ?
    Im checking now [✓] Notify me of new posts via email, hope it shall do the trick.
    Emanuel

    Comment by Emanuel Oliveira — 5 August 2023 @ 6:04 pm | Reply

    • I’m not sure how either Emanuel (receive an email every time there’s a new post). But enabling notification seems to be the right trick.

      Comment by Vincent Rainardi — 5 August 2023 @ 7:52 pm | Reply

      • Hey Vincent, thanks. I was having issues like not able to list my wordpress subs, and i found out the reason was I had login via Google SSO and getting errors and options not working. So I
        logged in with username/password and all is normal now.
        Again was surprise finding your blog, and i see theres dozens of DWH posts (and respective comments) to entertrain myself complementing potential issues.
        Myself I been interested in Data Vault (I already worked in Oracle Databases a lot, migrations and data warehouses) but been interested in Data Vault, which will have to deal with typical exceptions when creating star presentation layer anyway.

        Comment by Emanuel Oliveira — 5 August 2023 @ 8:56 pm

      • Ah I see, thanks Emanuel very useful (SSO vs username). Yes Data Vault modelling is interesting.

        Comment by Vincent Rainardi — 6 August 2023 @ 2:53 am

  19. This blog is amazing. Feeling like I have discovered a treasure.

    Comment by Jumana — 23 August 2023 @ 7:41 pm | Reply


RSS feed for comments on this post.

Leave a comment

Blog at WordPress.com.