Data Warehousing and Data Science

2 January 2013

Difference between CURRENT_DATE and SYSDATE

Filed under: Oracle — Vincent Rainardi @ 10:38 am

The difference between CURRENT_DATE and SYSDATE is:
CURRENT_DATE is the client’s time zone
SYSDATE is server’s time zone

i.e. if you are in London and your Oracle database server is in New York:
CURRENT_DATE: 02/01/2013 10:26:25
SYSDATE: 02/01/2013 05:26:25

23 December 2012

Bit and Pieces on Oracle

Filed under: Oracle — Vincent Rainardi @ 6:04 pm

Some things on Oracle proved to be very useful for me. I’ll share them below because I’m sure they are useful for some of you too.

1. Searching for a column name

select * from all_tab_columns where table_name like ‘C%’ and column_name like ‘%ST%’

2. Add/delete a column

alter table account add account_number varchar2(30)

alter table account add (account_number varchar2(30), balance decimal(20,4))

alter table account drop column account_number

3. Change column name / data type

alter table account rename column account_no to account_number

alter table account modify account varchar2(35)

4. Change table name

alter table account rename to customer_account

5. Update a column based on another table

update account set account_number =
( select account_number
from account_mapping
where account_mapping.customer_id = account.customer_id
) where account_number is null

6. Create a table from another table
(this is very useful for backing up before updating)

create table account_bak as
select * from account where 1=2

7. Escaping & (ampersand)

update table1 set column1 = ‘M’ || chr(38) || ‘S’

27 September 2011

Top 2 things I missed in SQL Server and Oracle

Filed under: Data Warehousing,Oracle,SQL Server — Vincent Rainardi @ 6:08 pm

In the last few months I used both SQL Server and Oracle every day. This is because the source system is in SQL Server and the data warehouse is in Oracle. I begin to miss little things on SQL Server that I don’t get in Oracle, and vice versa. I thought I’d write this experience down as it could be useful for others.

In this article, when I say Oracle I mean using Toad, whereas when I say SQL Server I mean using SSMS. You can use Toad with SQL Server, and you can use other UI for Oracle, but that’s for another article. Version? SQL Server: 2008, Oracle: 10g R2 (well doesn’t really matter 10 or 11 as I use Toad as UI).

Oracle: Control Enter

First is the Control Enter to run SQL at the cursor. In SQL Server you need to block the code you want to run before pressing F5. In Oracle if you press Control Enter, you don’t need to block anything. Very useful I use this every day.

Oracle: No need for “top 1000”

You don’ t need to type “top 1000” when selecting large table. It will only bring back 500 rows. Automatically. That can be useful and can be annoying. Strangely, I found it very useful.

Oracle: Column Selection

It is called Intellisense in SSMS 2005/8 but Toad one is better: it enables you to select the column you like. When you type table alias followed by dot, it shows the column names. But you have to alias the table first, i.e. “select from fact_interest_rate_swap S” then you go back, put your cursor between “select” and “from” and type “S.” (s dot) and it will list all columns of fact_interest_rate_swap table.

Oracle: Count(*) Works

In the last 6 months or so, I work in a project where we have a column called countrysomething and every time I type count(*) it shows this country column instead. It is so irritating that I have to carefully press escape after typing count but before typing “(“.

SQL Server: Copy including Header

This SQL Server feature in SSMS is very useful to put data into Excel. In Oracle/Toad we need to Export to clipboard get the header.

SQL Server: Object Explorer

It’s true the saying: you don’t know how valuable it is until you don’t have it. In Toad we have Object Pallette, but it’s not as good as SSMS’ object explorer where you can browse the columns, stored proc, etc. In Toad Object Pallette you can only browse the tables, not the columns.

SQL Server: Top N

We won’t know how useful this top N is until we don’t have it. In Oracle, a SQL guy will go wondering why “select top 100 * from table1” doesn’t work. It’s “select * from table1 where rownum <= 10”.

In Informix/Ingress it’s FIRST: select first 100 * from table1. And it’s very handy in Informix to have SKIP clause: select skip 10 first 100 * from table1.

In DB2 we have FIRST too: select * from table1 fetch first 100 rows only

Ending

Well it’s not 2 things. It’s more like 11 things I know, but who’s counting anyway? 🙂

8 January 2011

Oracle ETL Tools

Filed under: Oracle — Vincent Rainardi @ 8:59 am
Tags:

For years the ETL for Oracle Data Warehousing has been OWB (Oracle Warehouse Builder). But in 2007 Oracle introduced another ETL tool called ODI, Oracle Data Integrator (not to confused with BODI, Business Object Data Integrator). ODI was originally from Sunopsis. Sunopsis had an ETL product called Data Conductor, famous for its ELT approach, i.e. load into the target server first, then use the power of the DB engine to transform and upsert into the target tables. This approach suited Oracle so Oracle bought Sunopsis in 2006 and branded Data Conductor as ODI.

It was (and still is) Oracle’s intention to combine OWB and ODI into 1 product. In January 2009 they bundled ODI and OWB into 1 license and called it ODIEE. Still two separate software, but 1 licence. Some of ODI functionalities were added into OWB and it was called OWBEE. Formal statement here. Formal OWB blog here. ODI product page here. Data Integration blog here. OWB product page here.

So today Oracle has 4 ETL products:

  • ODI: the one from Sunopsis. This is the future.
  • OWB: the old OWB (without the new ODI functions), available with Oracle DB. Aka “Basic ETL”.
  • ODIEE: bundled license of OWB & ODI. EE stands for “Enterprise Edition”.
  • OWBEE: modified OWB, with the new ODI functions. EE stands for “Enterprise ETL”, not “Enterprise Edition”

In the future they will be merged into 1 software: ODI. For now (2011), companies which are new to Oracle ETL should use ODI, because that is the future. ODI is the strategic product. Companies which already use OWB for a long time should use OWBEE. In the end, the old OWB will be gone, replaced by ODI. But they can’t do that now, because there are many companies using OWB at the moment. There must be a smooth way of migrating them to ODI, and there must be support for OWB for many more years. But there’s no escape that they must migrate all to ODI, just like DTS to SSIS.It will probably take 5 years for the world to change. For companies using OWB, this means a big project, and significant costs.

But there’s no “middle way”. Oracle as a company can’t maintain 2 ETL tools, because of the cost. They must have only 1 tool. ODI is the strategic product. In 5 years time this is ETL product that will survive. This is the product that will be enhanced and developed further. ODI is Oracle’s strategic ETL tool, as per the formal statement here.

The alternative is to move all ODI core features to OWB, which in my opinion can’t be done. There is a fundamental difference in principle of how they work. If OWB is like a house, we can’t just add the core ODI features on top that house. But we have to demolish that house, build a new foundation (the ODI way) then rebuild the house.

In addition to OWB & ODI, Oracle also has GoldenGate, a famous CDC (change data capture) tool that Oracle acquired in July 2009. GoldenGate capture the changes in the source systems by reading the transaction log file and apply the data changes to the target system such as DW or ODS using a native DB interface or ODBC. The component that applies the data to target is called Replicat. GoldenGate 11g Admin Guide is here, architecture diagram is on page 12. The real beauty of it is that GoldenGate can operate between various RDBMS, i.e. DB2, MySQL, Oracle, SQL Server, Sybase and Teradata, integrating data between them in real time (synchronous) or in batch using CDC. Using GoldenGate is not easy, there is a lot of command line interface. Not as easy as (and not as modern in UI) for example, DataMirror, DBMoto and Attunity. But functionality wise, GoldenGate is probably the richest.

One word about the term “Data Integration”. In 2006 all ETL companies seem to be in agreement that the word “ETL” had a bad perception. The word “ETL” only covered a small part of the scope. They all rebranded their product as “Data Integrator”. a) they could add/integrate data quality and data profiling into the product, b) they could get into MDM market with their tool, c) they could get into data integration projects (not only data warehousing projects), d) they can get into operational systems such as ERP, business systems, and ODS projects. I totally agreed. It should be called Integration, not ETL. Look at Business Objects. They called it BODI, Business Objects Data Integrator. SSIS: Integration Services. In 2007 in Boston I listened to Michael Gonzales’ ETL session at the TDWI conference. He said that “ETL is dead, it is now called Data Integration”. I totally agreed. But, in 2010, because everybody in the market (and the press) still call it ETL, the word ETL seems revived again. Hence the title of this post 🙂

If you find something incorrect above, I’d be grateful if you could let me know, either via comments or vrainardi@gmail.com, I will correct them.

7 January 2011

Introduction to Oracle BI for MS BI Developer

Filed under: Oracle — Vincent Rainardi @ 8:15 pm
Tags: ,

If you are a MS BI developer or Architect (SS, IS, RS, AS) you need to look at Oracle BI. a) so that we know what’s out there, and b) we increase our understanding of our own area by studying other areas. When I ask people which BI stack did you implement? These days the answer is almost always MS BI. There other BI suites: SAS, SAP (BW & BO), IBM (Cognos + InfoSphere) and Oracle. Out of these, OBI is not the most popular (BO or Cognos probably is), but OBI is the “latest trend”, i.e. with the arrival of OBIEE 11g, in the 2nd half of 2010 people start considering OBI as an alternative to MSBI.

The documentation for OBIEE in general is here. Release 11g is here. Release 10g is here.

History

The OBIEE 11g originated from Siebel Analytics (latest version was 7.8, Jan 2006). Answers, Interactive Dashboards, and Delivers were from Siebel Analytics (read here). Discoverer was not from Siebel Analytics, it was from Fusion Middleware (see this book page 31), along with Forms, Reports and Portal. At that time, OBI was part of “Oracle Fusion Middleware 11g Release 1”, see here. Apart from OBI, OFM 11g R1 consists of Communication Services, Identity Management, SOA Suite, WebCenter, WebLogic, HTTP Server, and Application Development Framework.

So, in Jan 2006, OBI is located in 2 places: a) Siebel Analytics contains Answers, Delivers and Intelligent Dashboards, b) OFM 11g R1 contains Discoverers, Forms, Reports and Portal.

Then OBI 10g was released in June 2006 (10.1.3.2). It consisted of Answers, Delivers, Interactive Dashboards, BI Publisher, Disconnected Analytics. Discoverer 10g was a separate product, released in Dec 2004 (10.1.2, aka Drake), see this book page 4. Discoverer was a very mature product, has been in the market for a long time, first released April 1997. Other Oracle BI tools: Beans, Reports, Data Miner, OWB, Spreadsheet Add-In were all separate tools, not in the “OBI 10g” suite.

OBIEE 11g was released in July 2010. 10g’s Answers is renamed as Analysis. 10g’s Interactive Dashboards is renamed as Dashboards. 10g’s Delivers is renamed as Agents. New features added: Scorecards, KPI, Actions, Conditions, Filters, Prompts and OBI Add-in for MS Office. BI Publisher (Report Designer) and Real-Time Decision are also part of OBIEE 11g.

Components

I’m going to use 11g terms here, instead of 10g.

An Analysis is a DW/BI SQL query presented in the form of tables, pivot tables and charts. These tables and charts can then be included in a Dashboard. We can create a prompt in an Analysis, to allow the users to select a value to filter the result of the Analysis. We can integrate an Analysis with an Excel Internet Query (IQY) to enable us to run the Analysis from Excel. We can create an Agent from an Analysis to schedule the Analysis to run at certain time. We can use variables practically anywhere within an Analysis (title, column, formula, formatting conditions, SQL, header, etc).

A Dashboard is a collection of analysis. A Dashboard consists of one page or several pages (shown as “tabs” on the top of the Dashboard). Apart from displaying Analysis, a Dashboard page can also display images, text, alerts, action links, views, reports, URL links and embedded objects. There is a toolbar on the top right of the Dashboard containing buttons such as: Edit, Print, etc. We can create a Skin to change the appearance of a Dashboard, e.g. background colour, logo and style sheets. A Skin can be assigned to users so each user can have a different look. A Style control how a Dashboard are formatted, e.g. the colour, font & size of the text, table border, graph attributes. Styles are organised into folders that contains Cascading Style Sheets, images and graph templates. We can create a Dashboard Template, which acts as a starting point for users to build their own Dashboard pages.

A Briefing Book is a collection of Dashboards. A Briefing Book can also contain an individual Analysis and a BI Publisher Report. The Dashboards, Analyses and Reports in a Briefing Book can be static (can’t be updated) or updatable (the content is refreshed everytime we open it). We can export a Briefing Book to PDF. We can deliver a Briefing Book to users using an Agent.

BI Publisher is a tool to create and publish reports. We can read data from databases (using SQL), spreadsheets, Analysis and OLAP DB (using MDX). We can also read from LDAP, Web Services, HTTP feeds, and XML files. If the data source is unrelated (such as individual files), we can create relationships between data sources. Using Publisher we can create layouts for our reports and publish the reports to various output format, including: HTML, PDF, Excel, PowerPoint, XML, CSV, and Rich Text. The output can be sent to: printer, email, file, fax, WebDAV or FTP. We can split a single report into multiple sections, each section is sent to different recipients on multiple destinations, using different formats. This is called Bursting. We can create a Template that contains style information that can be applied to RTF layouts. For PDF output, we can split the PDF into several smaller files (called PDFZ). We can schedule a report to produce multiple outputs, each with different layout, and each is sent to different destinations. There is an interactive viewer, where users can sort or filter a table, scroll the table, get a pop-up graph, and do “propagated filtering” across different area of the report.

Real Time Decision (RTD) is a tool to create/model business processes, gather data/statistics, and make recommendations (all that is called an Inline Service). An Inline Service consists of several elements including: Performance Goals, Choices, Rules, Statistics  Collector and Models. A Performance Goal is KPIs used for setting criteria for the scoring of choices. A Rule is used to target a segment of population. A Rule can also be used to decide whether a choice is eligible. A Model is self-learning and predictive, used for optimising decisions and providing real-time analysis. RTD’ is fully integrated with Eclipse development environment. We can deploy an Inline Service, run it to process requests, monitor its performance, gather and view the statistics, and fine-tune/optimise it.

How these components fits together is shown in OBIEE system architecture diagram here.

 

25 December 2010

Comparing Oracle 11g OLAP Option and SSAS

For my next DWBI project, I’m thinking about implementing Oracle 11g OLAP option instead of SSAS, so I’m comparing them.

  1. Browsability: users can use OBI (e.g. Answers), BO (since 9i AW/2003), Excel or other tool to browse Oracle OLAP. Users are not expected to use SQL to query the cube. In AS now we have Crescent to look forward to. In the mean time we could use Tableau (see Jen’s session in SQLBits 8), Strategy Companion, CubePlayer or Excel 2010.
  2. Oracle OLAP cubes are stored as true multidimensional store (I’m interpretting this as compressed MOLAP), not as relational tables with materialized view/MV. This MOLAP storage solves the classic ROLAP problem of “MV aggregates can’t cover every permutation of every member of every attribute against other members of other attributes”.
  3. Oracle mentions compression & cost-based aggregation (see page 7). SSAS is certainly compressed, but I’m not sure if SSAS aggregations are cost-based.
  4. Oracle OLAP has the ability to blend multidimensional and SQL models within the same query (see Bud’s comments below). This “blend concept” is similar to SQL Server’s BISM concept in Denali (2011). BISM can expose itself as multidimensional interface (MDX) or SQL/relational interface (DAX). The difference is that BISM is relational (tables), with “in-memory column” (VertiPaq), where as 11g’s OLAP is compressed MOLAP.
  5. Many-to-many relationship. If you are not familiar with M2M in SSAS, pls read Marco’s paper about M2M Revolution. This is critical for many business situations. Can’t live without it. Have searched the whole library and I couldn’t find M2M in 11g OLAP. Bud said we can do it (see below), and from what I understand here, this is done by simply mapping a base measure to the fact table containing the M2M relationship, just like the way we create any base measure. Not sure how do we hide the measure though (because the measure is dummy, it is only used to establish M2M relationship between 2 dimensions).
  6. Yes 11g OLAP Option can create calculated measures. But as we in SSAS world know, calculated measure such as YTD or growth is not a problem if it’s only 1 or 2 measures, but if we have 50 measures we use date tool aka date calculation dimension (see Chris, Alberto & Marco’s most useful SSAS book here, chapter 6). Seems that we can’t do this in 11g OLAP. Surely we can’t create 50 * 10 calc measures? Well we can, but how about the usability?
    Update 27/12: I have read Bud’s comment below, and I know we can create “measure dim” and “calculation dim”, but in SSAS we need to create a hidden measure and write some “MDX calculations” to establish how the growth/2 yr growth, etc will be calculated. Have read about creating calc measure on 11g here, but couldn’t find how that “MDX calculation” is done in Oracle OLAP. Could anybody point me a direction?
  7. Complex calculation/MDX, i.e. existing (results depends on what’s on the slicer). Hidden named sets which is used by a calc member, etc. How do we do that in 11g OLAP?
    Update 27/11: have explored Simba’s MDX Provider for Oracle OLAP (link, link, link) and could not find comprehensive OLAP/MDX functions such as this. It seems that Simba’s product only provide connectivity, not ablity to do complex calculations. Would be grateful if somebody can enlighten me (list of Simba’s MDX functions for Oracle 11g OLAP). Have also looked at Bizzantz’s DeltaMaster methods, but it’s far below the richness of AS’ MDX e.g. scoping, existing, etc.
  8. MDX interface. 11g OLAP cube does mention MDX interface to OBIEE, but I expect this is basic “select … on columns, … on rows from cube” without the ability to use MS’ MDX functions such as NECJ, parallel period, etc. See page 14.
    Update 27/11: same comment as point 7. Update 28/11: see OLAP DML Functions here.
  9. Measure expression. This is required for example for currency conversion. If we don’t have measure expression in 11g OLAP (and we don’t have M2M either), how do we do currency conversion?
    Update 27/11: we can do this in 11g OLAP using “calculated measure” (see here), just need to specify the correct “calculation type” and “calculation”. It is quite comprehensive functions with regards to period / time series computation, but outside period I’m not sure. Update 28/11: found it. It’s called OLAP DML Functions, see here. It’s very comprehensive.
  10. Dimensional/cell security. In AS I sort of take it for granted that group1 can only see member1 whereas group2 can only see member2, along with giving users different default members based on role. But in 11g OLAP?
    Update 28/11: as Bud said, yes we can do dimensional security in Oracle OLAP, see “Selecting data by criteria” here.
  11. In AS we can control the incremental cube processing, i.e. process changed partition only etc. It is mentioned in 11g OLAP this can be automated (see the end on page 28 re Fast cube refresh), and that they can update incrementally by reading the log files (see last paragraph of page 6). This is a good thing (compared to AS).

Oracle OLAP blog is here. The Wiki is here. 11g OLAP user guide is here. OLAP Option documentation is here.

If you find something incorrect above, I’d be grateful if you could let me know, either via comments or vrainardi@gmail.com. I will correct them. Thanks again for Bud Eggress for your corrections.

Blog at WordPress.com.