Data Warehousing and Data Science

17 January 2022

Machine Learning or Data Science?

Filed under: Data Science,Machine Learning — Vincent Rainardi @ 8:07 am

I’ve just got my post grad diploma in machine learning and all this time I was wondering what data science was. I have written an article about what data science is: link, but now that I understand a bit more about machine learning, I understand there is a lot of overlap between the two (ML and DS).

Last night when I read a Data Science book by Andrew Vermeulen (link) I was wondering which of the things I’ve learned in ML is actually DS. I list the items and label them ML or DS:

Yes, machine learning is definitely part of data science. Strictly speaking, the data cleansing, data analysis, statistics and visualisation are data science but not machine learning. We can see this in this proceedings: link.

So Data Science consists of the followings:

  • Data Cleansing
  • Data Analysis
  • Statistics (including probability, central limit theorem, hypothesis testing)
  • Data Visualisation
  • Machine Learning (including all ML models)

But in my opinion one cannot learn ML without studying statistics, visualisation, data loading, data cleansing and data analysis. In order to understand ML models properly, one must understand all the above fields.

Berkeley School of Information argues that the followings are also included in data science: link

  • Data Warehousing
  • Data Acquisition
  • Data Processing
  • Data Architecture
  • Business Intelligence
  • Data Reporting

I disagree with this opinion. From what I see many companies, Data Warehousing, acquisition/ processing and Data Architecture are part of a role called Data Engineer. A Data Engineer prepare and stores the data, including designing the data models and data ingestion process.

Because Data Visualisation is part of data science, it is tempted to think that Business Intelligence and Data Reporting are part of Data Science. But this is not true. The data visualisation in the data science is more on the data behaviour, such as clustering and statistical analysis, whereas BI is more on the business side, such as portfolio performance or risk reporting. This is only my opinion though, I’m sure other people have different opinions.

So there are 2 fields/roles in the data industry these days:

  • Data Science: data cleansing, data analysis, statistics, machine learning, data visualisation.
  • Data Engineering: data acquisition, data loading/processing, data quality, data architecture.

Whereas in the old days the roles are: business/data analyst, data architect, BI developer, ETL developer.

1 Comment »

  1. I think there is a bit of a push by DS schools to “swallow” as much of the CS field as possible. Hence the annexation of databases and data warehousing.

    Comment by louiedinh — 17 January 2022 @ 4:57 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: