Welcome to the first ever Data Science rubric of MAEUR. This rubric exists of four monthly blogs that will be published in March, April, May and June. Each month the blog is linked to a different topic regarding Data Science. In this month’s blog, the focus lies on introducing the Data Science field.
Harvard Business Review named ‘Data Scientist’ to be the sexiest job of the 21st century. Thereby, the demand of Data Science is huge nowadays. But what is Data Science exactly? The goal of Data Science is gaining insights from data which can be translated into valuable information for the management of companies, often existing of predictions where they can base their new strategy on. In practice, it almost always involves a large amount of unstructured data, also known as the Big Data. Lately, data is used in every field. Think about the public sector such as healthcare, law enforcement and traffic. However, data is also important for companies and financial institutions, so the private sector. For companies that want to optimize their performance, it is important to know who their customers are and luckily for them, the quantity of data that is being saved is growing exponentially. When doing your groceries for example and using your “bonuskaart” at Albert Heijn (which is their loyalty program), it tracks the products you buy to create personal discounts that lead to you coming back to the store again. Think about the cookies you accept every time you visit a website. When accepting these, it saves personal information of your visit, such as clicks or mouse movement. This shows that data is everywhere and that it is important for public authorities and companies to be able to work with it.
There are several steps a Data Scientist follows in performing their work. The first step is gathering and cleaning the data. As mentioned, Big Data often is unstructured and mostly exists of a collection of information that is either directly or indirectly linked to the private information of people. This data is then cleaned by detecting and correcting false or incomplete information. This is done to get the best representative data of the population. Once the data is cleaned, analyses can be performed. It is important to first analyze the existing data to recognize patterns. Why do I say existing data? Well, Data Scientists use the data that exists to make predictions for the future. This leads to the third step, namely performing the analyses, by the means of machine learning, which lead to the valuable insights and final predictions. Machine learning helps to identify patterns in the data and make the predictions. The existing machine learning methods differ from simpler statistics because they can learn from the data and therefore, can be trained to be more accurate in their predictions, which is not possible when performing a simple linear regression for example. The last step which seems less important but is of as much importance as the other steps, is presenting and visualizing the findings in such a way that it is understandable for everyone. The art is to translate the findings that only the data scientists understand, into information that can reach the target group, being either the management of a company or people who are further away. As can be seen, there are multiple important aspects within Data Science and different jobs focus on different steps within this process.
Now you have read this, are you curious what it is like to work in Data Science and do you want to learn more about this field? Then you can participate in the first ever Data Conference organized by MAEUR, taking place on the 27th of May 2021. The theme will revolve around ‘dare to data – go beyond ones & zeroes’ and the capabilities, opportunities and possible obstacles of data will be addressed. This is the opportunity to be connected to the leading companies in the field of data science. More information about the event can be found here: https://www.maeur.nl/students/events-activities/95-maeur-data-conference.
Try to figure out if this field is something for you and what it is about data that attracts you! Next month, personalization using data and the corresponding concerns about privacy that exist will be discussed.