Thursday, September 7

Data Scientist Skills - Foundation

Data science career track is one of the hot favorite career track currently in the market. Individuals from various fields are moving to data science roles. Data science is a broad domain that demands skills of various domains for job execution. A data scientist has to be good with maths, business, and technology. In this blog post, I will try to cover all the foundational skills required to be a good data scientist. If you want to become a data scientist and have these skills then you can move right ahead but if you feel some part of the foundation is a little weak, just work on it and it will ease your way to become a successful data scientist.

Data Scientist Skills

To become a proficient data scientist four major skills are required. These are problem-solving skills, analytical skills, coding skills and mathematical aptitude. Other than these there are other skills as well that will help you enhance your ability and add to your charm but being good at these five skills should definitely be your first target. Let's go through these skills one by one.

Problem Solving Skill

The most important skill that will help you become an expert data scientist, business analyst, data engineer or machine learning consultant is the skill of solving problems. Problems are an unavoidable part of any domain whether it may be any domain. Finance, Insurance, Health Care, Logistics, Supply Chain etc. all the industries have a particular way to function and understanding the abruptness in these functions makes you a good problem solver. As a data scientist, you can work in any domain, you will get the domain knowledge gradually along with experience but the skill of solving problems is the one you need to cultivate without depending on the domain specific issues. 

Even in the interviews, the interviewer tries to get a rough idea of your problem-solving skill by asking hypothetical or abrupt cases. Some interviewer might ask you to guess the number of atm transactions in your New York, some might ask you how would you move mount Fuji? and some can ask you the revenue earned by Wallmart in a day across US market. These problems are typically weird ones and answer to these problems is never accurate. But they do help to analyze the thinking process of the candidate. Good problem-solving skill should be the first, you should master if you are looking for data scientist job roles.

Mathematical Aptitude

Majority of the work a data scientist does involve rigorous use of mathematics especially statistics, algebra, calculus, and probability. As a data scientist, you will be understanding the business problem, its demand and which machine learning model will be the best fit as per the use case. To understand these model and algorithms you need to be familiar with the maths behind it. The derivations, assumptions, and equations. If you don't like maths than its going to be really challenging. 

Analytical Skills

Analytical skills in itself is a collection of a lot of sub skills. But here, we are mainly concerned with pattern recognition and identification. You should be able to scan the data and analyze it. You should be able to identify different patterns and relationships between entities in the data which is available for analysis. Let's take the example of a housing data that has different attributes such as house size, location, price, number of bedrooms etc. Just by going through the data you should get a rough idea which attributes are affecting the price the most. Which attributes are directly related to price and which of them are inversely proportional? Later on, you can move ahead and do regression analysis and use other algorithms to critically and accurately identify the colinearity, variance and other relationships among the different variables. Analysis of the data helps data scientist to identify deviations in data and produces insights that can be leveraged for business growth and development.

Coding Skills

Languages used by data scientist

Like all other IT people data scientist also use the power of machines to handle big data, complex computations, and visualizations. Hence, to interact with machines you need to know the language of computers. And for a data scientist, the most popular languages are R and Python. Using any of the two languages you can execute any of the machine learning algorithms on a limited amount of data. And if the size of data increases beyond a certain extent than big data technologies come into the picture. These are Scala, spark, hive, HBase etc. which are based on Hadoop distributed file system framework.  All these languages come handy when you are working with data. Hence, you need to have some coding skills to get comfortable with these languages. The better coder you are the more easily you will be able to grasp these languages.

All these four skills are the foundation data scientist skills, that will help you become a good data scientist. If you have these four skills, congrats! You can move ahead and take the next steps. But if you lack a few skills, don't worry. Just start working on them and in no time you will have a strong foundation of expert data scientist skills. Best of luck!

No comments:

Post a Comment