Category: Data Science

What Does a Data Scientist Do?

With so much digitalization in recent years, most organizations is in constant need of data science professionals. The escalation of big data in 2010 led to the growth of data science. It was required to support the need of businesses to draw insights from vast unstructured data sets. The abundance of data allows for a more data-driven approach to train machines rather than a knowledge-based approach.

Data science is described as anything related to data, including modeling, analyzing, and collecting. But the most crucial part is its all sorts of applications like machine learning.

The misconception of a data scientist

The general masses have a popular misconception about data scientists. We think a data scientist is only involved in learning AI (Artificial Intelligence) or machine learning. However, most organizations hire data scientists as analysts. Undeniably, they can solve technical problems, but the companies hire them to solve the problems relating to data.

So, what does a data scientist do?

  • Data collection

One of the primary duties of a data scientist is collecting data. While collecting data, there will also be involvement of business stakeholders. The stakeholders will have domain knowledge about the project. Through them, we can extract data, whereby they offer lots of references and sources. It might be from a third party or web scraping etc. Also, note that the data collected are raw and not clean.

  • Preparation of data

After the collection of data, the team will start preparing data. They will clean the data and put it in the proper format. Cleaning of data is vital as it helps to produce a tremendous analytical report and avoids incorrect conclusion. With the help of software programs, they clean lots of raw data and put it in the right order.

  • Exploratory data analysis

In the exploratory analysis, a data scientist will try to include statistical analysis of the data. Doing statistical analysis helps them understand the data, which is very important while solving machine learning use cases. A data scientist tries to study the behavior of data by involving lots of diagrams or diagram visualization. Because of its thorough analysis, it helps companies their customer behavior and optimized plans according to it.

  • Evaluation and interpreting exploratory data analysis outcome

After identifying the trend and the pattern, a data scientist has to present the result to the stakeholders. The task can be challenging because a data scientist will have to submit a report to marketing professionals. They may have limited knowledge of data science; hence a data scientist must give the result in a simpler term.

  • Model testing and building

After sorting out everything, a data scientist will choose potentials models and algorithms. So, in a model building, the data scientists will select one algorithm and perform high parameter optimization or cross-validation to determine the accuracy.

Apart from the accuracy, they also look upon various factors like the confusion matrix or determining the score of ROC AUC. They have to find out if those accuracies are good or not. Once the accuracy is good, they will move to the next stage.

  • Deployment of model

After the positive outcome on the accuracy, the next step is model deployment. There are various tools for the deployment of the model. One of the tools is Flask. It is a web framework that helps create a REST API and can consume from any front-end application.

  • Optimization of the model

Once there is the deployment of the model, the next step is to optimize the model. Here, the data scientists will set a month or days and see if the accuracy is good or not with actual test data. They will know the outcome of the accuracy after the model is being applied in the production.

If the model is not providing a good outcome, then a data scientist has to start the cycle again. The process continues until it finds the perfect model.

These are what a data scientist does. They will work closely with the stakeholders to understand their requirements. The data scientists design models or develop algorithms to extract data for business needs. It involves a lot of collecting and analyzing data.

Various Databases Explained

Data is information or facts about any subject, which may relate to any object under consideration. People’s names, date of birth/age, weight, height etc. may be taken as data related to these people. An image, file, pdf may also be considered as data. Structured systems of data which can be stored (among other operations) are called Databases. Databases can be of various types, depending on application and usage. To have different types of databases explained is the purpose of this brief review that follows.

What is Database?

Data are the symbols characters or quantities, with which operations can be performed on a real or virtual computer or computerized system, and which may be transmitted and stored by way of electric signals, and documented on any recording media. Data may also be considered as information or characteristics (which are generally numerical or symbolic) which can be collected, processed, manipulated, transmitted or stored. Technically speaking, data is a set of values of quantitative or qualitative variables about single or multiple objects (or persons). A Data Base is a collection of data that is organized and stored in a form that makes retrieval, transmission, manipulation etc. possible inside the environment of the computerized system (cloud included). Data Bases render management easy. For example, the domestic electricity service provider uses a Data Base to manage and provide monthly billing, customer queries; problems and issues, and also to handle fault Data and restoration of power supply, amongst other matters. A typical Social Media, like Instagram or Facebook, needs to methodically store enormous masses of information related to Members, Member activities, friends of Members, Messages, and Advertisements and so on. An adult dating app has to store this information along with even more for matching. A hookup site like https://meetandfuck.co.uk utilizes massive amounts of various user data to match casual daters in real time. These are only some examples. The examples of uses of Data Bases in our daily life are too numerous to be listed here, and have contributed to a new science of DBMS (Data Base Management System).

Types of Data Bases

Data Bases are, in its simplest form containerized storage for Data. It could also be called a library of organized Data. Technically, Data Bases are computerized structures that store and save, organize and protect and transmit and deliver Data. A typical diagrammatic representation and symbol of a Data Base is a ‘Cylinder’. Following are the main different types of databases explained:

  • Relational Data Bases: These were most important in the 1980s, when the items of Data were organized tabularly, and displayed as arrays of columns and rows. This form of Structured Information allows highly accurate and speedy access to Data.
  • Object Oriented Data Bases: These Data Bases are organized around representations of objects.
  • Distributed Data Bases: Data Bases may, in this case, maybe located and stored in scattered physical locations, multiple computers and different Networks.
  • Data Warehouses: This form utilizes large centralized storage for Data in order to accommodate extremely fast analysis and query.
  • NoSQL Data Bases: These Data Bases took over from Relational Data Bases, as storage and manipulation of both Structured and Unstructured Data, side-by-side, became at once more common, and more complex. This is the most popular Data Base at present.
  • Graph Data Bases: These Data Bases store Data as entities and their relationships to each other.
  • OLTP Data Bases: This stands for Online Transactional Processing, and generally involves Inserting, Updating or Deleting small amounts of Data in a Data Base. OLTP is used to deal with large number of continuously upgraded transactions by a multiplicity of Users.

Data Bases are often confused with Spreadsheets, but they are not the same thing. Both are convenient ways to store information. But Spreadsheets, such as, Microsoft Office Excel, is different from Data Bases, in the manner in which Data is stored and manipulated. Access to the Data, as well as the amount of Data that can be stored is also limited for Spreadsheets, which was originally designed for one User only. It is obvious from the set up of a typical Spreadsheet that it cannot handle more than a maximum number of Users, and allow manipulations that are too complicated. Data Bases on the other hand can store and absorb vast amounts of Data that are almost unlimited. They allow a multiplicity of Users to access the Data stored quickly and securely and manipulate this Data, with incredibly complicated manipulations, logic and language.