Big data, true to its name, is used to describe large amounts of data that need to be stored and analysed. Since John Mashey coined the phrase, the explosion of technology has created more opportunities for collecting data. Such data exists within the Internet of Things (IOT), and requires big data technology to manage.
Data science, on the other hand, employs machine learning algorithms to analyse data, in order to make predictions. Otherwise known as a data scientist, these professionals use the latest technology and scientific methods to spot trends.
So, let's go into more detail.
What is Big Data?
From our smartphones to social media likes, a single person produces large amounts of data every day. Multiply that by the number of people in the world, and you are talking too much data for us to even comprehend, let alone computer systems. Every minute, one million people log into Facebook, 3.8 million Google searches are made and 188 million emails are sent. And with the growing population, that these numbers will continue to rise.
Big data aims to collect, analyse and use this data, through frameworks like Cassandra, Hadoop and Spark. Whether they break down data into files (File Distribution), use multiple machines for one task (Parallel Processing) or analyse a user's journey, these processes have one goal; to make predictions and improvements through collected data.
Data can be considered "big data" if it fits into the following five V's:
Volume is used to describe the mass amount of data generated by users. From mobile phones and sensors to credit cards and social media, the first characteristic of big data is the large volumes of data produced. "Big data" can only be used when large amounts of data are being analysed.
There are numerous types of data being produced; or, a vast variety of data. Since the humble beginnings of structured technology, new forms are being created at an unprecedented rate. Modern-day data is significantly more unstructured, with so much variety that there aren't enough categories to organise it all. Innovative big data technology endeavours to process such data.
Velocity is the speed at which data is being processed. As more and more data is produced, the need for faster collection and analysis increases. Despite the explosion in data, users continue to expect lightning-fast loading speed, so businesses need to work faster and harder to keep pace. That is where big data technology comes in, to analyse data while it is being generated, not after it reaches a database.
Data, often assumed to be black or white, needs to be trustworthy. Veracity ensures that data is accurate, so the predictions made are accurate too. With large volumes of data streaming in, technologists must prioritise the trustworthy data and disregard the inaccurate data.
Differing from the accuracy of data, value describes the worth of data. Such information, without an end goal, is useless. The main objective with big data is to understand the benefits and limitations of data analysis, in relationship with how much it can be monetized. Big data is commonly repurposed for financial gain.
What is Data Science?
Data science, in conjunction with big data, revolves around data management. It is the study and profession of analysis, with the intention of extracting knowledge from data. Data scientists use scientific and mathematical methods to process structured and unstructured data. Often confused with data mining, it is an integral industry in the modern landscape.
Data scientists work with big data to make predictions on future trends. They devise productive ways of managing, analysing and using data to a corporate client's advantage.
Is Big Data Safe?
There are ethical and moral issues with most things in life, technology in particular. Big data as a concept isn't unsafe, but the human motives behind the data can be. As it is reliant on data from us, it can fall foul to human bias and lack of representation. For example, if a study uses big data to decide which characteristics highlight potential CEOs based on existing business leaders, the lack of female representation would sway the results. In a study, out of the 100 top British firms, only six of them have female CEOs, so the data would be more biased towards masculine characteristics.
Big data can only use what we provide it with. But what actually do we provided with? Well, everything. Big data is collected from most everyday devices that we use, from social media to video sharing sites, so we can understand why some people may feel overly monitored.
As warned by George Orwell in 1984, modern society is being monitored on a daily basis. However, laws have been put in place to protect technology users. The General Data Protection Regulation was created in 2016, and gives users control over their personal data. It regulates how big data is stored and used, ensuring that user data cannot be breached; the rules provide peace of mind.