The management of Big Data: a new strategic challenge

If from the existence of computers, we create a brief history, we see that data collection was originally a matter of company workers. These data, in all honesty, were few and generally irrelevant. They also had the disadvantage of the lack of interoperability, that is, they could only be processed by the people who had generated them. With the internet, users are collecting data and also thanks to the network they can be easily transported. Not only this, but companies also appear offering new services handling data of these users, ultimately our data. The most significant cases are certainly Amazon, Google, Facebook, Apple and Twitter. Google knows for example what words are the most common when users search the network, the contents of the universe of websites indexed and therefore the services and products they offer.

That reaches extreme lengths such as the possibility for a company to predict epidemics, such as influenza. Google knows how to anticipate them by analysing searches that are increasing. For example if searches of certain symptoms associated with influenza exponentially increase it can declare the emerging birth of a pandemic outbreak. Facebook where appropriate, knows our profiles. It knows where we live, our tastes, friendships, hobbies, etc. and therefore can segment the population according to different avatars with high accuracy. But most importantly (let us not put our head in our hands) we have freely provided these data. Before continuing with the use that companies make of this information, let us introduce the third revolution, where data have not been generated by users.

If up to this point companies and users were flooding the data universe, now comes the revolution of machine produced data. This is a huge universe of data from temperature sensors, humidity sensors, buildings, cars, electricity, photographs, satellite, domestic appliances, wearables, billions of elements that make the Internet of Things (IoT) which measure the behaviour not only of people and their vital signs, but also of social groups, opening endless commercial possibilities still unknown today.

Several examples. To learn the movements that occurred in a city required surveys and asking where citizens came from and where they were going. Mobilizing a wealth of pollsters provided acceptable results. Today, with a mobile phone penetration close to one hundred percent, following the path that phones follow can inform us how people move and the length of their stay in certain locations.
Evidence-based medicine increasingly enables us to diagnose diseases based on a sufficiently broad database of symptoms.
Driverless cars are increasingly closer. Thanks to a combination of a large amount of data (Big Data) and accurate sensors, driverless cars are beginning to be used in the United States. As we can see there are a multitude of new possibilities.

At this point, the third era of data, two important questions arise. The first is protection of our privacy. Large volumes of aggregate data generally protect our privacy. However, it is always possible to extract personal data violating our privacy. How do we protect against this eventuality? It is certainly very difficult by the very nature of innovation. It is very difficult to hinder imagination and technology and new products and services will always be ahead of laws that should protect us. However, it should legislate broadly creating binding codes of good conduct.

The second big issue is how to manage big data. In addition to companies like Amazon, Google, Facebook, Apple and Twitter, there are large data deposits held by other organizations and companies. The telephone user data, energy consumption and supply data, transport data, health data, public administration data, etc. allow us to estimate with some accuracy how users behave. To analyse and manage data there is no need for companies to be substantial such as those mentioned in the previous paragraph. It is for this reason that every company in the future will be able to manage the data extracted, and it is absolutely necessary to know where they are and how the various data processing technologies are used, how they are managed, how we can merge them with our company data and how to benefit from them.

With all this we arrive at the crucial point: in a present or in the foreseeable future it will be essential to have professionals with transverse knowledge who understand data technologies, businesses and applications where such technologies and data can be applied.
For example, Google Analytics provides such an impressive number of records about users who visit our websites, but we must know how it works and what the data provided means. Facebook or Linkedin allows access to millions of users. We need to know what benefit we will gain from our advertising and whether the target chosen is correct.

Data are to computers what Big Data is to supercomputers, but let’s not forget that a mobile phone today has more computing power than a personal computer from the 80s and thus companies will soon have real supercomputers able to process today’s Big Data. The question then is, will we have sufficient staff with transverse skills to meet these needs?