Data warehouses are large repositories that accumulate data from many sources. For decades, they have been the fundamental building blocks of business intelligence and data discovery/warehousing systems. Their specific static structures dictate the type of analysis the data can be subjected to. Data warehouses are widely implemented among medium and large companies, due to their functionality to share data and content through databases shared by different teams or departments. Data warehouses allow organizations to increase their efficiency. Organizations that resort to 'data warehouses' usually do so to have tools to support business decision-making, that is, to allow decision-making based on data ('data driven'), which so often is spoken.
Data lakes, by contrast, store large amounts of raw data in native format for use at the moment it is needed. Hierarchical data stores store data in files or folders, while data lakes use a flat architecture to store data. In a data lake, each piece of data is assigned a unique identifier, along with a set of extended metadata tags. In this way, when a business question is asked, the relevant data can be retrieved from the data lake in order to analyze it and provide answers to the question.
The cheaper data storage systems and technologies have made it possible to multiply the amount of information available. The new database technologies dispense with pre-established schemes, so that they allow the application of discovery analytics techniques. With data lakes, companies employ data scientists capable of drawing conclusions from the analysis of raw data. They are able to detect correlations between data and draw conclusions as they drill down into it.