Data intake
An easily scalable system of ingest layers that extracts data from diverse sources, including web pages, mobile applications, social networks, IoT devices and existing data management systems. It must be flexible to execute in different modes (by batch, one time or in real time) and admit any type of data and new data sources.
Data storage
A highly scalable data storage system must be able to store and treat data without processing, as well as support encryption and compression systems while maintaining cost efficiency
Data security
Regardless of the type of data processed, data lakes must offer maximum security, using multi-factor authentication and authorization systems, as well as role-based access levels, data protection, etc.
Data analysis
After ingestion, the data must be able to be analyzed quickly and efficiently using data analysis and machine learning tools to extract relevant information and transfer the examined data to a data warehouse.
Data governance
The process of data intake, preparation, cataloguing, integration and acceleration of data queries must be simplified in its entirety to guarantee a level of data quality for business use. It is also important to track changes in key data elements for a data audit.
The 'data lakes' promise to accelerate the processes of extracting information and knowledge at the business level, avoiding the complexity involved in the processes of data storage centered on computer systems.