The biggest challenge for companies today is that more and more data is coming every day.
Businesses face a difficult challenge: data is growing tremendously and a lot of data is structured information, 80% (texts, videos) and at least 50% is stored outside companies, creating unprotected and security interruptions.
Data lake is a centralized set of data warehouses, raw (unprocessed), both structured (conventional databases) and unstructured, described by metadata. A data lake can easily scale up storage capacity (using platforms such as Hadoop or Apache Spark). Can recognize, analyze and generate data reports collected by a data lake.
Compared to Datawarehouse and Data Marts, a data lake does not describe the data format (no schema available) until it is used.
Features of a date lake:
To create a data lake, consider the following steps:
Source of information: itainnova.es