DATA LAKE

its functionalities and applications

Authors

DOI:

https://doi.org/10.31510/infa.v21i1.1960

Keywords:

Dta lake, Database, Raw data

Abstract

In an age where the voracity for data is insatiable, the concept of the Data Lake emerges as a robust and innovative reservoir for the retention and analysis of information. Inspired by pioneering research by authors such as James Dixon on his blog in 2010, founder of Pentaho, and Thomas H. Davenport, renowned data analytics expert, the Data Lake stands out as a disruptive approach in the data management landscape. This article aims to explore this concept, examining the flexible and scalable architecture proposed by Dixon and the main traditional approaches to preserving the integrity of raw data regardless of its source or format all in a single place, Considering the scarcity of literature that still exists because it is a new subject. When addressing the Data Lake, it is intended to cover not only its structure, but also its implications that this raw data storage environment can have on scientific research, also showing what the Data Lake is, in order to contribute to the understanding of this concept.

Downloads

Download data is not yet available.

References

AMAZON WEB SERVICES. Estudo de Caso: Coca-Cola. Disponível em: https://aws.amazon.com/pt/solutions/case-studies/innovators/coca-cola/. Acesso em: 27 fev. 2024.

AMAZON WEB SERVICES. Estudo de Caso: Coca-Cola Andina. Disponível em: https://aws.amazon.com/pt/solutions/case-studies/coca-cola-andina-case-study/. Acesso em: 27 fev. 2024.

Amazon Web Services. Data Lakes and Analytics: Data Lakes. Disponível em: https://aws.amazon.com/pt/big-data/datalakes-and-analytics/datalakes/. Acesso em: 12 mar. 2024.

Amazon Web Services. (s.d.). AWS CloudTrail: Guia do usuário. Recuperado de https://docs.aws.amazon.com/pt_br/aescloudtrail/latest/userguide/cloudtrail-user-guide.html. Acessado em: 12 mar.2024.

Cutting, D., & Cafarella, M. (2015). Data Lakes: The Definitive Guide. Data Lake Management: Challenges and Opportunities. Disponível em: http://www.vldb.org/pvldb/vol12/p1986-nargesian.pdf. DOI: https://doi.org/10.14778/3352063.3352116

Dixon, J. (2010). Pentaho, Hadoop, and Data Lakes. Disponivel em: https://jamesdixon.wordpress.com/2010/10/14/pentaho-hadoop-and-data-lakes.

Fang, H. (2015). Managing Data Lakes in Big Data Era: What's a data lake and why has it become popular in data management ecosystem. In The 5th Annual IEEE International Conference on Cyber Technology in Automation, Control and Intelligent Systems, June 8-12, 2015, Shenyang, China. DOI: https://doi.org/10.1109/CYBER.2015.7288049

GIL, Antônio Carlos. Como elaborar projetos de pesquisa. 1991. Atlas.

Inmon, B., & Linstedt, D.. Data Lake Architecture: Designing the Data Lake and Avoiding the Garbage Dump. 2017. Technics Publications.

IPSense. Estudo de Caso: AWS Neighborly Data Lake. Disponível em: https://www.ipsense.com.br/estudo-de-caso-aws-neighborly-data-lake/. Acesso em: 27 fev. 2024.

Khine, P.P.. Data lake: a new ideolçogy in big data era. Disponivel em: https://doi.org/10.1051/itmconf/20181703025. Acessado em: 12 mar. 2024. DOI: https://doi.org/10.1051/itmconf/20181703025

Medium. Como Criamos Nosso Data Lake Utilizando a AWS. Disponível em: https://medium.com/building-soulkey/como-criamos-nosso-data-lake-utilizando-a-aws-e8cd96618929. Acesso em: 12 mar. 2024.

Miloslavskaya, N., & Tolstoy, A. Application of Big Data, Fast Data and Data Lake Concepts to Information Security Issues. In 2016 4th International Conference on Future Internet of Things and Cloud Workshops. DOI: https://doi.org/10.1109/W-FiCloud.2016.41

Serra, J., & Anton, B. (2018). "Data Lake Architecture." Disponível em: https://www.itm-conferences.org/articles/itmconf/pdf/2018/02/itmconf_wcsn2018_03025.pdf.

Singh, A. (2019). Architecture of Data Lake. Revista Internacional de Pesquisa Científica em Ciência da Computação, Engenharia e

Tecnologia da Informação (IJSRCSEIT), 5(2), 411-414. Disponível em: https://doi.org/10.32628/CSEIT1952121. Acesso em 27 fev. 2024. URL da revista: http://ijsrcseit.com/CSEIT1952121.

Singh, A. & Ahmad, S. Architecture of Data Lake. International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 2019, vol. 5. Diponivel em: https://doi.org/10.32628/CSEIT1952121. Acessado em: 12 mar. 2024. DOI: https://doi.org/10.32628/CSEIT1952121

Wider, P. & Nolte, H. Toward data lakes as central building blocks for data management and analysis. Disponível em: https://www.frontiersin.org/articles/. Acessado em: 12 mar. 2024.

Published

2025-01-28

Issue

Section

Tecnologia em Informática

How to Cite

DA SILVA, Denis Henrique Pazini; PAES, Miriam Francieli; SOTTO, Eder Carlos Salazar; DE ARAÚJO, Liriane Soares. DATA LAKE: its functionalities and applications. Revista Interface Tecnológica, Taquaritinga, SP, v. 21, n. 1, p. 233–245, 2025. DOI: 10.31510/infa.v21i1.1960. Disponível em: https://revista.fatectq.edu.br/interfacetecnologica/article/view/1960. Acesso em: 20 jul. 2025.