Event date: Thursday, April 14, 2022 @ 10 AM PT / 1 PM ET
Speakers: Scott Gay - Solution Architect (LinkedIn) & Preeti Kodikal - Director of Product Marketing (LinkedIn)
Organized by: Dremio
Modern businesses heavily rely on BI tools, dashboards, and many other data-mining tasks. However, running queries on large data lakes can be extremely time-consuming.
This extended query processing time is mainly due to data storage and retrieval methods in data lakes. Data is compressed to its maximum possible extent in data warehouses and lakes, and the processing layer remains shut down until queries wake them up.
Thus data engineers naturally go for creating copies that speed up specific queries.
Few expectations behind creating such copies are:
- Making performance-optimized (e.g., uncompressed) copies for fast retrieval;
- Personalized documents to limit the searching space for specific users;
- Making exact copies to cater to the BI dashboard views and;
- Optimized documents for data mining and machine learning model training.
Yet, they create data redundancy and partially elude the true purpose of data warehouses and data lakes.
Scott Gay and Preeti Kodikal from Dremio are talking about how to cut such copies and instead use the data lake itself without compromising performance.