I recently had the opportunity to work on a large dataset (roughly 1TB) and analyze it using Python for filtering data based on some given parameters. In this post, I’ll be summing up how I accomplished the task using Python and used various measures to make the filtering process efficient.
Task Details
We have a large number of gzip files... more