Data for Good

Make weblogs safe and valuable for marketing

Telecommunication companies sometimes veer away from an incredibly rich, yet underutilized dataset – data already in their possession – weblog data. Concerns about privacy, scalability challenges, and fears of potential misuse have created significant hesitation. This has been stopping Communication Service Providers (CSPs) from leveraging their weblogs to offer more relevant marketing and from monetizing […]

Go Deeper

Dr. Pucketlove – Or, How I Learned to Stop Worrying and Love Parquet (partitioning)

Pucket is a Scala library which provides a simple partitioning system for Parquet. But what is Parquet and why does it need partitioning when it already supports filtering? In this post I will attempt to explain Parquet, partitioning in Hadoop, and the motivation and design of Pucket. If you’re not interested in the background, you can skip straight to some simple code examples or go to the GitHub repository.

Go Deeper