3 Kinds of Customer Data: Facts, Generalizations and Inferences

6 min. Read

Best practice is to use all three, of course.

Facts, generalizations, and inferences are defined as personal data whenever they are attributed or attributable to a customer.  They are regulated under laws such as GDPR in Europe and CPRA in California, so are rightly part of a conversation about customer data protection and privacy.

Using the richest data available.

Best practice is to use all three, of course.  Using the richest data available — in a privacy-safe way — is the best path to reach your goals.   But facts, generalizations, and inferences are not all the same, and understanding the differences can help you think more clearly about how to use them.

When we think about personal data, most of us think first of facts.  Names, addresses, phone numbers, gender, date of birth. Also, purchases, payments, and usage statistics. Plus, special categories such as health data, religion, political categories.

What are facts?

Facts are data which are true or thought to be true.  At first glance, facts seem to be easy to manage, but they can be difficult.  Many facts can change, and so need to be updated for accuracy.  People seldom tell us when a fact changes.  Add to that, people don’t always provide accurate information.  Facts may be accurate but hard to interpret when they apply to an account with multiple people, such as a household. Meanwhile, under GDPR, if you are the controller of this information then you have a duty to keep the facts accurate.  Of course, customers have no such duty!

What are generalizations?

Generalizations are usually data derived from facts, and are used in place of facts.  Generalizations can also be inferences.  Common generalizations of consumer data include converting a location into a region, converting an age into an age-range, and identifying that a customer is more likely than average to go to a competitor (to “churn”).

Many generalizations are more likely to be true than facts they are based on. Consider the simple example of an age-range.  If systems show the “fact” that a person is 29 years old, it is even more likely to be true that the customer is in the age range 21-35 years old.

Yet “more likely to be true” is not quite the same thing as being “more accurate.”  If the person is actually 30, it is more accurate (in the sense of a smaller amount of error) to say the person is 29 than to say they are in the age range 21-35. 

A “personal data protection” benefit of generalizations is that by giving up some accuracy, generalizations make it harder to identify individuals from their personal data.  By making more people look alike, fewer can be uniquely identified from their data.  Privacy is enhanced when we enable people to “hide in the crowd.”  Well-designed generalizations can retain most of the information we need, at a suitable level of accuracy, while making people and their data more secure.

Generalizations converting “continuous data,” such as birth date, into “category data” such as age range, can also be designed to make the information easier to interpret.  So well-designed generalizations can improve data analysis. 

What are inferences?

Inferences are usually more complicated than generalizations.  The process of computing a statistically-based inference can involve sophisticated Machine Learning.  In today’s AI-driven world, more and more inferences come from these kinds of models.  Analysts look at data and draw conclusions from the patterns they see in that data, and AI allows us to do that with masses of data and to calculate the inferences quickly. 

Inferences are most useful when they provide important information not directly available as facts.  Inferences can be commercially useful even when, like “facts,” they are not always correct, but are frequently correct.

Models can use data from multiple sources to reach useful inferences, often with great accuracy. We can infer gender and age from shopping data, based on data showing which things men or women and people of different age groups have purchased in the past.  These inferences can sometimes be more statistically accurate than the facts about gender and age in our clients’ databases! 

Of course, neither the age nor the gender inferred is a fact. We need to be aware that some people — and LGBTQIA people in particular — create a wonderful complexity that a “binary gender inference” will never capture.  Likewise with age-stereotyping, where we find older ages are harder to accurately predict than younger ages. Inferences can be powerful, so they demand care to protect people’s privacy. However, with thoughtful effort, we can use inferences without negative consequences for trust.  We can avoid using inferences in ways that don’t belong in an equitable society.  The first principle is to remember that inferences may be statistically useful but they are not “true facts.”  Don’t pretend you know. The second principle in using inferences is to be mindful of fairness. 

Be kind to people while using their data.

Technical methods have been developed that insulate modelers from the raw personal data used to create their models.  These methods protect customer privacy while enabling accuracy.  So today, richer data can be used with safety, and well-designed models can create more powerful inferences that boost ROI while protecting customer trust.

Nobody expects every advertisement they see to be accurately targeted to them, but improving the accuracy of targeted messages can improve advertisers’ ROI.  Everyone knows an offer for a gender-specific product will perform far better when exposed to the correct gender.  Likewise, if a promotion for a tennis event goes to a tennis fan.  The same principle works in customer service.  If we can alert a company’s call center to the task a customer is likely calling about it will be easier to improve NPS.  And if we can get the right offer to a customer who is probably considering a competitor, we can reduce churn. 

So, together with facts and generalizations, inferences can help companies to create better customer experiences.  Brand preference, better NPS scores, lower churn, and stronger more sustainable revenues can be achieved. 

It isn’t simple to handle all this data in ways that fully respect customers’ privacy.  Major corporations have often found this difficult, and many have chosen to forgo opportunities to serve customers better because it is easier to avoid using customer data than to protect it appropriately. 

Yet today, industry leaders are choosing to be privacy-first while using facts, generalizations, and inferences to power their marketing and create fabulous customer experiences.  The benefits of a hyper-personalized approach to reduced churn, increase upsales, and identify new audiences is compelling. 

Using an approach that is based on a customer-first attitude, privacy can be designed into business processes that protect customer data.  Starting from a foundation of cyber security is fundamental.  Designing methods that are privacy-by-default, and require customer opt-in where data is not fully anonymized is the next layer.  Here, it is important to apply differential privacy methods to minimize the chances of re-identification.  Then, honoring customer choices in the uses of their data, and giving them choices as required, for example, by GDPR and CPRA is essential to customer trust as well as compliance. Data should be monetized only in ways that customers expect, approve, and can modify.  Using data so it visibly benefits the customer is the key… whether the data is a fact, a generalization, or an inference.  

Image Credit: Photo by Alexander Sinn on Unsplash