What is pseudonymization?
Pseudonymization is the data management technique of replacing personal identifiers in data records with pseudonyms or placeholders. It’s often used to protect personal information when sharing data. Data that has been de-identified using pseudonymization can be re-identified again, if necessary.
Pseudonymization vs anonymization
Pseudonymization is a GDPR regulated, reversible practice that replaces identifiable information with non-identifiable information, while anonymization is the irreversible process of de-identification that is not controlled by the GDPR.
Both practices are approaches to the protection of personal information, however, they are best used in different situations. It’s best to use anonymization where the data will never need to be re-identified, for example, if a marketer wants to analyze trends without risking the exposure of personal information.
Pseudonymization, on the other hand, should be used when the data controller expects they will need to re-identify the data at some point. For example, one hospital may want to compare the medical records of patients with another hospital. In this case, the records would need to conceal personal information until it is confirmed to be a match to the right individual.
There are multiple techniques data controllers can use to pseudonymize their data and each one comes with its own benefits and drawbacks. Some common methods include:
- Tokenization: Replaces sensitive data with randomly generated tokens or symbols. The original data is securely stored separately and associated with the corresponding token.
- Encryption: Transforms data into an unreadable form using cryptographic algorithms. Pseudonymization through encryption requires a key or password to decrypt and recover the original data.
- Data masking: Modifies sensitive data to protect its confidentiality. This can involve techniques like redaction, where parts of the data are blacked out or obscured.
- Shuffling: Randomizes or reorders data elements to break any direct link between the original data and the pseudonyms. It can help protect individual identities while retaining data utility for analysis.
With increasing regulation surrounding the use of consumer data, many regulations, such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA/CPRA), define pseudonymization and offer guidelines on how to use it as a method of data protection.
According to article 4(5) of the GDPR:
“‘[P]seudonymisation’ means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person.”
The GDPR allows organizations to use pseudonymization as an acceptable means of data protection. However, the data controller must ensure that the reference table or data map is never shared outside the organization.
The CCPA definition of pseudonymization is similar to the GDPR’s:
“‘Pseudonymization’ means the processing of personal information in a manner that renders the personal information no longer attributable to a specific consumer without the use of additional information, provided that the additional information is kept separately and is subject to technical and organizational measures to ensure that the personal information is not attributed to an identified or identifiable consumer.”
“Health information that does not identify an individual and with respect to which there is no reasonable basis to believe that the information can be used to identify an individual is not individually identifiable health information.”
It also allows the data controller, or covered entity, to assign a code or other means of re-identification, as long as the method doesn’t include any identifiers and isn’t shared outside the organization.
The European Union Agency for Cybersecurity (ENISA), uses the GDPR definition of pseudonymization. However, it also released a report on Pseudonymisation Techniques and Best Practices that goes into further detail about use cases and techniques. The report concludes that there isn’t a single approach that works best. Instead, different methods may be optimal in different scenarios.
Pseudonymisation according to the GDPR
According to the GDPR, pseudonymisation is the processing of data in a way that conceals personal information. This is achieved by using a variety of methods that replace identifiers with placeholders. The data can later be re-identified using a reference table or data map.
What is anonymization?
Anonymization is the process of permanently de-identifying data and is used as a method of protecting consumer privacy. There are several methods of data anonymization, including masking, swapping, and perturbation.
What is a pseudonym?
A pseudonym is any made-up name used to replace a real one. In the case of pseudonymization, it refers to any placeholder that is used to replace identifiable information in a data set in order to protect the privacy of the data subject.
Is pseudonymized data still considered personal data under the GDPR?
Since pseudonymization is a reversible process, pseudonymized data is still considered personal information according to recital 26 of the GDPR.
Is anonymized data still considered personal data under the GDPR?
The GDPR doesn’t classify anonymized data as personal information. This is because anonymization is an irreversible process, meaning that the data can’t be re-identified.
Does the GDPR require data holders to use pseudonymization?
The GDPR offers pseudonymization as a method for protecting consumers’ personal data. However, it does not make pseudonymization a requirement.