How to write custom de-identification algorithm in Python?

Question

I have tried a simple algorithm to anonymize my data using the de-identification technique. But the code doesn't work for me. I want to anonymize the data by slightly changing the values. The data sample is available here

import pandas as pd 
import uuid as u 
import datetime as dt 

# generate a pseudo-identifier sequesnce using python random number generator library uudi.


 def uudi_generator(length): 
    uudi_list= list() 
    i=0 
    while i < length: 
        uudi_list.append(u.uuid4()) 
    i+=1 
    return uudi_list 

#import original originaL dataset 
dataset = pd.read_csv('bankcredit-data.csv') 

# pseudo identifier
sLength = len(dataset['housing']) 
dataset.insert(0, 'uuid', pd.Series(uudi_generator(sLength), index=dataset.index)) 

# Transaction record attached to the original
dataset.insert(0, 'transaction_date', pd.Series([dt.datetime.now]*sLength, index=dataset.index)) 

 #transcation record is attached to originaL data file 
dataset.to_csv('bankcredit-data.csv') 

#delete identifiabLe record from dataset 
del dataset['firstnamme'] 
del dataset['lastname'] 

# export  de-identified dataset as csv to be shared with the user
dataset.to_csv('deidentified-data.csv')

score 0 · Answer 1 · edited Apr 09 '20 at 23:54

0

Unless you want to build your own, try the Faker Library for anonymity of PPI info.

pip install Faker

edited Apr 09 '20 at 23:54

Stephen Rauch

1,831
11
23
34

answered Apr 09 '20 at 22:39

Syenix

369
1
6

How to write custom de-identification algorithm in Python?

1 Answers1