Questions tagged [privacy]

For questions related to the intersection of privacy and data science topics. This may include appropriate uses of data, collection techniques, anonymity, etc.

13 questions
15
votes
5 answers

How can I ensure anonymity with queries to small datasets?

I'm building a service that will contain personal data relating to real people. Initially the dataset will be quite small, and as such it may be possible to identify individuals if the search parameters are narrowed sufficiently. An example of a…
mal
  • 253
  • 1
  • 6
6
votes
1 answer

Un-learning a single training example from a trained model

I was going through the paper "The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction" by google on which suggests best practices for models in production. In a section about privacy controls in the data pipeline it…
bkshi
  • 2,303
  • 2
  • 14
  • 23
6
votes
3 answers

Is it possible to use a generative model to "share" private data?

Let's say we have some data set, with lots of instances $X$ and a target $y$. If it is of some importance, you may assume that it is a "real life" data set : medium sized, with important correlations, y is unbalanced...etc. Let's also say this data…
Lucas Morin
  • 2,775
  • 5
  • 25
  • 47
2
votes
3 answers

What is the best way for synthetic data generation while maintaining privacy?

For one of the projects where we are working as third party contractors, we need a way for the company to share some datasets which can be used for data science. It is not possible for the company to share the real data as that would be a privacy…
2
votes
1 answer

DBMS or Software for privacy sensitive data

We have a dataset of very privacy sensitive people data and want to build a database with it. The data protection department in our company doesn't like the idea that the data scientists are able to see any data specific to a person (even if…
user86825
  • 21
  • 1
2
votes
2 answers

Privacy through moving averages?

I am considering the following hypothetical situation: I have a time series of data. In general, 'the public' should have access to features of this data. However, making the time series available would constitute a privacy leak. I am considering…
Elle Najt
  • 131
  • 4
2
votes
0 answers

Noisification of categorical data proportions for privacy-preservation

Imagine I'm conducting an ongoing poll asking people's favourite animal out of a list of animals, [cat, dog, penguin, chimpanzee, ...] etc. I want to provide an interface that lets people query this poll data to see the relative popularity of each…
R Hill
  • 1,115
  • 11
  • 20
2
votes
1 answer

How to give colab access to only part of my google drive?

Is it possible to set up colab so it can access only one of two folders on my google drive? I know how to mount a colab book to the drive, but I'd like my collaborator to be able to read and write only from the folder I shared with them, not from…
Zaq
  • 21
  • 1
2
votes
1 answer

Does opting out of having my content used for improvement mean there are no other forms of data retention of my content by OpenAI?

Regarding the use of OpenAI API, Terms of Use at OpenAI mentions: You can opt out of having Content used for improvement by contacting support@openai.com with your organization ID. Please note that in some cases this may limit the ability of our…
Franck Dernoncourt
  • 5,862
  • 12
  • 44
  • 80
1
vote
3 answers

How to anonymize (de-identify) data in Python?

I have tried a simple algorithm to anonymize the data using the de-identification technique. But the code doesn't work for me. I want to anonymize the data by slightly changing the values of strings and integers. The data sample is available…
Muhammad Ali
  • 2,509
  • 5
  • 21
  • 22
0
votes
1 answer

The Israeli MOH databases on Covid patient numbers have a lower cutoff value on published numbers of patients. How does this protect privacy?

The Israeli ministry of health reports many statistics on Covid patients (Numbers of confirmed cases, hospitalizations, people in quarantine - Divided by age and other demographics) available here (Albeit only in Hebrew) In any category, numbers…
Student
0
votes
1 answer

How to write custom de-identification algorithm in Python?

I have tried a simple algorithm to anonymize my data using the de-identification technique. But the code doesn't work for me. I want to anonymize the data by slightly changing the values. The data sample is available here import pandas as pd import…
Muhammad Ali
  • 2,509
  • 5
  • 21
  • 22
0
votes
1 answer

Is it unethical to gather data from data leaks about demographics?

Sorry if this is the wrong SE, but in my mind it made the most sense to ask this here. My question is related to specifically collecting information about a target demographic, not individuals which is obviously unethical. For example, say that…
Justin T
  • 101
  • 1