HW 3: Privacy and Obfuscation

Due 11:59pm, Thursday 3/31.
Submission: Submit a zip/tar file to Canvas. Include your write-up (titled FirstName_LastName_hw3.pdf) and a folder containing your code. Code will not be graded.

Goals

Online data has become an essential source of training data for natural language processing and machine learning tools; however, the use of this type of data has raised concerns about privacy. Furthermore, the detection of demographic characteristics is a common component of microtargeting. In this assignment, you will explore how to obfuscate demographic traits, specifically gender. The primary goals are (1) develop a method for obfuscating an author’s gender and (2) explore the trade-off between obfuscating an author’s identity and preserving useful information in the data

Overview

The data for this assignment is available here. Your primary dataset consists of posts from Reddit. Each post is annotated with the gender of the post’s author (op_gender) and the subreddit where the post was made (subreddit). The main text of the post is in the column post_text. The contents of the provided data include:

classify.py: a classifier that predicts the author’s gender and the subreddit for a post (example run: python classify.py --test_file dataset.csv). Note that this file also uses the two provided pickle files.
dataset.csv: your primary data.
background.csv: additional Reddit posts that you may optionally use for training an obfuscation model. A larger version is available here.
female.txt: a list of words commonly used by women.
male.txt: a list of words commonly used by men.

The provided classifier achieves an accuracy of 64.95% at identifying the gender of the poster and an accuracy of 85.85% at identifying a post’s subreddit when tested over dataset.csv. Your goal in this assignment is to obfuscate the data in dataset.csv so that the provided classifier is unable to determine the gender of authors, while still being able to determine the subreddit of the post. Note that in this set-up, we treat the provided classifier as a blackbox adversary (please do not try to hack it). This assignment was largely inspired by the paper Obfuscating Gender in Social Media Writing (Knight & Reddy, 2016), which may be a useful reference. Scenerios where this obfuscation model might be useful could be social media users who want to preserve their privacy by hiding their gender from the adversary, without losing the meaning of their post. You could also imagine this is a dataset of health records or other sensitive information that needs to be anonymized before providing it to NLP researchers.

Basic Requirements

Completing the basic requirements will earn a passing (B-range) grade

First, build a baseline obfuscation model:

For each post in dataset.csv, if the post was written by a man (M) and it contains words from male.txt, replace these words with a random word from female.txt.
Obfuscate posts written by women (W) in the same way (i.e. by replacing words from female.txt with random words from male.txt)
Test classify.py on your obfuscated data and analyze the results.

Second, improve your obfuscation model:

Instead of replacing words from male.txt with randomly chosen words from female.txt, choose a semantically similar word from female.txt (use the same metric for replacing words from female.txt with words from male.txt). You may use any metric you choose for identifying semantically similar words. We recommend using cosine distance between pre-trained word embeddings (available here). You can also use SpaCy-based similarity here (example 1, example 2).
Test classify.py on data obfuscated using your improved model and analyze the results. The classifier should perform close to random at identifying gender (e.g. <53.5%) and should obtain at least 79% accuracy on classifying the subreddit.

Third, experiment with some basic modifications to your obfuscation models. For example, what if you randomly decide whether or not to replace words instead of replacing every lexicon word? What if you only replace words that have semantically similar enough counterparts?

Advanced Analysis

Develop your own obfuscation model. We provide background.csv, a large data set of Reddit posts tagged with gender and subreddit information that you may use to train your obfuscation model. A larger version of the background corpus is available here. Your ultimate goal should be to obfuscate text so that the classifier is unable to determine the gender of an author (no better than random guessing) without compromising the accuracy of the subreddit classification task. However, creative or thorough approaches will receive full credit, even if they do not significantly improve results. Some ideas you may consider:

Develop your own lexicons using pointwise mutual information scores or log odds with a dirichlet prior.
Follow the procedure described in “Obfuscating Gender in Social Media Writing”.
Use an adversarial objective as described in “Predicting Sales from the Language of Product Descriptions” to train a model that is good at predicting subreddit classification but bad a predicting gender. The key idea in this approach is to design a model that does not encode information about protected attributes (in this case, gender).
Use a model for style transfer to generate the text, such “Style Transfer Through Back-Translation”.

In your report, include a description of your model and results.

Extra Credit!

Perform multiple advanced analyses
Perform a new advanced analysis not suggested above, or perform one of the last 3 advanced analyses. Instead of simply describing your approach, provide a detailed and clear motivation, description, and analysis, including a comparison to the basic analysis.

Write-up

Write a 2-3 page report (ACL format) FirstName_LastName_hw3.pdf. Please do not write more than 4 pages. The report should include:

Description of baseline, improved and advanced (if completed) obfuscated models.
Description of the experiments you tried with your improved obfuscation model.
Results for your models by using them to obfuscate dataset.csv and running classify.py over your obfuscated test data.
Qualitative examples of text obfuscated with your models.
A brief discussion of the ethical implications of obfuscation and privacy that draws from concepts covered during lecture.

Grading (100 points + up to 10 extra credit)

20 points - Submitting assignment.
40 points - Completing basic requirements.
20 points - Write up is well-written, presents meaningful analysis, and contains all requested information.
20 points - Advanced analysis.
10 points - Extra credit.