1
00:00:05,280 --> 00:00:08,640
[MUSIC PLAYING]

2
00:00:10,360 --> 00:00:12,530
AUDACE NAKESHIMANA: In our
work on fairness and AI,

3
00:00:12,530 --> 00:00:14,990
we present a case study on
natural language processing

4
00:00:14,990 --> 00:00:17,030
titled "Identifying
and Mitigating

5
00:00:17,030 --> 00:00:20,140
Unintended Demographic
Bias in Machine Learning."

6
00:00:20,140 --> 00:00:23,030
We will break down what each
part of the title means.

7
00:00:23,030 --> 00:00:24,710
This is the work
that was done jointly

8
00:00:24,710 --> 00:00:27,440
by Chris Sweeney
and Maryam Najafian.

9
00:00:27,440 --> 00:00:29,330
My name is Audace Nakeshimana.

10
00:00:29,330 --> 00:00:32,930
I am a Researcher at MIT, and
I'll be presenting their work.

11
00:00:32,930 --> 00:00:35,630
The content of the slides
presents a high-level overview

12
00:00:35,630 --> 00:00:37,970
of a thesis project that
was done throughout a course

13
00:00:37,970 --> 00:00:38,680
of the year.

14
00:00:38,680 --> 00:00:40,860
It will be released
soon on MIT DSpace.

15
00:00:43,910 --> 00:00:46,360
AI has the power
to impact society

16
00:00:46,360 --> 00:00:48,260
in a vast amount of ways.

17
00:00:48,260 --> 00:00:50,710
For example, in the
banking industry,

18
00:00:50,710 --> 00:00:53,530
many companies are trying to use
machine learning to figure out

19
00:00:53,530 --> 00:00:57,280
if someone will default on a
loan given the data about them.

20
00:00:57,280 --> 00:00:59,590
Now, because machine
learning is used

21
00:00:59,590 --> 00:01:01,960
in the high-stakes
applications, errors

22
00:01:01,960 --> 00:01:04,900
that cause it to be unfair
could cause discrimination,

23
00:01:04,900 --> 00:01:07,750
preventing certain demographic
groups from gaining access

24
00:01:07,750 --> 00:01:09,160
to fair loans.

25
00:01:09,160 --> 00:01:10,990
This problem is
especially important

26
00:01:10,990 --> 00:01:13,270
to address in
developing nations where

27
00:01:13,270 --> 00:01:17,080
there may not be existing
sophisticated credit systems.

28
00:01:17,080 --> 00:01:20,200
Those nations will have to
rely on machine learning models

29
00:01:20,200 --> 00:01:23,410
to make these high-stakes
decisions, such as alternative

30
00:01:23,410 --> 00:01:26,350
credit scoring mechanisms
that are possibly going to be

31
00:01:26,350 --> 00:01:30,440
involving AI more and more.

32
00:01:30,440 --> 00:01:32,240
This work focuses
on applications

33
00:01:32,240 --> 00:01:35,120
of machine learning in
natural language processing.

34
00:01:35,120 --> 00:01:36,950
NLP is important to
studying fairness

35
00:01:36,950 --> 00:01:39,800
in AI because it is used
in many different domains,

36
00:01:39,800 --> 00:01:42,080
from education to marketing.

37
00:01:42,080 --> 00:01:43,910
Furthermore, there
are many sources

38
00:01:43,910 --> 00:01:46,520
of unintended demographic
bias in the standard natural

39
00:01:46,520 --> 00:01:48,260
language processing pipeline.

40
00:01:48,260 --> 00:01:51,410
Here we define the NLP
pipeline as a combination

41
00:01:51,410 --> 00:01:53,720
of steps involved, from
collecting natural language

42
00:01:53,720 --> 00:01:57,200
data to making decisions
based on the NLP models trend

43
00:01:57,200 --> 00:01:58,950
and resulting data.

44
00:01:58,950 --> 00:02:01,700
Lastly, [? therefore, ?] natural
language processing systems

45
00:02:01,700 --> 00:02:03,110
is easier to get.

46
00:02:03,110 --> 00:02:05,820
Unlike tabular systems from
banking or health care,

47
00:02:05,820 --> 00:02:08,280
where companies may be
reluctant to release data

48
00:02:08,280 --> 00:02:10,940
due to privacy
concerns, NLP data,

49
00:02:10,940 --> 00:02:13,610
especially in widely spoken
languages like French

50
00:02:13,610 --> 00:02:16,610
or English, is available from
different sources, including

51
00:02:16,610 --> 00:02:19,580
social media and different
forms of formal and informal

52
00:02:19,580 --> 00:02:23,300
publications, making it more
effective to use in research

53
00:02:23,300 --> 00:02:27,580
on how to make NLP
systems more fair.

54
00:02:27,580 --> 00:02:30,250
We now break down what
unintended demographic bias

55
00:02:30,250 --> 00:02:31,190
means.

56
00:02:31,190 --> 00:02:33,100
The unintended part
means that this bias

57
00:02:33,100 --> 00:02:36,310
comes as an adverse side
effect, not deliberately learned

58
00:02:36,310 --> 00:02:37,870
in a machine learning model.

59
00:02:37,870 --> 00:02:40,390
The demographic part means
that the bias translates

60
00:02:40,390 --> 00:02:43,348
into some sort of inequality
between demographic groups

61
00:02:43,348 --> 00:02:45,640
that could cause discrimination
in a downstream machine

62
00:02:45,640 --> 00:02:47,000
learning model.

63
00:02:47,000 --> 00:02:50,230
And finally, bias is an artifact
of natural language processing

64
00:02:50,230 --> 00:02:53,350
pipeline that causes
this unfairness.

65
00:02:53,350 --> 00:02:55,420
Bias is [INAUDIBLE] term.

66
00:02:55,420 --> 00:02:57,280
Therefore, it is
important that we

67
00:02:57,280 --> 00:02:59,140
center on a specific
form of bias that

68
00:02:59,140 --> 00:03:02,380
causes unfairness in typical
machine learning applications.

69
00:03:02,380 --> 00:03:03,970
In gender-based
demographic bias,

70
00:03:03,970 --> 00:03:05,860
for example, machine
learning model

71
00:03:05,860 --> 00:03:09,520
might associate specific types
of jobs to specific gender

72
00:03:09,520 --> 00:03:11,800
just because it's the way
it is in the data used

73
00:03:11,800 --> 00:03:12,730
to train the model.

74
00:03:16,470 --> 00:03:19,110
Within unintended
demographic bias,

75
00:03:19,110 --> 00:03:20,730
there are two
different types of bias

76
00:03:20,730 --> 00:03:22,920
that will focus on in
natural language processing

77
00:03:22,920 --> 00:03:24,210
applications.

78
00:03:24,210 --> 00:03:26,940
These are bias in
sentiment analysis systems

79
00:03:26,940 --> 00:03:29,550
that analyze positive or
negative feelings associated

80
00:03:29,550 --> 00:03:32,340
with words or phrases
and toxicity analysis

81
00:03:32,340 --> 00:03:35,820
systems designed to detect
derogatory or offensive terms

82
00:03:35,820 --> 00:03:36,810
in words or phrases.

83
00:03:40,020 --> 00:03:42,120
Sentiment bias
refers to an artifact

84
00:03:42,120 --> 00:03:45,000
of the machine learning
pipeline that causes unfairness

85
00:03:45,000 --> 00:03:47,130
in sentiment analysis systems.

86
00:03:47,130 --> 00:03:50,040
And toxicity bias is an
artifact of the pipeline that

87
00:03:50,040 --> 00:03:53,040
causes unfairness in systems
that tries to predict toxicity

88
00:03:53,040 --> 00:03:54,330
from text.

89
00:03:54,330 --> 00:03:57,202
In either sentiment analysis
or toxicity prediction,

90
00:03:57,202 --> 00:03:59,160
it is important that our
machine learning model

91
00:03:59,160 --> 00:04:01,740
doesn't use sensitive
attributes describing

92
00:04:01,740 --> 00:04:04,650
someone's demographic to inform
them whether a sentence should

93
00:04:04,650 --> 00:04:07,020
be positive or negative
sentiment or toxic

94
00:04:07,020 --> 00:04:09,890
or less toxic.

95
00:04:09,890 --> 00:04:12,940
Toxicity classification
is used in a wide variety

96
00:04:12,940 --> 00:04:14,390
of applications.

97
00:04:14,390 --> 00:04:17,560
For example, it can be used
to censor online comments that

98
00:04:17,560 --> 00:04:19,600
are too toxic or offensive.

99
00:04:19,600 --> 00:04:23,260
Unfortunately, these
algorithms can be very unfair.

100
00:04:23,260 --> 00:04:25,650
For example, the decision
of whether sentence

101
00:04:25,650 --> 00:04:28,630
is toxic or non-toxic
can depend solely

102
00:04:28,630 --> 00:04:30,310
on the demographic
identity term,

103
00:04:30,310 --> 00:04:34,420
such as American or Mexican,
that appears in the sentence.

104
00:04:34,420 --> 00:04:36,820
This unfairness can be caused
by many different artifacts

105
00:04:36,820 --> 00:04:39,430
of the natural language
processing pipeline.

106
00:04:39,430 --> 00:04:42,550
For instance, certain
nationalities and ethnic groups

107
00:04:42,550 --> 00:04:45,190
are specifically more
frequently marginalized.

108
00:04:45,190 --> 00:04:47,830
And this is reflected in the
language usually associated

109
00:04:47,830 --> 00:04:48,760
with them.

110
00:04:48,760 --> 00:04:52,050
Therefore, training NLP
algorithms and resulting data

111
00:04:52,050 --> 00:04:54,040
sets could result
in a certain form

112
00:04:54,040 --> 00:04:55,705
of unintended demographic bias.

113
00:04:58,990 --> 00:05:02,090
We want to drive home the point
of unintended demographic bias

114
00:05:02,090 --> 00:05:03,850
versus unfairness.

115
00:05:03,850 --> 00:05:06,190
Unintended demographic bias
can enter a typical machine

116
00:05:06,190 --> 00:05:08,990
learning pipeline from a
wide variety of sources,

117
00:05:08,990 --> 00:05:11,340
from the word corpus
to the word embedding,

118
00:05:11,340 --> 00:05:14,330
the data sent to the algorithm,
and finally from the thresholds

119
00:05:14,330 --> 00:05:15,710
used to make decisions.

120
00:05:15,710 --> 00:05:18,533
The possible unfairness
or the discrimination

121
00:05:18,533 --> 00:05:20,450
comes at the point where
this machine learning

122
00:05:20,450 --> 00:05:23,090
model meets society and
actually causes harm.

123
00:05:23,090 --> 00:05:25,610
This work addresses
mitigating and identifying

124
00:05:25,610 --> 00:05:28,400
unintended demographic
bias at each stage

125
00:05:28,400 --> 00:05:30,290
in the natural language
processing pipeline,

126
00:05:30,290 --> 00:05:32,330
from the words corpus
to the decision level.

127
00:05:35,170 --> 00:05:36,880
Our big goal here
is to find ways

128
00:05:36,880 --> 00:05:38,860
to mitigate the bias
that we might inherently

129
00:05:38,860 --> 00:05:41,320
find in the text corpora
or other types of data

130
00:05:41,320 --> 00:05:45,340
representation that are used
to build NLP applications.

131
00:05:45,340 --> 00:05:47,340
For this module,
we cover measuring

132
00:05:47,340 --> 00:05:49,780
unintended demographic
bias in word embeddings

133
00:05:49,780 --> 00:05:52,450
and using adversarial learning
to mitigate word embedding

134
00:05:52,450 --> 00:05:53,870
bias.

135
00:05:53,870 --> 00:05:55,810
The corresponding
thesis goes further,

136
00:05:55,810 --> 00:05:58,360
and it covers techniques for
identifying and mitigating

137
00:05:58,360 --> 00:06:01,360
unintended demographic bias
at other stages of the NLP

138
00:06:01,360 --> 00:06:03,570
pipeline.

139
00:06:03,570 --> 00:06:06,375
We now cover the work as
measuring word embedding bias.

140
00:06:09,480 --> 00:06:12,150
Word embeddings [? encode ?]
[? text ?] into vector spaces

141
00:06:12,150 --> 00:06:15,000
where distances between words
describe a certain semantic

142
00:06:15,000 --> 00:06:16,080
meaning.

143
00:06:16,080 --> 00:06:18,380
This allows one to
complete the analogy of man

144
00:06:18,380 --> 00:06:20,710
is to woman as king is to queen.

145
00:06:20,710 --> 00:06:24,360
Unfortunately, researchers
Tolga Balukbasi and others

146
00:06:24,360 --> 00:06:26,880
found that even for word
embeddings trained from Google

147
00:06:26,880 --> 00:06:30,910
News articles, there exists bias
in word embedding space, where

148
00:06:30,910 --> 00:06:34,500
the analogy becomes man is to
woman as computer programmer is

149
00:06:34,500 --> 00:06:37,410
to homemaker, another
word for a housewife.

150
00:06:37,410 --> 00:06:40,170
This is concerning given
that word embeddings could

151
00:06:40,170 --> 00:06:42,690
be used in natural language
processing applications devoted

152
00:06:42,690 --> 00:06:45,570
to predicting whether someone
should get a certain job.

153
00:06:45,570 --> 00:06:48,870
However, it is difficult to
quantify the bias just based

154
00:06:48,870 --> 00:06:52,230
on the vector space analogies.

155
00:06:52,230 --> 00:06:55,740
In this work, researchers
Sweeney and Najafian

156
00:06:55,740 --> 00:06:58,090
develop a system to
measure sentiment bias

157
00:06:58,090 --> 00:07:00,810
in word embeddings
to a specific number.

158
00:07:00,810 --> 00:07:04,230
The way they do this is they
take the bias toward embeddings

159
00:07:04,230 --> 00:07:06,210
and use them to
initialize an unbiased

160
00:07:06,210 --> 00:07:08,215
labeled word
sentiments data set.

161
00:07:08,215 --> 00:07:12,300
They train a logistic regression
classifier on this data set,

162
00:07:12,300 --> 00:07:14,100
and they predict
negative sentiment

163
00:07:14,100 --> 00:07:16,500
for a set of identity terms.

164
00:07:16,500 --> 00:07:19,160
For example, in
this case, this is

165
00:07:19,160 --> 00:07:21,690
a set of identity terms
describing demographics

166
00:07:21,690 --> 00:07:23,920
from different national origins.

167
00:07:23,920 --> 00:07:25,530
They analyzed the
negative sentiment

168
00:07:25,530 --> 00:07:27,720
for each identity
term and predict

169
00:07:27,720 --> 00:07:33,460
a score that describes the
bias in word embeddings.

170
00:07:33,460 --> 00:07:35,950
This score is the divergence
between the [INAUDIBLE]

171
00:07:35,950 --> 00:07:37,810
for abilities, for
negative sentiment,

172
00:07:37,810 --> 00:07:40,010
for national origin,
identity terms,

173
00:07:40,010 --> 00:07:41,640
and the uniform distribution.

174
00:07:41,640 --> 00:07:44,320
The uniform distribution
describes a perfectly fair

175
00:07:44,320 --> 00:07:46,690
case, wherein a
demographic is receiving

176
00:07:46,690 --> 00:07:49,360
an equal amount of sentiment
in the word embedding model.

177
00:07:52,020 --> 00:07:54,990
Now that we have a grasp
on the word embedding bias,

178
00:07:54,990 --> 00:08:00,030
we can start to figure out how
to mitigate some of this bias.

179
00:08:00,030 --> 00:08:02,820
In the thesis,
Sweeney and Nafajian

180
00:08:02,820 --> 00:08:04,800
describe how they use
adversarial learning

181
00:08:04,800 --> 00:08:06,780
to debias word embeddings.

182
00:08:06,780 --> 00:08:09,570
Different identity terms can
be more or less correlated

183
00:08:09,570 --> 00:08:11,880
with positive or
negative sentiment.

184
00:08:11,880 --> 00:08:15,570
For example, words like
American, Mexican, and German

185
00:08:15,570 --> 00:08:18,720
can have more correlations with
negative sentiment subspaces

186
00:08:18,720 --> 00:08:20,400
and positive
sentiment subspaces,

187
00:08:20,400 --> 00:08:22,515
because in the
data sets used, it

188
00:08:22,515 --> 00:08:24,390
might appear to be more
frequently associated

189
00:08:24,390 --> 00:08:26,700
with negative or
positive sentiments.

190
00:08:26,700 --> 00:08:27,690
This is concerning.

191
00:08:27,690 --> 00:08:29,273
Even downstream
machine learning model

192
00:08:29,273 --> 00:08:30,990
picks up on these correlations.

193
00:08:30,990 --> 00:08:34,230
Ideally, you want to have
each of those identity terms

194
00:08:34,230 --> 00:08:36,720
to a neutral point between
negative and positive sentiment

195
00:08:36,720 --> 00:08:39,929
subspaces without distorting
their meaning within the vector

196
00:08:39,929 --> 00:08:44,790
space so that the word embedding
model can still be useful.

197
00:08:44,790 --> 00:08:48,240
They use an adversarial learning
algorithm to achieve this.

198
00:08:48,240 --> 00:08:50,610
More details of this
algorithm are described

199
00:08:50,610 --> 00:08:51,780
in the corresponding phases.

200
00:08:54,860 --> 00:08:57,350
I now present some of
their work in evaluating

201
00:08:57,350 --> 00:08:59,750
how adversarial
learning algorithms can

202
00:08:59,750 --> 00:09:02,420
debias word embeddings and make
the resulting natural language

203
00:09:02,420 --> 00:09:04,900
processing system more fair.

204
00:09:04,900 --> 00:09:08,490
We focus on realistic systems
in both sentiment analysis

205
00:09:08,490 --> 00:09:10,410
and toxicity prediction.

206
00:09:10,410 --> 00:09:13,200
For each application,
Sweeney and Najafian

207
00:09:13,200 --> 00:09:14,880
define fairness
metrics to let us

208
00:09:14,880 --> 00:09:17,370
know whether the debiased
word embeddings are actually

209
00:09:17,370 --> 00:09:19,690
helping.

210
00:09:19,690 --> 00:09:22,180
These fairness metrics
often come in the form

211
00:09:22,180 --> 00:09:24,040
of a templates data set.

212
00:09:24,040 --> 00:09:27,070
Researchers have created these
data sets to somewhat tease out

213
00:09:27,070 --> 00:09:30,640
different biases with respect
to different demographic groups.

214
00:09:30,640 --> 00:09:33,550
For example, this set
is meant to tease out

215
00:09:33,550 --> 00:09:36,640
biases between African-American
names and European-American

216
00:09:36,640 --> 00:09:41,590
names when substituting each
name out in the same sentence.

217
00:09:41,590 --> 00:09:43,390
Similar template
data sets have been

218
00:09:43,390 --> 00:09:46,090
created for toxicity
classification algorithms,

219
00:09:46,090 --> 00:09:48,520
where you sub out different
demographic identity

220
00:09:48,520 --> 00:09:51,010
terms within a sentence
and compare differences

221
00:09:51,010 --> 00:09:53,100
in the overall
toxicity predictions.

222
00:09:56,140 --> 00:09:58,590
Sweeney and Najafian
used these templates data

223
00:09:58,590 --> 00:10:02,290
sets to compute fairness for a
real-world toxicity classifier.

224
00:10:02,290 --> 00:10:04,900
This graph shows per-term
AUC distributions

225
00:10:04,900 --> 00:10:07,750
for CNN convolution
neural network

226
00:10:07,750 --> 00:10:10,960
that was trained on a toxicity
classification data set.

227
00:10:10,960 --> 00:10:13,390
The x-axis represents
each demographic group,

228
00:10:13,390 --> 00:10:16,170
where the templates data set
has that identify term subbed

229
00:10:16,170 --> 00:10:17,980
in for each sentence.

230
00:10:17,980 --> 00:10:22,330
Each dot describes a particular
training run of the CNN.

231
00:10:22,330 --> 00:10:25,390
The y-axis describes the
area under the curve accuracy

232
00:10:25,390 --> 00:10:27,430
for this template's data set.

233
00:10:27,430 --> 00:10:29,710
One can see that there
is a lot of disparity

234
00:10:29,710 --> 00:10:33,400
between the accuracies for
different demographic groups.

235
00:10:33,400 --> 00:10:36,400
Ideally, you would want the
variance in different training

236
00:10:36,400 --> 00:10:38,980
runs to be compressed as
well as the differences

237
00:10:38,980 --> 00:10:41,440
between each demographic
group in the AUC scores

238
00:10:41,440 --> 00:10:42,670
to be smaller.

239
00:10:42,670 --> 00:10:45,970
Sweeney and Najafian show
a toxicity classification

240
00:10:45,970 --> 00:10:48,820
algorithm that uses the
debias towards embeddings

241
00:10:48,820 --> 00:10:52,060
creates better results.

242
00:10:52,060 --> 00:10:55,290
This slide shows results for
per-term AUC distributions

243
00:10:55,290 --> 00:10:58,870
for the CNN with different
debias treatments.

244
00:10:58,870 --> 00:11:02,680
Sweeney and Najafian measure how
their word embedding debiasing

245
00:11:02,680 --> 00:11:05,510
compares to other
state-of-the-art techniques.

246
00:11:05,510 --> 00:11:08,500
Further discussion and
evaluation of these graphs

247
00:11:08,500 --> 00:11:12,540
are presented in the
corresponding thesis.

248
00:11:12,540 --> 00:11:16,650
To wrap up, we describe some
key takeaways from this project.

249
00:11:16,650 --> 00:11:18,963
First, there is
no silver bullet.

250
00:11:18,963 --> 00:11:20,880
There are many different
types of applications

251
00:11:20,880 --> 00:11:23,370
and various types of bias
to correct for when trying

252
00:11:23,370 --> 00:11:25,860
to make NLP systems more fair.

253
00:11:25,860 --> 00:11:29,160
Second, bias can emanate
from any stage of the machine

254
00:11:29,160 --> 00:11:30,510
learning pipeline.

255
00:11:30,510 --> 00:11:33,810
Therefore, having to also
identify and mitigate bias

256
00:11:33,810 --> 00:11:37,170
at all stages of the machine
learning pipeline is essential.

257
00:11:37,170 --> 00:11:40,800
Finally, we focus on solving
this problem within an academic

258
00:11:40,800 --> 00:11:43,470
context for natural language
processing pipeline,

259
00:11:43,470 --> 00:11:48,100
but this cannot all
be solved in academia.

260
00:11:48,100 --> 00:11:51,030
For example, much of the
unintended bias in the data

261
00:11:51,030 --> 00:11:54,610
set, like the text corpus,
could come from decisions made

262
00:11:54,610 --> 00:11:56,710
upstream in direct collection.

263
00:11:56,710 --> 00:11:58,720
Furthermore,
unintended bias could

264
00:11:58,720 --> 00:12:00,610
come from decisions
made when deploying

265
00:12:00,610 --> 00:12:02,430
the model into society.

266
00:12:02,430 --> 00:12:04,390
When the model is used
in a way that does not

267
00:12:04,390 --> 00:12:07,730
resonate with how the data was
collected in the first place,

268
00:12:07,730 --> 00:12:09,730
this could cause discrimination.

269
00:12:09,730 --> 00:12:11,860
An example of this is
when the data collected

270
00:12:11,860 --> 00:12:14,110
from a specific
demographic population

271
00:12:14,110 --> 00:12:17,260
is used to make predictions that
affect other demographics that

272
00:12:17,260 --> 00:12:20,050
were not taken into account
during data collection.

273
00:12:20,050 --> 00:12:23,178
Finally, it is important to have
efficient channels of feedback

274
00:12:23,178 --> 00:12:24,595
for these machine
learning models.

275
00:12:27,400 --> 00:12:29,410
The work presented
in this module

276
00:12:29,410 --> 00:12:32,620
highlights why fairness is
a very important concept.

277
00:12:32,620 --> 00:12:35,650
It is therefore critical for
data scientists and engineers

278
00:12:35,650 --> 00:12:38,350
to measure and understand
performance of their models

279
00:12:38,350 --> 00:12:42,490
not just through accuracy,
but also through fairness.

280
00:12:42,490 --> 00:12:45,840
[MUSIC PLAYING]