1
00:00:17,060 --> 00:00:19,580
ADAM MARTIN: And so I just
want to say a couple sentences

2
00:00:19,580 --> 00:00:24,610
about DNA sequencing,
just to finish that up.

3
00:00:24,610 --> 00:00:29,870
And so you'll remember this
slide from last lecture.

4
00:00:29,870 --> 00:00:35,630
And remember, the way this
Sanger technique works

5
00:00:35,630 --> 00:00:38,450
is to set up four
different reactions where

6
00:00:38,450 --> 00:00:41,120
each reaction has
a different one

7
00:00:41,120 --> 00:00:44,970
of these dideoxynucleotides.

8
00:00:44,970 --> 00:00:51,120
OK, so there's four
reactions, each

9
00:00:51,120 --> 00:00:59,760
with different dideoxy NTP.

10
00:00:59,760 --> 00:01:04,830
And I brought along a gel
that I ran a while ago,

11
00:01:04,830 --> 00:01:09,460
which is basically-- it's from
sequencing gel, and you can--

12
00:01:09,460 --> 00:01:12,550
I'll pass this around so
you can take a look at it.

13
00:01:12,550 --> 00:01:16,650
So the four different
lanes for each sample

14
00:01:16,650 --> 00:01:21,300
are the different
dideoxynucleotide reactions.

15
00:01:21,300 --> 00:01:24,570
And what I want you to notice
as that's passing around

16
00:01:24,570 --> 00:01:27,780
and you're looking at it is
that the different reactions

17
00:01:27,780 --> 00:01:31,740
with the different
dideoxynucleotides

18
00:01:31,740 --> 00:01:36,500
give different patterns
of DNA fragment lengths.

19
00:01:36,500 --> 00:01:40,140
So there are different
patterns of fragment lengths.

20
00:01:47,580 --> 00:01:52,950
And the different patterns
are based on the fact--

21
00:01:55,690 --> 00:01:59,000
this is based on the sequence,
the sequence of the template,

22
00:01:59,000 --> 00:01:59,500
OK?

23
00:02:06,180 --> 00:02:11,190
And so if we look at the
example up here, what you'll see

24
00:02:11,190 --> 00:02:15,900
is that in this banding
pattern for dideoxy TTP,

25
00:02:15,900 --> 00:02:18,900
you see that there's a really
short fragment at the bottom

26
00:02:18,900 --> 00:02:22,830
there, and so that fragment
indicates that there must be

27
00:02:22,830 --> 00:02:25,620
an A in the template sequence.

28
00:02:25,620 --> 00:02:30,240
The next fragment up would be
this one in the dideoxy GTP

29
00:02:30,240 --> 00:02:35,610
lane, and that indicates that
one nucleotide beyond this A

30
00:02:35,610 --> 00:02:39,120
is a C position, and
so on and so forth,

31
00:02:39,120 --> 00:02:42,270
such that you can sort
of order the fragments

32
00:02:42,270 --> 00:02:45,630
and see which reaction
has a fragment

33
00:02:45,630 --> 00:02:49,110
and then read off
a DNA sequence.

34
00:02:49,110 --> 00:02:50,880
OK, so conceptually,
that's how you

35
00:02:50,880 --> 00:02:57,150
would read off the sequence
of a given strand of DNA, OK?

36
00:02:57,150 --> 00:03:00,720
So you might be wondering,
if now, we just read off

37
00:03:00,720 --> 00:03:05,160
sequence as a series of colors,
why am I even introducing

38
00:03:05,160 --> 00:03:06,510
this technique?

39
00:03:06,510 --> 00:03:08,160
And the reason is
because I think

40
00:03:08,160 --> 00:03:14,730
it's important for you as
potentially future scientists

41
00:03:14,730 --> 00:03:18,870
to know that when you're
faced with a problem,

42
00:03:18,870 --> 00:03:21,480
how you might discover
something new.

43
00:03:21,480 --> 00:03:24,540
And I see the
Sanger method of DNA

44
00:03:24,540 --> 00:03:29,250
sequencing as a really
clever and elegant way

45
00:03:29,250 --> 00:03:33,510
in which Fred Sanger solved
the problem of DNA sequencing,

46
00:03:33,510 --> 00:03:37,710
and while we don't necessarily
do it that way today, it still

47
00:03:37,710 --> 00:03:40,500
illustrates a concept
that's important,

48
00:03:40,500 --> 00:03:43,020
the concept of
chain termination,

49
00:03:43,020 --> 00:03:46,260
and I think there is something
to be learned from this older

50
00:03:46,260 --> 00:03:48,900
technique, even if
it's not exactly how we

51
00:03:48,900 --> 00:03:51,750
sequence DNA today.

52
00:03:51,750 --> 00:03:54,020
So for today's
lecture, we're going

53
00:03:54,020 --> 00:04:00,300
to continue on our quest to
basically clone a gene that's

54
00:04:00,300 --> 00:04:01,935
responsible for a disease.

55
00:04:04,920 --> 00:04:09,960
And so we started this
in the last lecture.

56
00:04:09,960 --> 00:04:12,930
And I guess one
thing we would want

57
00:04:12,930 --> 00:04:15,120
to start with is
a disease, so I'm

58
00:04:15,120 --> 00:04:19,860
going to introduce to you now
a disease called aniridia.

59
00:04:19,860 --> 00:04:23,720
And in order to clone
the gene for a disease,

60
00:04:23,720 --> 00:04:25,320
it has to be a
heritable disease,

61
00:04:25,320 --> 00:04:28,590
in this case, because we're
going to use linkage analysis

62
00:04:28,590 --> 00:04:30,420
to identify it.

63
00:04:30,420 --> 00:04:35,250
So aniridia is a disease that's
an eye disease in humans.

64
00:04:35,250 --> 00:04:37,800
It's a rare eye disease.

65
00:04:37,800 --> 00:04:43,020
So I want to show you a bit of
an example of this eye disease.

66
00:04:43,020 --> 00:04:45,180
The way this disease
manifests itself

67
00:04:45,180 --> 00:04:48,930
is it's basically the
affected individual

68
00:04:48,930 --> 00:04:52,380
has an eye that is
lacking an iris.

69
00:04:52,380 --> 00:04:55,230
So I'm going to show you
what this looks like.

70
00:04:55,230 --> 00:04:58,950
If you're squeamish or
don't like weird eyes

71
00:04:58,950 --> 00:05:02,320
and you don't want to
look, you can look away.

72
00:05:02,320 --> 00:05:09,150
But I will show you affected
phenotype in 3, 2, 1, OK,

73
00:05:09,150 --> 00:05:11,910
everyone looking who
wants to see weird eyes.

74
00:05:11,910 --> 00:05:13,860
OK, good.

75
00:05:13,860 --> 00:05:17,970
So that is a individual that
has aniridia, and also this one.

76
00:05:17,970 --> 00:05:21,750
So you see there's no
clear iris in these eyes.

77
00:05:21,750 --> 00:05:25,470
And this disease is associated
with other abnormalities

78
00:05:25,470 --> 00:05:29,550
of the eye that
severely impair vision.

79
00:05:29,550 --> 00:05:33,940
And this is an
inherited disease,

80
00:05:33,940 --> 00:05:38,580
and this is a
pedigree from a family

81
00:05:38,580 --> 00:05:42,600
or series of families where
the disease is propagating.

82
00:05:42,600 --> 00:05:46,720
And so anyone have a suggestion
as to what mode of inheritance

83
00:05:46,720 --> 00:05:47,220
this is?

84
00:05:54,310 --> 00:05:56,450
Anyone want to rule a mode out?

85
00:05:56,450 --> 00:05:58,010
Rachel, you have an idea?

86
00:05:58,010 --> 00:06:00,416
AUDIENCE: I was going to
say X-linked dominant,

87
00:06:00,416 --> 00:06:07,890
but [INAUDIBLE]

88
00:06:07,890 --> 00:06:11,940
ADAM MARTIN: OK, so let's
take X-linked dominant.

89
00:06:11,940 --> 00:06:16,950
So if it was X-linked
dominant, then this male

90
00:06:16,950 --> 00:06:19,950
would have an X chromosome
with the dominant allele

91
00:06:19,950 --> 00:06:24,840
of the disease and should
only pass it to his females.

92
00:06:24,840 --> 00:06:26,880
So I don't think that
it would necessarily

93
00:06:26,880 --> 00:06:28,350
be X-linked dominant.

94
00:06:28,350 --> 00:06:30,967
Anyone else have an idea?

95
00:06:30,967 --> 00:06:31,550
Yeah, Georgia?

96
00:06:31,550 --> 00:06:32,690
AUDIENCE: Autosomal.

97
00:06:32,690 --> 00:06:34,130
ADAM MARTIN: Autosomal dominant.

98
00:06:34,130 --> 00:06:37,200
I like autosomal dominant.

99
00:06:37,200 --> 00:06:39,860
So in this case,
you see you have

100
00:06:39,860 --> 00:06:42,440
an individual with
the disease and they

101
00:06:42,440 --> 00:06:46,262
marry into a family with
no history of the disease.

102
00:06:46,262 --> 00:06:48,470
One thing I'll point out,
for many of these diseases,

103
00:06:48,470 --> 00:06:52,970
they're extremely rare, so if
you see sort of a family tree

104
00:06:52,970 --> 00:06:55,820
where there's no
instance of the disease,

105
00:06:55,820 --> 00:06:57,920
if it's a rare
disease, it's likely

106
00:06:57,920 --> 00:07:01,660
that these individuals
are not carriers.

107
00:07:01,660 --> 00:07:05,270
And so in this case, if you
assume that this person doesn't

108
00:07:05,270 --> 00:07:07,750
have any form of the--

109
00:07:07,750 --> 00:07:10,550
isn't a carrier for
the disease, then this

110
00:07:10,550 --> 00:07:12,980
cross here resulting
in about half

111
00:07:12,980 --> 00:07:15,710
of the individuals affected
with the disease, that

112
00:07:15,710 --> 00:07:19,790
would be a characteristic of
an autosomal dominant disease.

113
00:07:19,790 --> 00:07:21,470
So everyone understand my logic?

114
00:07:21,470 --> 00:07:22,190
Yes, Carlos?

115
00:07:22,190 --> 00:07:24,897
AUDIENCE: What are--
why is two and that

116
00:07:24,897 --> 00:07:27,230
looks like three on the slide,
why are they crossed out?

117
00:07:27,230 --> 00:07:29,300
ADAM MARTIN: I think
they're deceased.

118
00:07:29,300 --> 00:07:29,800
Yes.

119
00:07:32,880 --> 00:07:36,120
OK, so let's say
you have a pedigree.

120
00:07:36,120 --> 00:07:42,930
You have pedigrees, you're able
to try to link this marker to--

121
00:07:42,930 --> 00:07:46,480
or the disease phenotype with
various molecular markers,

122
00:07:46,480 --> 00:07:49,560
which we discussed in
last week's lectures,

123
00:07:49,560 --> 00:07:55,260
then you're on the
way to performing

124
00:07:55,260 --> 00:07:58,740
a process which is known
as positional gene cloning.

125
00:08:01,380 --> 00:08:04,470
And what positional
gene cloning is

126
00:08:04,470 --> 00:08:09,000
is it's basically cloning
a gene and a allele that's

127
00:08:09,000 --> 00:08:13,710
responsible for a disease based
on its position in the genome,

128
00:08:13,710 --> 00:08:18,000
it's position in a particular
chromosomal region.

129
00:08:18,000 --> 00:08:25,740
So it's basically
cloning a gene based

130
00:08:25,740 --> 00:08:31,920
on its chromosomal position
or its chromosome position.

131
00:08:38,610 --> 00:08:42,030
And the first step of
positional gene cloning

132
00:08:42,030 --> 00:08:45,900
would be to establish maybe
what chromosome it's on.

133
00:08:45,900 --> 00:08:50,970
And a straightforward way to
do this, as we've basically

134
00:08:50,970 --> 00:08:54,420
been discussing almost from
when I started lecturing,

135
00:08:54,420 --> 00:08:56,700
is to create some
sort of linkage

136
00:08:56,700 --> 00:09:02,220
map or do linkage
mapping to identify,

137
00:09:02,220 --> 00:09:06,160
in the case of humans, molecular
markers that this disease

138
00:09:06,160 --> 00:09:08,160
allele is linked to.

139
00:09:12,780 --> 00:09:15,930
And remember, in
last week's lecture,

140
00:09:15,930 --> 00:09:19,530
we talked about a number
of different polymorphisms

141
00:09:19,530 --> 00:09:22,410
that are present
in the human genome

142
00:09:22,410 --> 00:09:28,050
that we can use to establish
linkage with a given phenotype.

143
00:09:28,050 --> 00:09:30,920
In this case, it's
a human disease.

144
00:09:30,920 --> 00:09:35,280
And we talked about this example
for a microsatellite marker.

145
00:09:35,280 --> 00:09:39,630
And in this case, we
talked through this example

146
00:09:39,630 --> 00:09:43,290
of how this dominant
allele, P, is linked

147
00:09:43,290 --> 00:09:47,010
to this microsatellite
allele m double prime,

148
00:09:47,010 --> 00:09:50,070
because if you look
at the pedigree here,

149
00:09:50,070 --> 00:09:54,000
all of the affected
individuals here

150
00:09:54,000 --> 00:09:57,090
contain this m double
prime sized fragment

151
00:09:57,090 --> 00:10:00,030
for this microsatellite.

152
00:10:00,030 --> 00:10:02,220
Another thing to
notice here is you

153
00:10:02,220 --> 00:10:06,510
can see that this couple has
been faithful to each other,

154
00:10:06,510 --> 00:10:09,240
because basically,
each of the children

155
00:10:09,240 --> 00:10:13,050
have an allele from the father
and an allele from the mother.

156
00:10:13,050 --> 00:10:16,500
So you can see that type of--

157
00:10:16,500 --> 00:10:22,590
you can see that using this type
of molecular marker as well.

158
00:10:22,590 --> 00:10:24,880
OK, so you establish linkage.

159
00:10:24,880 --> 00:10:30,820
So linkage mapping establishes
the chromosome position

160
00:10:30,820 --> 00:10:32,890
of a given allele and the gene.

161
00:10:37,450 --> 00:10:39,310
And this chromosome
position sort of

162
00:10:39,310 --> 00:10:42,340
gets maybe in the right
country, but you still

163
00:10:42,340 --> 00:10:47,370
have a long way before you get
to the specific street address.

164
00:10:47,370 --> 00:10:49,720
And so you have to then
sort of narrow it in

165
00:10:49,720 --> 00:10:54,580
to identify a smaller region
of the chromosome that could

166
00:10:54,580 --> 00:10:57,730
possibly contain this gene.

167
00:10:57,730 --> 00:11:02,140
And so what you would do is go
from this linkage map, where

168
00:11:02,140 --> 00:11:06,520
you maybe identify the position
of this gene within a couple

169
00:11:06,520 --> 00:11:10,840
map units, to this
next resolution of map

170
00:11:10,840 --> 00:11:14,390
called a physical map, OK?

171
00:11:14,390 --> 00:11:17,210
So we go from the
linkage position

172
00:11:17,210 --> 00:11:21,430
to the physical map
of the chromosome.

173
00:11:24,930 --> 00:11:27,810
And the physical map,
as the name implies,

174
00:11:27,810 --> 00:11:31,040
is when you have
physical pieces of DNA

175
00:11:31,040 --> 00:11:36,260
that are present in this
region of the chromosome.

176
00:11:36,260 --> 00:11:38,510
So the physical
map means you have

177
00:11:38,510 --> 00:11:42,530
cloned, so recombinant
pieces of DNA,

178
00:11:42,530 --> 00:11:49,220
cloned pieces of DNA which
encompass a given chromosome

179
00:11:49,220 --> 00:11:50,520
region.

180
00:11:50,520 --> 00:11:58,150
So these are encompassing
a chromosome region.

181
00:12:05,900 --> 00:12:10,910
OK, so how would you
get a piece of DNA

182
00:12:10,910 --> 00:12:14,560
that sort of is in this region?

183
00:12:14,560 --> 00:12:15,790
How would you start?

184
00:12:15,790 --> 00:12:17,870
How would you start
fishing for that DNA?

185
00:12:21,490 --> 00:12:24,910
So you've gone through
the process of linkage,

186
00:12:24,910 --> 00:12:29,200
you've identified sort
of a polymorphism that is

187
00:12:29,200 --> 00:12:31,330
linked to the disease allele.

188
00:12:31,330 --> 00:12:35,650
How would you go from there to
getting a physical piece of DNA

189
00:12:35,650 --> 00:12:41,690
that is present in that
region of the chromosome?

190
00:12:41,690 --> 00:12:43,720
So let's think back to--

191
00:12:43,720 --> 00:12:47,760
Jeremy, did you have an idea?

192
00:12:47,760 --> 00:12:52,388
AUDIENCE: Start by using PCR
to just amplify that chunk.

193
00:12:52,388 --> 00:12:55,800
[INAUDIBLE]

194
00:12:55,800 --> 00:12:57,597
ADAM MARTIN: And what
primers, I guess,

195
00:12:57,597 --> 00:12:58,680
would you use for the PCR?

196
00:13:01,902 --> 00:13:05,600
AUDIENCE: Depending on which
chunk you're trying to get,

197
00:13:05,600 --> 00:13:09,740
you'd use [INAUDIBLE]

198
00:13:09,740 --> 00:13:11,240
ADAM MARTIN: OK,
so Jeremy is saying

199
00:13:11,240 --> 00:13:14,150
if you knew the
sequence, and I guess

200
00:13:14,150 --> 00:13:18,800
if you're doing this
microsatellite analysis,

201
00:13:18,800 --> 00:13:22,310
you had primers that recognize
a sequence at a given

202
00:13:22,310 --> 00:13:25,100
genomic position, so you
actually know something

203
00:13:25,100 --> 00:13:29,060
about the sequence because
of this polymorphism,

204
00:13:29,060 --> 00:13:34,730
so you can use that knowledge
to then look for this sequence.

205
00:13:34,730 --> 00:13:38,040
And you could even look for
the microsatellite in a DNA

206
00:13:38,040 --> 00:13:38,540
library.

207
00:13:41,360 --> 00:13:44,780
OK, so you have
cloned pieces of DNA,

208
00:13:44,780 --> 00:13:48,650
and you're going to start with--

209
00:13:48,650 --> 00:13:49,880
I'm going to swap this.

210
00:13:54,200 --> 00:13:56,470
Your starting
position could be one

211
00:13:56,470 --> 00:14:00,200
of these polymorphisms in
the sequence around it,

212
00:14:00,200 --> 00:14:02,500
which you already know.

213
00:14:02,500 --> 00:14:05,140
So let's say you had this
microsatellite marker.

214
00:14:05,140 --> 00:14:07,810
You could then--
what I'm drawing here

215
00:14:07,810 --> 00:14:10,610
is a piece of genomic DNA.

216
00:14:10,610 --> 00:14:11,805
So this is genomic DNA.

217
00:14:18,040 --> 00:14:19,480
I'm just drawing the insert.

218
00:14:19,480 --> 00:14:21,430
This would be recombinant DNA.

219
00:14:21,430 --> 00:14:25,150
It would be present in
some vector or plasmid.

220
00:14:25,150 --> 00:14:28,090
But if you can identify
the sequence that

221
00:14:28,090 --> 00:14:32,770
contains this
microsatellite marker,

222
00:14:32,770 --> 00:14:36,610
then you would have the
microsatellite, but also

223
00:14:36,610 --> 00:14:39,190
the surrounding DNA, OK?

224
00:14:39,190 --> 00:14:43,000
So that sort of anchors
you at a given position.

225
00:14:43,000 --> 00:14:46,240
Now, you don't know if your
gene is in that piece of DNA,

226
00:14:46,240 --> 00:14:48,190
but you know that
it's linked, and so it

227
00:14:48,190 --> 00:14:52,270
should be around that
piece of DNA somewhere.

228
00:14:52,270 --> 00:14:55,640
And so it's unlikely
your gene is

229
00:14:55,640 --> 00:14:59,460
going to be on this small
piece of DNA that's cloned.

230
00:14:59,460 --> 00:15:02,350
This is probably just a
few kb, and you could still

231
00:15:02,350 --> 00:15:04,750
be very far away
from this, but that

232
00:15:04,750 --> 00:15:08,230
serves as a starting point
from which you can go from

233
00:15:08,230 --> 00:15:10,780
to get more and
more pieces of DNA

234
00:15:10,780 --> 00:15:13,330
such that eventually, you
have a bunch of pieces of DNA

235
00:15:13,330 --> 00:15:16,650
that are going to span
the entire region.

236
00:15:16,650 --> 00:15:21,130
So the way you identify
other pieces of DNA is you

237
00:15:21,130 --> 00:15:26,170
could start with a piece of DNA
maybe at the end of this insert

238
00:15:26,170 --> 00:15:28,510
and look for other
inserts that are not

239
00:15:28,510 --> 00:15:33,800
identical to this piece that
also contain this piece here.

240
00:15:33,800 --> 00:15:35,680
So that might get
you a piece that's

241
00:15:35,680 --> 00:15:41,050
overlapping, but extends
farther than your initial piece.

242
00:15:41,050 --> 00:15:43,600
So now you've moved
slightly farther away

243
00:15:43,600 --> 00:15:49,650
from your starting point, which
is this starting polymorphism.

244
00:15:49,650 --> 00:15:52,950
Then you could choose maybe
another DNA sequence here

245
00:15:52,950 --> 00:15:56,280
and look for a piece
of DNA that, again,

246
00:15:56,280 --> 00:15:58,740
is extending a bit farther out.

247
00:15:58,740 --> 00:16:01,350
And so you can see
how iteratively,

248
00:16:01,350 --> 00:16:04,710
you can get farther and farther
away from this starting point

249
00:16:04,710 --> 00:16:07,540
that you know your
gene is linked to.

250
00:16:07,540 --> 00:16:11,280
And this process of going
sort piece by piece and clone

251
00:16:11,280 --> 00:16:14,700
by clone away from
a starting position

252
00:16:14,700 --> 00:16:16,530
is known as a chromosome walk.

253
00:16:22,930 --> 00:16:25,570
And you can do this
bidirectionally.

254
00:16:25,570 --> 00:16:28,360
So you could also start
with a sequence of DNA

255
00:16:28,360 --> 00:16:32,230
here and look for a clone
that goes the other way.

256
00:16:35,230 --> 00:16:38,350
And you can see on
my slide up there,

257
00:16:38,350 --> 00:16:40,960
you can see that in
this case, they've

258
00:16:40,960 --> 00:16:44,290
taken a one map unit
region of the chromosome

259
00:16:44,290 --> 00:16:48,610
and they're illustrating
physical pieces of DNA that

260
00:16:48,610 --> 00:16:52,750
are overlapping that
encompass this entire region.

261
00:16:52,750 --> 00:16:56,440
So this could be much bigger
than the amount of DNA that

262
00:16:56,440 --> 00:16:59,380
would fit in one of these
clones in the bacteria,

263
00:16:59,380 --> 00:17:02,740
but by sort of identify
overlapping clones,

264
00:17:02,740 --> 00:17:04,839
you get the entire region.

265
00:17:04,839 --> 00:17:09,880
And what this is called here,
because these pieces of DNA

266
00:17:09,880 --> 00:17:13,289
are contiguous with each other,
this is known as a contig.

267
00:17:21,400 --> 00:17:22,900
Yeah, Jeremy?

268
00:17:22,900 --> 00:17:24,790
AUDIENCE: So would
how you get the--

269
00:17:24,790 --> 00:17:27,590
once you find one
of those pieces,

270
00:17:27,590 --> 00:17:29,864
how do you get the primer
for the end of it to start?

271
00:17:29,864 --> 00:17:32,170
Do you actually sequence
each of these pieces of DNA?

272
00:17:32,170 --> 00:17:33,340
ADAM MARTIN: You
could sequence it,

273
00:17:33,340 --> 00:17:34,960
or you could use a
technique that I'm

274
00:17:34,960 --> 00:17:37,300
going to talk about at the
end of my lecture, which

275
00:17:37,300 --> 00:17:38,980
I'll come back to.

276
00:17:38,980 --> 00:17:42,190
So nowadays, you'd probably just
sequence it and then maybe look

277
00:17:42,190 --> 00:17:45,520
for that in another clone.

278
00:17:45,520 --> 00:17:50,260
But even before we could
sequence DNA in entire genomes,

279
00:17:50,260 --> 00:17:52,840
you could do that
type of experiment

280
00:17:52,840 --> 00:17:55,040
by using a technique
called hybridization,

281
00:17:55,040 --> 00:17:58,150
which I'll come back to.

282
00:17:58,150 --> 00:18:02,480
OK, so the question in this
chromosome walk then becomes,

283
00:18:02,480 --> 00:18:05,260
how do you know when to stop?

284
00:18:05,260 --> 00:18:08,290
Because you could do this
for a very long time,

285
00:18:08,290 --> 00:18:09,950
but it might not be useful.

286
00:18:09,950 --> 00:18:12,400
So you have to
know when to stop,

287
00:18:12,400 --> 00:18:15,760
and you need to know when
you arrive at the gene

288
00:18:15,760 --> 00:18:18,340
that you're interested in,
which would be the gene that

289
00:18:18,340 --> 00:18:20,155
is responsible for the disease.

290
00:18:22,870 --> 00:18:25,300
So another way to
phrase this question is,

291
00:18:25,300 --> 00:18:28,960
how do you know when you have
an interesting gene on one

292
00:18:28,960 --> 00:18:29,980
of these fragments?

293
00:18:29,980 --> 00:18:34,660
So let's say this is an
interesting gene here.

294
00:18:34,660 --> 00:18:36,760
How do you identify
interesting genes?

295
00:18:41,800 --> 00:18:44,695
So now, let's talk about
identifying interesting genes.

296
00:18:52,990 --> 00:18:55,770
Anyone have an idea
for how they would--

297
00:18:55,770 --> 00:19:00,040
what criteria they would use
to define a gene as being

298
00:19:00,040 --> 00:19:00,910
interesting here?

299
00:19:08,810 --> 00:19:11,780
I mean, one could say that
all genes are interesting.

300
00:19:11,780 --> 00:19:13,450
If it's a gene, it's
interesting, right?

301
00:19:17,230 --> 00:19:20,770
How might we define whether or
not there's a gene even there?

302
00:19:20,770 --> 00:19:23,680
It could be-- there
could be a gene--

303
00:19:23,680 --> 00:19:25,490
how would you define a gene?

304
00:19:25,490 --> 00:19:27,130
Can someone define
for me a gene?

305
00:19:30,660 --> 00:19:32,630
Yeah, Miles?

306
00:19:32,630 --> 00:19:33,940
Is it Miles?

307
00:19:33,940 --> 00:19:35,590
No?

308
00:19:35,590 --> 00:19:37,590
Malik, OK.

309
00:19:37,590 --> 00:19:43,652
AUDIENCE: [INAUDIBLE] that would
create a starting and stopping

310
00:19:43,652 --> 00:19:44,576
point.

311
00:19:44,576 --> 00:19:47,617
So like [INAUDIBLE]

312
00:19:47,617 --> 00:19:49,700
ADAM MARTIN: So you'd look
for a piece of DNA that

313
00:19:49,700 --> 00:19:53,120
has a start and a stop codon?

314
00:19:53,120 --> 00:19:55,970
So you'd look for an open
reading frame, basically.

315
00:19:55,970 --> 00:19:56,840
Yeah.

316
00:19:56,840 --> 00:19:58,970
You could look for an
open reading frame.

317
00:20:01,860 --> 00:20:05,660
And so I totally agree
with Malik there.

318
00:20:05,660 --> 00:20:08,300
And another criteria
you could use

319
00:20:08,300 --> 00:20:11,450
is if it's encoding a
protein, at some point,

320
00:20:11,450 --> 00:20:15,500
it also must have been
transcribed as an mRNA.

321
00:20:15,500 --> 00:20:20,150
And there are some genes
that are transcribed as RNA

322
00:20:20,150 --> 00:20:21,980
but don't make a protein,
and they're often

323
00:20:21,980 --> 00:20:27,560
involved in coding or in
regulation of gene expression.

324
00:20:27,560 --> 00:20:28,730
So I'm going to--

325
00:20:28,730 --> 00:20:32,460
I'm going to say,
is it transcribed?

326
00:20:32,460 --> 00:20:37,420
So is there some
transcript that's made?

327
00:20:37,420 --> 00:20:40,700
And specifically, is it
transcribed in the tissue

328
00:20:40,700 --> 00:20:41,870
that we're interested in?

329
00:20:46,880 --> 00:20:49,790
So if we're talking
aniridia, we might

330
00:20:49,790 --> 00:20:53,750
be looking for genes that are
being expressed or transcribed

331
00:20:53,750 --> 00:20:57,325
specifically in eyes.

332
00:20:57,325 --> 00:20:58,700
You're looking
for something that

333
00:20:58,700 --> 00:21:00,770
might be expressed in the eye.

334
00:21:00,770 --> 00:21:02,990
If it's not
expressed in the eye,

335
00:21:02,990 --> 00:21:05,390
that gene's going to be
much less interesting to you

336
00:21:05,390 --> 00:21:08,900
because the phenotype of
aniridia is clearly in the eye.

337
00:21:11,690 --> 00:21:15,680
OK, what might be some
other criteria here?

338
00:21:15,680 --> 00:21:20,120
Well, one criteria might be,
is there a conserved gene that

339
00:21:20,120 --> 00:21:22,070
has an interesting
function that's

340
00:21:22,070 --> 00:21:25,750
maybe similar to the
disease related phenotype?

341
00:21:28,340 --> 00:21:39,535
So is there a conserved gene
with an interesting function?

342
00:21:48,450 --> 00:21:51,600
And to take this
example of aniridia,

343
00:21:51,600 --> 00:21:54,570
let's say you're doing
this chromosome walk,

344
00:21:54,570 --> 00:21:56,580
and you identify
a gene, maybe you

345
00:21:56,580 --> 00:22:00,630
sequence part of this clone,
you get a string of sequence,

346
00:22:00,630 --> 00:22:03,450
and you realize that the
sequence that you get

347
00:22:03,450 --> 00:22:08,280
is related to a gene
from a model organism,

348
00:22:08,280 --> 00:22:12,930
and maybe that gene
is called eyeless.

349
00:22:12,930 --> 00:22:18,000
If you've identified a region
of sequence in a human,

350
00:22:18,000 --> 00:22:22,560
in the human genome that's
mapping to an eye disease gene,

351
00:22:22,560 --> 00:22:24,540
and you find out
that in that region,

352
00:22:24,540 --> 00:22:28,080
there is a conserved
gene called eyeless,

353
00:22:28,080 --> 00:22:30,840
might be a very
interesting gene for you.

354
00:22:30,840 --> 00:22:32,670
So eyeless is a gene.

355
00:22:32,670 --> 00:22:34,500
So here's a normal fly.

356
00:22:34,500 --> 00:22:37,670
You see it has that
bright red eye.

357
00:22:37,670 --> 00:22:41,430
The eyeless gene,
when mutated, results

358
00:22:41,430 --> 00:22:44,460
in a fly that now just
doesn't have a white eye,

359
00:22:44,460 --> 00:22:48,120
but has no eye altogether.

360
00:22:48,120 --> 00:22:51,810
So it turns out that the
aniridia gene is the homolog

361
00:22:51,810 --> 00:22:54,520
of the eyeless gene in flies.

362
00:22:54,520 --> 00:22:57,240
That's not how it was
identified initially,

363
00:22:57,240 --> 00:23:01,810
but nowadays, there's a lot of
information in model organisms.

364
00:23:01,810 --> 00:23:05,580
And so if you're sort of
trying to identify a gene,

365
00:23:05,580 --> 00:23:08,010
and you see that there's
a gene in the neighborhood

366
00:23:08,010 --> 00:23:11,610
you're looking at
with a function that's

367
00:23:11,610 --> 00:23:15,630
related to a gene
like eyeless, which

368
00:23:15,630 --> 00:23:18,960
has a clear sort of analogy
in terms of phenotypes,

369
00:23:18,960 --> 00:23:21,540
then that's going to increase
your interest in that gene.

370
00:23:24,870 --> 00:23:27,840
So I'm going to come
back to this point

371
00:23:27,840 --> 00:23:31,710
here, which is how
do we determine

372
00:23:31,710 --> 00:23:35,640
whether a piece of DNA that's
on one of these inserts

373
00:23:35,640 --> 00:23:38,340
that we're getting as we
walk across the chromosome,

374
00:23:38,340 --> 00:23:42,990
how do we know whether
it is transcribed or not?

375
00:23:42,990 --> 00:23:44,940
And to get at this, I'm
going to introduce you

376
00:23:44,940 --> 00:23:47,940
to a concept which
is important in

377
00:23:47,940 --> 00:23:54,230
and of itself, which
is the idea of cDNA.

378
00:23:54,230 --> 00:23:55,850
So cDNA.

379
00:23:55,850 --> 00:23:57,650
And specifically,
I'm going to show you

380
00:23:57,650 --> 00:24:02,240
how one would make a cDNA
library, which is basically

381
00:24:02,240 --> 00:24:05,180
a library of different cDNAs.

382
00:24:05,180 --> 00:24:09,640
And so what cDNA is, as
shown up there on my slide,

383
00:24:09,640 --> 00:24:14,165
a cDNA is complementary DNA.

384
00:24:22,920 --> 00:24:27,730
It's complementary DNA,
meaning that is the complement

385
00:24:27,730 --> 00:24:31,210
of an mRNA transcript.

386
00:24:31,210 --> 00:24:40,990
This DNA is the complement
of an RNA or mRNA transcript.

387
00:24:47,750 --> 00:24:54,170
One thing to watch out for is
it's not complimentary DNA.

388
00:24:54,170 --> 00:24:56,180
So this is MIT.

389
00:24:56,180 --> 00:24:59,780
This is a no compliment
zone, so I don't want

390
00:24:59,780 --> 00:25:01,745
to see any complimentary DNA.

391
00:25:06,010 --> 00:25:10,030
All right, so let's think
about complementary DNA.

392
00:25:10,030 --> 00:25:14,860
So remember, we've talked
about the central dogma

393
00:25:14,860 --> 00:25:20,545
and how DNA encodes for RNA,
which encodes for protein.

394
00:25:25,060 --> 00:25:27,600
And so the information
flows from DNA

395
00:25:27,600 --> 00:25:30,150
through RNA to protein.

396
00:25:30,150 --> 00:25:33,990
But there are some
specialized cases in biology

397
00:25:33,990 --> 00:25:37,950
where this information
flow is reversed.

398
00:25:37,950 --> 00:25:42,210
So there can be a reverse
of information flow

399
00:25:42,210 --> 00:25:45,405
where information
flows from RNA to DNA.

400
00:25:48,340 --> 00:25:50,220
OK, so that's pretty cool.

401
00:25:50,220 --> 00:25:52,090
Where does that happen?

402
00:25:52,090 --> 00:25:56,700
Well, there are viruses,
such as retroviruses,

403
00:25:56,700 --> 00:26:03,990
one example of a retrovirus
is HIV, and the virus life--

404
00:26:03,990 --> 00:26:08,490
the virus genome is a
single-stranded RNA molecule,

405
00:26:08,490 --> 00:26:14,040
and the life cycle of the virus
is that inserts into the host--

406
00:26:14,040 --> 00:26:17,100
the host genome, which
is double-stranded DNA.

407
00:26:17,100 --> 00:26:21,420
For a retrovirus to do that,
it needs to take its RNA genome

408
00:26:21,420 --> 00:26:25,590
and make double-stranded DNA
in order for it to insert.

409
00:26:25,590 --> 00:26:29,220
So this is an example in
biology, which is basically

410
00:26:29,220 --> 00:26:32,100
breaking the rules that we
talked to you about earlier

411
00:26:32,100 --> 00:26:33,990
in the semester.

412
00:26:33,990 --> 00:26:37,860
Also, there are
retrotransposons which

413
00:26:37,860 --> 00:26:42,630
do a similar process,
going from an RNA molecule

414
00:26:42,630 --> 00:26:44,610
to double-stranded DNA.

415
00:26:44,610 --> 00:26:48,010
So this is a specialized
case, and it's interesting,

416
00:26:48,010 --> 00:26:54,570
and we can take advantage of it
to basically clone and identify

417
00:26:54,570 --> 00:26:56,010
mRNA transcripts.

418
00:26:59,870 --> 00:27:04,110
OK, so I'm going to tell you
how to make complementary DNA,

419
00:27:04,110 --> 00:27:07,060
and I'll go through
a series of steps.

420
00:27:07,060 --> 00:27:13,320
The first step is we want to
make complementary DNA of mRNA,

421
00:27:13,320 --> 00:27:15,405
so we need a way
to purify the mRNA.

422
00:27:18,040 --> 00:27:22,750
So anyone have any idea
how to purify mRNA?

423
00:27:22,750 --> 00:27:26,170
First, we could maybe
draw an RNA molecule here.

424
00:27:26,170 --> 00:27:31,020
What are some salient
features of mature mRNA?

425
00:27:31,020 --> 00:27:31,565
Yeah, Carlos?

426
00:27:31,565 --> 00:27:33,273
AUDIENCE: It'll have
the five-prime cap

427
00:27:33,273 --> 00:27:34,190
[INAUDIBLE] phosphate.

428
00:27:34,190 --> 00:27:36,230
ADAM MARTIN: Yeah, it'll
have a five-prime cap.

429
00:27:36,230 --> 00:27:37,290
Anything else?

430
00:27:37,290 --> 00:27:37,790
Jeremy?

431
00:27:37,790 --> 00:27:38,580
AUDIENCE: Poly-A tail.

432
00:27:38,580 --> 00:27:40,247
ADAM MARTIN: It'll
have a five-prime cap

433
00:27:40,247 --> 00:27:41,480
and a poly-A tail.

434
00:27:41,480 --> 00:27:44,600
I'm going to take advantage
mostly of the poly-A tail here.

435
00:27:47,480 --> 00:27:50,580
So here, we have a poly-A tail.

436
00:27:50,580 --> 00:27:55,660
OK, how might we use that
poly-A tail to purify mRNA?

437
00:27:55,660 --> 00:27:56,160
Natalie?

438
00:27:56,160 --> 00:27:58,663
AUDIENCE: Well, you
can add a [INAUDIBLE]

439
00:27:58,663 --> 00:28:00,507
because you know
they're [INAUDIBLE]

440
00:28:00,507 --> 00:28:01,340
ADAM MARTIN: Mm-hmm.

441
00:28:01,340 --> 00:28:02,975
What sequence would you use?

442
00:28:02,975 --> 00:28:03,850
AUDIENCE: [INAUDIBLE]

443
00:28:03,850 --> 00:28:04,558
ADAM MARTIN: Yes.

444
00:28:04,558 --> 00:28:09,550
So Natalie has suggested
using poly T, which

445
00:28:09,550 --> 00:28:13,480
she said would stick to
this poly A tail because

446
00:28:13,480 --> 00:28:17,050
of base pair hybridization, OK?

447
00:28:17,050 --> 00:28:22,690
So let's say we have a bead
or some type of resin with dTs

448
00:28:22,690 --> 00:28:25,450
hanging off of it.

449
00:28:25,450 --> 00:28:27,190
So I'll draw a few
of them, but you'd

450
00:28:27,190 --> 00:28:34,060
have maybe a lot of
them sticking off, OK?

451
00:28:34,060 --> 00:28:37,360
So you have a bead with
pieces of DNA, all of which

452
00:28:37,360 --> 00:28:39,940
are poly dT hanging off of it.

453
00:28:39,940 --> 00:28:46,120
And then these poly dTs, if
you add cytoplasm from cells,

454
00:28:46,120 --> 00:28:50,710
the mRNA in that cytoplasm is
going to stick to this poly dT

455
00:28:50,710 --> 00:28:55,540
bead, and it will
stick with a higher

456
00:28:55,540 --> 00:28:58,660
affinity than other things that
are non specifically sticking

457
00:28:58,660 --> 00:29:02,770
to the beads, and you can
wash these beads with buffer

458
00:29:02,770 --> 00:29:05,350
and salt to get rid
of everything that's

459
00:29:05,350 --> 00:29:07,660
non-specifically
sticking to the bead,

460
00:29:07,660 --> 00:29:10,870
and then you're left
with just a bead that's

461
00:29:10,870 --> 00:29:13,750
enriched with mRNA, which
is what was specifically

462
00:29:13,750 --> 00:29:15,760
sticking to this, OK?

463
00:29:15,760 --> 00:29:16,705
So you could purify--

464
00:29:20,260 --> 00:29:36,750
you're purifying the mRNA based
on its affinity for a poly dT,

465
00:29:36,750 --> 00:29:37,250
OK?

466
00:29:37,250 --> 00:29:42,170
So then you're going
to have enrichment

467
00:29:42,170 --> 00:29:45,940
of mRNA in your sample.

468
00:29:45,940 --> 00:29:51,270
And so then once
you have your RNA,

469
00:29:51,270 --> 00:29:55,150
you're going to want to
somehow go from RNA to DNA, OK?

470
00:29:58,290 --> 00:30:04,710
So the next step will involve
somehow going from RNA to DNA.

471
00:30:04,710 --> 00:30:07,020
So let's draw our
piece of RNA here.

472
00:30:07,020 --> 00:30:08,150
Here's our RNA.

473
00:30:08,150 --> 00:30:10,320
It has a poly A
tail so it's mRNA.

474
00:30:14,140 --> 00:30:15,095
There is 5 prime.

475
00:30:18,290 --> 00:30:22,840
OK, so now we need to
take advantage of a trick.

476
00:30:22,840 --> 00:30:26,050
We can still take
advantage of dT

477
00:30:26,050 --> 00:30:29,080
because we can use
this as a primer

478
00:30:29,080 --> 00:30:32,310
because polymerase
usually requires

479
00:30:32,310 --> 00:30:38,410
some primer and a three prime
hydroxyl in order to extend.

480
00:30:38,410 --> 00:30:43,450
Now, can we use DNA polymerase
to extend this primer?

481
00:30:43,450 --> 00:30:44,880
Jeremy is shaking his head no.

482
00:30:44,880 --> 00:30:45,380
Why?

483
00:30:45,380 --> 00:30:51,040
AUDIENCE: Because
DNA [INAUDIBLE]

484
00:30:51,040 --> 00:30:52,460
ADAM MARTIN: Exactly.

485
00:30:52,460 --> 00:30:55,400
So what Jeremy is
saying is DNA polymerase

486
00:30:55,400 --> 00:30:59,120
is a DNA dependent
DNA polymerase, OK?

487
00:30:59,120 --> 00:31:03,680
DNA polymerase can only use
this if this is DNA here, OK?

488
00:31:03,680 --> 00:31:07,460
So we need a different type
of enzyme, essentially,

489
00:31:07,460 --> 00:31:13,640
in order to make DNA
from RNA, and luckily,

490
00:31:13,640 --> 00:31:15,500
molecular biologists--

491
00:31:15,500 --> 00:31:17,960
actually one of whom
was here at MIT--

492
00:31:17,960 --> 00:31:20,540
discovered this type
of enzyme, and it's

493
00:31:20,540 --> 00:31:22,320
called reverse transcriptase.

494
00:31:25,670 --> 00:31:27,170
Reverse transcriptase.

495
00:31:27,170 --> 00:31:32,420
This is an enzyme that's
encoded by retroviruses in order

496
00:31:32,420 --> 00:31:36,440
to make double
stranded DNA from RNA,

497
00:31:36,440 --> 00:31:40,130
and that allows the retrovirus
to insert into the host genome,

498
00:31:40,130 --> 00:31:42,240
OK?

499
00:31:42,240 --> 00:31:47,150
And what reverse transcriptase
is is it's an RNA dependent DNA

500
00:31:47,150 --> 00:31:48,140
polymerase, OK?

501
00:31:48,140 --> 00:31:51,060
So it takes RNA
as its substrate,

502
00:31:51,060 --> 00:31:57,020
and then it synthesizes DNA
on the opposite strand, OK?

503
00:31:57,020 --> 00:32:05,222
So this is an RNA
dependent DNA polymerase.

504
00:32:09,380 --> 00:32:12,950
OK, so if you add
reverse transcriptase

505
00:32:12,950 --> 00:32:18,800
to mRNAs that have these dT
primers, then what you get

506
00:32:18,800 --> 00:32:22,310
is a new strand,
which is DNA here.

507
00:32:22,310 --> 00:32:23,570
This is the strand of DNA.

508
00:32:27,330 --> 00:32:33,800
And then you have a strand
of RNA opposite it, OK?

509
00:32:33,800 --> 00:32:39,290
So at this step, you
have a DNA RNA hybrid.

510
00:32:39,290 --> 00:32:42,860
So this is a DNA RNA hybrid.

511
00:32:47,550 --> 00:32:49,590
Let's see.

512
00:32:49,590 --> 00:32:51,140
Reveal some more of this.

513
00:32:51,140 --> 00:32:53,030
This is the process
which I'm basically

514
00:32:53,030 --> 00:32:54,230
outlining on the board.

515
00:32:57,070 --> 00:32:59,860
So then you want
double stranded DNA,

516
00:32:59,860 --> 00:33:03,070
so you don't want this strand
of RNA that's down here,

517
00:33:03,070 --> 00:33:05,950
so you have to get rid of it.

518
00:33:05,950 --> 00:33:13,270
So you would degrade
the RNA, and this

519
00:33:13,270 --> 00:33:16,540
is done using another
enzymatic activity, which

520
00:33:16,540 --> 00:33:19,570
is derived from reverse
transcriptase, which

521
00:33:19,570 --> 00:33:22,510
is termed RNAs H activity.

522
00:33:22,510 --> 00:33:28,990
So you can add an
enzyme RNAs H, which

523
00:33:28,990 --> 00:33:32,530
RNAs H takes this
DNA RNA hybrids

524
00:33:32,530 --> 00:33:35,800
and degrades the
RNA part of it, OK?

525
00:33:35,800 --> 00:33:39,910
So this is going to
degrade the RNA strand.

526
00:33:39,910 --> 00:33:42,170
And if you degrade
the RNA strand,

527
00:33:42,170 --> 00:33:44,440
then you're left with
a single strand of DNA.

528
00:33:48,740 --> 00:33:53,820
So you have single
strand of DNA here,

529
00:33:53,820 --> 00:33:56,540
and now what you need
to do is to synthesize

530
00:33:56,540 --> 00:33:59,690
the second strand of DNA.

531
00:33:59,690 --> 00:34:01,940
So you need a second
strand synthesis.

532
00:34:07,260 --> 00:34:09,960
And so you need, again,
a primer in order

533
00:34:09,960 --> 00:34:13,260
to prime the synthesis here.

534
00:34:13,260 --> 00:34:15,850
So there are a variety
of ways to do this.

535
00:34:15,850 --> 00:34:18,360
You can add some
type of hairpin,

536
00:34:18,360 --> 00:34:23,340
which is five prime here
and three prime here,

537
00:34:23,340 --> 00:34:26,550
and then you can use
either DNA, polymerase,

538
00:34:26,550 --> 00:34:31,590
or reverse transcriptase, which
also can be a DNA dependent DNA

539
00:34:31,590 --> 00:34:35,389
polymerase to transcribe
this strand here, OK?

540
00:34:38,190 --> 00:34:42,270
So again, you add polymerase,
and now you've gone

541
00:34:42,270 --> 00:34:46,980
and you've generated
double stranded DNA, OK?

542
00:34:46,980 --> 00:34:51,750
So everyone see how we've
gone from an mRNA transcript,

543
00:34:51,750 --> 00:34:55,889
and we've done the reverse of
everything we just told you

544
00:34:55,889 --> 00:35:00,780
in the first half of the course
because we've gone from RNA

545
00:35:00,780 --> 00:35:02,940
and we've made DNA, OK?

546
00:35:02,940 --> 00:35:05,100
But this will be really
useful because now we

547
00:35:05,100 --> 00:35:09,990
have a stable piece of DNA that
we can clone into a plasmid

548
00:35:09,990 --> 00:35:12,690
and we have a record of
this transcript being

549
00:35:12,690 --> 00:35:17,730
present in our sample, and we
can propagate that on and on,

550
00:35:17,730 --> 00:35:20,310
so we've cloned it, OK?

551
00:35:20,310 --> 00:35:23,760
All right, what's going to be
special about this piece of DNA

552
00:35:23,760 --> 00:35:27,050
versus a piece of genomic DNA?

553
00:35:27,050 --> 00:35:27,550
Natalie?

554
00:35:27,550 --> 00:35:29,340
AUDIENCE: [INAUDIBLE]

555
00:35:29,340 --> 00:35:31,220
ADAM MARTIN: Yes, so
Natalie suggesting

556
00:35:31,220 --> 00:35:34,950
that it doesn't have introns,
and that's totally right.

557
00:35:34,950 --> 00:35:41,880
So this is not like
genomic DNA, and what

558
00:35:41,880 --> 00:35:47,540
Natalie said is because
mRNA is processed,

559
00:35:47,540 --> 00:35:51,810
the introns are spliced out,
such the mature mRNA only

560
00:35:51,810 --> 00:35:57,150
has the axons, and so this
piece of complementary cDNA

561
00:35:57,150 --> 00:36:00,220
is going to have no introns.

562
00:36:02,850 --> 00:36:04,610
How else is it different?

563
00:36:10,260 --> 00:36:10,878
Yeah, Jeremy?

564
00:36:10,878 --> 00:36:12,670
AUDIENCE: It's not
going to have promoters.

565
00:36:12,670 --> 00:36:14,090
ADAM MARTIN: It's not
going to have a promoter.

566
00:36:14,090 --> 00:36:14,770
Yes, Carmen?

567
00:36:14,770 --> 00:36:19,820
AUDIENCE: It doesn't
have [INAUDIBLE]

568
00:36:19,820 --> 00:36:23,000
ADAM MARTIN: You might
see a poly A and T

569
00:36:23,000 --> 00:36:24,860
sequence in the cDNA.

570
00:36:24,860 --> 00:36:26,710
Yes, that's true.

571
00:36:26,710 --> 00:36:29,600
OK, so you might have
poly A, poly T. I'm

572
00:36:29,600 --> 00:36:31,830
going to focus on
the other part from--

573
00:36:34,340 --> 00:36:41,180
there's going to be no promoter,
enhancer, regulatory sequences.

574
00:36:41,180 --> 00:36:45,950
Basically, it's got no sequence
that's not transcribed, right?

575
00:36:45,950 --> 00:36:49,160
The DNA is only going to have
the part of the gene that

576
00:36:49,160 --> 00:36:54,180
was physically transcribed by
the RNA polymerase originally.

577
00:36:54,180 --> 00:36:59,115
OK, so no
non-transcribed regions.

578
00:37:02,290 --> 00:37:10,550
No non-transcribed regions,
and Carmen's absolutely right.

579
00:37:10,550 --> 00:37:14,110
You will also have possibly
a poly A or poly T sequence.

580
00:37:21,510 --> 00:37:27,360
OK, so when you get these
cDNAs, you might have--

581
00:37:27,360 --> 00:37:29,910
you have more than
one mRNA in a sample

582
00:37:29,910 --> 00:37:34,350
like a cytoplasmic extract,
so you're going to prime--

583
00:37:34,350 --> 00:37:36,510
you're going to
make multiple cDNA

584
00:37:36,510 --> 00:37:40,500
and different cDNAs will reflect
different transcripts that

585
00:37:40,500 --> 00:37:43,080
are present in your sample, OK?

586
00:37:43,080 --> 00:37:45,630
So you could have
one clone that's

587
00:37:45,630 --> 00:37:49,350
one gene, another clone
that's a different gene,

588
00:37:49,350 --> 00:37:51,450
and another clone
that's another gene,

589
00:37:51,450 --> 00:37:55,440
and you could have thousands of
clones of these different DNAs.

590
00:37:55,440 --> 00:38:00,630
What's going to be special
about what types of genes

591
00:38:00,630 --> 00:38:03,810
are you going to get for
I guess different tissues.

592
00:38:03,810 --> 00:38:08,070
Are they going to
be the same or not?

593
00:38:08,070 --> 00:38:08,730
Yeah, Carlos?

594
00:38:08,730 --> 00:38:11,815
AUDIENCE: [INAUDIBLE]

595
00:38:11,815 --> 00:38:12,690
ADAM MARTIN: Exactly.

596
00:38:12,690 --> 00:38:15,740
You're not going to see--
if you've prepared a tissue

597
00:38:15,740 --> 00:38:18,440
and there is no gene being--

598
00:38:18,440 --> 00:38:22,260
if one gene was not expressed
or transcribed in that tissue,

599
00:38:22,260 --> 00:38:25,350
you will not get a cDNA
for that particular gene

600
00:38:25,350 --> 00:38:27,630
in your library, OK?

601
00:38:27,630 --> 00:38:30,090
So the representation of genes--

602
00:38:33,630 --> 00:38:42,210
the representation of
genes in a cDNA library

603
00:38:42,210 --> 00:38:47,550
is totally dependent on what
genes are being expressed, OK?

604
00:38:47,550 --> 00:38:49,470
So this representation
is going to be

605
00:38:49,470 --> 00:38:56,130
proportional to the expression
level, and the more genes--

606
00:38:56,130 --> 00:38:59,040
the more a gene is
expressed in a given tissue,

607
00:38:59,040 --> 00:39:02,430
the more copies of
cDNA for that gene

608
00:39:02,430 --> 00:39:04,200
you would see in
the library, OK?

609
00:39:04,200 --> 00:39:06,510
So there's really
a proportionality

610
00:39:06,510 --> 00:39:09,480
between the number of
clones in a library

611
00:39:09,480 --> 00:39:12,120
and the expression
level of a gene,

612
00:39:12,120 --> 00:39:15,210
where in the most extreme
case, if this gene is not

613
00:39:15,210 --> 00:39:16,800
expressed at all,
you're not going

614
00:39:16,800 --> 00:39:21,890
to see it represented at
all in the cDNA library, OK?

615
00:39:21,890 --> 00:39:24,330
And then a corollary
to this statement

616
00:39:24,330 --> 00:39:30,540
is that if you make cDNA
libraries from different cell

617
00:39:30,540 --> 00:39:34,680
types or different tissue
types, the cDNA libraries

618
00:39:34,680 --> 00:39:37,020
are going to be different
between those different types

619
00:39:37,020 --> 00:39:39,780
of sources of mRNA, OK?

620
00:39:39,780 --> 00:39:46,860
So in other words,
different tissues

621
00:39:46,860 --> 00:39:47,960
give you different cDNA.

622
00:39:59,550 --> 00:40:01,900
OK, so there is the process.

623
00:40:01,900 --> 00:40:03,580
So I went through
most of the side.

624
00:40:03,580 --> 00:40:04,540
Yes, miles?

625
00:40:04,540 --> 00:40:08,415
AUDIENCE: Is this a way you can
determine what gene sequences

626
00:40:08,415 --> 00:40:10,720
are expressed in all cells?

627
00:40:10,720 --> 00:40:16,588
Because in certain mRNA strands
across all tissue samples,

628
00:40:16,588 --> 00:40:22,870
those are basic cell functions
and expressed in a [INAUDIBLE]

629
00:40:22,870 --> 00:40:24,283
organism?

630
00:40:24,283 --> 00:40:26,200
ADAM MARTIN: So you're
asking, if you grind up

631
00:40:26,200 --> 00:40:28,630
like an entire
organism and if you

632
00:40:28,630 --> 00:40:31,150
get a cDNA from that
library, could you

633
00:40:31,150 --> 00:40:36,370
tell if it's expressed in
all different cell types?

634
00:40:36,370 --> 00:40:39,970
Even if you have one cell
type that expresses a gene,

635
00:40:39,970 --> 00:40:42,530
if you grind up the
entire organism,

636
00:40:42,530 --> 00:40:45,850
then you're going to have some
mRNA that represents that gene.

637
00:40:45,850 --> 00:40:48,820
So I don't think it would
be as an effective measure

638
00:40:48,820 --> 00:40:52,600
to determine the ubiquity of
expression of a given gene,

639
00:40:52,600 --> 00:40:54,430
but in just a minute,
I'm going to give you

640
00:40:54,430 --> 00:40:57,580
a tool that would allow you
to answer the exact question

641
00:40:57,580 --> 00:41:00,220
that you're asking, OK?

642
00:41:00,220 --> 00:41:04,590
Any other questions
about the cDNA library?

643
00:41:04,590 --> 00:41:06,920
OK.

644
00:41:06,920 --> 00:41:10,850
So I just wanted to mention that
a comeback to this example I

645
00:41:10,850 --> 00:41:15,000
gave on the identification
of the human CDK gene.

646
00:41:15,000 --> 00:41:20,150
So remember, we started
with yeast that were mutant.

647
00:41:20,150 --> 00:41:22,430
They had
temperature-sensitive mutants,

648
00:41:22,430 --> 00:41:25,040
and we transformed these
mutants with a library,

649
00:41:25,040 --> 00:41:27,140
but I didn't really tell
you what the library was.

650
00:41:27,140 --> 00:41:30,260
It was in fact the cDNA
library from humans that

651
00:41:30,260 --> 00:41:32,740
was transformed into yeast, OK?

652
00:41:32,740 --> 00:41:34,460
And that's because yeast genes--

653
00:41:34,460 --> 00:41:38,150
for the most part, they don't
have a lot of interests, and so

654
00:41:38,150 --> 00:41:39,245
the yeast--

655
00:41:39,245 --> 00:41:41,840
the machinery is not
able to splice out

656
00:41:41,840 --> 00:41:45,290
the human interactions
and human genes, OK?

657
00:41:45,290 --> 00:41:47,810
And so this was done
with a human cDNA

658
00:41:47,810 --> 00:41:50,300
library, which then encoded--

659
00:41:50,300 --> 00:41:54,080
one of which encoded the
cumin CDK gene, and that

660
00:41:54,080 --> 00:41:57,920
allowed Paul Nurse to
discover the piece of DNA that

661
00:41:57,920 --> 00:42:01,190
encoded for the human CDK, OK?

662
00:42:01,190 --> 00:42:03,440
So I just wanted to
kind of retroactively

663
00:42:03,440 --> 00:42:06,440
go back and sort of tell you
how that experiment was done.

664
00:42:09,870 --> 00:42:13,660
OK, so now I'm going to
get to my final point

665
00:42:13,660 --> 00:42:17,980
for this lecture, which is
this final technique, which

666
00:42:17,980 --> 00:42:21,850
will allow us to determine
whether or not a transcript is

667
00:42:21,850 --> 00:42:24,880
expressed in a single
cell type or ubiquitously

668
00:42:24,880 --> 00:42:29,260
through an organism, and this
involves a technique, which

669
00:42:29,260 --> 00:42:30,780
is known as hybridization.

670
00:42:35,870 --> 00:42:38,920
And what hybridization
is is if you're

671
00:42:38,920 --> 00:42:41,830
starting with a
piece of DNA, you

672
00:42:41,830 --> 00:42:44,170
don't need to know
its sequence in order

673
00:42:44,170 --> 00:42:46,690
to determine whether
there are sequences that

674
00:42:46,690 --> 00:42:51,220
are similar or identical to
it, because hybridisation

675
00:42:51,220 --> 00:42:57,160
is basically if you
have some sequence

676
00:42:57,160 --> 00:43:00,310
and it's single stranded
such that you have a DNA

677
00:43:00,310 --> 00:43:03,010
backbone but you
have base pairs that

678
00:43:03,010 --> 00:43:06,790
are able to pair with
their complementary bases

679
00:43:06,790 --> 00:43:10,840
and you can use a piece of
single stranded DNA like this

680
00:43:10,840 --> 00:43:15,880
and you can label it such that
if the labeled piece sticks

681
00:43:15,880 --> 00:43:19,960
to another piece that has
identical or similar sequence,

682
00:43:19,960 --> 00:43:23,830
you'll be able to visualize
it in some way, OK?

683
00:43:23,830 --> 00:43:25,360
So this is called--

684
00:43:25,360 --> 00:43:32,770
you're looking for things
that anneal or hybridize

685
00:43:32,770 --> 00:43:37,360
to a particular
specific sequence.

686
00:43:40,270 --> 00:43:43,420
So you don't need to know
the sequence a priori, OK?

687
00:43:43,420 --> 00:43:47,020
You just need to have this
physical piece of DNA,

688
00:43:47,020 --> 00:43:50,350
and you can use this single
stranded piece of DNA

689
00:43:50,350 --> 00:43:55,510
to then fish for
similar sequences, OK?

690
00:43:55,510 --> 00:43:59,710
So we could take a piece of DNA
here maybe that's in a gene,

691
00:43:59,710 --> 00:44:02,200
and we could fish
through a DNA library

692
00:44:02,200 --> 00:44:07,600
to try to identify a cDNA clone
that has sequence identity

693
00:44:07,600 --> 00:44:10,480
to that piece of DNA, OK?

694
00:44:10,480 --> 00:44:13,960
And the way this is done
is to take a cDNA library.

695
00:44:13,960 --> 00:44:18,910
So each of these colonies
here would express or have

696
00:44:18,910 --> 00:44:21,910
a different clone of DNA.

697
00:44:21,910 --> 00:44:24,910
You can then take a
nitrocellulose filter, put it

698
00:44:24,910 --> 00:44:28,450
on this plate, which would
stick the bacteria in place

699
00:44:28,450 --> 00:44:32,470
to that filter, and you
could then lice the bacteria

700
00:44:32,470 --> 00:44:36,100
and denature the DNA, and then
the DNA is stuck to the figure,

701
00:44:36,100 --> 00:44:39,610
but now it's single stranded.

702
00:44:39,610 --> 00:44:42,400
You can then add your
probe, which is labeled,

703
00:44:42,400 --> 00:44:46,000
and look for the colonies
that this probe sticks to,

704
00:44:46,000 --> 00:44:49,180
and that would then identify
a particular cDNA, which

705
00:44:49,180 --> 00:44:52,300
would identify whether
or not a piece of DNA

706
00:44:52,300 --> 00:44:55,630
is expressed in a
given tissue type, OK?

707
00:44:55,630 --> 00:44:59,390
So everyone see how
that would work?

708
00:44:59,390 --> 00:45:02,680
So in addition to doing this
on a nitrous cellulose filter,

709
00:45:02,680 --> 00:45:06,310
you can also do
this in a tissue,

710
00:45:06,310 --> 00:45:08,860
and that's known as
in situ hybridization.

711
00:45:12,670 --> 00:45:15,820
And in this case, in
situ hybridization,

712
00:45:15,820 --> 00:45:19,150
you're searching for mRNA in
a section of fixed tissue.

713
00:45:30,310 --> 00:45:33,550
OK, and I have an
example from this paper

714
00:45:33,550 --> 00:45:37,000
here, which is the
paper this are cloned.

715
00:45:37,000 --> 00:45:40,420
In this paper was the
cloning of the aniridia gene,

716
00:45:40,420 --> 00:45:44,800
and they identified a gene of
interest, which is called Pax6

717
00:45:44,800 --> 00:45:48,400
now, and they basically
used a piece of DNA

718
00:45:48,400 --> 00:45:50,620
that they thought
was interesting,

719
00:45:50,620 --> 00:45:55,180
and they did in situ
hybridization in an organism,

720
00:45:55,180 --> 00:45:56,620
in this case, you see an eye.

721
00:45:56,620 --> 00:45:59,980
This is an eye here,
and the label Pax6

722
00:45:59,980 --> 00:46:01,810
is labeled in
yellow, and you can

723
00:46:01,810 --> 00:46:04,390
see how this transcript
is present throughout

724
00:46:04,390 --> 00:46:06,430
the entire eye, right?

725
00:46:06,430 --> 00:46:08,840
And the way you would see if
it's tissue specific is you

726
00:46:08,840 --> 00:46:12,820
look in other tissues and you
wouldn't see this yellow label.

727
00:46:12,820 --> 00:46:15,010
So that's how you
would determine

728
00:46:15,010 --> 00:46:18,580
if it's expressed in a
specific tissue or ubiquitously

729
00:46:18,580 --> 00:46:19,540
throughout an organism.

730
00:46:22,400 --> 00:46:24,760
OK, so this Pax6 gene.

731
00:46:24,760 --> 00:46:25,323
Oop.

732
00:46:25,323 --> 00:46:26,990
So I was going to
ask, what do you think

733
00:46:26,990 --> 00:46:32,130
would happen if you
hyperactivate Pax6 in humans,

734
00:46:32,130 --> 00:46:39,380
and this is one idea, but
actually, I just made that up,

735
00:46:39,380 --> 00:46:44,750
or Stan Lee made that up,
but actually, Stan Lee never

736
00:46:44,750 --> 00:46:48,920
in fact mentioned whether or
not cyclops is a Pax6 mutant,

737
00:46:48,920 --> 00:46:53,810
but we can do a different type
of experiment, which might be

738
00:46:53,810 --> 00:46:58,100
more ethical, which is we
know there's a fly gene that's

739
00:46:58,100 --> 00:47:00,020
homologous to Pax6.

740
00:47:00,020 --> 00:47:04,310
And what we can do in
flies is we can topically

741
00:47:04,310 --> 00:47:07,910
express this islets
gene in non-eye tissues

742
00:47:07,910 --> 00:47:09,890
and see what happens.

743
00:47:09,890 --> 00:47:11,510
OK so, this is pretty wild.

744
00:47:11,510 --> 00:47:15,990
This is my Halloween
image of the class.

745
00:47:15,990 --> 00:47:18,410
So this is a fly
where eyeless has been

746
00:47:18,410 --> 00:47:21,190
expressed all over its body.

747
00:47:21,190 --> 00:47:22,940
OK, so here you see
there's an eye--

748
00:47:22,940 --> 00:47:25,010
It's normal eye-- here.

749
00:47:25,010 --> 00:47:26,660
You can see there's
now another eye

750
00:47:26,660 --> 00:47:29,040
growing in the
front of its head.

751
00:47:29,040 --> 00:47:32,150
You can see here's an eye
growing on this fly's back,

752
00:47:32,150 --> 00:47:34,520
and you can see the legs.

753
00:47:34,520 --> 00:47:38,330
There's eye tissue all over
the legs of this fly, OK?

754
00:47:38,330 --> 00:47:43,130
So this Pax6 gene, which is
conserved from flies to humans

755
00:47:43,130 --> 00:47:46,430
is the master regulator
of eye development, OK?

756
00:47:46,430 --> 00:47:48,640
And at least in flies,
if you topically

757
00:47:48,640 --> 00:47:53,160
express this in other parts
of the body, you get an eye.

758
00:47:53,160 --> 00:47:55,360
I should say these are
not functionalized.

759
00:47:55,360 --> 00:47:57,650
They don't hook up to
the brain the same way

760
00:47:57,650 --> 00:47:59,510
the normal eye does.

761
00:47:59,510 --> 00:48:01,760
So it's not like
this fly can see out

762
00:48:01,760 --> 00:48:04,460
of the back of its head.

763
00:48:04,460 --> 00:48:06,290
OK, that's it.

764
00:48:06,290 --> 00:48:09,590
I'm done, and good luck
on your exam on Wednesday.

765
00:48:09,590 --> 00:48:11,770
We will see you here.