Transcript CNVs vs SNPs Understanding Human Structural ......So, in 2007, Science Magazine announced...
Transcript of Transcript CNVs vs SNPs Understanding Human Structural ......So, in 2007, Science Magazine announced...
![Page 1: Transcript CNVs vs SNPs Understanding Human Structural ......So, in 2007, Science Magazine announced its breakthrough of the year as human genetic variation. Part of the reason for](https://reader034.fdocuments.in/reader034/viewer/2022050504/5f96278442335f714b02b8a7/html5/thumbnails/1.jpg)
1
CNVs vs. SNPs: Understanding Human Structural Variation in Disease
[0:00:00] Sean Sanders: Hello and welcome to today’s Science/AAAS live webinar entitled, “CNVs
and SNPs: Understanding Human Structural Variation in Disease”. My name is Sean Sanders and I’m the commercial editor at Science Magazine.
Slide 1 Exhaustive analysis of human single nucleotide polymorphisms or SNPs
has led to the identification of interesting genetic markers for certain disorders. But these small changes are not the whole picture. Copy number variations or CNVs, which are the gain or loss of segments of genomic DNA relative to a reference, have also been shown to be associated with several complex and common disorders. Using array‐based comparative genomic hybridization techniques, CNVs at multiple loci can be assessed simultaneously allowing for their identification and characterization. CNV microarrays allow exploration of the genome for sources of variability beyond SNPs that could explain the strong genetic component of several of these disorders. Now, advances in microarray probe density have provided more comprehensive coverage of CNVs, enabling more in‐depth genotyping research.
Joining me in the studio today is an exceptional panel of thought leaders
to discuss this subject. To my left, I have Dr. Charles Lee from Harvard Medical School in Boston, Massachusetts. Next to him is Dr. Lars Feuk from Hospital for Sick Children in Toronto, Canada. And finally, we have Dr. Alex Blakemore joining us all the way from Imperial College London in the U.K. Welcome to you all.
Before we begin, a quick reminder, as usual, that if during the
presentations, you wish to see an enlarged version of any of the slides, you can simply click the enlarge slides button located just underneath the slide window of your web console. If you like, you can download a PDF copy of all the slides using the download slides button.
To submit a question to the panel or to a particular speaker, just type it
into the ask‐a‐question box in the bottom left of your viewing console below the video screen and then click submit. I’ll do my best to get to as many of the questions as possible. As always, keeping them short and to the point will give you the best chance of having them asked.
![Page 2: Transcript CNVs vs SNPs Understanding Human Structural ......So, in 2007, Science Magazine announced its breakthrough of the year as human genetic variation. Part of the reason for](https://reader034.fdocuments.in/reader034/viewer/2022050504/5f96278442335f714b02b8a7/html5/thumbnails/2.jpg)
2
Finally, I’d like to thank Agilent Technologies for their sponsorship of today’s webinar.
Slide 2 Now on to our first speaker, Dr. Charles Lee received his doctoral degree
from the University of Alberta, Canada and completed his Clinical Cytogenetics fellowship at Harvard Medical School. Dr. Lee is currently the director of Cytogenetics for the Harvard Cancer Center, Associate Professor in Pathology at Harvard Medical School, and an Associate Member of the Massachusetts Institute of Technology Broad Institute. In 2008, Dr. Lee became the youngest recipient of the Ho‐Am Prize in Medicine also referred to as the "Korean Nobel Prize" for his pioneering work in this field.
Welcome, Dr. Lee. Dr. Charles Lee: Thank you, Sean. Slide 3 So, in 2007, Science Magazine announced its breakthrough of the year as
human genetic variation. Part of the reason for this was the fact that there were a lot of associations being made between these single nucleotide polymorphisms or SNPs with human diseases.
Slide 4 Currently, it’s actually estimated that with respect to SNPs, there are 15
million sites in the human genome where these SNPs can occur. As of the end of 2007, approximately 3.8 million SNPs were catalogued the HapMap populations of about 270 individuals from four different populations in the world. And it’s currently thought that each person harbors anywhere from 3 or as many as 5 million common SNPs with about 30 new SNPs arising per generation.
Slide 5 But I think in addition to SNPs, part of the reason why it was announced
that human genetic variation was the breakthrough of the year was the fact that the scientific community was appreciating that there was a lot more genetic variation out there than we had once appreciated. And more specifically, a new form of genetic variation, which we now refer to as structural genomic variation, where you have inversions, insertions, deletions, copy number variation occurring. All these types of variations in addition to SNPs were being uncovered.
Slide 6 I believe that appreciation for the existence of widespread structural
genomic variation in the human genome stemmed from these two initial studies in 2004. Both studies essentially used array‐based comparative hybridization technologies to look at the genomes of healthy individuals.
Slide 7
![Page 3: Transcript CNVs vs SNPs Understanding Human Structural ......So, in 2007, Science Magazine announced its breakthrough of the year as human genetic variation. Part of the reason for](https://reader034.fdocuments.in/reader034/viewer/2022050504/5f96278442335f714b02b8a7/html5/thumbnails/3.jpg)
3
And shown here is an example of a profile by array CHG. Where, if you would expect that humans had essentially SNPs primarily as their genetic variation, when you do array‐based comparative genomic hybridization, you’d see a profile as is shown here.
Slide 8 However, both groups that did this study found that when they looked at
the genomes of healthy individuals, they actually found profiles that looked like this. Where the green gains and the red losses were found on every chromosome, a multiple of these throughout the whole genome. And so basically showing that this was a new type of genetic variation that really was widespread throughout the human genome.
[0:05:22] Slide 9 When we refer to structural genomic variation, some people can look at
this, referred to widely as any variants that are essentially non‐SNPs. But among those structural variants that we are currently aware of and have learned about, to date, copy number variants or CNVs probably are the largest fraction of this class of variants.
Slide 10 Summarizing what we know so far about copy number variants,
estimates range from 5% to 25% of the human reference genome is thought to be copy number variant. In a study in 2006, looking at the same 270 HapMap individuals where SNPs data was collected, it was revealed that at least 1447 CNVs could be catalogued, representing about 360 million bases of reference DNA sequence or 12% of the genome.
Now based on unpublished data, it’s estimated that each person could
have approximately 1500 common CNVs in their genomes. And if we believe that the average size of these common CNVs are about 20 kilobase, that equates to about 30 million bases of CNVs that differentiate one person from another.
And because the research in this area is still ongoing and this field is still
fairly new, we really don’t have any data on what the mutation rate is for specific CNVs. Therefore, it’s hard to estimate what is the number of new CNVs that arise per individual per generation.
Slide 11 Now, with respect to the genomic impact of CNVs, some of the things
that we do know is that, CNVs do appear to be preferentially located outside of genes and outside of ultra‐conserved elements in the human genome. And those CNVs that do overlap genes when we do gene ontology analysis, they do seem to show an enrichment for what we would now refer to as environmental interaction genes. These are genes
![Page 4: Transcript CNVs vs SNPs Understanding Human Structural ......So, in 2007, Science Magazine announced its breakthrough of the year as human genetic variation. Part of the reason for](https://reader034.fdocuments.in/reader034/viewer/2022050504/5f96278442335f714b02b8a7/html5/thumbnails/4.jpg)
4
such as genes involved in sensory perception, neurophysiological processes, drug detoxification, immune response, and cell surface antigens, and integrity genes.
Slide 12 And this slide basically shows just how some of these CNVs can affect
gene expression levels, especially when the CNVs lie within or involve genes in the human genome. But it’s important to understand also that CNVs, as I mentioned earlier, can also be found outside of genes. And if they overlap regulatory elements, they can also affect expression levels of genes that they’re associated with. And in fact, some recent studies have shown that a CNV that is outside of a gene can affect the gene’s expression level that’s 2 million bases away or more.
Slide 13‐Slide 15 In forming these copy number variants, we know that there are at least
two major mechanisms that can form CNVs. One is shown here called non allelic homologous recombination or NAHR where essentially homologous DNA segments that are on the same chromosomes misalign. And then you have a crossover event occurring, which essentially results in duplications or deletions of DNA segments that are between those homologous segments.
Slide 16‐Slide 18 A second mechanism for producing CNVs is referred to as non
homologous end joining or NHEJ, which is a repair mechanism whereby when you have double‐strand breaks occurring in the chromosomes during the repair mechanism ‐‐ the mechanism to repair these double‐strand breaks. If the repair doesn’t occur correctly, you can essentially get DNA gains and losses near those breakpoints.
And more recently, there has actually been ‐‐ it’s been put forward that
DNA replication based methods could also lead to copy number variant formation such as the fastest method that was recently put forward by Jim Lupski’s group at Baylor College of Medicine.
Slide 19 It’s amazing to see that in the last four years, there’s already been about
a dozen associations made between specific copy number variants and human diseases. One, which I’ll show here, is the β‐defensin genes and it’s susceptibility to both Crohn’s disease and psoriasis.
[0:10:03] So, β‐defensin genes are small secreted peptides that are coded for by
the by the DEFB genes. The β‐defensin genes are copy number variable with copy numbers ranging from 2 to 12 per cell among Europeans studied with a modal number of 4, not 2.
Slide 20
![Page 5: Transcript CNVs vs SNPs Understanding Human Structural ......So, in 2007, Science Magazine announced its breakthrough of the year as human genetic variation. Part of the reason for](https://reader034.fdocuments.in/reader034/viewer/2022050504/5f96278442335f714b02b8a7/html5/thumbnails/5.jpg)
5
So, individuals that have less than 4 copies of β‐defensins, they end up having a breakdown of the anti‐bacterial barrier in the intestinal wall leading to the inflammation that’s associated with Crohn’s. So, lower copy number of β‐defensin leads to increased susceptibility to Crohn’s disease.
Slide 21 So, you would think that having more copies of this would be a beneficial
thing. But, in fact, when you have more copies of β‐defensin, greater than 4 copies, it actually elicits an inflammatory response when you have minor skin injury that increases cytokine, EGF‐R and STAT signaling pathways leading to inflammation. So, basically, too little of β‐defensin is not a good thing and too much of β‐defensin is also not necessarily a good thing as well.
Slide 22 And we’re also finding that copy number variants research is giving us
insights into how our genomes are changing with respect to human behavior and our ever‐changing environment. One study shown here was where we looked at the copy number of amylase, AMY1, and AMY1 produces the amylase enzyme that is used for starch digestion. And when we looked over 200 individuals, we found that in fact, those individuals that are from populations of traditionally high‐starch diets tend to have more AMY1 copies than those from low starch diets.
Slide 23 And when look even more closely at the AMY1 CNV locus, using fiber‐
FISH techniques, you can actually appreciate the fact that in the left‐hand panel, a Japanese individual with 14 diploid copies of AMY1, 10 on one copy and 4 on the other. And on the right‐hand side, a Biaka pygmy with 6 diploid copies, 3 of the AMY1 gene copies on each chromosome. But in addition to this copy number variability of AMY1, you can appreciate that there is this structural variation on top of structural variation at this loci because you have inversions of some of these of AMY1 genes with respect to one another.
Slide 24 So, in order to detect CNVs in a genome‐wide manner, there are a
number of platforms and several companies that are providing array‐based methodologies to pick up CNVs. Some of them are shown here on this slide.
Slide 25 But we’re also beginning to realize how fast the cost of whole genome
sequencing is decreasing. In fact, the 1st human genome reference sequenced ‐‐ when we obtained the 1st of the human genome reference sequence, that cost us about $500M. But currently, using next generation sequencing methods, we’re estimating that it’s about $100,000 to $150,000 to sequence the whole genome.
![Page 6: Transcript CNVs vs SNPs Understanding Human Structural ......So, in 2007, Science Magazine announced its breakthrough of the year as human genetic variation. Part of the reason for](https://reader034.fdocuments.in/reader034/viewer/2022050504/5f96278442335f714b02b8a7/html5/thumbnails/6.jpg)
6
Slide 26 And as a result of that, what we’re finding is that due to this low cost of
DNA sequencing, this has prompted the launch of international efforts, such as 1000 Genome Project. Where they’re essentially starting January of this year, for a three‐year period, we are whole genome sequencing 1000 individuals to identify both SNPs and copy number variants as well as other structural variation to understand the total extent of genetic variation in the human genome.
Slide 27 And so, I guess, in conclusion, this understanding of structural variation is
really having broad implications to many areas of biomedical research including cancer research, clinical genetic testing, disease association studies, pharmacogenomic‐based studies, and also human population genetics and evolution.
Slide 28 I guess, our take‐home message based on the last four years of research
is that our previous notion that two normal individuals share about 99.9% genetic similarity really has to be reevaluated.
Slide 29 And based on what we know on structural variants now, this number is
probably more in the order of 99.5% genetic similarity between two normal individuals.
Slide 30 Slide 31 So, I would just like to thank the Structural Genomic Variation
Consortium for some of the unpublished data that I’m presenting here today, as well as others that have helped with this research. Thank you very much.
[0:15:04] Sean Sanders: Great. Thanks very much, Dr. Lee. Excellent introduction to this subject. Slide 32 We’re going to move right on to our second speaker today and that’s Dr.
Lars Feuk. Dr. Feuk completed his Ph.D. in functional genomics at the Center for Genomics and Bioinformatics at the Karolinska Institute in Stockholm, Sweden. His postdoctoral training was carried out in Dr. Stephen Scherer’s lab in at the Hospital for Sick Children in Toronto, Canada. And the current focus of Dr. Feuk’s work is understanding the full spectrum of human genetic variation, particularly as it relates to copy number variation. Since 2004, Dr Feuk has also been the curator of the Database of Genomic Variants, the most widely used database for information on structural genetic variation. Last year, Dr. Feuk was featured in Genome Technology magazine as one of “Tomorrow’s PIs”,
![Page 7: Transcript CNVs vs SNPs Understanding Human Structural ......So, in 2007, Science Magazine announced its breakthrough of the year as human genetic variation. Part of the reason for](https://reader034.fdocuments.in/reader034/viewer/2022050504/5f96278442335f714b02b8a7/html5/thumbnails/7.jpg)
7
and in 2008 was selected as one of twenty “Future Research Leaders” by the Swedish Foundation for Strategic Research.
Dr. Feuk, welcome. Dr. Lars Feuk: Thank you very much. It’s great to be here. Slide 33 I’d like to give an overview of the current state of the field of copy
number variation. And talk about some of the challenges that we’re facing in describing this type of variation in the human genome.
Slide 34 So, I’m going to start by a historical overview slide. So, if we look at the
type of variation that has been discovered in the human genome, we really started with the big thing. So, to the left on this spectrum, we see the things that we detect cytogenetically; chromosomal rearrangements, large deletions and duplications that are maybe over 5 megabases or so in size.
After detecting this initial variation, we actually moved straight down to
the nucleotide level. ‘Cause sequencing was discovered, PCR was discovered and we started looking at a very, very detailed level and could describe point mutations and so on. But in doing this, there was a whole midrange of variation where we simply didn’t look. And it was not until the technology of micro‐arrays and CGH arrays were developed that we could start assessing a variation in the size range of 1 Kb up to 3 to 5 megabases.
Slide 35 So, as Charles Lee mentioned, it was not until 2004 that the first studies
were published where we could describe that these types of variations, copy number variants, are a common feature in the human genome. So, it’s a very, very young field compared even to the SNP field. And it’s important to keep that in mind when we go towards disease that this is a young field where we still have a lot of challenges to face.
Slide 36 So, looking at the increasing copy number variation data since the initial
papers came out in 2004, at that stage, there were only a handful of regions known in the human genome to be copy number variable. Since then, there’s been an explosion of data and studies published. And to date, in the Database of Genomic Variants, which is the major database describing this type of variation, we have approximately 17,000 copy number variants described.
So, if you’re interested in this type of variation, I recommend you take
look at the website of the Database of Genomic Variants. You see the URL
![Page 8: Transcript CNVs vs SNPs Understanding Human Structural ......So, in 2007, Science Magazine announced its breakthrough of the year as human genetic variation. Part of the reason for](https://reader034.fdocuments.in/reader034/viewer/2022050504/5f96278442335f714b02b8a7/html5/thumbnails/8.jpg)
8
down to the left here. This is a catalogue of all the published studies describing copy number variation in the human genome. And we are currently describing variants that are a hundred base pairs in size and larger. And at present, there are approximately 29,000 entries and this also includes inversions.
Inversions are less common than copy number variants in the human
genome. To date, there have only been around 500 inversions described to be polymorphic, but copy number variants are very common.
So, if you look at the right side of this slide, just an overview of the
human chromosomes and all the copy number variants described to date. So, each blue bar that you see here is a copy number variant. And as you see from this slide, the copy number variants are distributed all across the genome with a slight bias towards centromeric and telomeric sequences.
On the left side of the chromosomes, you see green bars. These green
bars represent what we call segmental duplications. These are regions in the genome that are present in more than one copy. And there’s a very strong correlation between copy number variation and segmental duplications. And that’s important to keep in mind too when you look at this type of variation.
Slide 37 So, what are the major challenges as I see it? With the current maps we
have, they’re still very, very crude because the technology is still limited on what we can do. So, we still have very little information about variation in the 100 bp to 20 kb range because the resolution of the current high‐resolution technology is still not enough to accurately detect this level of variation.
Another thing is that microarray data tends to be slightly noisy meaning
that sometimes the probe would be included in the copy number variant and sometimes it would not when you score it. Therefore, the boundary information for the current variants that are described in databases is not very accurate.
[0:19:54] So, to the right on this slide, I show an example of a variant that was
found by a BAC array. And you see that even though it’s the same variant in all of these individuals, the algorithms and the approaches used are calling very different boundaries with different start and endpoints for these variants. So, it’s important to keep in mind when you see a variant, it’s an approximation of where a variant exists in the genome and not the exact start and endpoint of the variant you’re looking at.
![Page 9: Transcript CNVs vs SNPs Understanding Human Structural ......So, in 2007, Science Magazine announced its breakthrough of the year as human genetic variation. Part of the reason for](https://reader034.fdocuments.in/reader034/viewer/2022050504/5f96278442335f714b02b8a7/html5/thumbnails/9.jpg)
9
Another problem currently, so another challenge that we have is that
frequency estimates are currently insufficient. There are a lot of false‐negatives in the current studies. So, things that are there but are not called, and at the same time, the data is slightly noisy and we have some false‐positives also.
So, these are challenges that we need to deal with moving ahead.
Currently, we cannot really score CNV genotypes commonly for duplications, for example. We just say that there’s a CNV gained at this locus, but we cannot properly differentiate whether it’s three copies, four copies, five copies, or six copies due to the noise in the data.
Another part of the genome that we’re not very good at describing
currently is the really complex regions. And by complex regions, I mean regions that are highly duplicated in the genome. Most of the commercial arrays do not have good coverage in these regions and if they do, the data is very complex to interpret such that the analysis tools that are out there will basically fall apart in this region. So, I think, this is an area where we need to do a lot more work also.
Another challenge, again, many different technologies that are out there
give different results. If you would describe the same DNA using two different high resolution platforms, generally today you may get something between 50% and 70% overlap of the CNVs that you’re calling. So, it’s important to keep in mind if you’re trying to compare the data in your lab to the neighboring lab’s data that had used a different platform, that you can’t do a one‐to‐one comparison there because they’re describing very, very different things.
Finally, with the little picture down to the right here, I just want to show a
problem with interpreting the data on what is actually one CNV. So, in the database, for example, we have a lot of situations where from one platform, we would call two different CNVs 40 kb in size with some distance between them. You then use a different SNP platform with maybe a different probe coverage in the region and you call this as one contiguous 120 kb variant. Then the question is what actually the two underlying variations in this region. And currently without using high resolution methods, it’s very difficult to answer that question.
Slide 38 However, it is important to point out that the resolution is getting better
and better. So, initially, we used BAC array, CGH to describe these types of variants and the variants described were just the start and endpoint of
![Page 10: Transcript CNVs vs SNPs Understanding Human Structural ......So, in 2007, Science Magazine announced its breakthrough of the year as human genetic variation. Part of the reason for](https://reader034.fdocuments.in/reader034/viewer/2022050504/5f96278442335f714b02b8a7/html5/thumbnails/10.jpg)
10
a BAC clone. And BAC clones are very large, they’re 150 to 250 kb in size meaning that we often overestimated the size of the actual variant.
So, showing this example here from the database, you see the top orange
bar representing a finding with a BAC array indicating a very, very large region. But then, subsequent studies using higher and higher resolution methods ending down at the bottom with the Levy et al., variant, which is actually a sequencing based approach, you see how the resolution is increasing until we actually get nucleotide resolution data when we can use a sequencing approach to describe the variant.
Slide 39 So, this just an overview sort of historically how the field has advanced.
Initially, as I said in the initial papers in 2004, the collaboration between Charles Lee’s lab and our lab, we used a BAC array that contained 3000 BAC clones. With this, we found between 1 and 12 CNVs per genome.
The field then evolved to use tiling path BAC arrays that have
approximately 30,000 clones on an array, and you could describe maybe between 10 and 70 CNVs per genome. And the ranges here are quite large because depending on what analysis tool you use and what thresholds you use with those analysis tools, you will get very different results.
With the 500K SNP array, you might get something between 2 and 25
CNVs per genome. And with the currently highest resolutions SNP arrays, the Illumina 1M and Affymetrix 6.0, you can get something in the order of 30 to 150 CNVs per genome depending again on how you analyze the data.
Again, it’s also important to keep in mind that results will differ
depending on input quality of the DNA. So, if you have noisier data, you’re forced to set stricter thresholds or stricter parameters in your analysis. And then you may call fewer CNVs and it makes it difficult to compare between studies.
There are also many different analysis tools for the same platform. For
example, if you’re on Affy 6, there are probably four or five different analysis tools you can use, and each of those again give different results. So, it’s important to keep these things in mind.
Slide 40 So, what are we doing in our lab? So, in a collaboration with Charles Lee
and the Sanger Institute, we are trying to create a really high resolution map of copy number variation in the human genome. And to do this, we’ve created a custom set of a NimbleGen arrays. This is a set of 20
![Page 11: Transcript CNVs vs SNPs Understanding Human Structural ......So, in 2007, Science Magazine announced its breakthrough of the year as human genetic variation. Part of the reason for](https://reader034.fdocuments.in/reader034/viewer/2022050504/5f96278442335f714b02b8a7/html5/thumbnails/11.jpg)
11
arrays with 2.1 million probes on each totaling 42 million probes across the genome.
Slide 41 [0:25:14] So just as a comparison here, I show a screen grab where we have the
density of probes and our NimbleGen Array compared to the Affy, 500K, and the Illumina 650Y, and you see that there’s a significant increase in the resolution. So, we’re hoping with that that we’ll be able to capture all common variants over 1 kb in size and with high enough resolution of breakpoints that it could be followed up with a PCR based assay.
Here are some of the examples of the data from this array. Things in the
order of 30 kb, we reliably pick them up as does the Affymetrix 6.0 array. But we can reliably find things as small as 1 kb without a problem. You see on the bottom right, that there are lots of probes detecting this 1.3 kb variant. And there’s not a single probe on the Affy 6 array actually representing this variant.
So, we’re hoping with the results of this study, we’ll have a high
resolution map of all the common variants in the human genome. Slide 42 There are some challenges when taking this data out to disease studies.
As I mentioned, the data is a little bit noisy so when you compare patient and control groups, it’s important to keep in mind that there may be plate defects and batch effects. What I mean by that is if you have all of your controls on one plate and all your patients on another plate, you may get batch effects or plate defects that give false differences between these groups. So, I would recommend you should mix your controls and cases on the same plates to avoid those types of effects.
It’s difficult, as I mentioned, to get accurate genotype data for CNVs. This
is needed for some of the population genetics and association type analysis that you want to do.
And to identify true de novo calls is also difficult. So, currently, if you look
at trios with the data that’s out there, you get a de novo rate. So, that is, variants that are found in the offspring, but not in the parents of maybe around 5% to 15% depending on which platform you use. But, of course, not all these are true de novo. So, you really have to follow up with an independent validation method and test it both in the parents and the offspring. Because often, these seemingly de novo regions are due to false‐negatives in the parents. So, they’re there in the parent, but they’re not called and it looks like a de novo event.
![Page 12: Transcript CNVs vs SNPs Understanding Human Structural ......So, in 2007, Science Magazine announced its breakthrough of the year as human genetic variation. Part of the reason for](https://reader034.fdocuments.in/reader034/viewer/2022050504/5f96278442335f714b02b8a7/html5/thumbnails/12.jpg)
12
At the bottom here, I’m just showing what you would expect if you type a new set of controls compared to existing HapMap data. And, of course, the same would be true if you typed a new set of patients compared to any control data set; that a large fraction of the variants called in a new data set are novel. Of course, this does not mean that they will be associated with a disease you’re studying at present. So, it’s important then to go on and prioritize which ones are the one to follow up with.
Slide 43 Finally, I just want to end with talking about CNVs in a clinical setting. So,
many of the problems that I discussed are not as relevant in the clinical setting because you often set a threshold of looking at things only larger than, for example 300 kb or 500 kb. And most analysis tools and most approaches will very reliably pick those things up. But that doesn’t mean that interpretation is simple.
So, I have some examples here of data produced in our autism spectrum
disorder studies that we’re doing in our lab and I have some pedigrees. To the top left, I show an example where there are multiple de novo events in one of the affected offspring and no de novo events in the other affected offspring. Then the question must be, are any of these de novo events actually causative for the disorder? And the answer is, we simply don’t know.
Another example is the SHANK3 region in chromosome 22 where we find
the offspring having two de novo events, one of them being SHANK3, which is known to cause autism. But as you see, the SHANK3 deletion is much smaller than the other de novo event. And people tend to think that if things are bigger, they’re more likely to cause disease. So, again, it’s important to interpret this in the right way.
Finally, at the bottom, are examples of a region at 16p11 that was
recently found by multiple different groups to be associated with autism. Here again, we show that it’s de novo. But in one family, it’s de novo in one offspring that’s affected in the proband, but not in its affected sibling. So again, it’s open to interpretation. But I think here, the strength is in the numbers. So if people will report their data, other groups will find similar things and then using combined results, we can actually say what’s implicated in disease and not.
Slide 44 Finally, tying back to my initial slides, I think looking ahead, of course, the
next generation sequencing technologies are giving us much higher resolution and faster and cheaper sequencing. And I think in the future, it will be possible to describe all levels of variation of the entire spectrum from one base pair and up using the next generation sequencing. So,
![Page 13: Transcript CNVs vs SNPs Understanding Human Structural ......So, in 2007, Science Magazine announced its breakthrough of the year as human genetic variation. Part of the reason for](https://reader034.fdocuments.in/reader034/viewer/2022050504/5f96278442335f714b02b8a7/html5/thumbnails/13.jpg)
13
we’re not going to talk about CNVs vs. SNPs anymore, we’re just going to talk about variation in the human genome as a whole. And there’s enormous potential going forward with these technologies.
Slide 45 Finally, I’d just like to acknowledge the people involved in producing the
data I have presented: At The Centre for Applied Genomics at the Hospital for Sick Children in Toronto, especially the Director, Steve Scherer. And our collaborators at the Sanger Institute in Boston for the high resolution NimbleGen arrays; Matthew Hurles, Nigel Carter, Charles Lee, Don Conrad, Richard Redon, Dalila Pinto, and Chris Tyler‐Smith.
Thank you very much. [0:30:26] Sean Sanders: Great. Thank you very much, Dr. Feuk. Slide 46 Before we get to our final speaker, just a quick reminder that you can use
the download slides button to download a PDF of all the slides today. And you can click on the enlarge slides button to get a larger view of each of the individual slides.
So, on to our final speaker today, we have Dr. Alex Blakemore. Dr.
Blakemore obtained her Ph.D. in the Department of Molecular Biology and Biotechnology at Sheffield University. She joined Imperial College London in 2001, and is currently Senior Lecturer in Human Molecular Genetics and Chair of the Metabolic and Endocrine Technology Network at Imperial College London and is a member of the Council of the British Society for Human Genetics. Dr. Blakemore is making breakthrough discoveries in associating specific CNVs with common diseases including type II diabetes.
Dr. Blakemore? Dr. Alex Blakemore: Hello. Thank you. Slide 47 So, my talk is going to concentrate a little bit on looking at some of the
problems around trying to assign phenotypes to CNVs that we see. So, it’ll follow along quite well, I hope, from what Lars has just been saying, and picking up some of the issues that have been mentioned before.
Slide 48 So, as was discussed at the beginning of this webinar, CNVs are much
more common than we had first appreciated and they involve many genes. And for the genes that are involved, about 26% of those genes include coding sequences in the CNV. In fact, some workers have
![Page 14: Transcript CNVs vs SNPs Understanding Human Structural ......So, in 2007, Science Magazine announced its breakthrough of the year as human genetic variation. Part of the reason for](https://reader034.fdocuments.in/reader034/viewer/2022050504/5f96278442335f714b02b8a7/html5/thumbnails/14.jpg)
14
estimated that about 18% of the variance in gene expression that we see between individuals is due to copy number variation.
Slide 49 So, I just want to show you some data that we have in normal individuals,
and just talk about the interesting kinds of genes that we see and show you. So, this is something in the taste receptor gene. And what you see here is CGH data from 50 normal, healthy adult males ‐‐ French males as it happens ‐‐ for a taste receptor gene. And each person is represented by a different colored line. You can see here that there are three copy number states at this position. We don’t know actually, what effect this might have. Maybe it’s why French people like certain foods, maybe it’s why they like snails, or why other people don’t like garlic. Who knows?
Slide 50 A similar thing we see in the olfactory receptor genes where ‐‐ this is one
person. We see some small variants there, which are common, just like the one I just showed you. But this is one person who has a much larger variant. It’s over a megabase long and it involves about 24 genes. So, we see not only small common things, but rarer things are larger and involve many genes in normal individuals. And we have no idea whether they have a phenotypic effect. Some of them we suspect might.
Slide 51 This is probably five copy number variant levels of a gene that might be
involved in differences in hair structure and texture between different individuals.
Slide 52 This is in the immunoglobulin E gene. So, we don’t know whether it might
have any effect in the propensity to allergy. Slide 53 Here, this is a gene that’s involved in psychiatric disorders so Tourette
syndrome or obsessive‐compulsive disorder and some forms of epilepsy. It might be interesting for some of you to look at this in personality differences in normal individuals. You can see it’s a common variation.
Slide 54 And we very commonly see differences between people in their
oncogenes and a lot of cancer related genes come up. And you have to speculate what are the implications of these for susceptibility to cancer and what’s the relationship between the inherited CNVs that we bring down from our parents at any subsequent copy number changes that arise during cancer. We know there are a lot of chromosomal changes that arise during cancer and we don’t know how they relate to the CNVs we inherit. So, we have a lot of questions just right there. If you’ve seen your favorite gene, you know, write to me and we can talk about it.
Sean Sanders: [Laughs]
![Page 15: Transcript CNVs vs SNPs Understanding Human Structural ......So, in 2007, Science Magazine announced its breakthrough of the year as human genetic variation. Part of the reason for](https://reader034.fdocuments.in/reader034/viewer/2022050504/5f96278442335f714b02b8a7/html5/thumbnails/15.jpg)
15
Slide 55 Dr. Alex Blakemore: We have a big problem in general in assigning phenotypes to copy
number variants and let’s talk about that. So, people who have ‐‐ if you were, for example, looking at a child with learning disability and a lot of dysmorphic features and you think maybe it’s a new microdeletion syndrome, you might do an array CHG experiment then you might get 600 copy number variations in that child. And you’d have to ask yourself, well how do I prioritize these? And some people will say, “Well, look at de novo changes.” But you’ve had the problems there. We don’t know quite the rate of de novo changes. We don’t know quite how to measure those.
[0:35:02] Some people will say look at big changes that involve multiple genes.
Well, I’ve shown you a big change now involving 24 genes in a perfectly normal individual. So just because something is big, we can’t assume it’s important.
And changes in genes relevant to phenotype. Well, that’s nice, but
there’s a long way from showing a change in a gene that you like and the mechanism by which it might or might not cause disease. It’s no easier to do that with a copy number variant than it has been to do that with a SNP so…
Slide 56 And this is another example of a very large copy number variant in one of
our normal males. It’s over a megabase long. It involves four perfectly good metabolic genes. And it was not only carried by this male, but inherited by his two, also apparently, normal daughters. So just because something is big and contains genes doesn’t mean that you should write to Stockholm and claim your Nobel Prize just yet.
Slide 57 So, we don’t know really what is normal. And where we do know, we only
know it in a handful of population groups. So, if you’re looking at an Inuit for example or a Maori, all bets are off, we have no idea what is normal in those populations.
And we do know that you can’t just rely on something being big because
we know that a single based change can be deleterious for a phenotype. And we know that big changes might be benign.
We don’t know how often CNVs arise in meiosis, but another emerging
question is how frequently they arise during mitosis. So, we know, for example, that there are copy number variants between monozygotic twins and we’d always assumed that identical twins were genetically
![Page 16: Transcript CNVs vs SNPs Understanding Human Structural ......So, in 2007, Science Magazine announced its breakthrough of the year as human genetic variation. Part of the reason for](https://reader034.fdocuments.in/reader034/viewer/2022050504/5f96278442335f714b02b8a7/html5/thumbnails/16.jpg)
16
identical. That turns out not be true. And the other question in our minds is if that is true, then there is mitotic change in copy number variants then are there differences between different tissues and how does that change during life? Do copy number variants accumulate during aging? These are all questions we don’t really know the answer to very well yet.
Slide 58 The other thing that some of you might work on monogenic disease. So,
turning from our putative microdeletion syndrome to a simpler situation of monogenic disease, we still have many questions. So, how many cases of monogenic disease where we can’t find mutations are actually due to a copy number variation?
Slide 59 Looking at our 50 guys again, here’s one of them. And you see that one of
these persons, represented by the green line here, has a deletion involving deletion of the whole copy of this gene that’s associated with autosomal recessive deafness.
Now, most sequencing strategies to analyze that gene would just look at
the coding sequence and they would amplify up whatever was there. They would amplify up the good copy and they’d say, “Okay, this guy is normal.” They would not find that mutation by standard methods. So, if you work on monogenic disease, you need to bear that in mind.
Slide 60 Turning to complex disease, and we’ve already had some discussion of
that from Charles, there have been a handful of especially candidate gene approaches that have been successful here. And this one that I’m just showing you is in the Fc gamma receptor. It’s work done by Tim Aitman at Imperial College where I’m from. And this particular polymorphism is associated with systemic autoimmunity.
Slide 61 But we have a real problem when we’re taking that out further and trying
to look at the implications for copy number variation in other complex diseases. And in particular, using the data that we’ve already accumulated from genome‐wide SNP association studies, to try and harvest again from that data another set up information about not just what SNPs are there, but what copy number variants are there. ‘Cause we have a problem. We’ve done all this very expensive work and we still have a good deal of what we call “missing heritability”. That is, we know that there is a genetic effect here and the markers that we’re finding from our SNPs don’t account for it all.
Slide 62 So, here are the potential problems for that. The first one is I’ve just told
you that monozygotic twins are not in fact identical so maybe our heritability estimates need adjusting. Although, it maybe that the
![Page 17: Transcript CNVs vs SNPs Understanding Human Structural ......So, in 2007, Science Magazine announced its breakthrough of the year as human genetic variation. Part of the reason for](https://reader034.fdocuments.in/reader034/viewer/2022050504/5f96278442335f714b02b8a7/html5/thumbnails/17.jpg)
17
differences we see would mean that things are more heritable than we currently think nonetheless, but we don’t know.
There are very many technical problems in predicting copy number
variants from our SNP data. The first is that we have different subclasses of copy number variants. Some of them are old and stable and therefore in strong linkage disequilibrium with surrounding SNPs, and not too difficult. Then we have other regions of the genome where new copy number variants frequently arise, and they do not show strong LD with surrounding SNPs. So, they’re not well tied in our data.
Another problem is that in early SNP arrays and in later ones too, there
are gaps in the genome in difficult regions, that is, very repetitive, troublesome regions, which are the very ones where some of our finest CNVs arise. So, we’ve got holes in the data here.
[0:40:07] Even where we have SNPs and even where there’s strong LD, we still
have a problem predicting where the copy number variants are. And you can tell this is a problem by a cursory glance at the literature and you’ll see that there are several algorithms out there for doing this. And where you see that you know that the technology is still maturing and that we haven’t yet settled on what’s the best approach. And that’s the case where people are still defining different algorithms and arguing about what’s the best approach.
Even if you choose an algorithm, it gives you an answer that you like, you
then have to validate and replicate your results; and that’s an additional problem. Some CNVs because they have local sequence variations around them are very difficult to PCR up or to analyze. And we also need to replicate, to do studies in say obesity and diabetes, as we do, we might want to do 20,000 in our replication cohort. And finding a technology to do that conveniently and quickly is very difficult for copy number variants.
Slide 63 But I’ll show you where we are in our data so far. It’s very much a work in
progress so I’m not saying we have found the answer to this. I’ll tell you what we’re doing and you can ‐‐ we’ll wait and see whether it’s true or not.
So, I’ll start off with type II diabetes where we’re using some genome‐
wide Illumina SNP data from a French study of type II diabetes published by Sladek et al., in 2007. And for that, we’re using our own CNV algorithm developed by Lachlan Coin at Imperial College London. And we developed that ourselves using not only the CHG data from the 50 normal guys I told
![Page 18: Transcript CNVs vs SNPs Understanding Human Structural ......So, in 2007, Science Magazine announced its breakthrough of the year as human genetic variation. Part of the reason for](https://reader034.fdocuments.in/reader034/viewer/2022050504/5f96278442335f714b02b8a7/html5/thumbnails/18.jpg)
18
you, but matched Illumina 1 million SNP chip data so we could check the predictions that we make. And we could estimate the false‐positives and false‐negatives where the probes happen to coincide. So, our algorithm also assigns a haplotype so that gives us increased power.
Slide 64 To cut a long story short, to our somewhat pleasure, we did find some of
the genes we expected. So, we found some of the MODY genes. They are single gene forms of type II diabetes like the classic MODY gene, glucokinase in cell. They weren’t at very significant p values, but they were there on our list, which was comforting.
We also found 23 new putative type II diabetes‐associated CNVs, some
copy number losses, some copy number gains, and 13 where we had both losses and gains. And they’re quite heartening in a way because in 10 of those, we see the losses only in diabetes and the gains only in the controls, which is heartening. In 9 of those 13 loci, they are subtelomeric. And, I think, Charles said earlier, there is a tendency to increased number of CNVs at telomeres. We’re still not entirely sure how much of that is truth and how much of it is artifact. So, we rather hope that this data is true.
Slide 65 We have increased confidence in our data where we see the same CNVs
coming up as associated with different related data sets or coming up as predicted by different algorithms. And I’m just showing here our CNVs in cohorts of adult obesity and adult controls and child obesity with child controls. So, they’re separate, but related cohorts. And I’m showing you here our top hits of CNVs that come up in both cohorts.
I want you to focus on the asterisks there. We have a black asterisk. It
means that the CNV was present and where the asterisk is red, it was counted as significant to a p value of 10‐6 or below in that cohort. So, you see the top one was picked out only by our own CNV hap algorithm in the children, but by all three in the adults. So, we see it both with different algorithms and the different cohorts so we have increased confidence in that one.
Some of these are known CNVs and some of them are as yet unknown.
Obviously, if it’s reported before then we’re really happy. But we have a problem analyzing this data in truth because the boundary information given by the different algorithms differs. So, it’s very difficult to say quite how big these are and sometimes there’s some difference in who they’re predicted in and who they’re not predicted in according to the different algorithms used.
Slide 66
![Page 19: Transcript CNVs vs SNPs Understanding Human Structural ......So, in 2007, Science Magazine announced its breakthrough of the year as human genetic variation. Part of the reason for](https://reader034.fdocuments.in/reader034/viewer/2022050504/5f96278442335f714b02b8a7/html5/thumbnails/19.jpg)
19
So, that’s where we are. And we are currently trying to go back to the gold standard now of using high resolution CHG in the loci of interest to pick up those and detect whether they’re really there. And for the larger scale replication studies, we’re developing Sequenom MALDI‐TOF based assays for doing our 20,000‐replication cohorts so…
[0:45:09] Slide 67 Slide 68 Just to say for the future, we need to expand our work on CNVs into
more ethnic groups. We need to think more carefully about how we map and report CNVs. We still have a lot of work to do on the mechanisms of their generation, their stability through life, and their functional consequences. That’s really still very much under researched. And we need more and better developed high throughput, very high throughput screening technologies to do the large numbers we really need for this so…
Sean Sanders: Great. Thank you very much Dr. Blakemore. Slide 69 Thank you all for the excellent presentations. And we’re going to go on to
our Q&A portion of the webinar. My first question was actually going to be about those French males, but
you took my fun there. ‘Cause I’m intrigued to know whether it’s specific to that French population, you know, or maybe to kids that don’t like Brussels sprouts.
[Laughter] Dr. Alex Blakemore: Well, we don’t really know. I just happen to have access to French males.
And a lot of what we see is mirrored in other populations particularly the common ones. But what’s interesting is when you look at the crossover between the different studies is that there are Venn diagrams, and you showed one, didn’t you? We never see 100% crossover even in the same population of results from different studies.
Sean Sanders: Okay. So, the first question that I have ‐‐ we’ve had a lot of questions come in
online, but this one was from somebody who emailed before the webinar, and it relates to this. And they were asking whether there are any CNVs that can be detected in tissues within the same individual. And maybe we’ll start with Dr. Lee to answer that.
![Page 20: Transcript CNVs vs SNPs Understanding Human Structural ......So, in 2007, Science Magazine announced its breakthrough of the year as human genetic variation. Part of the reason for](https://reader034.fdocuments.in/reader034/viewer/2022050504/5f96278442335f714b02b8a7/html5/thumbnails/20.jpg)
20
Dr. Charles Lee: Sure. So, I think, as was mentioned by Dr. Blakemore earlier, there was a paper that came out March of this year in the “American Journal of Human Genetics”, showing that there appears to be copy number variants between monozygotic twins. And so, presumably, these copy number variants arose early during embryogenesis or a little bit later on so meaning that these occurred mitotically. And that would suggest that if this kind of phenomenon can occur when you look at monozygotic twins, then you could have tissue‐to‐tissue differences occurring as well.
In fact, recollection is that in 2006 in the “American Journal of Human
Genetics”, Sharp et al. [0:47:26] [Phonetic] also made a comment that when they did their CNV studies, they also saw some preliminary data that demonstrated this. So, I think that there is growing evidence that there is some degree of CNV differences between tissues in a given individual.
Sean Sanders: Dr. Feuk, you have anything to add? Dr. Lars Feuk: Just to follow‐up on that, I just want to say that the number of somatic
CNVs is much, much smaller than what you would find in the genome as whole. And again, coming back to these problems of interpreting noisy data, before you can actually say that something is somatic, you would have to validate it by an independent method. Because right now even if you run exactly the same DNA twice, you will find some differences based on the array results. So, it’s important to point that out that you need to validate it in some way.
Sean Sanders: Okay. And Dr. Blakemore a question for you that was asked about whether
SNPS and/or CNVs can explain the different manifestations of the same disease in two individuals.
Dr. Alex Blakemore: Right. Well, there is some data emerging on that too. Sean Sanders: Uh‐hum. Dr. Alex Blakemore: For a while, we’ve looked at different lengths of mutation in people who
have known genomic disorders. It’s been known quite well in William’s syndrome for instance. And we can map the occurrence of particular sub‐phenotypes according to the length of mutation. So, we know that can be done and I happen to know this, there’s going to be some data published showing in a particular microdeletion syndrome that the sub‐phenotype
![Page 21: Transcript CNVs vs SNPs Understanding Human Structural ......So, in 2007, Science Magazine announced its breakthrough of the year as human genetic variation. Part of the reason for](https://reader034.fdocuments.in/reader034/viewer/2022050504/5f96278442335f714b02b8a7/html5/thumbnails/21.jpg)
21
of obesity can be mapped to a nearby gene that is included in some people’s mutation and not in others. So, I would say the answer is yes.
But on top of that, perhaps we can also look at something we think is
quite simple, so trisomy 21. So, we think of all the phenotypic things that we see in trisomy 21 as arising from there being three copies of chromosome 21. But, of course, chromosome 21 like the other chromosomes has regions of copy number variants. And so, in that case, where there’s a deletion, for example, then that person only may have two copies or one with no copies of that region or consequently in some areas, they might have four. And that might explain why for example 40% of children with Down’s syndrome have cardiac abnormalities and 60% don’t.
Sean Sanders: So, here’s a question I’m going to throw out to the group that came in
online. What about CNVs between two and a hundred base pairs, can we see them? Do we have the resolution yet in the technology? Let’s start with Dr. Feuk.
[0:49:58] Dr. Lars Feuk: I think for that really sequencing is going to be the answer. I mean
sequencing is very, very good at picking up these smaller variants. It’s actually maybe more difficult to pick up the big variants with sequencing. You have to use some different strategies to find those. But for the small variants that are less than a hundred base pairs, I think, sequencing is the way to go to really create that map of variation in the human genome.
Sean Sanders: Okay. Any other comments? Dr. Charles Lee: You know, Sean, I would agree with that. I think and it’s comforting to see
that whole genome sequencing is becoming more widely available just precisely to identify these small imbalances. Yeah.
Sean Sanders: Uh‐hum. So, you think as the next generation sequencing technology
advances, it’s going to get easier to see these? Dr. Charles Lee: Yes. Sean Sanders: Okay. Dr. Alex Blakemore: You can pick up small things by CHG. We accidentally picked up
something that turned out to be only 15 bases long by sequencing. But you have to have just by ‐‐ happen to stand put to your probe there. And it has to do with the density of probes that you can use.
![Page 22: Transcript CNVs vs SNPs Understanding Human Structural ......So, in 2007, Science Magazine announced its breakthrough of the year as human genetic variation. Part of the reason for](https://reader034.fdocuments.in/reader034/viewer/2022050504/5f96278442335f714b02b8a7/html5/thumbnails/22.jpg)
22
Sean Sanders: Okay. Question about inversions, are they less common or just more difficult to
detect? Dr. Lee, you want to start us off? Dr. Charles Lee: So, I think, inversions and other balanced rearrangements are part of
structural variation that really hasn’t been explored up to now. I’m thinking of a paper in 2005 in “Nature Genetics” by Tüzün et al. where they actually looked at structural genomic variation by sequence comparison methods in the size range of 8 to 40 kb. And they found approximately 300 copy number variants and about 50 odd inversions.
Sean Sanders: Uh‐hum. Dr. Charles Lee: So, clearly, there’s more ‐‐ in that study, they showed that there are more
genomic imbalances occurring in this whole spectrum of structural variation, but not an insignificant amount of balanced rearrangements such as inversions. But I think the problem is right now that the technology to do sort of a genome‐wide scan for these balanced rearrangements doesn’t exist to my knowledge. And so, this is an area of structural variation that we really haven’t fully explored.
Dr. Alex Blakemore: But you saw it in your fiber‐FISH, you saw it rather beautifully there. Dr. Charles Lee: Right. But that was a locus‐specific method. But to screen for it in a
genome‐wide manner, I think, is still not something that’s really possible. There are methods to look at balanced rearrangements, yes, in a specific locus.
Dr. Lars Feuk: I think another way to address the question is to actually compare the
chimpanzee genome to the human genome. ‘Cause there, we see what’s happened evolutionarily in the last few million years. And we do see a lot more copy number variation between those genomes than we do inversions.
Dr. Charles Lee: That’s true. Dr. Lars Feuk: So, again, indicating that copy number variants are more common than
inversions ‐‐ Dr. Charles Lee: Yup. Dr. Lars Feuk: ‐‐ independent of technologies used to assess it.
![Page 23: Transcript CNVs vs SNPs Understanding Human Structural ......So, in 2007, Science Magazine announced its breakthrough of the year as human genetic variation. Part of the reason for](https://reader034.fdocuments.in/reader034/viewer/2022050504/5f96278442335f714b02b8a7/html5/thumbnails/23.jpg)
23
Sean Sanders: Okay. And do we see CNVs in the chimp that we in human? Dr. Lars Feuk: Well, Charles has the answer for that – Sean Sanders: [Laughs] Dr. Lars Feuk: ‐‐ specifically of this so… Dr. Charles Lee: So, I think, the work is preliminary. But some of the earlier studies from
our group and others have shown that in fact when you look at CNVs in chimpanzees, in macaques, other primates, indeed there is a significant number of overlap of CNVs that we find in these primates that are also in humans.
Sean Sanders: Uh‐hum. Dr. Charles Lee: And in particular, those that are fairly large and associated with ancestral
segmental duplications. So, presumably, these ancestral segmental duplications are mediating the formation of these CNVs independently in the different lineages.
Sean Sanders: Uh‐hum. And how does that play into human evolution? Is there a role
for these duplications or these structural changes in human evolution? Dr. Charles Lee: So, I really do believe that these segmental duplications in our genome
are allowing our genomes to be much more plastic, which is a good thing. I think, we’re in a constantly changing environment and in order to continue to adapt to that environment, our genomes need to adapt as well. And these segmental duplications that are throughout our genome are making ways for our genomes to rearrange and to have more variation to allow us to adapt to that. So, I think, that is a good thing.
Sean Sanders: Okay. Dr. Blakemore, are you seeing this? Relating evolution to disease, you
know, can you see a link there? You know, for instance mutations that we know are advantageous in certain climates, say, that are disadvantageous in others.
Dr. Alex Blakemore: It’s a very early stage. I can’t really comment in detail, but I would be
surprised if we don’t see those various associations. I mean, there is a paper by Voigt a couple of years ago identifying regions of the genome that had been subject to recent selection. And some of those regions of
![Page 24: Transcript CNVs vs SNPs Understanding Human Structural ......So, in 2007, Science Magazine announced its breakthrough of the year as human genetic variation. Part of the reason for](https://reader034.fdocuments.in/reader034/viewer/2022050504/5f96278442335f714b02b8a7/html5/thumbnails/24.jpg)
24
the genome definitely overlap with regions where we see a lot of CNVs. And the types of genes involved definitely overlap with the types of genes that we see CNVs in. But nobody’s put that data together really as far as I know. Do you know anything of that?
Dr. Lars Feuk: No. Just ‐‐ I mean, to reiterate that you said, that the types of genes that
we do see to be copy number variable and to be increased. Like if we look at GO terms for example, it is a lot of genes that have to do with our interaction with the environment.
[0:55:07.0] Sean Sanders: Uh‐hum. Dr. Lars Feuk: So, that’s an indication that it is currently going on. The evolution is
currently going on in human genome and it is a mechanism for us to interact with our environment.
Sean Sanders: Uh‐hum. Okay. Well, I’m going to stay with you, Dr. Feuk, and a question came in about
the Toronto database. And they say, whether you’ll agree with this or not, that the database often gives a wrong estimate about the size of CNVs because of the BAC related data within it. Is this going to be corrected and will the database have ‐‐ so, what will we have, a database that is ‐‐ will we have a database that’s solely based on oligo data.
Dr. Lars Feuk: So, no. He or she is absolutely right that we are currently showing the old
BAC data, which we know is overestimating the sizes of the variant. So, we leave a lot of interpretation up to the user currently of the database. And if there are BAC data as well as oligonucleotide data, of course, you should look at the oligonucleotide data because it has higher resolution.
What we’re doing to sort of address this is in the next update of the
database, we’re going to separate the BAC data from the other types of data that are out there, which will give it much better ‐‐ you know, give the user a much better understanding of what the actual picture of variation at each site looks like.
Sean Sanders: Okay. Great. I’m going to jump into maybe a snake pit here. So, somebody asked a
question about nomenclature. Does the CNV imply a benign alteration and does the CNA implies a disease associated change? Who wants to start? [Laughs]
![Page 25: Transcript CNVs vs SNPs Understanding Human Structural ......So, in 2007, Science Magazine announced its breakthrough of the year as human genetic variation. Part of the reason for](https://reader034.fdocuments.in/reader034/viewer/2022050504/5f96278442335f714b02b8a7/html5/thumbnails/25.jpg)
25
Dr. Alex Blakemore: I think that’s one for Charles. [Laughter] Dr. Charles Lee: So, yes. CNV nomenclature has been a very tricky issue. And this is
something, which I’m sure not only our group, but others have debated. Classically for those doing clinical service in genetic diagnostics, we refer to variants as those that are benign. So, the whole idea of copy number variants generally implies to most clinical geneticists or clinical cytogenetecists that that imbalance is not contributing to the clinical phenotype.
Now, I think, one of the things that have been proposed is that we use
identifiers before the CNV. So, is it a pathogenic CNV or a benign CNV? And I think that proposal came forward because of the fact that we’re discovering these CNVs at such a rate that we don’t understand the biological functions behind them.
Sean Sanders: Uh‐hum. Dr. Charles Lee: And some of these CNVs that we think currently are benign, are not
contributing to a clinical phenotype, may later be shown to actually indeed contribute to that phenotype. And that could be on its own, it could in combination with various SNPs. So, I think, it’s more of a factor that we just ‐‐ our understanding in this area is still quite limited. And so, we’re using this kind of a terminology to be more all encompassing and to protect ourselves.
Sean Sanders: Okay. Any other comments? Stay away from that one? [Laughter] Okay. So, I’m going to jump into some technical questions now. We had a
question come in by email asking what tools are available for SNP and CNV screening in non‐model species, in other words, species where we don’t have a reference genome. So, Dr. Feuk, you want to try that one?
Dr. Lars Feuk: Most of the strategies that we currently have for annotating variation are
built on using these different array‐based technologies. So, of course, if you don’t have a genome, you cannot build an array because you don’t have any sequences to put on the array. So, it depends a little bit on what stage this model organism would be at. If you have for example a BAC library, you could actually make an array with those BACs. But if you have
![Page 26: Transcript CNVs vs SNPs Understanding Human Structural ......So, in 2007, Science Magazine announced its breakthrough of the year as human genetic variation. Part of the reason for](https://reader034.fdocuments.in/reader034/viewer/2022050504/5f96278442335f714b02b8a7/html5/thumbnails/26.jpg)
26
nothing at all, it’s very difficult to start the annotation of that genome by looking for SNPs and CNVs.
Sean Sanders: Uh‐hum. Dr. Lars Feuk: I would say you should start by sequencing a genome and trying to make
at least the rough assembly on which you could then base your future studies.
Sean Sanders: Uh‐hum. Uh‐hum. Okay. Dr. Blakemore, you’re going to ‐‐ anything to
add? Dr. Alex Blakemore: That’s true for genome‐wide approaches. Of course, if you know
something about your favorite gene, then perhaps you can do something locally in that region more quickly. But on the whole line, I agree with Lars.
Sean Sanders: Uh‐hum. Okay. Well, we’re almost out of time so I’m going to ask one more
question and I’ll go through all the panelists and see what your answer is to this. This question comes in online, what is the biggest challenge you feel in this field? So, maybe we’ll start with ‐‐ let’s start with Dr. Blakemore at the end.
Dr. Alex Blakemore: For me in my field, it’s having a technique that is fast enough and
efficient enough to look at thousands of people. There are things that ‐‐we’ve gotten very good at looking at thousands of CNVs in a few people and we need something to look at a few CNVs in thousands of people.
[1:00:04] Sean Sanders: Uh‐hum. Dr. Feuk? Dr. Lars Feuk: To me, it’s really what I’ve been working on for the last few years. And
it’s to create a really high resolution map of this type of variation. Because I think we need that map to be able to go on and do really well designed studies in disease and so on. So, that’s the biggest challenge to me, to make an accurate map, start and endpoints are correct, the frequencies are correct in the populations. And we have all the different populations that are out there, we have data for those. I think that’s crucial to be able to take this to the disease studies.
Sean Sanders: Uh‐hum. Dr. Lee?
![Page 27: Transcript CNVs vs SNPs Understanding Human Structural ......So, in 2007, Science Magazine announced its breakthrough of the year as human genetic variation. Part of the reason for](https://reader034.fdocuments.in/reader034/viewer/2022050504/5f96278442335f714b02b8a7/html5/thumbnails/27.jpg)
27
Dr. Charles Lee: So, Sean, the thought that I have is that it was the technology advances that became available, like array CGH, that allowed us to identify copy number variants and other structural variants. And, I think, there are areas where we need to continue to make technological advances.
Sean Sanders: Uh‐hum. Dr. Charles Lee: Some things that come to mind for example is a lot of the things that we ‐
‐ the information we get now is relative copy number. I’d like to see more absolute copy number data come out. We need to get allelic information, if there are four copies in a cell, is it two on one chromosome, two on the other, or one and three, or etc. ‐‐ I think that information.
And as was mentioned earlier, I think we really need to get high
definition of the architecture, the CNV’s breakpoint information, where are they in the genome. And that’s going to give us the most power for future association studies and other studies like in evolution, pharmacogenomics. And I’m really looking forward to seeing what breakthroughs are coming over in the next years, you know, on this.
Sean Sanders: Great. Well, I wish we had more time to continue this discussion. It’s very
interesting, but unfortunately, we’ve reached the end of our hour. So, it just remains for me to thank our excellent speakers for being with us today and for the enlightening discussion they’ve provided; Dr. Charles Lee from Harvard Medical School, Dr. Lars Feuk from the Hospital for Sick Children, and Dr. Alex Blakemore from Imperial College London.
Thank you also to our online viewers for the wonderful questions. Sorry
we didn’t manage to get to all of them. Please go to the URL at the bottom of your slide viewer now if you would
like to learn a little bit more about some products related to today’s discussion. And look out for more webinars from Science in the near future available at www.sciencemag.org/webinar.
We encourage you to share your thoughts about the webinar with us by
sending an email to the address now up in your slide viewer, [email protected].
Again, thank you to all the participants and to Agilent Technologies for
their kind sponsorship of this educational seminar. Goodbye.
![Page 28: Transcript CNVs vs SNPs Understanding Human Structural ......So, in 2007, Science Magazine announced its breakthrough of the year as human genetic variation. Part of the reason for](https://reader034.fdocuments.in/reader034/viewer/2022050504/5f96278442335f714b02b8a7/html5/thumbnails/28.jpg)
28
Okay. Thank you very much. [1:02:45] End of Audio