WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4...
Transcript of WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4...
![Page 1: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/1.jpg)
WNGT 2020 Efficiency Shared TaskKenneth Heafield,1 Yusuke Oda, Graham Neubig
https://www.aclweb.org/anthology/2020.ngt-1.1https://sites.google.com/view/wngt20/efficiency-task
1Corruptly, both organizer and participant.
![Page 2: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/2.jpg)
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 2
![Page 3: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/3.jpg)
Goal: Efficient Machine Translation
Present task: inference → productionFuture task: efficient training?
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 3
![Page 4: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/4.jpg)
Data Condition
WMT 2019 English–German constrained news task.
State-of-the-art systems submit to the latest WMT=⇒ There is no such thing as state-of-the-art on WMT14!
Also, recycle WMT 2019 systems as teachers.
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 4
![Page 5: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/5.jpg)
Awkward timing with WMT
2020 training data not final at start, test set unavailable at end.Root cause: WNGT at ACL, WMT at EMNLP.Coordinate with WMT more?
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 5
![Page 6: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/6.jpg)
Test Set
Last year≈1s to translate =⇒ too smallBanned a team for memorizing known test set
Before deadline1 million sentences≤100 space-separated words/sentenceUnspecified test set hidden in input
After deadlineWMT plus filler: EMEA, Tatoeba, German FederalShuffled, also score parallel filler datahttp://data.statmt.org/heafield/wngt20/test/
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 6
![Page 7: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/7.jpg)
Test Set
Last year≈1s to translate =⇒ too smallBanned a team for memorizing known test set
Before deadline1 million sentences≤100 space-separated words/sentenceUnspecified test set hidden in input
After deadlineWMT plus filler: EMEA, Tatoeba, German FederalShuffled, also score parallel filler datahttp://data.statmt.org/heafield/wngt20/test/
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 7
![Page 8: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/8.jpg)
Test Set
Last year≈1s to translate =⇒ too smallBanned a team for memorizing known test set
Before deadline1 million sentences≤100 space-separated words/sentenceUnspecified test set hidden in input
After deadlineWMT plus filler: EMEA, Tatoeba, German FederalShuffled, also score parallel filler datahttp://data.statmt.org/heafield/wngt20/test/
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 8
![Page 9: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/9.jpg)
Approximately Measuring Quality
Need a surprise evaluation set. WMT20 not ready yet.→ Uh, average old WMT test sets?→ WMT12 has sentences longer than 100 words.→ WMT1*: average sacrebleu of WMT11, WMT13–19
See paper supplement for individual WMT scores.Problem: participants likely tuned on WMT sets.
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 9
![Page 10: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/10.jpg)
BLEU?
“use human evaluation to verify claims in experiments that use metrics suchas BLEU” –Reviewer of my EU project
“BLEU has been surpassed by various other metrics”–Mathur et al, ACL 2020→ Submitted fast Czech systems to WMT20 with Charles University.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grantagreement No 825303.
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 10
![Page 11: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/11.jpg)
Hardware
Recent hardware with 8-bit optimization:
GPU NVidia T4g4dn.xlarge on Amazon Web Services $0.526/hr
CPU Intel Xeon Platinum 8275CL (Cascade Lake) dual socketc5.metal on Amazon Web Services $4.08/hrSingle-core and all-core tracks (48 physical cores)
Provided credits for participants to develop with.
Amazon, Intel, and NVidia have contributed to my research.
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 11
![Page 12: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/12.jpg)
Three teams
Multiple submission encouraged!GPU CPU 1 core CPU all core
NiuTrans 4 0 1OpenNMT 4 4 4UEdin 4 2 5
UEdin’s CPU submissions had a memory leak → shown with/without fix.
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 12
![Page 13: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/13.jpg)
Pareto Comparison
Submissions have varying quality and efficiency.Unclear how much quality loss to tolerate.
Pareto comparison: quality ≥ baseline and efficiency ≥ baseline.
More efficient with same quality. . . or better quality with same efficiency.
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 13
![Page 14: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/14.jpg)
Speed
Primary: wall clock time.Words per second based on 15,048,961 untokenized words.
Supplementary data: CPU time.
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 14
![Page 15: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/15.jpg)
GPU speed
28
30
32
34
36
0 5 10 15 20 25
WM
T1*
BLEU
Thousand words per real second
NiuTransOpenNMT
UEdin
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 15
![Page 16: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/16.jpg)
CPU single core speed
28
30
32
34
36
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
WM
T1*
BLEU
Thousand words per real second
OpenNMTUEdin
UEdin Fix
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 16
![Page 17: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/17.jpg)
CPU all core speed
28
30
32
34
36
0 20 40 60 80 100 120
WM
T1*
BLEU
Thousand words per real second
NiuTransOpenNMT
UEdin
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 17
![Page 18: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/18.jpg)
Cost
28
30
32
34
36
0 20 40 60 80 100 120 140 160
WM
T1*
BLEU
Million Words per USD
NiuTrans: GPUOpenNMT: GPU
UEdin: GPUCPUCPUCPU
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 18
![Page 19: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/19.jpg)
Disk
Model size: parameters, BPE, shortlists, etc.
Total Docker size: model, part of Ubuntu, codeOpenNMT won Docker with 122–308 MB; others 432–933 MB.
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 19
![Page 20: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/20.jpg)
Model size, all platforms
28
30
32
34
36
0 50 100 150 200 250 300 350 400 450
WM
T1*
BLEU
Model size (MB)
NiuTransOpenNMT
UEdin
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 20
![Page 21: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/21.jpg)
Peak RAM usage
GPU: polling nvidia-smiCPU: memory.max usage in bytes
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 21
![Page 22: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/22.jpg)
GPU RAM
28
30
32
34
36
1 2 4
WM
T1*
BLEU
GPU RAM (GB)
NiuTransOpenNMT
UEdin
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 22
![Page 23: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/23.jpg)
CPU single core RAM
28
30
32
34
36
0.25 0.5 1 2 4 8 16 32 64 128
WM
T1*
BLEU
CPU RAM (GB)
OpenNMTUEdin
UEdin Fix
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 23
![Page 24: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/24.jpg)
CPU all core RAM
28
30
32
34
36
1 2 4 8 16 32 64
WM
T1*
BLEU
CPU RAM (GB)
NiuTransOpenNMT
UEdinUEdin Fix
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 24
![Page 25: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/25.jpg)
Efficiency Task
All participants had something Pareto optimal.
System descriptions:https://sites.google.com/view/wngt20/programme
I am opening the task for rolling submission.
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 25
![Page 26: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/26.jpg)
What’s missing
Allowed batching in all conditions→ What about latency?
Where are the non-autoregressive people?→ Non-autoregressive: a case study in poor evaluation.
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 26
![Page 27: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/27.jpg)
Latency
Latency is average time to translate one sentence.Experiments with Edinburgh’s systems; sorry I asked too late.
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 27
![Page 28: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/28.jpg)
Batching is Important for Speed
28
29
30
31
32
33
34
35
36
0 1 2 3 4 5 6 7 8 9 10
WM
T1*
BLEU
Thousand words per second
Batch on GPUBatch on CPU core
Single on GPUSingle on CPU core
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 28
![Page 29: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/29.jpg)
Latency: 10.3–71.7 ms!
28
29
30
31
32
33
34
35
36
0 10 20 30 40 50 60 70 80
WM
T1*
BLEU
Latency (ms)
GPUCPU core
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 29
![Page 30: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/30.jpg)
Autoregressive MT latency is 10.3–71.7 ms, often <30 ms.
So what’s up with this table from Jiatao Gu et al (2018)�
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 30
![Page 31: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/31.jpg)
Replicating Gu et al (2018)’s setup
Do not try this at home or work.WMT14State-of-the-art is latest WMT.Just don’t claim state-of-the-art like Wang et al (2018) did.
Tokenized BLEUTokenization differences =⇒ BLEU scores are not comparable.But many non-autoregressive papers compare anyway.Use sacrebleu instead.
P100, latency on IWSLT 2016 en-de dev.
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 31
![Page 32: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/32.jpg)
Replicating Gu et al (2018)’s setup
Do not try this at home or work.WMT14State-of-the-art is latest WMT.Just don’t claim state-of-the-art like Wang et al (2018) did.
Tokenized BLEUTokenization differences =⇒ BLEU scores are not comparable.But many non-autoregressive papers compare anyway.Use sacrebleu instead.
P100, latency on IWSLT 2016 en-de dev.
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 32
![Page 33: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/33.jpg)
Real baselines for Gu et al (2018)
16
18
20
22
24
26
28
30
0 100 200 300 400 500 600 700
Weir
dly
Toke
nize
dW
MT1
4BL
EU
Latency (ms) on P100, IWSLT 2016 dev
Gu et al (2018) autoregressiveGu et al (2018) non-autoregressive
Marian 2018
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 33
![Page 34: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/34.jpg)
Real baselines for Gu et al (2018)
16
18
20
22
24
26
28
30
0 100 200 300 400 500 600 700
Weir
dly
Toke
nize
dW
MT1
4BL
EU
Latency (ms) on P100, IWSLT 2016 dev
Gu et al (2018) autoregressiveGu et al (2018) non-autoregressive
Marian 2018Marian 2019
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 34
![Page 35: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/35.jpg)
Research doesn’t have to be state-of-the-art.Just mention stronger baselines, not 60x weaker straws.
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 35
![Page 36: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/36.jpg)
Arguably this is what the shared task explores.Here are some easy things you could have done:
1 Model distillation for autoregressive, since it’s used fornon-autoregressive
2 Use 1–2 decoder layers in autoregressive models3 Averaged attention network
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 36
![Page 37: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/37.jpg)
Recommendations
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 37
![Page 38: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/38.jpg)
Use sacrebleu.You can’t compare against a paper that didn’t.
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 38
![Page 39: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/39.jpg)
Don’t have to be state-of-the-art.Just cite it or put it in your table.Strawman baselines are misleading.
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 39
![Page 40: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/40.jpg)
Lots of baselines
1 Fewer parameters or layers2 Quantize3 Prune4 Beam size5 Shortlisting6 Simplify architecture7 Model distillation8 Early exit9 Non-autoregressive
Show your method is a better trade-off via Pareto optimality.Don’t trust papers that get X speedup for “small” Y BLEU loss!
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 40
![Page 41: WNGT 2020 Efficiency Shared Task · Recent hardware with 8-bit optimization: GPUNVidia T4 g4dn.xlarge on Amazon Web Services $0.526/hr CPUIntel Xeon Platinum 8275CL (Cascade Lake)](https://reader034.fdocuments.in/reader034/viewer/2022050114/5f4b7654de5c9f196e50bbed/html5/thumbnails/41.jpg)
Conclusion
Currently no evidence that non-autoregressive is competitive.
We’re implementing it in Marian.
WNGT 2020 efficiency task is rolling, send me dockers!
Task Definition Efficiency Results Latency Non-autoregressive Recommendation 41