1.昨天提出的四点

  • 通过继续降低λ,控制m的值在正常范围内

做了多组实验,当λ太小时需要将学习率增特大,否则梯度一直为0,m没有被训练。而当学习率过大且λ小时,容易m爆炸,如下图,


λ=1e-14, lr=1e4
  • 继续增大学习率,至收敛(震荡)

  • 正向test,比较直觉听和网络的正确率差距

  • 不将超出去的值返回加,而是通过归一化乘以系数

比如原值为9,m=3,本应是1 2 3 2 1,但由于前1 2越界了,则系数original_ratio_to_nowsum=9/(1+2+3)=1.5,则Z_add乘上系数,变为4.5 3 1.5。

这一步是比较简单的,但是下一步Z_addm求导,就不能简单乘上这个系数了,因为这个系数本身是和m、坐标有关的,其对m也有导数,所以求导较为复杂。

目前推导的算法如下,还不确定:

而之前的方法是将超出去的值都反回来加到中心点上,以及最初以为直接乘以系数,这里也都有测试。

2.控制变量 几组结果

0 1 2 3 4 5 6

序号 lr lambda 是否归一化 m_mean m_var 效果
0 1e4 1e-10 是(但求导直接乘系数) 4e2 1e6 loss上升,m过大,反
1 5e4 1e-10 是(但求导直接乘系数) 1e3 7e6 loss上升了/m过大/反
2 1e4 1e-10 否(回加) 50 3e5 loss下降且收敛/反
3 1e4 1e-10 是(求导用全1) 4e2 8e5 loss上升/m过大/反
4 1e4 1e-11 否(回加) 50 1e4 loss下降且收敛/反
5 1e4 1e-11 是(求导用全K) 8e2 6e6 loss在高处/m过大/反
6 1e4 1e-11 是(求导用全1) 4e2 8e5 loss上升/m过大/反

  • 0. lr=1e4, λ=1e-10, 归一化直接乘系数

loss分为两部分,蓝-λ橙

对m梯度分为两部分,蓝-λ橙。这里分别看梯度的平均值和方差

MSE的梯度 H(m)的梯度


m的变化和最终m,详细

1000次epoch 2000次epoch 3000次epoch 4000次epoch 5000次epoch 6000次epoch 7000次epoch 8000次epoch 9000次epoch 10000次epoch


最后一层的梯度和m分布直方图

最后输出在各个维度的分布:

正向结果:

Remove50high:
|T|: that|had|its|source|away|back|in|the|woods|of|the|old|cuthbert|place|it|was|reputed|to|be|an|intricate|headlong|brook|in|its|earlier|course|through|those|woods|with|dark|secrets|of|pool|and|cascade|but|by|the|time|it|reached|lynde's|hollow|it|was|a|quiet|well|conducted|little|stream
|P|: if|hanget|soi|when|he|bected|e|woath|fe|holpe|gethrickeles|it|was|roget|i|to|tbe|to|to|to|food|um|the|would|bii|is|rer|tlie|f|hi|o|s|for|odoes|anes|with|tack|touprwiertse|l|on|a|o|cin|dad|batk|ane|the|lamed|toe|weaced|bid|s|holan|if|ns|a|mop|n|not|id|wet|estrane
[sample: 1, WER: 92.4528%, LER: 61.1307%, total WER: 92.4528%, total LER: 61.1307%, progress: 100%]

Remove50low:
|T|: that|had|its|source|away|back|in|the|woods|of|the|old|cuthbert|place|it|was|reputed|to|be|an|intricate|headlong|brook|in|its|earlier|course|through|those|woods|with|dark|secrets|of|pool|and|cascade|but|by|the|time|it|reached|lynde's|hollow|it|was|a|quiet|well|conducted|little|stream
|P|: at|hatted|sourc|a|away|back|of|the|woonds|if|the|ld|cuthortable|it|was|er|r|fu|d|to|be|n|in|ruuke|dt|hedrop|rolk|his|arey|coursed|o|there|his|woos|with|darke|segrence|of|chuul|and|keste|scame|n|that|by|the|timent|reach|lens|hollow|u|t|was|acuiret|whlk|conductid|lith|l|stream
[sample: 1, WER: 73.5849%, LER: 36.0424%, total WER: 73.5849%, total LER: 36.0424%, progress: 100%]

Remove80high:
|T|: that|had|its|source|away|back|in|the|woods|of|the|old|cuthbert|place|it|was|reputed|to|be|an|intricate|headlong|brook|in|its|earlier|course|through|those|woods|with|dark|secrets|of|pool|and|cascade|but|by|the|time|it|reached|lynde's|hollow|it|was|a|quiet|well|conducted|little|stream
|P|: edione|sre|ar|ul|other|thoenei|g|mlany|o|n|the|ning|lovin|of|hou|of|ho|i|of|this|it|litel|oo|o|on|you|mouve|ungling|livien|im|work|o|erany|n|wom|an|moliuaely|i|and|of|on|hem|reng|moiring|the|the|little|fiver|whol|lin|who|willle|linglot|o|ho|aitein|lin|srund|on|nmlics|if|waneaing|engeng|hund|mlovme|on|oivvles
[sample: 1, WER: 94.3396%, LER: 81.9788%, total WER: 94.3396%, total LER: 81.9788%, progress: 100%]

Remove80low:
|T|: that|had|its|source|away|back|in|the|woods|of|the|old|cuthbert|place|it|was|reputed|to|be|an|intricate|headlong|brook|in|its|earlier|course|through|those|woods|with|dark|secrets|of|pool|and|cascade|but|by|the|time|it|reached|lynde's|hollow|it|was|a|quiet|well|conducted|little|stream
|P|: at|hatie|source|sa|wayi|back|an|e|wouids|if|the|l|hath|ritaibles|it|was|er|revu|d|to|be|an|nerookite|hendioal|rook|his|earty|coursed|hr|thanses|woos|woi|h|darke|sengrns|of|clolear|kust|game|that|by|the|time|at|reached|lingns|holl|oug|t|was|aquiret|wilk|conducted|fital|srem
[sample: 1, WER: 73.5849%, LER: 36.0424%, total WER: 73.5849%, total LER: 36.0424%, progress: 100%]

  • 1. lr=5e4, λ=1e-10, 归一化直接乘系数

loss分为两部分,蓝-λ橙

对m梯度分为两部分,蓝-λ橙。这里分别看梯度的平均值和方差

MSE的梯度 H(m)的梯度


m的变化和最终m,详细

1000次epoch 2000次epoch 3000次epoch 4000次epoch 5000次epoch 6000次epoch 7000次epoch 8000次epoch 9000次epoch 10000次epoch


最后一层的梯度和m分布直方图

最后输出在各个维度的分布:

正向结果:

Remove50high:
|T|: that|had|its|source|away|back|in|the|woods|of|the|old|cuthbert|place|it|was|reputed|to|be|an|intricate|headlong|brook|in|its|earlier|course|through|those|woods|with|dark|secrets|of|pool|and|cascade|but|by|the|time|it|reached|lynde's|hollow|it|was|a|quiet|well|conducted|little|stream
|P|: ffentets|song|e|he|ick|wooks|is|you|d|t|fif|te|wrickedes|it|was|urgut|d|to|beae|o|tou|you|italen|ove|look|ou|cius|of|the|us|cunhes|goods|d|darkts|e|coufscoit|f|an|the|ase|dad|it|gent|the|e|elived|each|thate|hat|o|o|t|it|ias|o|o|had|wott|the|nuph|t|littls|scapf
[sample: 1, WER: 92.4528%, LER: 62.5442%, total WER: 92.4528%, total LER: 62.5442%, progress: 100%]

Remove50low:
|T|: that|had|its|source|away|back|in|the|woods|of|the|old|cuthbert|place|it|was|reputed|to|be|an|intricate|headlong|brook|in|its|earlier|course|through|those|woods|with|dark|secrets|of|pool|and|cascade|but|by|the|time|it|reached|lynde's|hollow|it|was|a|quiet|well|conducted|little|stream
|P|: at|hatted|sorce|of|way|back|in|the|bones|if|the|old|cuthbrtable|it|was|er|r|vu|d|to|be|a|nriiket|hedron|rouu|an|his|eravey|courstder|the|his|woods|with|darke|secgrence|of|pool|and|cast|gape|bat|by|the|timenat|reach|lind's|colow|oun|t|was|a|cuiet|will|conducted|hith|a|streak
[sample: 1, WER: 71.6981%, LER: 30.742%, total WER: 71.6981%, total LER: 30.742%, progress: 100%]

Remove80high:
|T|: that|had|its|source|away|back|in|the|woods|of|the|old|cuthbert|place|it|was|reputed|to|be|an|intricate|headlong|brook|in|its|earlier|course|through|those|woods|with|dark|secrets|of|pool|and|cascade|but|by|the|time|it|reached|lynde's|hollow|it|was|a|quiet|well|conducted|little|stream
|P|: hit|ent|milles|mea|anin|oal|ou|eaimie|e|liikeu|tho|li|hanions|sinse|s|isis|t|onnen|musting|h|you|helingtly|ifey|moming|it|croute|s|fom|of|of|loveln|en|seliane|lomy|he|moms|o|i|iung|hisncs|has|u|old|intg|saing|monning|and|oin|linkan|o|mhe|te|nlenin
[sample: 1, WER: 96.2264%, LER: 74.2049%, total WER: 96.2264%, total LER: 74.2049%, progress: 100%]

Remove80low:
|T|: that|had|its|source|away|back|in|the|woods|of|the|old|cuthbert|place|it|was|reputed|to|be|an|intricate|headlong|brook|in|its|earlier|course|through|those|woods|with|dark|secrets|of|pool|and|cascade|but|by|the|time|it|reached|lynde's|hollow|it|was|a|quiet|well|conducted|little|stream
|P|: at|hates|houre|same|way|back|ne|omords|in|the|ole|ath|rigkle|if|was|r|ul|d|to|be|an|n|rooket|to|hadn|yrong|orkh|e|ermy|a|courst|hfor|thesei|woaces|with|darpes|erence|o|el|n|caunt|gae|that|by|the|ima|and|reached|lings's|coll|rough|was|quiet|wholm|a|onductin|ort|streme
[sample: 1, WER: 79.2453%, LER: 44.8763%, total WER: 79.2453%, total LER: 44.8763%, progress: 100%]

output和groundtruth的差别

50high_output 50low_output 80high_output 80low_output

  • 2. lr=1e4, λ=1e-10, 直接减

loss分为两部分,蓝-λ橙

对m梯度分为两部分,蓝-λ橙。这里分别看梯度的平均值和方差

MSE的梯度 H(m)的梯度


m的变化和最终m,详细

1000次epoch 2000次epoch 3000次epoch 4000次epoch 5000次epoch 6000次epoch 7000次epoch 8000次epoch 9000次epoch 10000次epoch


最后一层的梯度和m分布直方图

最后输出在各个维度的分布:

正向结果:

Remove50high:
|T|: that|had|its|source|away|back|in|the|woods|of|the|old|cuthbert|place|it|was|reputed|to|be|an|intricate|headlong|brook|in|its|earlier|course|through|those|woods|with|dark|secrets|of|pool|and|cascade|but|by|the|time|it|reached|lynde's|hollow|it|was|a|quiet|well|conducted|little|stream
|P|: that|hide|follycs|reavd|yoere|onll|woods|it|thei|threuthles|winn|u|e|was|or|you|tinn|to|be|gn|iueael|eau|congy|he|erlly|oreaeriel|eorster|tos|woots|af|you|yhou|withle|dorics|swordts|of|ro|gol|n|bcaust|tate|of|if|hood|goee|an|hevinglin|ve|lingies|if|lo|e|o|was|gloine|we|liiing|ilfint|ieas|re
[sample: 1, WER: 88.6792%, LER: 63.9576%, total WER: 88.6792%, total LER: 63.9576%, progress: 100%]

Remove50low:
|T|: that|had|its|source|away|back|in|the|woods|of|the|old|cuthbert|place|it|was|reputed|to|be|an|intricate|headlong|brook|in|its|earlier|course|through|those|woods|with|dark|secrets|of|pool|and|cascade|but|by|the|time|it|reached|lynde's|hollow|it|was|a|quiet|well|conducted|little|stream
|P|: and|hact|sourc|of|way|back|the|woods|if|the|yold|hethrurtable|it|was|er|youl|d|to|be|ntrook|et|hedwont|wolk|an|his|ereheye|coursed|to|thense|laads|with|darks|secrts|of|croom|e|an|casin|cate|mat|by|the|time|a|reachd|linds|hollow|u|it|was|a|quiet|will|h|conducted|lithlel|streime
[sample: 1, WER: 71.6981%, LER: 33.2155%, total WER: 71.6981%, total LER: 33.2155%, progress: 100%]

Remove80high:
|T|: that|had|its|source|away|back|in|the|woods|of|the|old|cuthbert|place|it|was|reputed|to|be|an|intricate|headlong|brook|in|its|earlier|course|through|those|woods|with|dark|secrets|of|pool|and|cascade|but|by|the|time|it|reached|lynde's|hollow|it|was|a|quiet|well|conducted|little|stream
|P|: ly|he|aties|shooc|om|en|e|ke|tlerlyn|o|homn|if|yeuthues|was|o|thre|flus|chorlinly|woll|e|his|yeue|in|t|beenin|till|you|fiute|gone|h|rugin|one|irlr|wert|ho|iastle|fo|lou|who|with|uelriengs|can|n|worns|little|as|is|teave|onnln|thou|lang|yo|o|i|we|hil|ue|lens|soind|loo|eat|wais|i|w|ain|the|liiere|ilviei|r|lidse|of|reag
[sample: 1, WER: 98.1132%, LER: 77.3852%, total WER: 98.1132%, total LER: 77.3852%, progress: 100%]

Remove80low:
|T|: that|had|its|source|away|back|in|the|woods|of|the|old|cuthbert|place|it|was|reputed|to|be|an|intricate|headlong|brook|in|its|earlier|course|through|those|woods|with|dark|secrets|of|pool|and|cascade|but|by|the|time|it|reached|lynde's|hollow|it|was|a|quiet|well|conducted|little|stream
|P|: and|hartiens|sourd|way|back|the|woods|o|deyeld|o|eforite|le|c|it|was|you|yu|d|to|be|n|n|arulk|t|to|had|n|wook|of|his|ae|he|a|coursed|to|tdon'|s|loons|wuth|n|darks|rns|of|pora|in|the|d|mack|by|the|time|reached|wins|hellow|it|was|cuiret|willom|c|condupt|o|huth|le|streate
[sample: 1, WER: 84.9057%, LER: 44.8763%, total WER: 84.9057%, total LER: 44.8763%, progress: 100%]

output和groundtruth的差别

50high_output 50low_output 80high_output 80low_output

  • 3. lr=1e4, λ=1e-10, 归一化,求导用全1

loss分为两部分,蓝-λ橙

对m梯度分为两部分,蓝-λ橙。这里分别看梯度的平均值和方差

MSE的梯度 H(m)的梯度


m的变化和最终m,详细

1000次epoch 2000次epoch 3000次epoch 4000次epoch 5000次epoch 6000次epoch 7000次epoch 8000次epoch 9000次epoch 10000次epoch


最后一层的梯度和m分布直方图

最后输出在各个维度的分布:

正向结果:

Remove50high:
|T|: that|had|its|source|away|back|in|the|woods|of|the|old|cuthbert|place|it|was|reputed|to|be|an|intricate|headlong|brook|in|its|earlier|course|through|those|woods|with|dark|secrets|of|pool|and|cascade|but|by|the|time|it|reached|lynde's|hollow|it|was|a|quiet|well|conducted|little|stream
|P|: in|fa|fict|fom|he|beckin|ood|s|feak|ifteoruentntoes|it|was|recute|to|eaven|the|too|at|koote|oun|gook|c|e|his|note|yo|d|instent|olv|his|dock|rith|o|cat|e|en|er|the|cast|day|lit|an|e|treiaot|wea|to|tlens|have|hove|itnss|lite|lotpean|nook|it|o|stan
[sample: 1, WER: 92.4528%, LER: 64.311%, total WER: 92.4528%, total LER: 64.311%, progress: 100%]

Remove50low:
|T|: that|had|its|source|away|back|in|the|woods|of|the|old|cuthbert|place|it|was|reputed|to|be|an|intricate|headlong|brook|in|its|earlier|course|through|those|woods|with|dark|secrets|of|pool|and|cascade|but|by|the|time|it|reached|lynde's|hollow|it|was|a|quiet|well|conducted|little|stream
|P|: at|hatted|sourc|som|auwaty|back|i|the|bonds|it|the|ld|hathertble|it|was|erefeuld|it|to|be|intuuk|tet|hederan|welk|an|s|e|ereyr|acoursed|ther|wis|oods|with|te|dark|secgrence|of|cloul|and|cast|cape|that|by|the|timent|reach|ling's|holow|it|was|aquiet|wilk|condected|t|hith|le|shret
[sample: 1, WER: 75.4717%, LER: 37.8092%, total WER: 75.4717%, total LER: 37.8092%, progress: 100%]

Remove80high:
|T|: that|had|its|source|away|back|in|the|woods|of|the|old|cuthbert|place|it|was|reputed|to|be|an|intricate|headlong|brook|in|its|earlier|course|through|those|woods|with|dark|secrets|of|pool|and|cascade|but|by|the|time|it|reached|lynde's|hollow|it|was|a|quiet|well|conducted|little|stream
|P|: muase|nanies|aveils|elleining|in|o|han|meye|of|tly|ou|little|as|reins|littl|had|vinge|o|ere|he|little|mever|then|e|ernlialye|es|winm|i|lints|sulin|u|king|on|on|hes|e|s|anly|r|willingnan|heesein|the|lit|f|hou|loulsins|line|hugu|onh|loms|littlul|eung|on|in|then|lenl|se|lottle|of|e
[sample: 1, WER: 94.3396%, LER: 74.5583%, total WER: 94.3396%, total LER: 74.5583%, progress: 100%]

Remove80low:
|T|: that|had|its|source|away|back|in|the|woods|of|the|old|cuthbert|place|it|was|reputed|to|be|an|intricate|headlong|brook|in|its|earlier|course|through|those|woods|with|dark|secrets|of|pool|and|cascade|but|by|the|time|it|reached|lynde's|hollow|it|was|a|quiet|well|conducted|little|stream
|P|: at|hatied|sore|som|a|wrat|back|in|the|mords|it|the|ol|hat|ritiabline|the|it|was|er|e|feu|d|to|be|ni|ruguitet|o|hedya|welt|in|e|erbly|coursed|hror|therswis|woods|and|with|darke|songeraice|of|coeang|cast|came|thant|by|the|timea|each|linges|hollowow|it|was|acquiet|wilm|n|conducted|lettle|screak
[sample: 1, WER: 75.4717%, LER: 43.8163%, total WER: 75.4717%, total LER: 43.8163%, progress: 100%]

output和groundtruth的差别

50high_output 50low_output 80high_output 80low_output

  • 4. lr=1e4, λ=1e-11, 直接减

loss分为两部分,蓝-λ橙

对m梯度分为两部分,蓝-λ橙。这里分别看梯度的平均值和方差

MSE的梯度 H(m)的梯度


m的变化和最终m,详细

1000次epoch 2000次epoch 3000次epoch 4000次epoch 5000次epoch 6000次epoch 7000次epoch 8000次epoch 9000次epoch 10000次epoch


最后一层的梯度和m分布直方图

最后输出在各个维度的分布:

正向结果:

Remove50high:
|T|: that|had|its|source|away|back|in|the|woods|of|the|old|cuthbert|place|it|was|reputed|to|be|an|intricate|headlong|brook|in|its|earlier|course|through|those|woods|with|dark|secrets|of|pool|and|cascade|but|by|the|time|it|reached|lynde's|hollow|it|was|a|quiet|well|conducted|little|stream
|P|: that|higetd|t|corce|o|aiev|e|ruly|lwlii|wod|s|on|thelng|hof|o|rou|elynvl|ce|wiie|ho|in|was|ar|leaeden|t|tof|ouning|a|ol|you|long|rel|i|wase|nerelyl|orse|for|to|his|loos|foe|o|with|i|torkes|sulincs|hif|cool|and|cares|scalo|twen|le|aned|o|you|n|leas|leates|linitlot|le|was|yeurded|willing|nlockedn|les|foe|you
[sample: 1, WER: 92.4528%, LER: 66.7845%, total WER: 92.4528%, total LER: 66.7845%, progress: 100%]

Remove50low:
|T|: that|had|its|source|away|back|in|the|woods|of|the|old|cuthbert|place|it|was|reputed|to|be|an|intricate|headlong|brook|in|its|earlier|course|through|those|woods|with|dark|secrets|of|pool|and|cascade|but|by|the|time|it|reached|lynde's|hollow|it|was|a|quiet|well|conducted|little|stream
|P|: and|hatetd|sorcse|of|way|back|i|the|woods|if|the|ld|cothertables|it|was|yere|cul|d|to|be|n|enenru|it|had|rlot|bolk|on|his|earehey|acoursed|too|then'e|ouds|with|darks|secretes|of|cluuae|castyd|dade|that|by|the|time|t|reached|lins|toll|w|out|it|was|a|cliet|will|the|ducted|whath|o|streike
[sample: 1, WER: 73.5849%, LER: 37.4558%, total WER: 73.5849%, total LER: 37.4558%, progress: 100%]

Remove80high:
|T|: that|had|its|source|away|back|in|the|woods|of|the|old|cuthbert|place|it|was|reputed|to|be|an|intricate|headlong|brook|in|its|earlier|course|through|those|woods|with|dark|secrets|of|pool|and|cascade|but|by|the|time|it|reached|lynde's|hollow|it|was|a|quiet|well|conducted|little|stream
|P|: ly|hevers|was|wore|leny|lutyal|golrns|an|the|e|who|hat|reurauliney|slooly|in|he|molelnenehns|eultinilnf|a|menaelen|he|lealintye|long|gonl|lora|liete|worse|on|nhuselt|leate|a|reele|o|jeonies|we|ittlr|the|woll|if|e|uet|gave|l|ge|hen|o|neo|that|o|you|wer|le|leases|o|e|the|liwelrye|the|wis|weare|o|wille|hen|o|oliese|freve
[sample: 1, WER: 98.1132%, LER: 81.6254%, total WER: 98.1132%, total LER: 81.6254%, progress: 100%]

Remove80low:
|T|: that|had|its|source|away|back|in|the|woods|of|the|old|cuthbert|place|it|was|reputed|to|be|an|intricate|headlong|brook|in|its|earlier|course|through|those|woods|with|dark|secrets|of|pool|and|cascade|but|by|the|time|it|reached|lynde's|hollow|it|was|a|quiet|well|conducted|little|stream
|P|: and|hactedn|sor|away|eck|the|woods|of|yeld|oer|hethertable|it|was|your|yeul|d|i|to|be|a|nerur|gat|hi|hadno|g|wlooke|of|is|atehe|acorsed|thou|thonse|loods|with|darps|hiigeranse|of|por|in|ciusted|ean|nuat|by|the|mime|i|eached|lengs|helow|it|was|cuie|h|wild|onducting|little|streak
[sample: 1, WER: 75.4717%, LER: 40.9894%, total WER: 75.4717%, total LER: 40.9894%, progress: 100%]

output和groundtruth的差别

50high_output 50low_output 80high_output 80low_output

  • 5. lr=1e4, λ=1e-11, 归一化,求导用全K

loss分为两部分,蓝-λ橙

对m梯度分为两部分,蓝-λ橙。这里分别看梯度的平均值和方差

MSE的梯度 H(m)的梯度


m的变化和最终m,详细

1000次epoch 2000次epoch 3000次epoch 4000次epoch 5000次epoch 6000次epoch 7000次epoch 8000次epoch 9000次epoch 10000次epoch


最后一层的梯度和m分布直方图

最后输出在各个维度的分布:

正向结果:

Remove50high:
|T|: that|had|its|source|away|back|in|the|woods|of|the|old|cuthbert|place|it|was|reputed|to|be|an|intricate|headlong|brook|in|its|earlier|course|through|those|woods|with|dark|secrets|of|pool|and|cascade|but|by|the|time|it|reached|lynde's|hollow|it|was|a|quiet|well|conducted|little|stream
|P|: the|frish|rursotly|hemme|him|o|e|ou|a|cuifet|ot|oforcuteyou|to|be|a|can|tootitit|stoup|yo|o|hom|cince|ce|strte|u|his|tat|hus|sich|to|if|the|tuches|fii|to|he|kilse|chane|sudh|goutdat|enver|toc|pit|sit|o|it|his|ift|i|d|gook|onockting|oisk|exctras
[sample: 1, WER: 92.4528%, LER: 68.9046%, total WER: 92.4528%, total LER: 68.9046%, progress: 100%]

Remove50low:
|T|: that|had|its|source|away|back|in|the|woods|of|the|old|cuthbert|place|it|was|reputed|to|be|an|intricate|headlong|brook|in|its|earlier|course|through|those|woods|with|dark|secrets|of|pool|and|cascade|but|by|the|time|it|reached|lynde's|hollow|it|was|a|quiet|well|conducted|little|stream
|P|: atd|haddet|sor|away|back|in|the|wotds|if|the|ld|o|otheurtkple|it|was|er|vuted|to|be|n|neanruuk|tedt|hedru|breek|in|his|ave|corse|hrou|htheis|gods|with|the|dark|secrets|of|pullle|and|cast|es|cate|that|by|the|time|t|reach|lends|hllow|o|it|was|a|quiret|will|hav|ducted|a|lith|lat|streme
[sample: 1, WER: 66.0377%, LER: 34.2756%, total WER: 66.0377%, total LER: 34.2756%, progress: 100%]

Remove80high:
|T|: that|had|its|source|away|back|in|the|woods|of|the|old|cuthbert|place|it|was|reputed|to|be|an|intricate|headlong|brook|in|its|earlier|course|through|those|woods|with|dark|secrets|of|pool|and|cascade|but|by|the|time|it|reached|lynde's|hollow|it|was|a|quiet|well|conducted|little|stream
|P|: of|h|ntnh|ensonsisteirc|itt|to|pleinice|of|crteit|f|the|c|kut|er|the|o|hif|to|cank|cn|ofer|in|did|i|undeh|t|an|gites|ctinie|the|cin|from|h|f|you|thaovet|with|fookind|com|to|pleaken|ic|thiak|e|e|theunhotl|fiftcl|fuxkens|the|ittillly|o|the|lugicae|frotouted|itt|you|it|bleeklicly|his|eren|lesed
[sample: 1, WER: 90.566%, LER: 78.4452%, total WER: 90.566%, total LER: 78.4452%, progress: 100%]

Remove80low:
|T|: that|had|its|source|away|back|in|the|woods|of|the|old|cuthbert|place|it|was|reputed|to|be|an|intricate|headlong|brook|in|its|earlier|course|through|those|woods|with|dark|secrets|of|pool|and|cascade|but|by|the|time|it|reached|lynde's|hollow|it|was|a|quiet|well|conducted|little|stream
|P|: and|haccets|sorcs|way|back|hnlnt|woods|if|the|ol|of|t|rikae|it|was|r|of|yout|to|be|an|natuuk|tet|hednoa|rork|o|is|anmey|a|course|tror|those|aads|ate|he|with|dark|s|secerance|of|flool|n|cast|esgate|mat|by|the|tine|reache|lings|nhollow|h|it|was|a|uiet|willk|ho|ductoin|lit|lel|streme
[sample: 1, WER: 69.8113%, LER: 40.9894%, total WER: 69.8113%, total LER: 40.9894%, progress: 100%]

output和groundtruth的差别

50high_output 50low_output 80high_output 80low_output

  • 6. lr=1e4, λ=1e-11, 归一化,求导用全1矩阵

loss分为两部分,蓝-λ橙

对m梯度分为两部分,蓝-λ橙。这里分别看梯度的平均值和方差

MSE的梯度 H(m)的梯度


m的变化和最终m,详细

1000次epoch 2000次epoch 3000次epoch 4000次epoch 5000次epoch 6000次epoch 7000次epoch 8000次epoch 9000次epoch 10000次epoch


最后一层的梯度和m分布直方图

最后输出在各个维度的分布:

正向结果:

Remove50high:
|T|: that|had|its|source|away|back|in|the|woods|of|the|old|cuthbert|place|it|was|reputed|to|be|an|intricate|headlong|brook|in|its|earlier|course|through|those|woods|with|dark|secrets|of|pool|and|cascade|but|by|the|time|it|reached|lynde's|hollow|it|was|a|quiet|well|conducted|little|stream
|P|: an|henkted|soon|that|he|achond|it|s|the|helpe|fivvhoengringdies|hen|it|was|regut|into|the|into|geut|yo|adio|hom|n|is|rof|hi|costant|o|t|looese|with|tarnks|see|rpoath|of|cuuro|om|the|caste|de|et|ber|the|ramed|to|weahed|thins|have|of|in|is|lay|it|mombane|doot|strape
[sample: 1, WER: 90.566%, LER: 61.8375%, total WER: 90.566%, total LER: 61.8375%, progress: 100%]

Remove50low:
|T|: that|had|its|source|away|back|in|the|woods|of|the|old|cuthbert|place|it|was|reputed|to|be|an|intricate|headlong|brook|in|its|earlier|course|through|those|woods|with|dark|secrets|of|pool|and|cascade|but|by|the|time|it|reached|lynde's|hollow|it|was|a|quiet|well|conducted|little|stream
|P|: at|hatted|sourc|so|away|back|if|the|bods|in|the|ld|o|cuthortable|it|was|oereculd|to|be|inruuketd|hedeha|hele|his|ertey|coursed|tro|there|his|woos|with|darke|segrence|of|cruul|an|cast|came|that|by|the|timeant|each|lin's|holow|out|t|was|acuiet|will|conducted|at|lichlu|stream
[sample: 1, WER: 71.6981%, LER: 37.1025%, total WER: 71.6981%, total LER: 37.1025%, progress: 100%]

Remove80high:
|T|: that|had|its|source|away|back|in|the|woods|of|the|old|cuthbert|place|it|was|reputed|to|be|an|intricate|headlong|brook|in|its|earlier|course|through|those|woods|with|dark|secrets|of|pool|and|cascade|but|by|the|time|it|reached|lynde's|hollow|it|was|a|quiet|well|conducted|little|stream
|P|: him|muute|somelin|wolt|n|ol|ou|an|you|ouye|l|of|ho|mes|linnyn|hone|hene|ei|int|slls|an|on|er|ineng|lone|liner|fnont|i|hem|er|eovernly|i|on|eglnkan|of|ruiles|soifeher|iuto|ore|of|coln|an|hid|ies|e|likeale|oither|when|link|ig|hen|o|of|heaedes|en|woee|nr|on|teu|le|then|of|nfosingy|at|yor|move|on|you|liat|ynu
[sample: 1, WER: 98.1132%, LER: 80.5654%, total WER: 98.1132%, total LER: 80.5654%, progress: 100%]

Remove80low:
|T|: that|had|its|source|away|back|in|the|woods|of|the|old|cuthbert|place|it|was|reputed|to|be|an|intricate|headlong|brook|in|its|earlier|course|through|those|woods|with|dark|secrets|of|pool|and|cascade|but|by|the|time|it|reached|lynde's|hollow|it|was|a|quiet|well|conducted|little|stream
|P|: at|hattin|sourc|a|way|back|in|the|words|in|the|ole|to|hathortably|it|was|ir|e|fou|d|to|be|n|trooketd|o|hadnoa|wock|an|army|coursed|throug|therewis|woods|with|darpes|singrens|of|groe|and|cast|game|that|by|the|timat|reach|ling's|holl|woug|t|was|acuiret|wilk|conducted|loth|l|srenm
[sample: 1, WER: 75.4717%, LER: 37.4558%, total WER: 75.4717%, total LER: 37.4558%, progress: 100%]

output和groundtruth的差别

50high_output 50low_output 80high_output 80low_output