加勒比久久综合,国产精品伦一区二区,66精品视频在线观看,一区二区电影

合肥生活安徽新聞合肥交通合肥房產生活服務合肥教育合肥招聘合肥旅游文化藝術合肥美食合肥地圖合肥社保合肥醫院企業服務合肥法律

ECE 498代寫、代做Python設計編程
ECE 498代寫、代做Python設計編程

時間:2024-11-15  來源:合肥網hfw.cc  作者:hfw.cc 我要糾錯



ECE 498/598 Fall 2024, Homeworks 3 and 4
Remarks:
1. HW3&4: You can reduce the context length to ** if you are having trouble with the
training time.
2. HW3&4: During test evaluation, note that positional encodings for unseen/long
context are not trained. You are supposed to evaluate it as is. It is OK if it doesn’t
work well.
3. HW3&4: Comments are an important component of the HW grade. You are expected
to explain the experimental findings. If you don’t provide technically meaningful
comments, you might receive a lower score even if your code and experiments are
accurate.
4. The deadline for HW3 is November 11th at 11:59 PM, and the deadline for HW4 is
November 18th at 11:59 PM. For each assignment, please submit both your code and a
PDF report that includes your results (figures) for each question. You can generate the
PDF report from a Jupyter Notebook (.ipynb file) by adding comments in markdown
cells.
1
The objective of this assignment is comparing transformer architecture and SSM-type
architectures (specifically Mamba [1]) on the associative recall problem. We provided an
example code recall.ipynb which provides an example implementation using 2 layer
transformer. You will adapt this code to incorporate different positional encodings, use
Mamba layers, or modify dataset generation.
Background: As you recall from the class, associative recall (AR) assesses two abilities
of the model: Ability to locate relevant information and retrieve the context around that
information. AR task can be understood via the following question: Given input prompt
X = [a 1 b 2 c 3 b], we wish the model to locate where the last token b occurs earlier
and output the associated value Y = 2. This is crucial for memory-related tasks or bigram
retrieval (e.g. ‘Baggins’ should follow ‘Bilbo’).
To proceed, let us formally define the associative recall task we will study in the HW.
Definition 1 (Associative Recall Problem) Let Q be the set of target queries with cardinal ity |Q| = k. Consider a discrete input sequence X of the form X = [. . . q v . . . q] where the
query q appears exactly twice in the sequence and the value v follows the first appearance
of q. We say the model f solves AR(k) if f(X) = v for all sequences X with q ∈ Q.
Induction head is a special case of the definition above where the query q is fixed (i.e. Q
is singleton). Induction head is visualized in Figure 1. On the other extreme, we can ask the
model to solve AR for all queries in the vocabulary.
Problem Setting
Vocabulary: Let [K] = {1, . . . , K} be the token vocabulary. Obtain the embedding of
the vocabulary by randomly generating a K × d matrix V with IID N(0, 1) entries, then
normalized its rows to unit length. Here d is the embedding dimension. The embedding of
the i-th token is V[i]. Use numpy.random.seed(0) to ensure reproducibility.
Experimental variables: Finally, for the AR task, Q will simply be the first M elements
of the vocabulary. During experiments, K, d, M are under our control. Besides this we will
also play with two other variables:
• Context length: We will train these models up to context length L. However, we
will evaluate with up to 3L. This is to test the generalization of the model to unseen
lengths.
• Delay: In the basic AR problem, the value v immediately follows q. Instead, we will
introduce a delay variable where v will appear τ tokens after q. τ = 1 is the standard.
Models: The motivation behind this HW is reproducing the results in the Mamba paper.
However, we will also go beyond their evaluations and identify weaknesses of both trans former and Mamba architectures. Specifically, we will consider the following models in our
evaluations:
2
Figure 1: We will work on the associative recall (AR) problem. AR problem requires the
model to retrieve the value associated with all queries whereas the induction head requires
the same for a specific query. Thus, the latter is an easier problem. The figure above is
directly taken from the Mamba paper [1]. The yellow-shaded regions highlight the focus of
this homework.
• Transformer: We will use the transformer architecture with 2 attention layers (no
MLP). We will try the following positional encodings: (i) learned PE (provided code),
(ii) Rotary PE (RoPE), (iii) NoPE (no positional encoding)
• Mamba: We will use the Mamba architecture with 2 layers.
• Hybrid Model: We will use an initial Mamba layer followed by an attention layer.
No positional encoding is used.
Hybrid architectures are inspired by the Mamba paper as well as [2] which observes the
benefit of starting the model with a Mamba layer. You should use public GitHub repos to
find implementations (e.g. RoPE encoding or Mamba layer). As a suggestion, you can use
this GitHub Repo for the Mamba model.
Generating training dataset: During training, you train with minibatch SGD (e.g. with
batch size 64) until satisfactory convergence. You can generate the training sequences for
AR as follows given (K, d, M, L, τ):
1. Training sequence length is equal to L.
2. Sample a query q ∈ Q and a value v ∈ [K] uniformly at random, independently. Recall
that size of Q is |Q| = M.
3. Place q at the end of the sequence and place another q at an index i chosen uniformly
at random from 1 to L − τ.
4. Place value token at the index i + τ.
3
5. Sample other tokens IID from [K]−q i.e. other tokens are drawn uniformly at random
but are not equal to q.
6. Set label token Y = v.
Test evaluation: Test dataset is same as above. However, we will evaluate on all sequence
lengths from τ + 1 to 3L. Note that τ + 2 is the shortest possible sequence.
Empirical Evidence from Mamba Paper: Table 2 of [1] demonstrates that Mamba can do
a good job on the induction head problem i.e. AR with single query. Additionally, Mamba
is the only model that exhibits length generalization, that is, even if you train it pu to context
length L, it can still solve AR for context length beyond L. On the other hand, since Mamba
is inherently a recurrent model, it may not solve the AR problem in its full generality. This
motivates the question: What are the tradeoffs between Mamba and transformer, and can
hybrid models help improve performance over both?
Your assignments are as follows. For each problem, make sure to return the associated
code. These codes can be separate cells (clearly commented) on a single Jupyter/Python file.
Grading structure:
• Problem 1 will count as your HW3 grade. This only involves Induction Head
experiments (i.e. M = 1).
• Problems 2 and 3 will count as your HW4 grade.
• You will make a single submission.
Problem 1 (50=25+15+10pts). Set K = 16, d = 8, L = ** or L = 64.
• Train all models on the induction heads problem (M = 1, τ = 1). After training,
evaluate the test performance and plot the accuracy of all models as a function of
the context length (similar to Table 2 of [1]). In total, you will be plotting 5 curves
(3 Transformers, 1 Mamba, 1 Hybrid). Comment on the findings and compare the
performance of the models including length generalization ability.
• Repeat the experiment above with delay τ = 5. Comment on the impact of delay.
• Which models converge faster during training? Provide a plot of the convergence rate
where the x-axis is the number of iterations and the y-axis is the AR accuracy over a
test batch. Make sure to specify the batch size you are using (ideally use ** or 64).
Problem 2 (30pts). Set K = 16, d = 8, L = ** or L = 64. We will train Mamba, Transformer
with RoPE, and Hybrid. Set τ = 1 (standard AR).
• Train Mamba models for M = 4, 8, 16. Note that M = 16 is the full AR (retrieve any
query). Comment on the results.
• Train Transformer models for M = 4, 8, 16. Comment on the results and compare
them against Mamba’s behavior.
4
• Train the Hybrid model for M = 4, 8, 16. Comment and compare.
Problem 3 (20=15+5pts). Set K = 16, d = 64, L = ** or L = 64. We will only train
Mamba models.
• Set τ = 1 (standard AR). Train Mamba models for M = 4, 8, 16. Compare against the
corresponding results of Problem 2. How does embedding d impact results?
• Train a Mamba model for M = 16 for τ = 10. Comment if any difference.




請加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp






 

掃一掃在手機打開當前頁
  • 上一篇:IEMS5731代做、代寫java設計編程
  • 下一篇:ENGG1110代做、R編程語言代寫
  • 無相關信息
    合肥生活資訊

    合肥圖文信息
    2025年10月份更新拼多多改銷助手小象助手多多出評軟件
    2025年10月份更新拼多多改銷助手小象助手多
    有限元分析 CAE仿真分析服務-企業/產品研發/客戶要求/設計優化
    有限元分析 CAE仿真分析服務-企業/產品研發
    急尋熱仿真分析?代做熱仿真服務+熱設計優化
    急尋熱仿真分析?代做熱仿真服務+熱設計優化
    出評 開團工具
    出評 開團工具
    挖掘機濾芯提升發動機性能
    挖掘機濾芯提升發動機性能
    海信羅馬假日洗衣機亮相AWE  復古美學與現代科技完美結合
    海信羅馬假日洗衣機亮相AWE 復古美學與現代
    合肥機場巴士4號線
    合肥機場巴士4號線
    合肥機場巴士3號線
    合肥機場巴士3號線
  • 短信驗證碼 目錄網 排行網

    關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 hfw.cc Inc. All Rights Reserved. 合肥網 版權所有
    ICP備06013414號-3 公安備 42010502001045

    亚洲国产一区二区三区高清| 欧美国产一区二区三区激情无套| 国产美女视频一区二区 | 亚洲精品**不卡在线播he| 亚洲三级欧美| 欧美特黄一区| 精品一区电影| 国产免费播放一区二区| 91精品美女| 六月丁香综合| 不卡av一区二区| 国产乱论精品| 日韩高清成人在线| 亚洲毛片在线| 色婷婷成人网| 亚洲人成午夜免电影费观看| 尤物精品在线| 999久久久91| 亚洲91网站| 欧洲大片精品免费永久看nba| 国产精品第一| 在线天堂资源| 蜜臀av亚洲一区中文字幕| 1024成人| 高潮久久久久久久久久久久久久| 国产一区二区三区91| 亚洲人人精品| 久久久久黄色| 欧美综合影院| 日韩免费视频| 狼人综合视频| 日产午夜精品一线二线三线| 宅男噜噜噜66一区二区| 自拍亚洲一区| 欧美亚洲精品在线| 91精品啪在线观看国产18| 欧美韩一区二区| www.丝袜精品| 97青娱国产盛宴精品视频| 日韩有码av| 日韩av中文字幕一区二区| 怕怕欧美视频免费大全| 超碰国产精品一区二页| 91麻豆精品| 91精品麻豆| 国产精品1区在线| 你懂的视频一区二区| 中文字幕人成人乱码| 你懂的成人av| 国产不卡一二三区| 亚洲伊人春色| 久久久国产精品入口麻豆| 久久99国产成人小视频| 亚洲人成精品久久久| 亚洲v天堂v手机在线| 最新亚洲精品| 1204国产成人精品视频| 都市激情亚洲欧美| 欧美一级二级三级视频| 99热国内精品| 亚洲精品成人| 老司机午夜精品视频在线观看| 免费看欧美美女黄的网站| 日韩1区2区| 成人在线网站| 日韩精品电影在线| 国产精品一级在线观看| 日韩电影在线观看完整免费观看| 亚洲亚洲一区二区三区| 99热国内精品| 香蕉成人久久| 欧美二三四区| 日本视频免费一区| 久久99视频| 日韩精品一区二区三区免费观看| 欧美精选一区二区三区| 亚洲一区二区免费看| а√天堂8资源中文在线| 日本成人一区二区| 在线精品一区二区| 警花av一区二区三区| 精品亚洲成人| 亚洲一区一卡| 丝袜美腿一区| 亚洲人成免费网站| 伊人精品综合| 欧洲毛片在线视频免费观看| 丝袜亚洲另类欧美| 久久天天久久| 电影一区二区三区久久免费观看| 精品国产三级| 蜜桃视频欧美| 天堂av在线| 麻豆国产精品777777在线| 精品麻豆剧传媒av国产九九九| 久久狠狠婷婷| 蜜桃视频在线观看一区| 久久精品国产99| 亚洲精华一区二区三区| 在线成人动漫av| 久久久男人天堂| 91麻豆精品| 91精品久久久久久久久久不卡| 男人的天堂亚洲| 欧美激情不卡| 激情视频亚洲| 香蕉精品999视频一区二区| 成人交换视频| 九色精品蝌蚪| 国产一区二区高清| 日韩一区二区三区四区五区| 天海翼精品一区二区三区| 欧美 日韩 国产一区二区在线视频 | 136国产福利精品导航网址| 免费成人在线网站| 欧美日韩99| 欧美1区免费| 久久夜夜操妹子| 久久9999免费视频| 欧美专区18| 国自产拍偷拍福利精品免费一| 精品国产乱码| 亚洲精品福利电影| 亚欧洲精品视频在线观看| 一区福利视频| 看片网站欧美日韩| 日韩精品免费一区二区在线观看| 亚洲精品**中文毛片| 国产精品亚洲二区| 国产一区导航| 影音先锋日韩在线| 女生裸体视频一区二区三区| 欧美一区二区三区免费看| 欧美偷窥清纯综合图区| 国产 日韩 欧美一区| 免费观看亚洲天堂| 久久久男人天堂| 久久视频社区| 深夜av在线| 亚洲图色一区二区三区| 中文字幕在线官网| 日韩有码中文字幕在线| 蜜臀久久久久久久| 欧美猛男男男激情videos| 亚洲尤物在线| 亚洲毛片免费看| 蜜臀av性久久久久蜜臀aⅴ| 国内精品久久久久久久影视简单 | 青青一区二区三区| 日韩欧美看国产| 国产精品极品国产中出| 亚洲美女炮图| 久久久噜噜噜| 日本在线播放一区二区三区| 久久精品成人| 日韩精品色哟哟| 亚洲激情精品| 欧美欧美黄在线二区| 蜜桃精品在线观看| 日韩av中文字幕一区二区三区| 国产夫妻在线| 精品三级在线观看视频| 日韩国产欧美在线观看| 欧洲乱码伦视频免费| 中文字幕av亚洲精品一部二部| 亚洲一区激情| 久久综合偷偷噜噜噜色| 日韩国产欧美| 国产一区二区三区四区三区四| 久久综合另类图片小说| 视频一区欧美精品| 欧美三区美女| 日韩综合一区二区| 乱人伦精品视频在线观看| 18国产精品| 国产精品主播| 欧美综合国产| 麻豆精品av| 成人黄色91| 日韩高清成人| 尤物在线精品| 99久热这里只有精品视频免费观看| 日本欧美一区| 噜噜噜在线观看免费视频日韩| 日韩成人在线看| 日本午夜一区二区| 色777狠狠狠综合伊人| 成人久久电影| 国产精品一区二区三区四区在线观看| 手机av在线| 午夜欧美视频| 精品国产乱码久久久久久果冻传媒| 麻豆成人久久精品二区三区红| av最新在线| 在线精品视频在线观看高清| 综合激情五月婷婷| 亚洲国产精品第一区二区| 国产777精品精品热热热一区二区| 亚洲午夜av| 在线精品国产亚洲|