加勒比久久综合,国产精品伦一区二区,66精品视频在线观看,一区二区电影

合肥生活安徽新聞合肥交通合肥房產生活服務合肥教育合肥招聘合肥旅游文化藝術合肥美食合肥地圖合肥社保合肥醫院企業服務合肥法律

代做CS 7642 Reinforcement Learning and Decision

時間:2024-04-21  來源:合肥網hfw.cc  作者:hfw.cc 我要糾錯



   CS 7642: Reinforcement Learning and Decision Making Project #3 Overcooked
 1 Problem 1.1 Description
For the final project of this course, you have to bring together everything you have learned thus far and solve the multi-agent Overcooked environment (modeled after the popular video game). In this environment, you have control over 2 chefs in a restaurant kitchen who have to collaborate to cook onion soups. To cook a soup, the agents need to put 3 onions into a cooking pot, initiate cooking, wait for the soup to cook, put the soup into a dish, and serve the dish at a serving area. This project serves as a capstone to the course and as such we expect much of the project to be open-ended and self-directed. Your primary goal is to maximize the number of soups delivered within an episode on a variety of layouts ranging from fairly easy to extremely difficult. In your quest to solve these layouts you may discover auxiliary goals or metrics that are worth analyzing.
Our expectation is that you have learned what is significant to include in this type of report from the previous projects and the material we have covered so far. It is thus up to you to define:
• The direction of your project including which aspect(s) you aim to focus upon.
• How you specify and measure such aspects.
• How to train your agents.
• How to structure your report and what graphs to include (in addition to the mandatory graphs discussed later).
Your focus should be on demonstrating your understanding of the algorithm(s)/solution(s), clarifying the ratio- nale behind your experiments, and analyzing their results. Your main goal is to develop an algorithm to solve the environment but you can also use everything else studied in the course such as reward and policy shaping. The environment provides a reward shaping data structure that you are free to use. You may also design your own reward shaping in place of, or in addition to, this default setup. However, all algorithms and solutions used to solve the environment should be your own. We encourage you to start off this project with your Project 2 solution and see how far that model takes you. This will provide context for why multi-agent methods may be necessary for this environment. It will also help to ease your transition into this environment by utilizing an algorithm you’ve already gotten to work.
Figure 1: Visualization of the Overcooked environment. Carroll et al. 2019
1.2 Environment and Task
In this project, you will be training a team of 2 agents to cook onion soups in a kitchen. The objective is always to deliver as many soups as possible within a 400-timestep episode. Each soup takes 20 timesteps to cook and
 1

– Overcooked 2
delivering a soup successfully yields a +20 reward. Cooking a soup with less than 3 onions, dropping a soup on the ground, or serving the soup on the counter (instead of the designated serving area) yields no reward but hinders progress as agents lose precious time (and starve customers). Episodes are truncated to a 400 step horizon with no termination conditions. You are not permitted to increase the 400 step horizon. You are provided with 5 layouts of varying difficulty - [cramped room, asymmetric advantages, coordination ring, forced coordination, counter circuit 0 1order] as shown in Figure 2 1. Your task is to achieve a mean soup delivery count of ≥ 7 per episode across all layouts using a single approach. This means a single algorithm and a single reward-shaping function (if you utilize reward shaping). This also means a single set of hyperparameters, The idea is to build an agent that can solve any layout that is thrown at it, and not just these 5. Having a constant set of parameters also makes reproducibility much easier (something we gained an appreciation for in Project 1). Note that some layouts can be solved by a single agent algorithm and don’t require any collaboration. Other layouts benefit significantly from collaboration and some may only be solvable via collaboration. This means that a successful approach to solving all 5 layouts will likely require an explicit multi-agent approach. We also expect you to develop your approaches and analyze the results by explicitly looking at multi-agent metrics (see Section 1.8).
Figure 2: The 5 layouts you are tasked to solve. From left to right they are named [cramped room, asymmetric advantages, coordination ring, forced coordination, counter circuit 0 1order]. Car- roll et al. 2019
1.3 State Space
This is a fully-observable MDP and both agents have access to the full observation. Therefore, the state and observation spaces are equivalent. By default, the observations are provided as a 96-element vector, customized for each agent. The encoding for player i ∈ {0, 1} contains a player-centric featurized view for the ith player, and is as follows:
[player i features, other player features, player i dist to other player, player i position]
The first component, player i features has length 46 and is detailed below. Note that if you add all the feature lengths in the specification below, you will get 36 instead of the expected 46. This is because the five features related to the pot (having a combined length of 10) occur twice, once for each pot, and are concatenated together. Note also that none of our layouts contain tomatoes, so the features corresponding to tomatoes will always be 0. Finally, layouts containing only one cooking pot will have the second pot’s features zeroed out as well.
• p i orientation: one-hot-encoding of direction currently facing (length 4)
• p i obj: one-hot-encoding of object currently being held ([onion, soup, dish, tomato]) (all 0s if no object
held) (length 4)
• p i closest onion|tomato|dish|soup: (dx, dy) where dx = x dist to item, dy = y dist to item. (0, 0) if item is currently held (length 8)
• p i closest soup n onions|tomatoes: int value for number of this ingredient in closest soup (length 2)
• p i closest serving area|empty counter: (dx, dy) where dx = x dist to item, dy = y dist to item. (length
4)
1The overcooked environment has dozens of layouts but for this project we will only be focusing on these 5.
  
– Overcooked 3
• p i closest pot j exists: {0, 1} depending on whether jth closest pot is found. If 0, then all other pot features are 0. Note: can be 0 even if there are more than j pots on layout, if the pot is not reachable by player i (length 1)
• p i closest pot j is empty|is full|is cooking|is ready: {0, 1} depending on boolean value for jth closest pot (length 4)
• p i closest pot j num onions|num tomatoes: int value for number of this ingredient in jth closest pot (length 2)
• p i closest pot j cook time: int value for seconds remaining on soup. 0 if no soup is cooking (length 1)
• p i closest pot j: (dx, dy) to jth closest pot from player i location (length 2)
• p i wall j: {0, 1} boolean value of whether player i has a wall immediately in direction j (length 4)
The remaining components of the observation vector are as follows:
other player features (length 46): ordered concatenation of player j features for j ̸= i player i dist to other player (length 2): [player j.pos - player i.pos for j ̸= i]
player i position (length 2)
1.4 Action Space
The action space is discrete with six possible actions: up, down, left, right, stay, and ”interact,” which is a contextual action determined by the tile the player is facing (e.g. placing an onion when facing a counter). Each layout has one or more onion dispensers and dish dispensers, which provide an unlimited supply of onions and dishes respectively.
1.5 Installation Notes
The environment is officially supported on Python 3.7 and is installed via pip install overcooked-ai. We recommend you run in Anaconda. We require the use of PyTorch if using deep learning methods. You absolutely do not need a GPU to solve any of the layouts in less than 10 hours (in fact, GPUs typically slow RL algorithms down). To help you with getting started, we are providing you with a Jupyter notebook. You may create a copy of this notebook in order to run the starting code. This notebook demonstrates installing, building, interacting with, and visualizing the environment. You are not required to use this notebook in your project, but we encourage you to use it as a companion to this document to better understand the environment.
1.6 IMPORTANT: Reward Shaping Addendum
If you plan on using reward shaping, take a look at how the default shaped rewards are swapped by the agent index in the provided notebook. Upon episode reset, agents are assigned randomly to one of the 2 starting positions. This assignment is only reflected in the official observation that is returned to you by the environment’s step method. Any state variable you obtain from the Overcooked environment that is not in this observation variable (including anything in the info dictionary or the base environment) needs to be similarly swapped. Failure to do this means you will be assigning credit to the wrong agent roughly half the time, crippling your algorithm.
For more details on installation and operation, refer to the GitHub repository - https://github.com/ HumanCompatibleAI/overcooked_ai
1.7 Strategy Recommendations
You are free to pursue any multi-agent RL strategies in your soup-cooking quest. For instance, you may pursue a novel reward-shaping technique, however, make sure that the method chosen is relevant to multi-agent RL problems. We strongly recommend that you (1) start with your Project 2 solution adapted to this problem and (2) start with the cramped room and asymmetric advantages layouts. Below are further examples of strategies worth pursuing:

– Overcooked 4
• using reward shaping techniques for improving multi-agent considerations such as collaboration and credit
assignment;
• asynchronous methods Mnih et al. 2016;
• centralizing training and decentralizing execution (Lowe et al. 2017; J. N. Foerster et al. 2017);
• value factorisation Rashid, Samvelyan, De Witt, et al. 2020;
• employing curriculum learning (some single-agent ideas in this dissertation may be interesting and easy to extend to the multi-agent case e.g., Narvekar 2017).
• adding communication protocols (J. Foerster et al. 2016);
• improving multi-agent credit assignment (J. N. Foerster et al. 2017; Zhou et al. 2020);
• improving multi-agent exploration (Iqbal and Sha 2019; Wang et al. 2019)
• finding better inductive biases (i.e., choosing the function space for policy/value function approximation) to handle the exponential complexity of multi-agent learning, e.g., graph neural networks (Battaglia et al. 2018; Naderializadeh et al. 2020).
1.8 Procedure
This problem is more sophisticated than anything you have seen so far in this course. Make sure you reserve enough time to consider what an appropriate approach might involve and, of course, enough time to build and train it.
• Clearly define the direction of your project and which aspect(s) you aim to improve upon over your Project 2 baseline, assuming that that baseline was unable to solve all of the layouts. For example, do you want to improve collaboration among your agents?
– This includes why you think your algorithm/procedure will accomplish this and whether or not your results demonstrate success.
• Implement a solution that produces such improvements.
– Use any algorithms/strategy as inspiration for your solution.
– The focus of this project is to try new algorithms/solutions, rather than to simply im- prove hyper-parameters of the algorithms already implemented. Further, avoid search- ing for random seeds that happen to work the best as this is inconsequential analysis. Remember that the algorithm/reward-shaping/hyperparameters must be fixed across all 5 layouts.
– Justify the choice of that solution and explain why you expect it to produce these improvements.
– Even if your solution does not solve all of the layouts, you still have the ability to write
a solid paper.
– Upload/maintain your code in your private repo at https://github.gatech.edu/gt-omscs-rldm.
• Describe your experiments and create graphs that demonstrate the success/failure of your solution.
– You must provide one graph demonstrating the number of soups made across all five layouts during training. You can combine all five layouts’ plots onto one graph if you wish. Displaying a simple moving average for each layout’s training run is suggested to help with clarity.
– You must provide one graph demonstrating performance of your trained agent on each layout over at least 100 consecutive episodes. Again, you can combine all five layouts’ plots into one graph. If all five of these graphs are flat lines (a possible consequence of using a deterministic algorithm on a deterministic environment), then a bar graph is ok.
– Additionally, you must provide at least two graphs using metrics you decided on that are significant for your hypothesis/goal.
– Analyze your results and explain the reasons for the success/failure of your solution.

– Overcooked 5
– Since graphs are largely decided by you, they should have clear axis, labels, and captions. You will
lose points for graphs that do not have any description or label of the information being displayed.
– Example metrics you might consider are number of dish pickups, dropped dishes, incorrect deliveries, or picked up onions. These example metrics and more are built-in to the environment and are accessible via the info variable at the end of an episode. In your report you should clearly motivate why you are interested in a particular metric. See the provided notebook.
• We’ve created a private Georgia Tech GitHub repository for your code. Push your code to the personal repository found here: https://github.gatech.edu/gt-omscs-rldm.
• The quality of the code is not graded. You do not have to spend countless hours adding comments, etc. However, the TAs will examine code during grading.
• Make sure to include a README.md file for your repository that we can use to run your code.
– Include thorough and detailed instructions on how to run your source code in the README.md.
– If you work in a notebook, like Jupyter, include an export of your code in a .py file along with your notebook.
– The README.md file should be placed in the project 3 folder in your repository.
• You will be penalized by 25 points if you:
– Do not have any code or do not submit your full code to the GitHub repository; or – Do not include the git hash for your last commit in your paper.
• Write a paper describing your agents and the experiments you ran.
– Include the hash for your last commit to the GitHub repository in the header on the first page of
your paper.
– Make sure your graphs are legible and you cite sources properly. While it is not required, we recommend you use a conference paper format. For example: https://www.ieee.org/conferences/ publishing/templates.html.
– 5 pages maximum—really, you will lose points for longer papers.
– Explain your algorithm(s).
– Explain your training implementation and experiments.
– An ablation study would be a interesting way to find out the different components of the algorithm that contribute to your metric. (See J. N. Foerster et al. 2017.)
– Graphs highlighting your implementations successes and/or failures.
– Explanation of algorithms used: what worked best? what didn’t work? what could have worked
better?
– Justify your choices.
∗ Unlike Project 1, there are multiple ways of solving this problem and you have a lot of discretion over the general approach you take as well as experimental design decisions. Explain to the reader why, from amongst the multiple alternatives, you chose the ones you did.
∗ Your focus should be on justifying the algorithm/techniques you implemented.
– Explanation of pitfalls and problems you encountered.
– What would you try if you had more time?
– Save this paper in PDF format.
– Submit to Canvas!
1.9 Resources
1.9.1 Lectures
• Lesson 11A: Game Theory
• Lesson 11B: Game Theory Reloaded
• Lesson 11C: Game Theory Revolutions

– Overcooked 6
1.9.2 Readings
• J. N. Foerster et al. 2017
• Lowe et al. 2017
• Rashid, Samvelyan, Witt, et al. 2018
1.9.3 Talks
• Factored Value Functions for Cooperative Multi-Agent Reinforcement Learning • Counterfactual Multi-Agent Policy Gradients
• Learning to Communicate with Deep Multi-Agent Reinforcement Learning
• Automatic Curricula in Deep Multi-Agent Reinforcement Learning
1.10 Submission Details
The due date is indicated on the Canvas page for this assignment. Make sure you have set your timezone in Canvas to ensure the deadline is accurate.
Due Date: Indicated as “Due” on Canvas
Late Due Date [20 point penalty per day]: Indicated as “Until” on Canvas
The submission consists of:
• Your written report in PDF format (Make sure to include the git hash of your last commit.) • Your source code
To complete the assignment, submit your written report to Project 3 under your Assignments on Canvas (https://gatech.instructure.com) and submit your source code to your personal reposi- tory on Georgia Tech’s private GitHub
You may submit the assignment as many times as you wish up to the due date, but, we will only consider your last submission for grading purposes. Late submissions will receive a cumulative 20 point penalty per day. That is, any projects submitted after midnight AOE on the due date will receive a 20 point penalty. Any projects submitted after midnight AOE the following day will receive another 20 point penalty (a 40 point penalty in total) and so on. No project will receive a score less than a zero no matter what the penalty. Any projects more than 4 days late and any missing submissions will receive a 0.
Please be aware, if Canvas marks your assignment as late, you will be penalized. This means one second late is treated the same as three hours late, and will receive the same penalty as described in the breakdown above. Additionally, if you resubmit your project and your last submission is late, you will incur the penalty corresponding to the time of your last submission. Submit early and often.
Finally, if you have received an exception from the Dean of Students for a personal or medical emergency we will consider accepting your project up to 7 days after the initial due date with no penalty. Students requiring more time should consider taking an incomplete for this semester as we will not be able to grade their project.
1.11 Grading and Regrading
When your assignments, projects, and exams are graded, you will receive feedback explaining your successes and errors in some level of detail. This feedback is for your benefit, both on this assignment and for future assignments. It is considered a part of your learning goals to internalize this feedback. This is one of many learning goals for this course, such as: understanding game theory, random variables, and noise.
If you are convinced that your grade is in error in light of the feedback, you may request a regrade within a week of the grade and feedback being returned to you. A regrade request is only valid if it includes an explanation of where the grader made an error. Create a private Ed Discussion post titled “[Request] Regrade Project 3”. In the Details add sufficient explanation as to why you think the grader made a mistake. Be concrete and specific. We will not consider requests that do not follow these directions.

– Overcooked 7 1.12 Words of Encouragement
We understand this is a daunting project with many possible design directions to consider. As Graduate Students in Computer Science, projects that allow you to challenge and expand your skills in a practical and low-stakes manner are crucial. These projects are ideal for testing the knowledge you have garnered throughout the course and applying yourself to a difficult problem commonly faced when applying reinforcement learning in industry. After completing the course, a project like this can be valuable to highlight during interviews, to demonstrated your newfound knowledge to current employers, or to add a (new) section on your resume. Historically, many students have reported back the positive interactions encountered when discussing their projects, sometimes leading to job offers or promotions. However, please remember not to publicly post your report or code. The project is a good talking point and you would be within the bounds of the GT Honor Code if you were to share it privately with a potential employer (if you so desire), however making any part of this project publicly available would be a violation of the GT Honor Code.
We encourage you to start early and dive head-first into the project to try as many options as possible. We strongly believe the more successes and failures you experience, the greater your growth and learning will be.
The teaching staff is dedicated to helping as much as possible. We are excited to see how you will approach the problem and have many resources available to help. Over the next several Office Hours, we will be discussing various approaches in detail, as well as dive deeper into approaches on Ed Discussions. We are here to help you and want to see you succeed! With all that said:
Good luck and happy coding!

請加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp

掃一掃在手機打開當前頁
  • 上一篇:代寫EMATM0050 DSMP MSc in Data Science
  • 下一篇:COMP284 代做、Java 語言編程代寫
  • 無相關信息
    合肥生活資訊

    合肥圖文信息
    2025年10月份更新拼多多改銷助手小象助手多多出評軟件
    2025年10月份更新拼多多改銷助手小象助手多
    有限元分析 CAE仿真分析服務-企業/產品研發/客戶要求/設計優化
    有限元分析 CAE仿真分析服務-企業/產品研發
    急尋熱仿真分析?代做熱仿真服務+熱設計優化
    急尋熱仿真分析?代做熱仿真服務+熱設計優化
    出評 開團工具
    出評 開團工具
    挖掘機濾芯提升發動機性能
    挖掘機濾芯提升發動機性能
    海信羅馬假日洗衣機亮相AWE  復古美學與現代科技完美結合
    海信羅馬假日洗衣機亮相AWE 復古美學與現代
    合肥機場巴士4號線
    合肥機場巴士4號線
    合肥機場巴士3號線
    合肥機場巴士3號線
  • 短信驗證碼 目錄網 排行網

    關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 hfw.cc Inc. All Rights Reserved. 合肥網 版權所有
    ICP備06013414號-3 公安備 42010502001045

    欧美私人啪啪vps| 三级在线观看视频| 西野翔中文久久精品字幕| 黄色在线网站噜噜噜| 你懂的在线观看一区二区| 综合视频在线| 日韩av首页| 葵司免费一区二区三区四区五区| 北条麻妃一区二区三区在线| 在线看片日韩| 精品日本视频| 色呦哟—国产精品| 黄色亚洲在线| 天堂资源在线亚洲| 国产suv精品一区| 少妇精品久久久一区二区三区 | 91精品一区二区三区综合在线爱 | 综合国产视频| 麻豆国产欧美一区二区三区| 日韩在线观看| h片在线观看视频免费| 在线亚洲一区| 羞羞色午夜精品一区二区三区| 九九99久久精品在免费线bt| 白嫩亚洲一区二区三区| 久久精品国产精品亚洲红杏 | 影音先锋在线一区| 免费一级欧美在线观看视频| 综合久久2023| 国产高潮在线| 狼人综合视频| zzzwww在线看片免费| 免费成人av在线播放| 乱码第一页成人| 亚洲在线一区| 久久成人亚洲| 久久一本综合频道| 在线视频精品| 久久不射2019中文字幕| 亚洲免费中文| 亚洲综合二区| 蜜桃久久久久久久| 免费不卡在线观看| 日韩不卡一区| 另类专区亚洲| 日本精品不卡| 国产成人久久精品麻豆二区| 性欧美超级视频| 成人国产精品| 亚欧成人精品| 国内自拍一区| 国产影视精品一区二区三区| 国产探花一区二区| 日韩有吗在线观看| 给我免费播放日韩视频| 精品精品国产三级a∨在线| 麻豆视频一区| 免费观看久久av| 国产视频亚洲| 色999日韩| 日韩av免费| 久久一区国产| 精品一区二区三区在线观看视频| 欧美日韩午夜电影网| 日韩精品1区2区3区| 欧美色综合网| 亚洲国产影院| 天堂av在线一区| 日韩欧美网址| 麻豆精品精品国产自在97香蕉| 亚洲色图网站| 日韩三级av高清片| 私拍精品福利视频在线一区| 免费观看不卡av| 欧美bbbbb| 神马久久资源| 综合久久精品| 国产suv精品一区二区四区视频| 性欧美xxxx免费岛国不卡电影| 国产亚洲欧洲| 日韩极品一区| 日本免费新一区视频| 久久久久毛片免费观看| 国产99精品一区| 人人狠狠综合久久亚洲| 天天综合网站| 欧美久久亚洲| 久久久蜜桃一区二区人| 美女视频一区免费观看| 国产精品久久亚洲不卡| 成人国产精品久久| 久久久成人网| 成人精品视频| 国产一区二区三区的电影 | 精品少妇av| 亚洲一区成人| 国产福利亚洲| 日本在线视频一区二区三区| 欧美不卡视频| av免费在线一区| 国产免费久久| 欧美69视频| 日韩精选视频| 日韩中出av| 亚洲免费观看| 日本久久一区| 中文字幕一区日韩精品| 亚洲在线观看| 久久精品三级| 精品亚洲免a| 91视频久久| 国产精品一站二站| 亚洲午夜一区| 色综合天天色| 美女精品久久| 蜜臀精品久久久久久蜜臀| 久久精品女人| 精品国产网站 | 黄色在线免费观看网站| 综合激情婷婷| 欧美阿v一级看视频| 久久不卡日韩美女| 成人综合专区| 日本三级一区| 日本在线视频一区二区三区| 亚洲中午字幕| 97久久中文字幕| 欧美福利视频| 一区二区三区福利| 久久久亚洲人| 四虎国产精品免费久久| 国产精品网在线观看| 欧美国产一级| 婷婷视频一区二区三区| 蜜桃av一区二区| 偷拍亚洲精品| 欧美h版在线| 精品视频在线观看免费观看 | 岛国精品在线| 精品人人人人| 成人综合网站| 久久成人综合| 久久精品人人做人人爽电影蜜月| 亚洲va在线| 欧美在线黄色| 婷婷综合网站| 亚洲最大黄网| 日韩在线卡一卡二| 日韩av不卡在线观看| 国产精品久久久久无码av| 亚洲区小说区图片区qvod按摩| 蜜臀av性久久久久蜜臀aⅴ| 日韩二区三区在线观看| 高清不卡亚洲| 久久激情婷婷| 亚洲国产三级| 久久亚洲一区| 91欧美日韩在线| 久久三级毛片| 99香蕉国产精品偷在线观看 | 丝袜美腿成人在线| 日韩一区二区三区色| 国产一区二区三区朝在线观看| 久久精选视频| 伊人久久精品| 国产免费拔擦拔擦8x在线播放| 黑人久久a级毛片免费观看| 黄页免费欧美| 中文欧美日韩| 国产色99精品9i| 美腿丝袜亚洲一区| 在线亚洲精品| 国产精品三p一区二区| 六月丁香婷婷色狠狠久久| 亚洲美女一区| 一区二区三区自拍视频| 欧美在线高清| 日韩精品一二三| 成人羞羞视频播放网站| 久久久久97| 成人日韩精品| 一本色道久久综合一区| avtt综合网| 国内精品亚洲| 日韩伦理福利| 国产午夜精品一区二区三区欧美 | www.久久爱.com| 天堂av在线| av成人毛片| 成人综合专区| 国产不卡一二三区| 国产激情欧美| 高潮在线视频| 亚洲精品国产偷自在线观看| 日韩中文字幕在线一区| 亚洲美女91| 日本一区二区三区视频在线| 国产亚洲精品久久久久婷婷瑜伽| 成人精品动漫一区二区三区| 久久悠悠精品综合网|