๐ a bit opaque if you're not super familiar with the space i suppose
โ
ELI5: Theres a lot of work being done with LLMs to take actions on websites. This open source repo provides static versions of these websites along with some evaluation criteria to measure the performance of your LLM "agents". Its quite a pain to reliably test these agents otherwise. (An agent being some system of code that will take a goal like "travel to xyz on this page" and use an llm to translate that into actual actions)
๐ a bit opaque if you're not super familiar with the space i suppose
โ
ELI5: Theres a lot of work being done with LLMs to take actions on websites. This open source repo provides static versions of these websites along with some evaluation criteria to measure the performance of your LLM "agents". Its quite a pain to reliably test these agents otherwise. (An agent being some system of code that will take a goal like "travel to xyz on this page" and use an llm to translate that into actual actions)