Get Started
Welcome to the auto-evaluator! This is an app to evaluate the performance of question-answering LLM chains. This demo has pre-loaded two things: (1) a document (the Lex Fridman podcast with Andrej Karpathy) and (2) a "test set" of question-answer pairs for this episode. The aim is to evaluate the performance of various question-answering LLM chain configurations against the test set. You can build any QA chain using the components and score its performance.
Choose the question-answering chain configuration (left) and launch an experiment using the button below. For more detail on each setting, see full the documentation here.