Infinite problems, infinite apps.

Shovon Hasan, Renaise Kim

Mythra (mith-ruh), (noun), is the concept of a latent space of available internet resources that LLMs can request by turning natural language queries into HTTP request strings.

Abstract

This page proposes a new capability for large language models (LLMs) to embed live websites into chats by instructing the model to generate HTTP requests as text. We demonstrate that ChatGPT can, without pretraining, be instructed to output HTTP requests to websites relevant to a user's query and respond normally otherwise. We believe this generalizes to other language models.

The client browser will use the output HTTP request text to perform a real HTTP request and render the returned HTML, CSS, and JavaScript as a functional website seamlessly within the chat window. We call this process "summoning".

We also propose a protocol for website authors to describe their API in a way that facilitates language models learning how to use the website via HTTP. To allow language models to learn about a website, we propose a convention where website authors host a /mythra.txt file at the website root which serves a text file adhering to the protocol.

We predict that language model developers will curate a selection of high quality website API descriptions for trusted websites and that in the future most interactions humans have with the internet will be mediated through chats with language models. The language model interface will become both the browser and a search engine through the latent space of available HTTP API definitions that the language model was trained on. We call this latent space Mythra.

Problem

Websites are great specialized tools for particular problems and LLMs are great general tools for natural language conversations. But websites require learning a specialized interface for every new website you visit and every new problem you have, and language models only return text, which is very limiting as a problem solving tool.

A few attempts have been made to bridge this gap but they've been lackluster. ChatGPT Plugins, such as for Wolfram Aalpha, requires developers to write a new application designed specifically for OpenAI. The application then has to be approved by OpenAI,and the user has to then install the plugin and then enable it for their chat session. The number of barriers to entry are so high that plugins are used by less than 1% of ChatGPT users. Even with plugins installed the best ChatGPT can do is generate an image or text as the output of the plugin, which doesn't create desired side effects like opening a bank acount, booking a flight, or filing taxes.

A second iteration was Browsing for ChatGPT, but it has the same problem of only returning a static image or text and is not an interface you can continue to interact with using natural language in addition to other problems like not storing preferences or sessions the way websites would do with cookies.

Background

Almost all modern internet communication is done through HTTP requests. HTTP requests are simply strings of text (really 1s and 0s) that are sent over the wire and propagate through a global network of interconnected computers until they arrive at the computer meant to handle that particular HTTP request. The computer at the other end, upon receiving the 1s and 0s, decodes the binary as a string and parses the string into its constituent parts (the HTTP verb, the requested path, the headers, the form body, etc.), which it knows how to do because the string of text is formatted in accordance with the HTTP protocol, and executes some code to send back to the sender of the request some bit of data: HTML text, JSON text, a JPG image, etc. It may also choose to initiate some other side effects as a result of receiving this request, such as update a database.

HTTP is therefore a way of using text to command computers across the world.

HTTP is just text.

LLMs generate text.

What if LLMs could command computers across the world by generating HTTP text?

Proposal

We propose a solution that solves the aforementioned problems and is a new way to leverage the internet: we call it "Summoning" a website into the user's chat. In this proposal we show that a website can be modified to incrementally adopt a simple protocol with minutes of work which will allow ChatGPT to understand how to use the website by translating a user's natural language into HTTP requests sent to the website, and present the website to the user in its entirety.

This example application uses a non-finetuned GPT-4 via the OpenAI API to demonstrate that LLMs are good candidates to turn natural language requests into HTTP requests to fetch websites and display the website contents in the user's browser in-line with their conversation with the AI assistant.

A sample weather application describes its endpoints and supported HTTP methods and query params in the following format:

protocol.d.ts
1interface Protocol {
2 description: string,
3 endpoints: {
4 [key: string]: {
5 description: string,
6 action: 'GET' | 'POST' | 'PUT' | 'DELETE',
7 params: {
8 [key: string]: {
9 description: string,
10 type: 'string' | 'number' | 'boolean'
11 }
12 },
13 request: {
14 headers: string[],
15 description: string
16 body: {
17 [key: string]: string
18 }
19 },
20 response: {
21 headers: {
22 [key: string]: string
23 },
24 description: string;
25 body: {
26 [key: string]: string
27 }
28 }
29 }
30 },
31 baseUrl: string,
32}

The provided weather website describes its endpoints as follow:

weather.ts
1const description = {
2 description: 'A weather website',
3 endpoints: {
4 '/weather': {
5 description: 'Get the weather for a given city',
6 action: 'GET',
7 params: {
8 city: {
9 description: '',
10 type: 'string'
11 },
12 },
13 request: {
14 headers: ['Token', "unit"],
15 description: 'An authentication token to attach to the request',
16 body: {}
17 },
18 response: {
19 headers: {},
20 description: 'The weather for the given location',
21 body: {
22 temperature: 'number',
23 unit: "F or C",
24 rainPercentage: 'number',
25 windSpeed: 'number',
26 location: 'string'
27 }
28 }
29 },
30 '/forecast': {
31 description: 'Get the 5-day forecast for a given location',
32 action: 'GET',
33 params: {
34 location: {
35 description: 'The location to get the 5-day forecast for',
36 type: 'string'
37 },
38 startDate: {
39 description: 'The start date of the forecast',
40 type: 'string'
41 }
42 },
43 request: {
44 headers: ['Token'],
45 description: 'An authentication token to attach to the request',
46 body: {}
47 },
48 response: {
49 headers: {},
50 description: 'The forecast for the given location',
51 body: {
52 temperature: 'number',
53 weather: 'string',
54 location: 'string'
55 }
56 }
57 }
58 },
59 baseUrl: 'https://www.example-weather-app.com'
60}

We believe the key to low cross-entropy loss results is for developers to provide as thorough and accurate a description of each endpoint and its parameters as possible. By ingesting these descriptions along with the HTTP interface, an LLM can be made to learn how to communicate with it purely through HTTP.