Over the last couple of projects, whether it be AI heavy projects or engineering projects, LLMs have been creeping into our workflow. Specifically we have been experimenting with LLMs in generating tests, namely tests for fastapi applications.
Most of our python backends rely in one way or another on FastAPI. Tests are a crucial part of every app. They ensure correctness, prevent regressions and enable refactoring. Especially in larger sized projects tests ensure smooth collaborations between developers. But writing tests is really something a client is rarely excited about, hard to visualize and mostly falls short in a development cycle.
LLMs have shown remarkable capabilities in generating and understanding code. So we figured we could use LLMs to sketch up tests that we could then use in our applications. This project aims in facilitating this process and here we highlight how the fastapi_llm_test_generator works.
This package is supposed to be installed inside your projects virtual env.
pip install git+https://github.com/bakkenbaeck/fastapi_llm_test_generator
Although you could run it as a package it is best used from the command line.
python -m fastapi_llm_test_generator generate <source_app_directory>
As a starting point, the program will scan its source directory to locate a FastAPI application. Once identified, it will extract all defined FastAPI endpoints. For each endpoint, the program will traverse its call tree, analyzing the sequence of function calls and dependencies invoked during the request handling process. This traversal enables deeper inspection and understanding of the application’s internal logic. It then collects all functions calls of that endpoint, excluding: buildins, site-packages and only collects user generated code. Additionally, the program gathers all Pydantic models used within the application. In a last step it searches for any text that resembles SQL and collects used tables. These in turn are then used to query table definitions. Since LLMs rely heavily on context, we compile all relevant information into a single prompt: the specific endpoint, its associated function calls, the Pydantic models, and the database definitions.
In addition the prompt can take additional information like mocks, fixtures or other textual information. Since this can be a lot of information to pass into the command line, the app also reads in a config file with this information provided.
{
"db_plugin": "psycopg2",
"db_url": "postgresql://user:password@localhost:5432/database",
"api_key": "some-key",
"additional_prompt_pre": null,
"additional_prompt_info": "Assume fixture 'database' using postgres and asyncpg (using $ syntax) for that. \n Use \"client\": httpx.AsyncClient \n* @pytest.mark.asyncio",
"mock_prompt": null,
"fixtures_prompt": "```python\n@pytest.fixture\nasync def login(database, app, client):\n res = await client.post(\n \"/api/v2/login\", data=dict(username=\"admin\", password=\"1234\", client_id=\"123\", grant_type=\"password\")\n )\n assert res.status_code == 200\n\n name, val = res.headers[\"Set-Cookie\"].split(\"=\", maxsplit=1)\n client.cookies[name] = val\n```",
"additional_prompt_after": null,
"prompt_type": "pytest"
}
In this case the app uses psycopg2
to actually query the database db_url
for table definitions. The api_key
is some LLM api key. additional_prompt_pre
is some text prepended to the actual prompt. additional_prompt_info
comes after the prompts introduction. Here we tell the LLM to use postgres, asyncpg and httpx as a client. mock_prompt
describes possible mocks. Then fixtures_prompt
describes a login fixture needed for each test. additional_prompt_after
can be anything added at the end of prompt. prompt_type
describes that our current prompt is tuned towards pytest. In the future we would like also offer other test frameworks.
python -m fastapi_llm_test_generator generate . anthropic --config-file config.json
Finally an LLM is called to generate the tests.
This works well enough to speed up the development of tests. It will iterate possible outcomes of endpoints and tests for different input/output options of an endpoint. As a disclaimer I want to not that this won't solve for all cases, especially if the prompt becomes very large or a lot of logic is used. Tests are there to test your application, make sure to read the generated tests and think about whether they make sense. We treat these as a starting point.
In the future, we aim to integrate multiple test executions that will automatically feed back any errors encountered during testing into the AI client. This feedback loop will allow the AI to refine and improve the test cases with each iteration, enhancing accuracy and robustness over time.
This project was born out of a necessity and we hope to contribute some ideas to future projects. We believe this demonstrates an effective example of how large language models (LLMs) can enhance and streamline workflows.