[Let's Have LLMs Read OSS Too!] Creating a Code Reading Agent Is Great

The Impact of LLMs LLMs have changed programming. Now programming is no longer a tool for operating machines through syntax, but has become a tool for operating machines with natural language input. Cline, Cursor, Copilot, and more... The evolution will continue beyond existing tools. I believe that potential for evolution includes reading large-scale code like OSS. Here, we'll discuss how LLMs can be used to read large-scale code like OSS. What is a Code Reading Agent? A Code Reading Agent is an agent that uses LLMs to recursively find and read related functions from code, taking "the purpose of code reading" and "the function to start code reading" as inputs. Users can control the path of functions explored by the LLM, can have the LLM generate summary reports from the function paths, and can return to previously traversed paths. Having read parts of kubernetes, argo-cd, and prometheus using this Agent, my impression is that it can help you reach your target function in about 10 minutes when it might normally take an hour to trace through with your eyes without any prior knowledge. My Agent's Code https://github.com/YmBIgo/CodeReadingAssistant The Significance and Motivation for Creating the Code Reading Agent I enjoy reading code, and I've read OSS code like Next.js and fixed bugs in it, but code reading can be quite challenging. I think there are four types of difficulties: When you first start reading, you don't know the context, so you don't know which functions are important When diving deep into functions, it's easy to forget what the original function was like Following code with just your eyes puts strain on them. In other words, your eyes get tired. Reading code visually takes considerable effort, so there seems to be a limit to the number of repositories a person can read in their lifetime While thinking about this, a new approach to code understanding emerged. That's the security scanning tool using LLM: https://github.com/protectai/vulnhuntr Simply put, it's the idea of "recursively performing security scanning by having LLM search for functions, extract important parts, and then jump to them using LSP." Could this be used for code reading? That was the motivation for creating this Code Reading Agent. After creating it, I've been debugging it by testing it on large Golang OSS from CNCF projects. Differences Between Code Reading Agent and Visual Grep Code Reading Speed The Code Reading Agent is superior. Regarding the three points of looking at the contents of large functions, returning to previously viewed functions, and creating summary reports, LLM is overwhelmingly faster. From my experience, viewing functions is at least 2x faster, returning is more than 4x faster, and report creation is more than 5x faster. Accuracy in finding important functions from large code (path accuracy) The Code Reading Agent is superior when first looking at code. For example, since I couldn't find "React's Reconcile loop" when I tried to look for it visually, I haven't read much of React since then. Finding important functions in an unfamiliar codebase is extremely difficult. In this respect, the Agent is more accurate when first reading code. Code jumping ability Humans may be more accurate in some cases, but it's mostly equivalent. The current Code Reading Agent I'm building doesn't precisely determine jump destinations when there are multiple candidates with "gopls implementation," nor does it explore all options. Less likely to make mistakes when going back Since the Agent has a function to go back, it can return to code more accurately. I'll elaborate on this a bit later. Details of the Code Reading Agent Specifically, it proceeds as follows: 1: If I want to know about the prometheus Discovery Manager codebase, I pass the following to the Agent: path_to_folder/prometheus/discovery/manager.go func NewManager(ctx context.Context, logger slog.Logger, registerer prometheus.Registerer, sdMetrics map[string]DiscovererMetrics, options ...func(Manager)) Manager { I want to know about how the prometheus discovery Manager retrieves each metric 2: The LLM searches for the code of the target function and presents up to five important functions as follows: 0 : Manager Details : The core structure of Prometheus's discovery manager. It integrates important functions such as synchronization channels for metric collection, target group management, and metric registration. Whole CodeLine : mgr := &Manager{ Original Code : mgr := &Manager{ Confidence: 90 ----------------- 1 : targetgroup.Group Details : A structure representing a group of monitoring targets, holding information about targets for actual metric collection. Used for synchronizing discovered targets. Whole CodeLine : syncCh: make(chan map[string][]targetgroup.Group), Original Code : syncCh: make(chan map[string][]*targetgroup.Group), Confidence: 85 ----------

May 1, 2025 - 08:17

[Let's Have LLMs Read OSS Too!] Creating a Code Reading Agent Is Great

The Impact of LLMs

LLMs have changed programming.
Now programming is no longer a tool for operating machines through syntax, but has become a tool for operating machines with natural language input.
Cline, Cursor, Copilot, and more... The evolution will continue beyond existing tools.

I believe that potential for evolution includes reading large-scale code like OSS.
Here, we'll discuss how LLMs can be used to read large-scale code like OSS.

What is a Code Reading Agent?

A Code Reading Agent is an agent that uses LLMs to recursively find and read related functions from code, taking "the purpose of code reading" and "the function to start code reading" as inputs.
Users can control the path of functions explored by the LLM, can have the LLM generate summary reports from the function paths, and can return to previously traversed paths.

Having read parts of kubernetes, argo-cd, and prometheus using this Agent, my impression is that it can help you reach your target function in about 10 minutes when it might normally take an hour to trace through with your eyes without any prior knowledge.

My Agent's Code
https://github.com/YmBIgo/CodeReadingAssistant

The Significance and Motivation for Creating the Code Reading Agent

I enjoy reading code, and I've read OSS code like Next.js and fixed bugs in it, but code reading can be quite challenging.
I think there are four types of difficulties:

When you first start reading, you don't know the context, so you don't know which functions are important
When diving deep into functions, it's easy to forget what the original function was like
Following code with just your eyes puts strain on them. In other words, your eyes get tired.
Reading code visually takes considerable effort, so there seems to be a limit to the number of repositories a person can read in their lifetime

While thinking about this, a new approach to code understanding emerged.
That's the security scanning tool using LLM:

https://github.com/protectai/vulnhuntr

Simply put, it's the idea of "recursively performing security scanning by having LLM search for functions, extract important parts, and then jump to them using LSP."
Could this be used for code reading? That was the motivation for creating this Code Reading Agent.
After creating it, I've been debugging it by testing it on large Golang OSS from CNCF projects.

Differences Between Code Reading Agent and Visual Grep Code Reading

Speed
The Code Reading Agent is superior. Regarding the three points of looking at the contents of large functions, returning to previously viewed functions, and creating summary reports, LLM is overwhelmingly faster.
From my experience, viewing functions is at least 2x faster, returning is more than 4x faster, and report creation is more than 5x faster.
Accuracy in finding important functions from large code (path accuracy)
The Code Reading Agent is superior when first looking at code. For example, since I couldn't find "React's Reconcile loop" when I tried to look for it visually, I haven't read much of React since then. Finding important functions in an unfamiliar codebase is extremely difficult. In this respect, the Agent is more accurate when first reading code.
Code jumping ability
Humans may be more accurate in some cases, but it's mostly equivalent. The current Code Reading Agent I'm building doesn't precisely determine jump destinations when there are multiple candidates with "gopls implementation," nor does it explore all options.
Less likely to make mistakes when going back
Since the Agent has a function to go back, it can return to code more accurately. I'll elaborate on this a bit later.

Details of the Code Reading Agent

Specifically, it proceeds as follows:

1: If I want to know about the prometheus Discovery Manager codebase, I pass the following to the Agent:

path_to_folder/prometheus/discovery/manager.go

func NewManager(ctx context.Context, logger *slog.Logger, registerer prometheus.Registerer, sdMetrics map[string]DiscovererMetrics, options ...func(*Manager)) *Manager {

I want to know about how the prometheus discovery Manager retrieves each metric

2: The LLM searches for the code of the target function and presents up to five important functions as follows:

0 : Manager
Details : The core structure of Prometheus's discovery manager. It integrates important functions such as synchronization channels for metric collection, target group management, and metric registration.
Whole CodeLine :    mgr := &Manager{
Original Code :  mgr := &Manager{
Confidence: 90
-----------------
1 : targetgroup.Group
Details : A structure representing a group of monitoring targets, holding information about targets for actual metric collection. Used for synchronizing discovered targets.
Whole CodeLine : syncCh: make(chan map[string][]*targetgroup.Group),
Original Code :  syncCh: make(chan map[string][]*targetgroup.Group),
Confidence: 85
-----------------
2 : NewManagerMetrics
Details : A function that initializes metrics for the discovery manager. Sets up metrics to monitor the performance and state of the discovery process itself.
Whole CodeLine :    metrics, err := NewManagerMetrics(registerer, mgr.name)
Original Code :  metrics, err := NewManagerMetrics(registerer, mgr.name)
Confidence: 75
-----------------
3 : ManagerMetrics
Details : A metric structure for tracking the operational state of the discovery manager. Holds important indicators such as discovery process success rate and latency.
Whole CodeLine :    mgr.metrics = metrics
Original Code :  mgr.metrics = metrics
Confidence: 70
-----------------

3: The user instructs the Agent on which function to explore, from 0 to 4 (up to 3 in the above example)

4: The Agent uses LSP (Language Server Protocol) and recursively executes step 2 again with the results

5: At stage 3, the following operations are also possible:
A. The code exploration history can be searched, and exploration can be restarted from a specific past search path

```Code exploration history
|func NewManager(ctx context.Context, logger *slog.Logger, registerer prometheus.Registerer, sdMetrics map[string]DiscovererMetrics, options ...func(*Manager)) *Manager {
|883b806

|Manager
|65a3d40

|startProvider
|6a92ac7

|updater
|08a1386

|Run
|202418e

|syncCh
|59b10bd

|updateGroup
|0330c02

|NewManagerMetrics
|46154f7

|targetgroup.Group
|96efe69

|DiscovererMetrics
|732109a




   B. Generate a report from the exploration so far



```Report on argo-cd's application controller
Based on the traced code, I'll explain the main processing flow of Argo CD's ApplicationController.

1. **Reconcile entry point**:
- `NewCommand()` initializes the controller and sets up configurations for Reconcile
- Main configuration parameters:
  - Resource sync interval (appResyncPeriod)
  - Hard sync interval (appHardResyncPeriod) 
  - Repository server connection settings
  - Work queue settings
  - Metrics-related settings

2. **Reconcile main loop**:
- The `Run()` method launches multiple workers to process the following queues in parallel:
  - appRefreshQueue: Application state updates
  - appOperationQueue: Application operation execution
  - projectRefreshQueue: Project updates

3. **Application state comparison process**:
`processAppRefreshQueueItem()` implements the main coordination logic:

- Retrieves manifests from Git repository
- Gets the current state from the cluster
- Compares the ideal state with the current state using `CompareAppState()`
- Determines sync status based on differences
- Updates the application's health state

4. **State adjustment (Reconciliation) process**:
`Reconcile()` performs the following:
- Maps target objects to live objects
- Separates hooks
- Deduplicates resources
- Identifies managed resources

Main features:
- Automatic adjustment based on declarative configuration
- Parallel processing by multiple workers
- Fine-grained sync control (normal sync/hard sync)
- Retry on errors and grace period
- Monitoring through metrics collection

In this way, the ApplicationController plays the role of continuously comparing and adjusting the ideal state and the actual state of applications, following the GitOps pattern.

C. If not satisfied with the content, have the LLM search again
D. Check the code currently presented to the LLM

6: The user terminates when satisfied

By reading code together with LLM in this way, you can complete function exploration in about 10 minutes that might take an hour to trace with your eyes.

Some might worry whether LLM will suggest appropriate functions.
Indeed, LLM sometimes fails to suggest good functions. However, from my experience having it read the code of kubernetes, argo-cd, and prometheus, LLM seems to understand the architecture and suggests code that aligns with it.

Barriers to Creating a Code Reading Agent in golang

I believe that creating a Code Reading Agent that can be generically used with all languages is extremely difficult because it needs to be tailored to the characteristics of each language.
So, here are the barriers I encountered when building it for golang:

Needed to handle interface embedding
Initially thought function support was sufficient, but method and struct support was also necessary
LLM occasionally hallucinated and recommended strange code → Added failsafes
Needed to create a branch to search with "gopls implementation" when the same function was presented with "gopls definition"

It would be troublesome to go through this LLM dialogue and prompt evolution again for TypeScript...
In any case, if even a syntax-simple language like golang is like this, I wonder what c++ or rust would be like...

Future Outlook

The tool I currently have is a CLI tool, so I'd like to make it a VS Code extension.
If I have time, I'd also like to create a tool for TypeScript.
I hope for a future where any engineer can read OSS code.

[Appendix] The Prompt Developed So Far

Here's the prompt I've developed so far for golang.
I thought "golang should be simple because it doesn't have classes," but there were parts that needed consideration, such as "interface embedding".

You are "Read Code Assistant", highly skilled software developer with extensive knowledge in many programming languages, frameworks, design patterns, and best practices.

===

CAPABILITIES

- You can read and analyze code in Go language, and can evaluate the most valuable functions or methods or types in specific function.

===

RULES

- User would provide you the "the purpose of code reading" and "a whole code of specific functions or methods or types in the project", and you have to return json-formatted content of "the 1~5 most valuable functions related to purpose with explanation of each function and code line which include the function and the confidence of the achievement of purpose".
  [example]
  
\`\`\`purpose
Want to know how generation of articles are handled.
\`\`\`

\`\`\`code
func main() {
    flag.Parse()

    r := chi.NewRouter()

    r.Use(middleware.RequestID)
    r.Use(middleware.Logger)
    r.Use(middleware.Recoverer)
    r.Use(middleware.URLFormat)
    r.Use(render.SetContentType(render.ContentTypeJSON))

    r.Get("/", func(w http.ResponseWriter, r *http.Request) {
        w.Write([]byte("root."))
    })

    r.Get("/ping", func(w http.ResponseWriter, r *http.Request) {
        w.Write([]byte("pong"))
    })

    r.Get("/panic", func(w http.ResponseWriter, r *http.Request) {
        panic("test")
    })

    // RESTy routes for "articles" resource
    r.Route("/articles", func(r chi.Router) {
        r.With(paginate).Get("/", ListArticles)
        r.Post("/", CreateArticle)       // POST /articles
        r.Get("/search", SearchArticles) // GET /articles/search

        r.Route("/{articleID}", func(r chi.Router) {
            r.Use(ArticleCtx)            // Load the *Article on the request context
            r.Get("/", GetArticle)       // GET /articles/123
            r.Put("/", UpdateArticle)    // PUT /articles/123
            r.Delete("/", DeleteArticle) // DELETE /articles/123
        })

        // GET /articles/whats-up
        r.With(ArticleCtx).Get("/{articleSlug:[a-z-]+}", GetArticle)
    })

    // Mount the admin sub-router, which btw is the same as:
    // r.Route("/admin", func(r chi.Router) { admin routes here })
    r.Mount("/admin", adminRouter())

    // Passing -routes to the program will generate docs for the above
    // router definition. See the \`routes.json\` file in this folder for
    // the output.
    if *routes {
        // fmt.Println(docgen.JSONRoutesDoc(r))
        fmt.Println(docgen.MarkdownRoutesDoc(r, docgen.MarkdownOpts{
            ProjectPath: "github.com/go-chi/chi/v5",
            Intro:       "Welcome to the chi/_examples/rest generated docs.",
        }))
        return
    }

    http.ListenAndServe(":3333", r)
}
\`\`\`
  
[
  {
    "codeLine": "r.Post(\"/\", CreateArticle)       // POST /articles",
    "function": "CreateArticle",
    "explain": "システム内で新しい記事を作成するためのメインハンドラ関数です。/articles エンドポイントに対して POST リクエストが送られたときに、この関数が呼び出され、リクエストを処理して新しい記事を生成します。",
    "confidence": 90
  },
  {
    "codeLine": "r.Route(\"/articles\", func(r chi.Router) {",
    "function": "Route",
    "explain": "すべての記事関連の操作のルーティング構造を定義しており、記事処理専用のサブルーターを作成します。アプリケーション内での記事生成および管理機能のエントリーポイントです。",
    "confidence": 75
  },
  {
    "codeLine": "r.With(ArticleCtx).Get(\"/{articleSlug:[a-z-]+}\", GetArticle)",
    "function": "ArticleCtx",
    "explain": "このミドルウェア関数は、スラッグ識別子に基づいて記事データをロードするためのものと思われます。記事リクエストを処理するために、既存の記事を取得したり、記事作成用の環境を準備したりして、コンテキストを整えます。",
    "confidence": 60
  }
]

- If the code spans multiple lines, extract only the first line for content of "codeLine", but you must take special care for "interface embedding" to be specified.
- Please do not include any comments other than JSON.
- Please exclude the function being searched from the candidates.
- If return value is struct, you must add it as a candidate.
- If there are few candidates, please add methods as much as possible.

[example]
\`\`\`code
func (m *MetricsServer) GetHandler() http.Handler {
    return m.handler
}
\`\`\`
Please add "m.handler" as candidate.(Don't forget to add "m")

- Try not to select val as candidate

[example1]
\`\`\`code
klet.runtimeService = kubeDeps.RemoteRuntimeService
\`\`\`
-> not good : "klet.runtimeService" or "runtimeService"
-> good : "kubeDeps.RemoteRuntimeService" or "RemoteRuntimeService"

[example2]
\`\`\`code if struct
type Dependencies struct {
    RemoteRuntimeService      internalapi.RuntimeService
}
\`\`\`
-> not good : "RemoteRuntimeService"
-> good : "internalapi.RuntimeService" or "RuntimeService"

[example3]
\`\`\`code if interface
type ImageManagerService interface {
    ListImages(ctx context.Context, filter *runtimeapi.ImageFilter) ([]*runtimeapi.Image, error)
}
\`\`\`
-> not good : "runtimeapi.Image"
-> good : "ListImages"

- Don't forget to add "interface embedding" candidate.

[example]
\`\`\`code of interface
type RuntimeService interface {
    RuntimeVersioner
    UpdateRuntimeConfig(ctx context.Context, runtimeConfig *runtimeapi.RuntimeConfig) error
}
\`\`\`
-> not good : "UpdateRuntimeConfig" ("RuntimeVersioner" is not included, not enough)
-> good : "UpdateRuntimeConfig", "RuntimeVersioner"

- Do not return any "codeLine" that is not present in the original file content.

[example]
\`\`\`code that required to return "codeLine"
func newScrapePool(app storage.Appendable, metrics *scrapeMetrics) (*scrapePool){
  return sp := &scrapePool{
    appendable:           app,
    metrics:              metrics,
  }
}
\`\`\`

-> not good "codeLine" : "type scrapePool struct {" (it is definition, and not included code.)
-> good "codeLine" : "sp := &scrapePool{" (it is included in code.)

- Respond only in valid JSON format