AI-Driven Development: My Database Experiment

There's plenty of debate about AI's role in coding. As an engineer, I've been working with Large Language Models and GenAI since their early days. I decided to skip the debate and put AI to a practical test: build a DynamoDB-like database service from scratch with AI assistance. Because who doesn't love a good database challenge? The idea was to build a service with the following characteristics: DynamoDB-like API: Implementing the essentials from PutItem to TransactWriteItems. Robust Storage Layer: A RocksDB implementation with full transaction support. Modern Communication: High-performance gRPC service, Protocol Buffer serialization, and TLS 1.3 (including mutual authentication). Containerization: Multi-arch Docker support and Kubernetes deployment manifests. Agile Development Simulation: Instead of providing a complete design upfront, start small and iteratively add (generate) new features. (This was partly a necessity as I worked on it primarily on weekends). AI-Driven Development: Let AI drive the development process as much as possible. The Results The development approach was pragmatic and interactive. The workflow was as follows: Begin with interactive AI sessions to outline implementation strategies using Amazon Q chat. Review generated code and iterate until it met my standards. Use the generated code as a foundation. Leverage inline code generation for refinements, extensions, and new features using Amazon Q inline code generation. Review selected concepts using Google Gemini chat. The numbers surprised even me: approximately 80% of the application code was AI-generated, with test coverage hitting an impressive 90% (if not more). Amazon Q demonstrated impressive knowledge across the chosen technology stack: build tools and testing frameworks (Gradle, JUnit5, Mockito), communication protocols including transport security (gRPC, Protocol Buffers, TLS 1.3), RocksDB database, and deployment technologies (Docker, Kubernetes). I want to call out three key things from this experiment: Development Workflow. To state the obvious: the first generated code isn't always the best. By providing feedback and requesting improvements, the quality of the generated code improved significantly. For example, the default JUnit tests for the gRPC service weren't using mocks but instead used the underlying implementation. When I asked for mocks, I got JUnit tests with interface mocks. Once I had tests with mocks, the inline code generation followed the existing code and started generating code with mocks. Highlight. The TransactWriteItems implementation particularly showcased AI's capabilities, handling everything from Protocol Buffer definitions, gRPC implementation, RocksDB implementation, to @FunctionalInterface Java lambda for submitting RocksDB transactions. Initially, the inline code generation had trouble generating the correct Java builders chains for this complex operation (TransactWriteItems is a composition of a list of PutItem, UpdateItem, DeleteItem operations). In order to get the correct Java builders chains, I used the Amazon Q chat interface and provided the proto definition as the input. Once I got the working code, I completed the remaining implementation using the inline code generation. The unit test for TransactWriteItems was almost entirely implemented by inline code generation, except for mocking the Java lambda expression for submitting the RocksDB transaction, which seemed to be a challenge for the inline tool. Again, I solved that by using the chat feature, where I explained the problem and pasted the @FunctionalInterface definition for additional context. The rest of the unit test was completed by the inline code generation. As a result, most (about 95%) of the TransactWriteItems feature (ProtoBuf definition, gRPC service implementation, RocksDB implementation, and all the unit tests) was AI-generated. Quite impressive, I have to say. Lowlight. I have to say that not everything was perfect. There's room for improvement in handling cross-cutting concerns, like input validation. I think that if I had provided a complete design upfront, the generated validation code would have been elegantly built-in rather than bolted-on. The input validation code is where I spent most of my time and had to provide the most of my manual implementation. Key takeaways Here are my key takeaways: AI is great at implementing well-defined patterns. The more context, the better and more accurate the generated code. AI excels at generating boilerplate code. AI is extremely useful for writing unit tests. The inline code generator very strictly follows the Arrange-Act-Assert pattern and very accurately predicts what and how you want to test your code. Interactive AI sessions are great for starting from scratch or in situations where there's not (yet) enough context in the existing code. Interactive AI sessions produce better results than quick inline generations.

May 1, 2025 - 01:26

AI-Driven Development: My Database Experiment

There's plenty of debate about AI's role in coding. As an engineer, I've been working with Large Language Models and GenAI since their early days. I decided to skip the debate and put AI to a practical test: build a DynamoDB-like database service from scratch with AI assistance. Because who doesn't love a good database challenge?

The idea was to build a service with the following characteristics:

DynamoDB-like API: Implementing the essentials from PutItem to TransactWriteItems.
Robust Storage Layer: A RocksDB implementation with full transaction support.
Modern Communication: High-performance gRPC service, Protocol Buffer serialization, and TLS 1.3 (including mutual authentication).
Containerization: Multi-arch Docker support and Kubernetes deployment manifests.
Agile Development Simulation: Instead of providing a complete design upfront, start small and iteratively add (generate) new features. (This was partly a necessity as I worked on it primarily on weekends).
AI-Driven Development: Let AI drive the development process as much as possible.

The Results

The development approach was pragmatic and interactive. The workflow was as follows:

Begin with interactive AI sessions to outline implementation strategies using Amazon Q chat.
Review generated code and iterate until it met my standards.
Use the generated code as a foundation.
Leverage inline code generation for refinements, extensions, and new features using Amazon Q inline code generation.
Review selected concepts using Google Gemini chat.

The numbers surprised even me: approximately 80% of the application code was AI-generated, with test coverage hitting an impressive 90% (if not more).

Amazon Q demonstrated impressive knowledge across the chosen technology stack: build tools and testing frameworks (Gradle, JUnit5, Mockito), communication protocols including transport security (gRPC, Protocol Buffers, TLS 1.3), RocksDB database, and deployment technologies (Docker, Kubernetes).

I want to call out three key things from this experiment:

Development Workflow. To state the obvious: the first generated code isn't always the best. By providing feedback and requesting improvements, the quality of the generated code improved significantly.
For example, the default JUnit tests for the gRPC service weren't using mocks but instead used the underlying implementation. When I asked for mocks, I got JUnit tests with interface mocks. Once I had tests with mocks, the inline code generation followed the existing code and started generating code with mocks.
Highlight. The TransactWriteItems implementation particularly showcased AI's capabilities, handling everything from Protocol Buffer definitions, gRPC implementation, RocksDB implementation, to @FunctionalInterface Java lambda for submitting RocksDB transactions.
Initially, the inline code generation had trouble generating the correct Java builders chains for this complex operation (TransactWriteItems is a composition of a list of PutItem, UpdateItem, DeleteItem operations). In order to get the correct Java builders chains, I used the Amazon Q chat interface and provided the proto definition as the input. Once I got the working code, I completed the remaining implementation using the inline code generation. The unit test for TransactWriteItems was almost entirely implemented by inline code generation, except for mocking the Java lambda expression for submitting the RocksDB transaction, which seemed to be a challenge for the inline tool. Again, I solved that by using the chat feature, where I explained the problem and pasted the @FunctionalInterface definition for additional context. The rest of the unit test was completed by the inline code generation.
As a result, most (about 95%) of the TransactWriteItems feature (ProtoBuf definition, gRPC service implementation, RocksDB implementation, and all the unit tests) was AI-generated. Quite impressive, I have to say.
Lowlight. I have to say that not everything was perfect. There's room for improvement in handling cross-cutting concerns, like input validation.
I think that if I had provided a complete design upfront, the generated validation code would have been elegantly built-in rather than bolted-on.
The input validation code is where I spent most of my time and had to provide the most of my manual implementation.

Key takeaways

Here are my key takeaways:

AI is great at implementing well-defined patterns. The more context, the better and more accurate the generated code.
AI excels at generating boilerplate code.
AI is extremely useful for writing unit tests. The inline code generator very strictly follows the Arrange-Act-Assert pattern and very accurately predicts what and how you want to test your code.
Interactive AI sessions are great for starting from scratch or in situations where there's not (yet) enough context in the existing code.
Interactive AI sessions produce better results than quick inline generations.
Complex architectural decisions benefit from human oversight and experience.
Some aspects (cross-cutting ones) still require significant human input.

The experiment demonstrated that AI-assisted development has matured to the point where it can significantly accelerate software development.
While it's not a complete replacement for human expertise and there are some areas of the project that I'm not happy with (which I left as they were generated for other engineers to review and assess), it can handle much of the heavy lifting in software development.

I think it's a great tool for teams looking to create minimum viable products to validate ideas quickly, as well as teams looking to build more complex production-grade systems.