Whenever I got a chance to try and use Gemini models, the results were disappointing. The results reflected that the context was always not enough. My prompts to Gemini were mostly text based and learning oriented. However, other models like GPT, Sonnet they were able to provide relevant answers. Only when the prompts were carefully crafted with TCREI framework (Tiny Crab Rode Enormous Iguana – Task, Context, Reference, Evaluate, Iterate), I got a decent response from Gemini. Until last week, I had ChatGPT (on browser) and Github Copilot (on IDEs) as my top go-to AI assistants (not agents). I kind of not trusted the agent making code changes and preferred to Ask and do the changes myself.
Image generated by gemini.google.com

Problem Statement:
Most of my kids worksheets come as image via the School app or Whatsapp groups. There is a constant need to print the sheets. My Epson L6270 is super slow when printing the images. But, when converted to pdf, they were quite fast. I tried many of the free apps available in Google Playstore and they are all heavily loaded with advertisements (most of them were simply annoying).
Challenges in implementing:
I wanted to create an Android app myself that helps me convert images to pdf files. I never had the time to create it completely. My attempts in the past fizzed out several times – after refreshing few Android libraries, going through Udemy course or just after creating the home screen. The iText library looked simple but definitely has a learning curve. I was not interested in commercial SDKs.
Working with Agents:
Through ChatGPT, I came to know about the PdfDocument class available in Android itself. I thought to give it a shot and installed Android Studio again . Otter – The latest stable version of Android Studio has Gemini Pro built in and the auth process was smooth and easy. I was trying few prompts and the responses were detailed. I was asking about Java 25/21 support for Android and would it make sense to create the project in Java instead of Kotlin. I noticed the Agent mode and wanted to explore it more.
I started with a simple task to add a button to the existing main screen. Gemini 2.5 in Agent mode gave a clean code. I increased the complexity of the tasks – and asked it to change the compose alignment – Eg. Columns instead of Box inside scaffold, create an image picker, show preview fo images, create top bar, convert the list of uris to pdf, save them to a private folder, sort the list of uris, add settings screen, use the preferences while creating pdf files etc. Gemini Agent was able to do it with great explanations. Thanks to Google for giving free subscriptions for individual developers to their pro model – Gemini 2.5. It is definitly worth a try. It’s never too late to try Gemini.
The debugging capability of these Agents are mind-blowing. They are able to fire complex bash commands to read the logs, identify the issue, automatically try alternate approaches, re-run the commands, verify the logs again and stop this loop when it finds the success logs. That’s a lot of time saved and things get done at jet speed. The agent also helped in writing Github Actions workflow yaml files and also a pristine and neat github friendly ReadMe.md with badges.
Within 5 days, I was able to get the Android app to a decent shape and even upload to Google Playstore. I did manual testing though. I am sure testing with Agents, MCP Servers will add more value to the quality of the product.
Though most code is written by Gemini, I am still the proud owner for the few lines of code I added by hand. The agent did the heavy lifting as a developer. I was just supervising it like a manager or a product owner. The source code is available here –> https://github.com/sudhans/Image2Pdf.
Gotchas: There were a few times, the Agent was not able to refactor/write to the same file and it kept retrying. I had to force stop the task. When changes were made manually, the agent took it for granted and rolled back the manual changes – Eg. a deprecated method call was updated as per the documentation.
At times, the editing of files by the Agent could take time and if you try to type another query in the Gemini Agent chat, it could temporarily freeze the IDE till the response for the previous task is complete.
Conclusion:
Learn to work with Agents. Give better inputs to the agents via prompts. Agents can auto pick the models. Know the strengths of different models available. The Agents are here for a revolution. This also reminds me of the famous quote from the movie – iRobot
- Detective Del Spooner: Is there a problem with the Three Laws?
- Dr. Alfred Lanning: The Three Laws are perfect.
- Detective Del Spooner: Then why would you build a robot that could function without them?
- Dr. Alfred Lanning: The Three Laws will lead to only one logical outcome.
- Detective Del Spooner: What? What outcome?
- Dr. Alfred Lanning: Revolution.
- Detective Del Spooner: Whose revolution?
- Dr. Alfred Lanning: *That*, Detective, is the right question. Program terminated.
Note: cloudfare.com is down today and I got time for posting 🙂 Cloudfare took down Udemy, Kodecloud, Pluralsight, ChatGPT along with it.