Production-Ready AI Integration In Apps

High-level design perspectives you might overlook when designing AI-integrated digital products

July 11, 2024

AI integration is the hottest thing in the software industry right now. While a magic AI demo for your dream product can be built overnight, making things actually ready for production is an entirely different thing. Over the past decade, Appic by Hiflylabs has created countless data-driven apps, often integrating them with various machine learning models in collaboration with our data science team. Designing functionality, UX and architecture needs extra care in these cases. Here’s how we do it and what we look out for.

With the rise of text-based generative models, integrating AI into your apps is now easier than ever, and available for pretty much everyone. The end goal is always to not only provide an AI service, but make it with high reliability, and for a wide user base. In our 10+ years in the field, and around 30% of our current projects being ML or GenAI integration-related, we have created a set of principles for app design and development that drive our work every single day. I hope that this guide comes in handy in the first phases of your app’s production.

Clarity By Transparent & Familiar Patterns

The key to building trust with someone is to be clear and honest, and it’s not that different in the case of apps either. For example, as opposed to the predictable nature of algorithms, GenAI models might provide different answers for the similar instructions. Or just think of agentic behavior - would you rather trust a model that just “promises” to do what you asked it to, or one that shows you what it is going to do and how?

To improve transparency in an AI-powered app, it may be worth to consider using patterns like:

Showing the state of the agent.
Indicating the level of certainty.
Providing insights about the decision made.
Showing the expected load, service performance.

Predictable operation is also achievable by using familiar patterns. Anchor your UI on familiar user journey touchpoints instead of capitalizing on the newness of GenAI.

App Context

The app state and the AI context behind is much more relevant compared to traditional apps. It reflects the model’s current state, and thus the level of cooperation between user and model, heavily impacting user experience and perception.

This need should be taken into account early during the design phase, so these context specific aspects present in visual and technical design like:

User preferences and personal context.
Save and restore context, if the run is too long or interrupted for some reason.
Managing multiple contexts simultaneously, especially when dealing with complex user scenarios or multitasking.
Summarize the context in a concise and easily understandable way.
Managing history.
Handling error, empty or false states.

During the PoC phase, these factors may have limited relevance, but they become significantly more important when transitioning to production-ready app.
In addition to this topic, it may be worthwhile to consider breaking down the app into a set of the smallest possible contexts. This helps improve the precision of each model, to review and adjust the results of each model and also to make troubleshooting easier.

AI Apps Beyond Conversation Agents And Chatbots

Due to the rise of language models and the general public’s unfamiliarity with machine learning as a whole, GenAI is often equated with chatbots. While it’s still a good way to retrieve information from a large dataset, often even users are inaccurate in defining their goals. In these cases, your aim is to define more efficient interactions for them.

Defining and/or reproducing the desired state is difficult. We have also seen during user interviews that chat-based interaction often feels overused in their eyes, and they’re looking for better ways to interact.

For a better user experience it’s best to provide structure. Implement AI in a way that’s familiar to the user and easy to use. In short: let the user interface with something familiar you define instead of having to phrase their instructions to the model itself.

We aim to reduce the time spent prompting and to provide a way for users to find what they need in the shortest possible route. The key to achieving this is to map as many use cases and user journeys as possible and implement simple, on-hand interactions.

In the case of more complex AI implementations such as copilots and agents, while their functions are a bit different, the principle remains the same. Always make sure that users take the fewest possible steps to reach their goals, and that they don’t have to spend time putting their thoughts into words.

In short, the solution can be multifold:

Make chatbots more convenient with recommended questions, highlighting the most common interactions, and automate mundane tasks.
Provide a canvas and in it, suggestions, such as CoPilot.
If possible, integrate prompt-based operation into exact interfaces. This way, the user can create their prompt from an exact value set, while the LLM answers in a way that fits the interface.

Safeguard Your Infra Costs

The first question you ask should always be “Do I need AI for this?” Running GenAI models is still very expensive, especially compared to a well-optimized algorithm, and often they’re not even necessary for the task at hand. What matters is the added value: if GenAI creates enough positives to justify the costs, it will be greenlit.

After you’re set on using GenAI, a whole new bunch of questions pops up. Regardless of using subscription models or your own infrastructure, credit-based systems that limit users from spamming the model are pretty much a must. If you’re creating a publicly available system, make sure to provide decent rate limiting, but it’s best to put everything behind authentication for the sake of your infra or AI service bill.

We have also encountered cases where it is beneficial to combine multiple approaches. For example, running an ML model periodically for clustering, and then performing further calculations using exact algorithms on those clusters.

In the long run, thorough analytics and enhanced logging can help you even further. This will enable you to highlight anomalies and deploy solutions faster, and keep your costs low.

Security

Bad faith actors always have and will come up, and it’s up to you to protect your systems from them. Imola Horváth, Hiflylabs’ Head of Advanced Analytics also wrote about this recently, I recommend checking out her post if you’re interested. Right now, most of the questions around security are about prompt injection and hijacking, but I’m highlighting another aspect in this post.

From a development/design POV, it’s necessary to pay attention to how your AI pipeline handles private and sensitive information. Anonymization and aggregation are both great ways to up security, and it goes both for devs and users. On the one hand, AI should access anonymized, aggregated data if possible. On the other hand, user input that’s not relevant to the context should be emitted, while relevant data needs to be anonymized before processing.

And to recall my first point regarding transparency: make sure to explain how your AI processes data so that your users can make informed decisions on what they share with the model.

In addition, legislation has a part to play in this as well. Even though comprehensive regulation is still a while away, examples are popping up, such as the EU AI Act. These laws put transparency and secure AI use first, showcasing examples and setting up rules.

Data Sources

Well-structured source data is just as important for performance as the model’s training method. As our analytics team says: good data = good AI. And yet, around 70-80% of all data in the world is unstructured, making it extremely difficult to use them in traditional analytics or machine learning. Generative models, on the other hand, are great at handling unstructured data!

However, working with structured data can be a challenge as well. Let’s assume you want an agent on your database to generate SQL queries. Then your database structure, and schema descriptions come into play. It needs to match a set of requirements in addition to making sure that the language model can handle the DB. E.g. you need to consider denormalization not just for performance reasons, but because text2sql models may not work on your perfect 3NF scheme.

It is also beneficial to consider the data pipelines. to ensure smooth operation beyond the Python notebook and into production This includes everything from the ingestion of source data and documents, through chunking, all the way to vectorization/indexing.. If tasks are handled by separate cloud resources, these pipelines should run on schedule to save ultra-expensive resources. Additionally, consider how you define these pipelines declaratively and handle erroneous branches effectively in the cloud.

User Feedback

When it comes to analytics and feedback, we often emphasize that unobserved periods are null and void. You can’t track anything retroactively—considering this from the get-go is crucial. Especially so when you’re designing an AI-integrated application. Validation and user feedback can't come early enough, and optimizing your AI's performance with limited initial data can help ensure minimal error rates when you go into production. This can be done, for example, by creating multiple prompt variations, few-shot samples or model configurations and A/B testing them throughout the early phases of the project.

And you can’t talk about GenAI accuracy without addressing hallucinations. Finding out about them is one of the most difficult challenges, so collecting feedback as soon as possible can help you optimize your models in the long run.

Challenges And Opportunities For Accessibility

Last but not least, a double-edged sword. New and non-conventional features brought along with the AI revolution poses a challenge in terms of creating accessible designs. You have brand new perspectives to consider, and the field changes pretty much every day—definitely a hard nut to crack from an accessible design perspective.

However, with the advances in technology, new ways to interface with computers have and will allow for additional accessibility options. AI tools such as text-to-speech open up the option for us to reinterpret interactions: what used to be a UI on the screen before might become a voice call in the future for those who cannot use a screen.

Article by DOMONKOS PÁL

App Development

Explore more stories

CTO Perspectives: An AI Reality Check
If you're in sales or management, it can be tough to tell what matters from noise. Here’s our reality check for the 2025 tech market, covering where AI truly stands today.
Auditing Webshop Listings: How We Turned BigQuery into an AI Data Quality Engine
E-commerce platforms struggle with mismatched product titles and images. We built a solution for a Kaggle challenge that uses BigQuery and Gemini to act as an automated catalog auditor, classifying millions of listings for consistency and providing actionable insights to improve data quality.
AI Agent Governance: What to Prepare for as AI Enters Your Stack
AI agents are here, and they’re not just suggesting actions. They are taking actions as well, on their own! The question is: who’s governing them?

Flying high with Hifly

We want to work with you

Hiflylabs is your partner in building your future. Share your ideas and let’s work together.