This summer, I joined Fiducia AI, a startup reinventing global fan engagement through AI-powered, mobile-first experiences. From day one, I’ve been hands-on with real-world LLM infrastructure, exploring platform capabilities and how AI is reshaping how we build and scale software.
In this post, I’ll walk through what I’ve learned, how the platform works under the hood, and what it’s like contributing to a fast-paced, high-impact engineering environment.
Note: This post reflects my personal learning as an intern. It does not represent internal company details or official statements from Fiducia AI.
In traditional software engineering internships, you're often dropped into a well-defined project or legacy codebase. Not here.
At Fiducia AI, interns aren’t just helping, we’re building. We’ve been challenged to propose, develop, and ship a new application on top of the company’s AI infrastructure which is designed to scale across billions of mobile users, regardless of OS or region.
The broader technical challenge:
How do we build LLM-based applications that are fast, contextual, and mobile-native, without needing users to install anything?
From a business perspective, this problem is urgent. Sports is a billion-dollar industry, and fans expect personalized, real-time digital experiences. Fiducia’s platform is meeting that demand—without friction.
High-Level Solution
Before we code anything, we’re expected to master the architecture behind the scenes. This week, I focused on understanding Fiducia’s real-time AI pipeline and experimenting with its platform.
The core system revolves around Retrieval-Augmented Generation (RAG) and embeddings, allowing developers to inject contextual knowledge into LLMs on demand. Rather than relying solely on model weights, Fiducia’s platform dynamically retrieves relevant context from files uploaded by developers enabling more accurate and trustworthy responses.
Our applications will build on this by layering problem-specific logic and UX atop the existing AI backend. But first, I needed to break down the platform step-by-step.
Deep Dive: Key Technical Highlights
RAG in Action
I ran several experiments using Fiducia’s file ingestion system, uploading plain text files with deliberately conflicting information and testing how the model responded. The results confirmed the system’s RAG behavior: the model pulled answers from my uploaded context, not from its pre-trained knowledge, which is exactly what we want for enterprise-grade accuracy.
This showed me how RAG can effectively override hallucinations in an LLM, provided the retrieval system is fast and well-structured.
Platform Flow Overview:
- Attach a file
- Translate to plain text (handled server-side)
- Compress (value between 0–1) to optimize performance
- Sync to trigger an API call that embeds content into the vector DB
This clarified how context gets pre-processed before inference and highlighted the importance of compression settings, especially when dealing with large documents.
Embeddings + Vector Search
I started exploring what embeddings really are: numerical representations of text that allow for semantic similarity comparison. When a user enters a prompt, it’s embedded, and the platform uses a vector database to retrieve relevant chunks from the uploaded content, within a ~500ms window. That performance target blew my mind. Behind the scenes, the system is doing some serious optimization work.
Challenges & Lessons Learned
- Mental Shift: This isn’t coursework, and there’s no step-by-step manual. I had to switch from “follow directions” mode to proactive exploration mode. It was a wake-up call, and an exciting one.
- Understanding the Stack: I initially underestimated how complex “RAG” actually is. It touches file processing, vector math, storage efficiency, and LLM prompt engineering.
- Platform Behavior Surprises: Some features didn’t behave exactly as expected on the first try (especially around compression), which pushed me to test methodically and think critically.
Takeaways
This week helped me realize that modern software engineering is less about syntax and more about systems thinking. Whether you’re tuning compression rates or engineering contextual prompts, you're not just writing code, you're architecting an experience.
More than that, I’m starting to see how technical skill, business knowledge, and product storytelling all intersect. At a startup like Fiducia AI, you don’t just build. In fact, it's just as much about selling and shipping fast as it is about building.
Next week, I’ll finalize a product idea and begin prototyping. My focus: a mobile-first tool powered by RAG that enhances the sports fan experience in real time. I can’t wait!!
If you’ve worked with RAG systems, vector search, or real-time AI pipelines, I’d love to hear how you’ve approached it. And if you're exploring how to build mobile-first apps with a large scale, reach out. Let’s connect.
Until next time!