Case Study · Android · Voice AI · Custom IME

Voxly AI Keyboard

A fully custom Android keyboard that replaces the system IME: speak naturally and get clean text typed straight into any app, in whichever of 19 languages you choose. Production-ready, with Google Sign-In, a credit system metered server-side, in-app purchases verified against the Play Developer API, and a self-hosted backend.

Google Play · Closed Testing

The problem

Typing is the slowest part of mobile messaging — especially across languages. Voice assistants exist, but none of them live where messaging actually happens: inside the keyboard. Voxly was built to close that gap — a keyboard where speech is the primary input, in any app, in any conversation.

How it works

Voice-to-text pipeline — audio is recorded on-device and sent to a cloud speech model, and clean text is typed straight into the focused field. Output reads like a real person typed it, not like a transcript.
Your language, not just English — users pick a native language once, and the keyboard writes in it whatever language they speak. Speak Malayalam, send Danish. Languages written in a non-Latin script get a second mode that spells them in Latin letters, the way people actually text (Manglish, Hinglish, Tanglish).
A check before you send — when the output is a language the user cannot read, the model also returns a short English gloss, shown above the keyboard. Same request, same cost, so verification is free.
Custom keyboard, built from scratch — a canvas-rendered QWERTY (no deprecated framework widgets) with word prediction and autocorrect over a 40,000-word dictionary, undo-on-backspace, long-press accents, an emoji panel, and full TalkBack accessibility.
Style toggle — faithful transcription or casual Gen Z texting style, switchable mid-conversation; each mode runs a separately tuned prompt.

Architecture

The Android client (Kotlin, Jetpack Compose, custom InputMethodService) talks to a FastAPI backend running in Docker on a self-managed VPS behind Nginx with Let's Encrypt TLS. Authentication is Google Sign-In via Firebase Auth, with user state and credit balances in Firestore. A credit system meters usage: one transcription costs one credit, with free credits on sign-up. Deductions are atomic Firestore transactions, charged only on success, so silence and failures cost nothing, and enforced server-side with proper HTTP status codes for exhausted credits and expired sessions. Credit packs are sold through Google Play Billing. The language catalogue is served by the backend and cached on device, so adding a language is a server deploy rather than an app release, and existing installs pick it up. The app also ships Google Play in-app updates, so users get new versions without leaving the keyboard.

Engineering challenges

Replacing the system IME — a keyboard can never crash, block the UI thread, or lose the input connection; every network call and audio operation is fully asynchronous.
Rejecting silence before it costs money — four detection layers (clip duration, file size, peak amplitude polling, and a model-level guard) stop empty audio on-device before any network call, and the backend never charges for an empty result.
Ditching the deprecated keyboard framework — Android's KeyboardView is deprecated, so the entire keyboard (rendering, touch pipeline, key previews, long-press popups, accessibility) was rebuilt as a custom canvas view.
Purchases that cannot be forged — the app never grants its own credits. Every purchase token is verified server-side against Google's Play Developer API before anything is credited, and crediting is idempotent: a record keyed on the hash of the token means a replayed or retried purchase adds nothing twice. Firestore rules deny all client writes, so credits can only ever come from the backend.
Shipping a language without shipping an app — the language list lives on the server, is cached on device, and falls back to a bundled copy when offline. New clients send a language code and an output mode; older builds that predate the feature keep working through a compatibility path in the same endpoint.

Stack

Kotlin Jetpack Compose Custom IME FastAPI Speech Recognition AI Play Billing Firebase Firestore Docker Nginx

Get in Touch