Sound & InteractionDSP · Frontend · Gesture HCI

PostTalk

Audio effects controller driven by hand gestures via MediaPipe — replacing the flat interface with embodied physical interaction for live music performance.

MediaPipeC++JUCESvelteWebview
Work in progress — End of Year Show, Media Art & Technology · June 2026

First Iteration

PostTalk — first iteration

Early prototype of PostTalk. The system is functional at the DSP level — the reverb engine runs in C++/JUCE with all 31 parameters exposed — and gesture recognition via MediaPipe is integrated in the Webview layer. This iteration tests the core pipeline: hand landmarks captured in real time, normalized and passed across the JS–JUCE bridge, driving effect parameters live. The focus now is on calibrating the gesture-to-parameter mapping for a live performance context and refining the interaction model ahead of the June 2026 presentation.

Narrative & Inspiration

LLMs changed human-computer interaction permanently. Natural language as an interface is a genuine leap. But it also narrows something. We have reduced the richness of human expression to text and voice, as if language were the only channel. Communication is posture, movement, proximity, sound, gesture, context. We have bodies, and our bodies carry meaning that language cannot fully encode.

PostTalk asks: what if a musician could shape sound with their hands — not by pressing buttons or turning knobs, but through gesture? The performer's hand postures, detected in real time via MediaPipe, control audio effect parameters. The interface disappears. The musician's physical presence becomes the control surface.

Technical Detail

  • DSP layer built in C++ with JUCE. The audio engine implements a full reverb with Early Reflections, a Diffusion Network (chorus, crossover filter, delay lines, feedback matrix, LFO), and a Feedback Delay Network (FDN) with freeze capability — 31 parameters in total.
  • UI layer is a Svelte frontend delivered via Webview, running on a separate thread from the DSP engine. All values crossing the JS–JUCE bridge are normalized 0–1. JUCE maps them to physical ranges internally.
  • Hand gesture detection via MediaPipe HandLandmarker, running inside the Webview. A sliding-window majority-vote smoother (4-frame window) converts raw landmark geometry into stable gesture states before sending parameter changes.
  • Architecture insight: separating the UI thread from the DSP thread via message passing — not shared state — keeps audio processing deterministic while the interface stays responsive.

Learnings

  • Real-time computer vision in a live performance context has strict latency requirements. MediaPipe running in the Webview is fast enough, but the mapping between gesture and effect needs careful calibration — too sensitive and it is unplayable, too coarse and it loses expressiveness.
  • Thread separation is not just an architectural choice — it is a musical one. Audio dropouts break a performance. Keeping the DSP thread isolated from UI events was the most important reliability decision in the project.


More Information
Icon Realiza
Italo Rojas 2026