Real-time Robotic Hip-hop
Shimon raps emerged out of an interest in applying my automatic lyric creation system to a real-time model. This system aims to capture many of unique aspects of lingusitic characteristics of hip hop and lyrical flow through rhythm and phrasing. The final system is interactive, allowing a rapper to respond in dialogue with Shimon.
Shimon Raps was awarded a Guiness World Record for “First robot to participate in a rap battle” and was also featured in the World According to Jeff Goldblum S2 Episode 206.
‘The World According to Jeff Goldblum’ sneak peek: ‘I don’t rap’, Yahoo News!, January 18, 2022
This Robot Can Rap—Really Scientific American, December 4 2020
It’s Robot Versus Human as Shimon Performs Real-Time Rap Battles IEEE Spectrum, 23rd April 2020
Can a Robot Really Freestyle? Freethink Media, 23rd April 2020
Rapping featured on the tracks, Biological inclusion, Children of Two and Do You Hear.
Shimon the Rapper: A Real-Time System for Human-Robot Interactive Rap Battles
International Conference on Computational Creativity, ICCC’2020
Awarded Best Student Paper
Richard Savery, Lisa Zahray, Gil Weinberg
Abstract: We present a system for real-time lyrical improvisation between a human and a robot in the style of hip hop. Our system takes vocal input from a human rapper, analyzes the semantic meaning, and generates a response that is rapped back by a robot over a musical groove. Previous work with real-time interactive music systems has largely focused on instrumental output, and vocal interactions with robots have been explored, but not in a musical context. Our generative system includes custom methods for censorship, voice, rhythm, and a novel deep learning pipeline based on phoneme embeddings. The rap performances are accompanied by synchronized robotic gestures and mouth movements. Key technical challenges that were overcome in the system are performing with low-latency, dataset censorship, and rhyming. We evaluated several aspects of the system through a survey of videos and sample text output. Analysis of comments showed the overall perception of the system was positive. The model trained on our hip hop dataset was rated significantly higher than our metal dataset in coherence, rhyme quality, and enjoyment. Participants preferred outputs generated by a given input phrase over outputs generated from unknown keywords, indicating that the system successfully relates its output to its input.