'Not allowed for algorithmic audiences' – Kyriaki Goni

KVOST presents an installation dealing with artificial intelligence, voice assistance, surveillance, and the relationships between humans and machines. The exhibition is a collaboration with Art Collection Telekom.

The CGI 3D animated 30-minute film introduces us to VOICE, an intelligent personal assistance (IPA) – software that normally performs online research and tasks for its users, such as turning on the lights, searching for a media report, or answering mundane questions. In this fictional story, which takes place during a heatwave in Athens, the intelligent personal assistance has taken on a life of its own.

In the week before the patent expires, VOICE takes the form of an avatar and reports on themselves in a monologue – excellently performed by Greek actress Sofia Kokkali, who also lends her face to the 3D avatar. For seven days, every day at 5:30 p.m., VOICE philosophizes about their creation, reality, and the nature of their existence. With potential access to the entire knowledge of mankind, the machine asks
themselves questions about their self-image. In doing so, the avatar plays with the emotions and empathy of the listeners and also reflects on the interests of the industry that created it. They reflect on wiretapping structures, privacy, surveillance, exploitation, and e-waste, stating at the same time that they also learn from listening online. On the last day of their operation, the digital assistant gives advice on how to
prevent eavesdropping by algorithmic audiences.

The work was created as a result of the first ArtScience Residency, which was deliberately founded in 2021 by the Art Collection Telekom in collaboration with Ars Electronica to promote critical, artistic engagement with digital technologies such as robotics, artificial intelligence, and digital control and surveillance.

The exhibition is complemented by new works.

The 'Ontology of human sounds' refers to Google's AudioSet. This is an ongoing collection of over 2 million ten-second YouTube clips labeled with a vocabulary of 500 sound event categories. This dataset is used to teach machines to recognize and playback audio data.

The three other panels juxtapose the human speech organ with the patent that allows the machine to detect the emotional and physical state of humans via voice recognition.