I was browsing through the many spaces on Hugging Face recently and found this interesting one that claims to be able to convert any text-based article into a simulated podcast featuring two AI speakers who take turns speaking.
It uses a range of open source technologies including Jina AI’s Reader for parsing URL queries to AI-friendly text, the large language model Hermes-2-Pro-Llama-3-8B by Nous Research for function calling support, and the multilingual text-to-speech library MeloTTS by MyShell to generate the actual voices.
At first I tried pasting in the Wikipedia article for AI but that brought up an error. Sensing that perhaps length could be the issue, I tried the same with the much shorter Simple English version but still got the same error. But trying a third time with an earlier article I wrote for this blog about generative AI ended up working!
Here is the result of that experiment (unlike a normal podcast, it’s only 80 seconds long).
As you can see, it’s not perfect. The voices are still somewhat robotic-sounding and some of the pronunciations are wrong. But it’s still a very interesting technology. I can see tools like this being useful for those who don’t want to read through pages upon pages of text but would rather get the gist of a subject while waiting in traffic or riding the bus, etc. That being said, a big part of the appeal of podcasts is that they’re actually entertaining to listen to, and I don’t think this AI is quite there yet.
And finally, after this post has been published, I will ask Podcastify to generate a new podcast episode based on it – making this the world’s first self-referential podcast episode!