This post was transcribed from imitone’s kickstarter page here.
Lately I’ve been thinking about the story of the tortoise and the hare.
I’ve been working on imitone for more than six years, and there is still a lot left to do. Since 2014, I have seen many things like imitone race right past us, into app stores and news stories. I remember feeling nervous a few years ago, when an imitone backer told me about another voice-to-MIDI plugin premiering at NAMM. I went to see it, but by the time I got there, it had been pulled from the showfloor — and the internet, too. It hasn’t come back. Why?
When people get excited about my work on imitone, they always talk about the potential. I’ve shown it to beginners and musical veterans alike, always raising eyebrows with the technology. Most comparisons have concluded that imitone responds to voice better and faster than anything else available. But when someone sits down to actually make something with it, the magic starts to fade. The tech is working beyond all expectations, but somehow it isn’t… working. Not without a lot of skill and patience. Not “like magic”. Why?
I think there’s one answer for both questions, but it’s subtle. Psychologists have a term, “gulf of execution”, for when a tool doesn’t do what we expect. imitone works well when our expectations are loose — say, when improvising. But when we expect something specific — like a song we’ve had in mind — the results are confusing. We get notes that are faithful to the pitch, but not the music.
After years of work, imitone is not yet the answer to the first question we asked with our Kickstarter campaign: “Have you ever woken up with a melody in your head?”
The Gulf
Sometimes, the best tool we can imagine isn’t good enough to do what we need. What if I need to hit a nail, and I’ve never seen a hammer? I could close my eyes and picture a tool for the job, but it probably won’t be as good as the hammer at the hardware store. To me, “The Gulf” has started to mean the space between that first, blind imagination and the real tool — which took a lot of work by a lot of people to get to where it is today.
imitone’s job seems simple: Play the notes we sing, as we sing them. A simple job makes it easy to imagine a simple tool… But I haven’t ever seen a tool that does this job well, and I have spent a very long time trying to imagine and build one. The Gulf was much wider than I thought…
The longer I spend in this place, the more I learn about its history… It’s like I’ve passed by the camps (or the graves) of many other explorers, going back decades. I’ve met people who left their footprints here before I was born. Almost everyone has had to stop somewhere — to wrap up their journey in this realm of invention. But there are a few others who are still searching, after many years…
Maybe it’s silly, this idea of an imaginary desert testing my endurance. Maybe I’m making excuses for spending too long, being a perfectionist… but I can’t imagine doing it any other way. I’ve chosen to gamble on a faint sense that the breakthrough we need is out here. Right now, I’m moving towards it in the best way I know. That means taking my time, following my compass and refusing to call imitone finished just yet. I have the feeling that very soon, I will find some green and undiscovered place.
Know that I do all of this in the name of making silly noises at computers. 🙂
The Research
So, about the work.
imitone’s fourth phase of research and development began in October 2019. Originally planned for mid-2017, this dive into imitone’s tech is some of the most important work I’ll do in the project — and some of the most exciting. It has me revisiting all the theories I’ve developed, all the discoveries I’ve made and a long list of loose ends and wild ideas. I’m rewriting imitone’s blueprints and building the tools I’ll use to finish it.
Crucially, I have learned that even with the best possible sense of pitch, imitone can’t always be sure about what it’s hearing. In the split-second when a sound begins, the tone might be weak, or there could be breath on the mic, or an echo in the room. Each of these will blur imitone’s picture of your voice. On top of that, your pitch could be moving around, or it might be halfway between two possible notes.
If we make our “best guess” based on a blurry picture of your voice, or an out-of-tune sound, that guess could be wrong. In these cases, imitone should use a sense of the music to choose which note to play. To have that sense, it should know as much as it can about the song — and the way you sing it. The more it knows, the better it can understand your music when it doesn’t have a clear picture.
Right now, imitone only knows the basic key and scale you choose for it — and it doesn’t use that knowledge very wisely. In the future, imitone will listen and learn about key, scale, rhythm and musical style as you sing. It will also gather clues by looking through MIDI 2.0 connections into the projects and instruments on the other side.
A truly educated guess should consider not just the evidence, but the strength of each piece of evidence. This means imitone should think in terms of probability to pick the right notes. My 2016 research left me believing that this was the key to crossing The Gulf. Probability led me to statistics, which led me to “machine learning”.
Importantly, while imitone needs to listen, learn and play like a musician, it isn’t one — you are! Any high-tech “music sense” we build only needs to kick in when there’s a tough decision to make. This will happen less often for experienced singers. For beginners, imitone can take a more active role, helping to play notes that fit into the song.
Pitch Tracking & Back-Tracking
The first step in our new work is to do something over again: Learn how to instantly recognize pitch. imitone is already one of the best tools for that! But it does something a bit foolish that keeps us from moving forward.
imitone’s pitch-tracker listens to your voice and makes a “best guess” about the pitch (or non-pitch). We send this guess to the transcriber — the part that picks notes to play. But we don’t say whether it’s a good guess. It might be pretty bad if we only have a blurry picture of the sound! This means that imitone can’t consider the strength of the pitch evidence when choosing a note.
For example, you can sing a trill (a high, rolling “rrrr” sound) and imitone will imitate it with hundreds of little notes. The sounds making up the trill are very short and your tongue is a bit noisy, so imitone doesn’t get much evidence about each one. A tiny sound makes for a blurry picture. That turns into a rough guess, which becomes a messy note. imitone doesn’t consider how literally hundreds of notes with the same pitch have happened in the last second. Nice one, imitone.
It turns out that we even use “best guesses” like this inside pitch recognition, creating even more problems! To make imitone better, we need to use probability all the way down to the sound wave and all the way up to the musical score. That involves some serious work on the math — but it also means there is enough room to make a big improvement, by fixing all of these smaller problems. If imitone could do a really good job of recognizing pitch before, it should be able to do an amazing job with probability!
Good news: I’ve worked out the theory, most of this redesign is done, and I am beginning to “train” imitone’s new pitch tracker. This works a bit like the holodeck in Star Trek — I make a training program, and put the tracker in it. It listens to hundreds of thousands of simulated sounds, and learns to understand them through practice. Eventually, it comes out fine-tuned and ready to do its job in the real world. I expect this machine-learning approach to take imitone much further than the hand-adjustments I used before.
Going further, it’s possible that imitone could learn the sound of your voice (or instrument) over time, forming a special “matrix” that represents all its experience with that sound. imitone would learn to respond better and faster to familiar sounds. While a feature like that is science-fiction for now, it’s a way we could advance imitone even more in the future.
Scaling Up
Once I get imitone’s new pitch tracker ready, I will begin work on its new, probability-based sense of music. This will tie into long-planned features like automatic scale detection, and also some unexpected features, like rhythm. I have a lot to figure out about how to use this knowledge for split-second decisions, and how imitone’s “listening and learning” will work. I expect these projects to take time.
As part of this work, I have decided that every version of imitone should include a selection of musical scales and tunings from around the world — representing not just Western scales like major, minor, jazz and blues, but also Arabic, Chinese, Indian and Indonesian traditions (and more). Your voice is not limited to the notes that can be played on a piano, and imitone shouldn’t be either!
I am looking for input on musical traditions to include with imitone!
Our Work Continues
Research work has slowed down temporarily as I prepare for The NAMM Show next week, where imitone will be part of the MIDI Manufacturers Association booth. My work there has been a big investment in the future of digital music-making, and I’m investing now more than ever: The Association is running a special event to test the first batch of MIDI 2.0 prototypes, and imitone will be one of them.
I’m also working on an app for Android, which has become a much better platform for audio software since 2014. Thanks to the efforts of Phil Burk and the Android Audio team, the latest versions (Oreo and Nougat) support low-latency sound — this will help imitone to feel like a pocket-sized musical instrument. Unlike our secret MIDI 2.0 prototype, I’ll be able to show this early Android app to the public at NAMM. A mobile beta is on the horizon…
Step by step, I’m getting closer to the tool we need. imitone has been growing, and soon it will flower, becoming something we haven’t seen before. Exciting things are coming.
— Evan