Regarding the Frequent Question of Latency
This post was written about imitone 0.7.0, and its details may become inaccurate as the software improves.
I'm often asked what imitone's latency is, with the asker expecting a number in milliseconds. Unfortunately, answering with a simple number would be dishonest. I'm an engineer first, and I don't like spreading misinformation.
The short answer:
imitone responds much faster to high notes than low notes, and somewhat faster to clear tones than deep, gritty ones.
Ignoring the input latency that affects all software, it is essentially instant when whistling or singing in the soprano range, and can become significant in the tenor and bass range. The latency will be much more noticeable on stricken instruments like pianos and guitars.
For the time being, a good "cheat" is to sing in a higher register (falsetto!) and transpose imitone back down to the desired pitch for your instrument. For deep bass notes I also find that an "oh" or "boh" sound tracks a little better due to its clearer overtone structure.
The long answer: (for the brave and nerdy!)
imitone 0.7.0's latency is the I/O latency noted in the CPU meter, plus 3-7 periods of the tone. I will call that number the clarity factor as it is mainly affected by how clear the tone is. (Loud background noise can also increase it, and I expect to lower it generally in future updates.)
Some example values: For an A4 (alto range) note at 440hz, imitone responds in (1000/440)*clarity factor milliseconds (that is, 7-16ms) plus about 6ms I/O latency if using Mac. This gives a total around 13-22ms -- typically closer to 13 as higher notes usually have a good clarity factor. The response time doubles to 14-32ms for the tenor note A3, and halves to 3.5-8ms for a soprano A5.
This may seem concerning, but it's a basic theoretical limit. A tone is a repeating wave, and we don't know it's a tone until we can tell it's repeating. (The theoretical minimum for the clarity factor is 2.) This limitation affects the human ear -- a tone doesn't start to sound like a tone to us until about 5-6 periods in, and because our brains compensate for this we perceive less latency for low tones than for high tones. For this reason gentle, continuous instruments like bowed strings won't seem less responsive in the lower ranges.
However, we're very good at noticing the latency of transients -- sudden loud noises like a piano key strike, a guitar string pluck or a drum hit will "short circuit" our normal sound perception and get to our brains faster. Generally when musicians talk about millisecond latency, this is what they have in mind -- the time delay between their finger hitting a key and the mallet slamming noisily into the string.
Getting around the limitation:
Transients are easy to recognize. They often accompany musical notes, but do not actually have pitch themselves. Using the planned percussion mode, it would be possible for imitone to recognize them quickly, like the human brain does, but because of the way music controllers work, it is not possible to transmit a note until its tone is known -- which is always some number of periods later. I can know a piano key has been pressed in the first few milliseconds -- just not which key! By the same token I can hear the "t" in "tah" before the "ah".
If imitone could transmit the transient first and the pitch a tiny fraction of a second later, it would be possible to control deep guitar or piano notes with almost no perceived latency -- but as far as I know there are no instruments that support this. This is a limitation of modern MIDI, and I will attempt to spread awareness of it in the music manufacturing industry in the hopes of solving this problem for imitone (as well as other audio-MIDI systems like V-guitars) in the future.
Hopefully this answers more questions than it raises, haha!
Excellent explanation. I like it like that. Thank you.
ad limitation of MIDI
Do you know Florian Bomers?
He and a few other guys/companies are already working on the "HD protocol" specifications of MIDI. The upcoming BomeBox is "HD protocol" ready.
And what about OSC?
imitone 0.7.0's latency is the I/O latency noted in the CPU meter, plus 3-7 periods of the tone [..] This limitation affects the human ear -- a tone doesn't start to sound like a tone to us until about 5-6 periods in [..] Transients are easy to recognize. They often accompany musical notes, but do not actually have pitch themselves.
You might be interested in the work of András Szalay. He worked on the Axon and Fishman MIDI pickups, which are capable of detecting notes much faster (< 1 period) by training a neural net to listen to the transients.
High level explanation from Axon. The developer of Axon now works at Fishman (the TriplePlay wireless MIDI pickup).
No clue if this could be applied to the human voice.
If imitone could transmit the transient first and the pitch a tiny fraction of a second later, it would be possible to control deep guitar or piano notes with almost no perceived latency
How would that work? I assume this would require the instrument to separate the percussive transient from the note body itself, so you trigger the transient first then follow up with the note? But the transient on A0 sounds a lot different than the transient on C8, right?
Sorry for the super slow reply -- I've been travelling...
Holo -- I'm sort of a music tech industry outsider, and I don't know a lot of people there. OSC is somewhere on the map, but I'll need to give some thought to how to make it work. It's a very open-ended protocol...
Eric -- Lately I'm getting much closer to 2 periods under controlled conditions. I've talked quite a lot about the implications of going under that -- in short, you can't make an authoritative determination of pitch without knowing the exact timbre to expect, and even when you do the results are likely to be unstable. I've actually been in touch with the MIDI manufacturers' association about ways of making these early guesses and subsequently adjusting them to accurate values -- it's a tricky protocol design problem and one that will require cooperation with synth makers.
The bass tech is neat -- approximating pitch based on transient is something I'd given thought to -- but naturally it's constrained to processing sounds that it knows about already. So it's not very useful for voices without a lot of user-specific training or a huge database. I've got some ideas of my own, but I'm nowhere near testing them.
Have you looked at the Reaktor Ensemble "The Mouth" by Tim Exile? It sends MIDI out too (to use it like imitone), but also has a built-in pitch/formant shifting algo and some sort of simple synth engine.
It has basically 2 modes:
"Pitch Mode" (slower, compareable to imitone)
"Transient Mode" (nearly latency-free, but not able to detect more complex melodies or sluggish intonation)
Hey, Holo --
The Mouth's MIDI output is a very secondary feature, as far as I can tell. (I had researched it before and wasn't even aware it had the capability -- they certainly don't market it.) It's mainly a voice-driven synthesizer -- which has been the fate of many attempts to do what imitone does over the years. When the synth and the analysis are closely coupled, you can get away with a lot more -- in fact, the LPC coding that telephones use has worked on the same principle for decades. The key benefit of that coupling is that the pitch estimate (if any) can be a fraction of the actual sound's pitch, and still capture all the overtones of the sound in a way that allows them to be reconstructed by the synthesizer.
MIDI data is harder, because you have to fully resolve those ambiguities... That's an ongoing effort with imitone. I'll have to try The Mouth and see how well it works when used as a live converter.
I have not tried the software yet, but I thought I might suggest another way to address the latency question. Could you including a mode that has a consistent latency for recording scenarios? i.e, you detect the start of the note so it's relative place in time is accurate, and you using the time of the buffer to calculate the note, before you wait out the delay before you sent the midi data. That way you would have consistent latency no matter what note is being detected, and can preserve/translate the performance more accurately.
That way the delay might be acceptable for monitoring, and a consistent offset can be set to compensate in the music software on the recorded track. In Reaper I think this can be done with some limitations on the track, or by using the ReaInsert without any limitations.
Let me know what you think.
Last edited by Boroni (2016-01-17 10:12:32)
That's an astute suggestion -- albeit potentially very difficult to implement. In one of my future updates I plan on enhancing imitone's timing accuracy, and I think I will investigate what would be necessary to implement such a delay scheme at that time.
Thanks for your input.
Thank you for the swift reply. I am going to get hold of a key today, and I am going to have an experiment with 0.8.0. My plan is to retained the audio of the vocals, so I can reprocess the tracks with later versions of imitone and compare the results. I have a half decent setup with a Scarlett 18i20 and a bog standard SM58 I can use for vocals (which looks like the mics you are using in the demonstration video) so I am hoping I will not come across some of the glitches that others are describing (I suspect that is going to be down mostly to mains hum and other electrical interference making it to the recorded signal).
I was going to go for the basic license first. Is likely to be an upgrade path when the prime version is complete?
It's probably worth noting that latency improved quite a lot with version 0.8.0 generally -- I could stand to re-write the first post. Tracking did get somewhat more jittery for certain sounds, which I have improved upon in 0.8.1.
The SM58 is a really excellent mic to use with imitone, by the way -- I have used those in super noisy environments and with careful adjustment of the volume gate they work like a charm. It's very unlikely that "hum" interference will give you any kind of trouble, though substantial background noise might give you a little jitter.
There will be an upgrade path in the future, but the price of the advanced edition may be higher by that time. It's discounted to $60 because currently it only adds a single feature beyond the standard edition -- specifically, processing multiple microphones simultaneously.
Perhaps CopperLan (http://www.copperlan.org/index.php/copp … variations) could help ?
While midi is around 1000 messages/s, copperLan is apparently more than 20000 and up to 128 bits (16 by default) : https://www.kvraudio.com/forum/viewtopi … 3&start=15
Musical possibilities :
http://www.copperlan.org/index.php/copp … musicality
Last edited by Handmusician (2017-03-08 13:10:27)
Alternative protocols like that one are interesting, but they're only valuable where widely supported by receiving software. I've joined the MIDI Manufacturers' Association now, where major vendors are working together to plan out some big upgrades.
Also worth noting that while MIDI over DIN has a bandwidth limit and some potential latency concerns, those don't apply over USB or between applications on the same computer.
Do you think we can expect an upgrade to cents notes for more precision and microtonal purposes (means that 127 semitones notes will become 12700 cents notes) ?
Hey, HM —
Your question is a bit off-topic as it doesn't have to do with latency... But yes, I'll be supporting microtonal scales in imitone in the future. Most of the code for that is already written; I just don't have a visual interface for it yet.
Sorry for being off-topic ! I did not express myself well, in your previous answer you spoke about MMA, so i wondered if there's something planned for microtonal purposes in the future midi hd protocol.
Excellent, good new for imitone and the future of music ! Just hope this update won't take many years to be released.
Things might appear a bit slow right now because I'm working on various ports for imitone and SoundSelf. I have various things that need to be compatible with game consoles, VR headsets, DAW plugins, mobile phones and tablets... Lots of work indeed!