Hi,
I have a wave recoding of a guitar playing single notes which I'd like
to convert to MIDI format. I'm using a C++ program to process this
audio.
My first question is, how do I detext the attack points accurately? At
the moment, I take a fft of short length (256 for 44.1kHz audio), and
I add the values in all the bins and dividie this value by the largest
value in a single bin. This lets me know how noisy or chromatic a
single is at a point in time. A guitar attack is noisy so this
information helps. But is there a better way? How about other
instrument playing single notes such as a flute? In this case, I could
uses the fft to establish when a note is on or off by summing the
values in an fft window and comparing this value with some threshold.
My second question is how might I calculate the loudness of this
attack, in sones perhaps (although since I am working with a wave file
were talking about relative loudness)?
Thirdly, what would be an intelligent way of mapping the loudness of a
note against its MIDI velocity.
Thanks very much for your help,
Barry.
Ethan Winer - 30 Apr 2007 16:26 GMT
Barry,
> I have a wave recoding of a guitar playing single notes which I'd like to
> convert to MIDI format. I'm using a C++ program to process this audio.
I applaud your ambition because what you are attempting is VERY difficult if
not impossible. There are programs that attempt to transcribe audio into
MIDI, but the ones I've seen are pretty lame.
> My first question is, how do I detext the attack points accurately?
Some DAW programs like Sonar and ACID look for note attacks to identify
tempo and where the beats fall by watching for a sudden rise in level. This
is probably more direct than using an FFT to watch for "impure" frequencies
(pick noise, flute chiffs, etc). But the guitar player better be mighty
clean, because even a hint of a second note will throw a wrench into the
works!
> How about other instrument playing single notes such as a flute?
Now you're getting to the "impossible" part. If the flute player plays with
a silence between each note you can look for the silence. But real music is
not played that way. Real players play staccato AND legato. So now you have
to watch for changes in pitch only.
> My second question is how might I calculate the loudness of this attack,
> in sones perhaps (although since I am working with a wave file were
> talking about relative loudness)? Thirdly, what would be an intelligent
> way of mapping the loudness of a note against its MIDI velocity.
There's no way to map signal level in a Wave file to the original volume in
the room as SPL. You'll need to decide what dB levels in the file relate to
various note-on velocities.
--Ethan
Chris Whealy - 30 Apr 2007 17:08 GMT
> Hi,
>
[quoted text clipped - 19 lines]
> note against its MIDI velocity.
>
Sorry to rain on your parade here, but what your attempting to do is
pretty much impossible - at least with the technology available on his
planet...
Let me explain why starting from a Fourier transform of the music will
lead you down a dead end street. Lets say you have a page of text, and
you analyse it by counting how many times each character occurs. So
you'll end up 138 A's, 18 B'c, 47 C's etc. Now you attempt to use the
frequency count of each letter to reconstruct the original text on the
page. This is really what you're trying to do with Audio -> MIDI
conversion.
Chris W

Signature
The voice of ignorance speaks loud and long,
But the words of the wise are quiet and few.
---
robert bristow-johnson - 30 Apr 2007 17:37 GMT
> b...@yahoo.com wrote:
> > Hi,
[quoted text clipped - 31 lines]
> page. This is really what you're trying to do with Audio -> MIDI
> conversion.
sounds like a good question for comp.dsp or for the music-dsp mailing
list.
if it's single notes, there is a possibility, but using the Fourier
Transform to do it is likely not as helpful.
try looking up "Pitch Detection Algorithms" in Google. also "Average
Magnitude Difference Function" (AMDF) and "autocorrelation".
r b-j
Angelo Campanella - 01 May 2007 05:11 GMT
> My first question is, how do I detext the attack points accurately? At
> the moment, I take a fft of short length (256 for 44.1kHz audio), and
> I add the values in all the bins and dividie this value by the largest
> value in a single bin. This lets me know how noisy or chromatic a
> single is at a point in time.
I would think that a good discriminator can be the dwell of a constant
frequency, ergo a "note". Such a dwell, or window of selectable width of
from 100 to 1000 milliseconds, should make a proper "note" sensor. The
FFT would be examined for an outstanding frequency subsequent the
"noise" of an attack, whereupon said attack and dwelled tone are
declared to be the new subject note up for analysis.
> A guitar attack is noisy so this
> information helps. But is there a better way? How about other
> instrument playing single notes such as a flute?
Add a filter acting as exponential time weighting to identify the note.
One isolates ad identifies the attack as being broad band, while the
sustained note sound following has much energy, most in its fundamental
and a fair amount less in a few harmonics. The note emerges as
identifiable as soon as a pure tone, can be identified in the FFT. The
braod band noise occurs (attack), then in a few milliseconds the pure
tone is the obvious survivor within the sound emitted.
> In this case, I could
> uses the fft to establish when a note is on or off by summing the
> values in an fft window and comparing this value with some threshold.
You would need a very fast Fourier transform.
I hear that "wavelets" may be better suited, but that notion needs
thorough investigation.
> My second question is how might I calculate the loudness of this
> attack, in sones perhaps (although since I am working with a wave file
> were talking about relative loudness)?
That's easy: Integrate all energy in an overall time interval that
includes sustained not and the attack. It will take something of a long
buffer to capture the entire attack and note information for post-note
analysis. The end of that interval is marked by the next attack, not to
be counted for the previous note.
One can ten compute an attack-to-note ratio as the broad band energy
divided by the total energy. Another common ratio is the attack to note
ratio. Your choice.
Angelo Campanella
Greg Locock - 01 May 2007 13:17 GMT
>> My first question is, how do I detext the attack points accurately?
>> At the moment, I take a fft of short length (256 for 44.1kHz audio),
[quoted text clipped - 49 lines]
>
> Angelo Campanella
This approach will absorb a great deal of your time and teach you a lot.
It will not, by itself, lead to a mechanical score-writing program. I
have my doubts that such a thing is buildable, although in theory you
could merely match any arbitrary selection of midi notes to a given
sound signal, and eventally you might extract a midi score.
FWIW a similar approach was used in modal analysis... and largely
dropped.
However, I do believe that a skilled composer with a full knowledge of
the trends of his time could identify a melody, and parts, by listening
to a multipart piece of music.
Cheers
Greg Locock