Autotune and extreme compression

Studio and home recording topics

Moderator: Shoshanah Marohn

Post Reply
User avatar
Jeremy Threlfall
Posts: 1380
Joined: 3 Aug 2006 12:01 am
Location: now in Western Australia

Autotune and extreme compression

Post by Jeremy Threlfall »

A theory has come to me in the night ….

note I am not a qualified audio engineer or audio anything really, but this notion has some intuitive appeal to me

Came to me after reading a book on the development of MP2/3/4 compressed files

and how that extreme form of compression works by deleting the notes (or sounds) that the brain would infer for itself. For example, for a song in a major key, if your brain hears the root and the 3rd, it might not need to hear the 5th, to know its there

(I don't really know what they take out in that compression process, but that's what I took away from my reading)

So if your brain hears a root note and a slightly off key 3rd, then it might not be able to infer the deleted info, and the third would have to be pitch corrected for the compression scheme to work

is this a deluded notion? Is this why even good singers use auto-tune these days?
User avatar
b0b
Posts: 29108
Joined: 4 Aug 1998 11:00 pm
Location: Cloverdale, CA, USA
Contact:

Post by b0b »

I think you misunderstood. What they're removing isn't pitches, it's dynamic range. A compressor doesn't know anything about musical intervals or pitch. It's working with volume levels, that's all.
-𝕓𝕆𝕓- (admin) - Robert P. Lee - Recordings - Breathe - D6th - Video
User avatar
Jeremy Threlfall
Posts: 1380
Joined: 3 Aug 2006 12:01 am
Location: now in Western Australia

Post by Jeremy Threlfall »

I have most likely misunderstood

Is the so-called 'compression' that they use to create mp3 and mp4 files the same as the compression we use in studios?

it was explained to me as diminishing file size by removing 'unnecessary' information, or information we can 'do without'
User avatar
Ian Rae
Posts: 5826
Joined: 10 Oct 2013 11:49 am
Location: Redditch, England
Contact:

Post by Ian Rae »

It's digital compression. A computer decides what ones and zeros it can leave out without degrading the information below a set level. It doesn't know whether it's an audio recording or your holiday snaps.
Make sleeping dogs tell the truth!
Homebuilt keyless U12 7x5, Excel keyless U12 8x8, Williams keyless U12 7x8, Telonics rack and 15" cabs
User avatar
Jeremy Threlfall
Posts: 1380
Joined: 3 Aug 2006 12:01 am
Location: now in Western Australia

Post by Jeremy Threlfall »

Excellent - so, I’m sort of right. Thanks George!
User avatar
b0b
Posts: 29108
Joined: 4 Aug 1998 11:00 pm
Location: Cloverdale, CA, USA
Contact:

Post by b0b »

Jeremy Threlfall wrote:I have most likely misunderstood

Is the so-called 'compression' that they use to create mp3 and mp4 files the same as the compression we use in studios?

it was explained to me as diminishing file size by removing 'unnecessary' information, or information we can 'do without'
The compression of MP3 files is not audio compression, it's file size compression. There are several algorithms involved to determine what is 'unnecessary information', but I doubt that any of them have to do with musical pitch. Most of it is tiny variations between the volumes of adjacent samples (very small slices of time). Once those little details are removed, standard data compression algorithms can reduce the file size considerably.
-𝕓𝕆𝕓- (admin) - Robert P. Lee - Recordings - Breathe - D6th - Video
User avatar
Ian Rae
Posts: 5826
Joined: 10 Oct 2013 11:49 am
Location: Redditch, England
Contact:

Post by Ian Rae »

Take the example of a video of someone talking to camera in front of a stationary background. The only thing that changes from frame to frame is the speaker's face, so if instead of repeatedly recording or sending the complete frame you just code the differences, there is much less information.

Although it's less easy to visualise for audio, it's on the same principle.
Make sleeping dogs tell the truth!
Homebuilt keyless U12 7x5, Excel keyless U12 8x8, Williams keyless U12 7x8, Telonics rack and 15" cabs
User avatar
Jeremy Threlfall
Posts: 1380
Joined: 3 Aug 2006 12:01 am
Location: now in Western Australia

Post by Jeremy Threlfall »

OK - I’ll discard my whim (thanks for the updates)

So, there’s no excuse for auto-tune, at least coming from the engineer/producer

Good, I didn’t miss anything
User avatar
Dom Franco
Posts: 1985
Joined: 16 Oct 1998 12:01 am
Location: Beaverton, OR, 97007
Contact:

Post by Dom Franco »

To clarify a bit... Analog audio recording (Tape, direct to disc etc.) captures everything and plays it back with an infinite amount of variability. (However, also including tape hiss, record groove noise clicks and pops etc.)

Digital recording converts the analog signal to "1's" and "0's" (binary computer numbers) and depending upon the quality of the A to D converters and the resolution (number of binary bits allowed) will make a very good copy of the analog with out the noise added upon playback.

HOWEVER: at some point digital information will be limited to the number of bits that can be stored for each note.

So in effect all Digital Audio suffers from some compression, however the highest sample rates are nearly perfect. When a smaller bit rate "sample size" is used there will be more degradation of the audio play back.

Often this digital compression is undetectable to the human ears, but lesser quality recordings can sometimes sound "sterile or "harsh" compared to analog.
User avatar
werner althaus
Posts: 133
Joined: 27 Aug 1998 12:01 am
Location: lincoln, NE

Post by werner althaus »

Ian Rae wrote:Take the example of a video of someone talking to camera in front of a stationary background. The only thing that changes from frame to frame is the speaker's face, so if instead of repeatedly recording or sending the complete frame you just code the differences, there is much less information.

Although it's less easy to visualise for audio, it's on the same principle.
I love this example, especially when you compare it to the same person talking in front of a tree on a windy day but I think what you're describing is lossless data compression. It's considered lossless despite reducing file size because instead of encoding a number of identical pixels one by one it'll just say X times pixel red , therefore it is reversible.

On the other hand lossy compression reduces data by also eliminating non-redundant datapoints that are deemed un-necessary , most of them based on psychoacoustic phenomena such as masked sounds, reflections/ reverbs/ delays that fall within the Haas effect and therefore do not contribute to spatial localization, etc but also things like upper frequency response, resolution of the mid signal vs side signal , etc.. This is irreversible.
User avatar
werner althaus
Posts: 133
Joined: 27 Aug 1998 12:01 am
Location: lincoln, NE

Post by werner althaus »

Dom Franco wrote:To clarify a bit... Analog audio recording (Tape, direct to disc etc.) captures everything and plays it back with an infinite amount of variability. (However, also including tape hiss, record groove noise clicks and pops etc.)

Digital recording converts the analog signal to "1's" and "0's" (binary computer numbers) and depending upon the quality of the A to D converters and the resolution (number of binary bits allowed) will make a very good copy of the analog with out the noise added upon playback.

HOWEVER: at some point digital information will be limited to the number of bits that can be stored for each note.

So in effect all Digital Audio suffers from some compression, however the highest sample rates are nearly perfect. When a smaller bit rate "sample size" is used there will be more degradation of the audio play back.

Often this digital compression is undetectable to the human ears, but lesser quality recordings can sometimes sound "sterile or "harsh" compared to analog.
I'm not sure if this is clarifying anything. For starters analog recording doesn't capture "everything", neither in the time domain nor in terms of amplitude. It's just not how microphones work, it's not how amplifiers work, it's not how tape or direct -to-disk works. each of these components in a recording chain has limitations regarding frequency response, distortion, signal-to-noise-ratio etc, that's why honest specs are published in a way that show values within a given and hopefully agreeable operational range only, like +/- 3dB 20Hz to 20 KHz or s/N = 90 dB re +4dBu, 22KHz BW, unity gain Those are the ranges within which the gear performs close to being able to capture faithfully.
If you are referring to analog audio as capturing "everything" vs digital audio missing some information due to stair steps then you've fallen victim to the digital myths floating around on the internet.
Digital, like high quality, honestly specd analog captures faithfully across a predetermined operational range with samplerate defining the frequency range and bit depth defining dynamic range ( in theory).
Why am I pointing this out? Because contrary to what you say all digital does NOT suffer from some compression and higher sample rates are no more perfect than lower ones, they only expand the frequency range of audio to be captured and they only do so in theory, in reality all but the best ones introduce more problems due to decreased clock accuracy, added intermodulation distortion and a host of other problems.
And as far as what level of lossy data compression is detectable, IMO that hasn't been determined yet because not all audio encodes the same. You can actually prep your audio to be more resilient to even very lossy data compression. Some mixes sound great at 128Kbps while others sound lossy, swirly, dull and whatnot even at 320.
User avatar
Godfrey Arthur
Posts: 2997
Joined: 12 Dec 2012 5:46 pm
Location: 3rd Rock

Post by Godfrey Arthur »

Jeremy Threlfall wrote:OK - I’ll discard my whim (thanks for the updates)

So, there’s no excuse for auto-tune, at least coming from the engineer/producer

Good, I didn’t miss anything
Pro recording studio schools have a prerequisite course in Autotune. Each engineer-student hopeful spending thousands in school fees is expected to master Autotune.

There is a use for the software. You can dial in trace amounts and choose the notes it works over. It doesn't all have to sound like T-pain.

Here as #1 of the 25 Best Rock albums for Billboard's 2018 list is The 1975 with it's delve into pitch modulation.

Serious Autotune happening here.

TOOTIMETOOTIMETOOTIME

https://www.youtube.com/watch?time_cont ... fxPQUKfim4

As has been pointed out, your premise that converting a file to Mp3 is a compressing of the file size not a compression of the audio.

What you posture compression has to do with Autotune is not clear. All Autotune does is pitch correct the same way Melodyne does.

If we listen to songs made decades before Autotune, even the best singers of the day have songs on vinyl that have out of tune notes. And today we can pick those out of tune notes out.

Maybe singers like Karen Carpenter were really spot on in the note pitch department.
ShoBud The Pro 1
YES it's my REAL NAME!
Ezekiel 33:7
User avatar
werner althaus
Posts: 133
Joined: 27 Aug 1998 12:01 am
Location: lincoln, NE

Post by werner althaus »

Godfrey Arthur wrote: ...What you posture compression has to do with Autotune is not clear...
The way I understand the OP is that he is wondering whether a pitch -corrected harmony part consisting of a root, a third and a fifth would be easier and better data compressed (by removing the fifth) relative to a not perfect harmony part because , as he stipulates ( incorrectly IMO) the brain can then "infer' the missing fifth note if perfect intervals between the root, the third and the fifth are present. The assumption is that the codec algorithm can safely remove the fifth for psychoacoustic reasons. I don't believe it works that way, for starters autotune will pitch correct to tempered tuning which is different from overtones or combination tones that we hear, which vary a few cents up or down from tempered tuning. I don't believe that the psychoacoustic toolkit used is removing any fundamentals or first harmonics of any given sound at all unless they are masked by other sounds or fall within the Haas effect's spatial effect due to delay. We should be able to try this by encoding a single note with its series of overtones vs all those overtones played as separate notes. Would the algorithm remove the played overtones since those frequencies are already present in the single note? I don't believe it would unless maybe if they share the exact envelope (attack, decay, etc).
Post Reply