Still, I want to digest comparisons of sampling frequencies. I do not hear a frequency above 16.3 kHz. The structure of our ear is such that it is simply physically impossible to perceive frequencies above a certain limit (and if possible, it is unlikely that this manifests itself to our consciousness and has to do with hearing). It is interesting that the rate of impulse transmission by neurons is 500 to a maximum of 1200 times per second. This is an electrochemical process. So in the perception of high frequencies the decisive role is played by the time factor. And with time, the spatial perception of sound is connected.
The developer of acoustic systems Thomas Andrews in one interview said that the hearing resolution in the perception of the spatial location of the sound source is about 1 degree. This is reflected in the change in the delay time by 13-18 microseconds (!!!). The sampling rate of 44100 kHz is a duration of 22.7 microseconds. Naturally, the bit depth of still has a different role in this.
The result is not in the presence of high frequencies in high-quality files, but in the right time display of sounds that are in the audible range. In particular, this concerns stereo recording. Perhaps this (great spatiality) can be heard in the noise attack of sounds. And it matters only when listening in headphones. Because Crosstalk, arising when reproducing through arbitrarily good speakers, kill these audio details.
How many tests were there that said that high quality does not matter. Has only if the recording was done correctly, on a properly placed stereo microphone, does not have any editing, is reproduced through high-quality hardware and is heard only through good headphones. (or dynamics with the suppression of crosstalk, but this technology is not yet perfect.). In this case our consciousness will experience more presence and perceive sounds more relaxed.
It turns out that for correct transmission of temporal differences between the ears it is necessary to take a discretization with a reserve. 13-18 / 2 = 6-9 microseconds..you will add a little more..5 microseconds .. It turns out the correct sampling frequency will be 200000Hz. The nearest, which is easier for us to have, is 196000Hz. I think this is the optimal minimum. More is needed only to be able to edit the sound without losing enough quality.
Please note that this is only necessary for stereo recording. The mono record to this is not demanding. One of the reasons for the "euphony" of vinyl records for our ears, I think, is this. Let there be frequency distortions, noise .., but the transfer of time is delicious and excellent.
Another moment ..
I tried to draw parallels in visual perception, auditory, in music and photography. For clarity.
For vision, you can build a graph of frequency-spatial perception. It reflects the strength of perception, depending on the size of the object.
In photography, one of the characteristics of the resolution is the graph of the MTF.
I would draw parallels between sound perception and visual: pitch = angular size of the object, volume = brightness (film photography = vinyl sound).
A film photograph, gave way to a digit in resolution, dynamic range, sensitivity, convenience and speed, but so far the film exceeds the figure by the quality of tonal ratio transmission. As well as vinyl .. With these carriers we get convex, spatially-volumetric results, which make us look closely and listen .. are in themselves something mysterious.
Yes, the digital photo is sharp, bright, authentic (in this incredibly accurate). But it's worth taking a closer look at the small details that go into the darkness and there we find horror and pain. Plasticine. Small hair becomes a metal wire, texture of a stone-plastic. And the film in these places is mysterious.
In translation to sound - it's quiet and high sounds (without leaving the thresholds of perception).
This is the fault of the frequency filters. Therefore, in digital sound we have plastic and plasticine in low high frequencies.
In the photo, the dark tones are less bit and in the sound the quieter sound is less bit. In these places, where the ear and eye are still sensitive, the figure saves and introduces obvious distortions.
Even the most modern camera Nikon D850 has such a sin (well, there is no frequency filter in front of the matrix, which lowers the clarity)
In this regard, I will give a video:
(Immediately I apologize - the picture is twitching terribly - my computer can not cope with such a load) (the graphics display the sound that comes out of the socket of the headphones of the sound card and is fed to its input, ie, there is a digital-analog conversion and then an analog- synthesized sound).
https://youtu.be/RdiYSXcCqm0
Firstly, the oscilloscope shows what kind of distortions occur in the frequency above 5000-8000 Hz at a sampling rate of 44100 Hz. (this is still quite audible range). It is noticeable how the wave size is measured with a sampling rate. Perhaps this should be smoothed out (and smoothed out), but here we already have a loss (this is how to apply noise in the photo - small details are removed and becomes flat and lifeless). At 192,000 Hz only small distortions occur at the boundary of the audible range.
Secondly - pay attention when I use a small pitch in the tenths of a cent. It is clearly visible how the volume of the signal changes. Maybe it's the features of mathematics in the miscalculation of the sinusoid, but apparently it sags at some frequencies, and on some resonates.
When (in video) I use a "square" sound that has a lot of overtones, then using a small increase and decrease in tone (per cents and cents), you can clearly see how the volume levels in the overtones change wave-wise. Does Pianoteq have problems with this? Have they been resolved? After all, the piano sound is constantly moving in height with all its overtones.
Third, noise is used to hide the artifacts of digitization. But what is this noise. His nature is unnatural, his ears are bleeding (as well as in the photo - digital noise simulation .. horror.). And the quieter the sound, the closer it is to noise, the more distortions it receives a useful signal. Moreover, it becomes disgusting. But vinyl noise, the film noise has some nobility and natural softness. (although at the time they fought this critically).
In the fourth, all the surrounding sound is not static, it constantly changes in height .. sometimes at the very smallest parts, and that's probably the case, and settles most of the distortions when digitizing ..
Format 44100-16 was introduced when people tried to record a high-quality digital sound on the most volume at that time media-video cassette. It was such a quality that was a technical alternative for the possibility of recording and reading in this format of media. (that would be close to the formats of PAL and Sekam). Since then it seems sufficient. But now we have incomparably faster computers and are much freer in terms of storing data volumes. Our Internet channels are incredibly fast. So why deprive our subtle body - hearing, the possibility of a spatial adequate, undistorted presence?
Proceeding from the technical side of the DSD format, it seems to me that this format is most adequate in terms of the transmission parameters of all sound criteria critical for our hearing.
And let this all be fantasy, esoteric .. not scientific .. but I feel so ..
Last edited by scherbakov.al (07-11-2017 01:43)