Topic: Will next Pianoteq version able to use ALL power of multi-core proc ?

I note with the current version of Pianoteq and a core processor i7 old generation of rapid limitations when the internal sampling frequency of Pianoteq reaches and especially exceeds 48khz while using the maximum polyphony. However, these high sampling rates can significantly improve the accuracy of rendering with good audio downstream. (not because of the maximum reproducible frequency inaudible by the human ear but by a more global impression of its more transparent subjectively giving an impression of lesser distortion (by carrying out several times the test to pass from 48 to 96 kHz of frequency of internal sampling, with a downstream audiophile dac set to 96khz a dedicated headphone amplifier and a high-end headphone, I found a sensible transparency difference that better highlights the modeling effort of Pianoteq). summarily the behavior of Pianoteq 6.4.1 pro launched in "standalone" mode (under Windows 10 x64 version 189) and the distribution of processing on the different processors (4 cores and 2 logical threads per core for my Intel core i7 a bit old: 3630QM), I found that only 2 of the 8 logical threads are mainly used and that Pianoteq "saturates" with 100% of resources according to its own indicators, while less than 40% of processors are actually used "average usage over a few seconds", as indicated by the tool "resource monitor" Windows, or more precisely by the tool Sysinternals "Process Explorer" in analyzing the properties of Pianoteq6.exe threads (whose value of "ideal processor" for each thread of pianoteq)
To get an idea of what a slightly more powerful machine could do - one of my children equipped with a core i7 8750H (6 cores 12 threads) - I launched the version of Pianoteq standard trial (sample rate internal 48khz, with an accelerated midi file x10 and sustain pedal stuck ON, with naturally the maximum polyphony). Pianoteq does not seem to saturate (65% of resources at maximum according to its own indicators) but only 20% of the power of the processors is only used, mainly distributed on 2 threads among the 12 available threads:
cf (corrected) link:
https://photos.google.com/share/AF1QipP...EtZlA5UlJ3

At the end of this observation, (subject to a good interpretation?) I wonder about a possible need for evolution of Pianoteq in its future versions to be able to exploit all the power of new generation processors. At the moment Pianoteq is doing well because the frequency of the new processor is significantly increased, so even on one or two cores, it can find the necessary power if these processors have a sufficiently high turbo frequency (ex intel Core i7 8750H ci above), however the evolution of the processors is more in the direction of a multiplication of the cores than in the increase of the maximum frequency (the latter causing the processor to heat and the fans to emit a very unpleasant noise for a comfortable use of Pianoteq). Example of evolution of the number of cores, at Intel, already available today on laptop Core i9 9900K 8 cores 16 threads, and even on tower: Intel 9980XE with 18 cores 36 threads or AMD 2990WX with 32 cores 64 threads .. .
I also note that most of the powerful laptops have graphics cards that have an even higher number of cores, interesting for calculations made in parallel (applicable to polyphony?), Especially Nvidia with Cuda and Pascal architectures. (ex RTX 2080 with 1944 Cuda cores at 1.7 Ghz), even with a modest card MX150: already 380 Cuda cores at 1.5 Ghz) - My old card GTX660M already had 384 cores at 0.8Ghz)

In any case, I am wondering today for the renewal of my laptop dedicated to Pianoteq pro, on the choice of the processor to be done in order to correctly use the internal sample rate of 96 kHz with the current maximum polyphony (256) , in view of the use of the next versions of Pianoteq in the next 3 years, which I do not doubt will be able to use more power to further improve the implementation of its model. (if possible without imposing the use of fans too noisy for this)
Maybe the Pianoteq team could shed light on the recommendations and advice in this perspective?

Bruno

Last edited by bm (04-05-2019 09:21)

Re: Will next Pianoteq version able to use ALL power of multi-core proc ?

GPU processing ain't gonna happen because of added latency (caused by data shuffling between CPU and GPU). Lower latency is what's needed, not more of it. Also note that GPU cores are not capable of calculating a whole voice of polyphony, especially because in the piano, voices are not completely separated, they interact - which makes it not very parallelizable...

Last edited by EvilDragon (04-05-2019 13:08)
Hard work and guts!

Re: Will next Pianoteq version able to use ALL power of multi-core proc ?

EvilDragon wrote:

GPU processing ain't gonna happen because of added latency (caused by data shuffling between CPU and GPU). Lower latency is what's needed, not more of it. Also note that GPU cores are not capable of calculating a whole voice of polyphony, especially because in the piano, voices are not completely separated, they interact - which makes it not very parallelizable...

I had not really thought about latency for GPU communications in a Cuda context and other GPU interprocess issues.
Hoping that for the CPUs, an optimization of the use of the inter-process and shared memory mechanisms allow in the future in Pianoteq new versions to better distribute between the threads of the CPU cores the part of the calculated treatment which could maybe ? be more parallelized (in spite of the probable interactions induced by the model between the different voices)
Naturally, I do not have any competence for a more in-depth technical opinion on this point.
However, it would be interesting if the Pianoteq team could tell us, in order to benefit from the maximum performances of the next versions of Pianoteq pro, (especially at more than 48khz of internal sampling frequency - for me 96khz & maximal polyphony)
how many cores (or CPUs) threads) and what basic and / or turbo frequencies could be recommended for a new laptop (or tower) - on both Windows x64 and Linux for which the prerequisites are perhaps a little different?

Bruno

Last edited by bm (04-05-2019 17:25)

Re: Will next Pianoteq version able to use ALL power of multi-core proc ?

For virtual instruments CPU frequency is ALWAYS going to be #1 metric. The faster the cores, the better.

Hard work and guts!

Re: Will next Pianoteq version able to use ALL power of multi-core proc ?

When I bought my 2013 Mac Pro I asked Julien if a 6 core unit would run Pianoteq standalone better and he said no, so I bought a 4 core 8 thread model. Here is a stress test for Pianoteq shown in the following images at 48 kHz and 96 kHz sample rate:

https://photos.app.goo.gl/K5KCT9Ur7ipAPySJ9
https://photos.app.goo.gl/pgZW6b9qkyf3Azuz7

In these two examples I simply pressed the sustain pedal with the mouse and ran it up and down the virtual keyboard so as to saturate the audio output. I get overload at 96 kHz but none at 48 kHz with the shown setup. 96kHz is quite demanding but only the real-time suffers, the total cpu demand is roughly the same as for 48 kHz. In fact in these two examples we can see that the 4 main cores are used at less than half capacity with the other 4 threads not used. I guess a recommendation for 96 kHz is get at least 4 cores with the fastest clock speed possible.

I also asked about GPU use for Pianoteq since the Mac Pro has two GPUs, one dedicated to computation (and presently idle...alas) and I was told that for latency reasons as ED mentioned and also due to the high variability, GPU use was not envisioned.

I also remember a Google Chrome browser version that came out shortly after that Mac Pro was available that tried (I think) to use the compute GPU for scrolling and I got a quite bizarre output where portions of a text scrolled at different speed, not very pretty! The next version corrected that. It was  experimental I suppose but quite revealing. GPU compute is not useful for anything real-time in my opinion.

As for noise, a laptop is sure to be noisy with intensive use of course, but the Mac Pro remains totally silent, whatever I throw at it...

Last edited by Gilles (05-05-2019 02:55)

Re: Will next Pianoteq version able to use ALL power of multi-core proc ?

The faster the cores, the better.

The trend goes in the opposite direction (away from 3.7 GHz 130 W bolides for example).

Is there any other parameter than cpu-frequency, that reflects the fitness for Pianoteq better / more correctly? FLOPS, Floating Point Operations Per Second for example?

Last edited by groovy (06-05-2019 11:40)

Re: Will next Pianoteq version able to use ALL power of multi-core proc ?

Well 95W gives you a 5 GHz beast these days so it's clearly an improvement

3.7 GHz on 130W sounds like old and tired AMD CPU

Last edited by EvilDragon (06-05-2019 11:47)
Hard work and guts!

Re: Will next Pianoteq version able to use ALL power of multi-core proc ?

EvilDragon wrote:

3.7 GHz on 130W sounds like old and tired AMD CPU

No, it was a reference to Gilles Apple Mac Pro in his previous post:

https://ark.intel.com/content/www/de/de...0-ghz.html

Your presented Intel Core i9 has a lower Base Frequency of 3.6 GHz, so it follows the trend .

But that is uninteresting and wasn't the point. - Had you read my question?

Last edited by groovy (06-05-2019 12:50)

Re: Will next Pianoteq version able to use ALL power of multi-core proc ?

FLOPS are always important. The more the better


groovy wrote:

Your presented Intel Core i9 has a lower Base Frequency of 3.6 GHz, so it follows the trend .

But you can easily make it constantly run at 5 GHz across all cores.

Last edited by EvilDragon (06-05-2019 13:11)
Hard work and guts!

Re: Will next Pianoteq version able to use ALL power of multi-core proc ?

I know my 4 core 8 threads 2013 Mac Pro is getting old (like everything else... ), but I can assure you I have flawless, silent and low latency results from anything Pianoteq can do at 48 kHz sample rate and 256 voice polyphony. 96 kHz is taxing when pushed to the limit but not really useful in my humble opinion. The fact that the 4 cores are only half used at most for both sample rates means there is some other bottleneck than cpu speed for real-time intensive audio at 96 kHz and beyond, so, a 3.7 GHz clock speed is surely enough for current standalone use. Of course improvements in the model always eat up a little bit more cycles but not drastically. For many instrument DAW work more cores could be useful I suppose.

Re: Will next Pianoteq version able to use ALL power of multi-core proc ?

@EvilDragon
We know this.

Let's make it clearer with a very artificial example (the following numbers are just symbolic):

PC A has a CPU with 2 GHz, 4 cores and a performance of 10 GigaFLOPS.
PC B has a CPU with 1 GHz, 4 cores and a performance of 10 GigaFLOPS.

Can we expect as a rule of thumb that both CPU overload at the same Pianoteq settings? (sets like 48 kHz, 64 samples, 256 polyphony)

Re: Will next Pianoteq version able to use ALL power of multi-core proc ?

It's hard to say without actually doing such a test, but I guess it would be about the same. On the other hand, depending on OS CPU scheduling and whatnot, it might not be the same at all...

Hard work and guts!

Re: Will next Pianoteq version able to use ALL power of multi-core proc ?

EvilDragon wrote:

GPU processing ain't gonna happen because of added latency (caused by data shuffling between CPU and GPU). Lower latency is what's needed, not more of it. Also note that GPU cores are not capable of calculating a whole voice of polyphony, especially because in the piano, voices are not completely separated, they interact - which makes it not very parallelizable...

I haven't been involved in hardware development for a very long time now, so I am WAY out of date.
At the time graphics processors were basically passed a very small amount of meta data, e.g. a pointer,  a few descriptor and command bytes.
The graphics processor did its thing and signaled back - little to no "data shuffling".

Unless there is some paranoia about shared memory I doubt there is a need to shuffle anything between them.

Re: Will next Pianoteq version able to use ALL power of multi-core proc ?

Well, there is. That's what causes the latency!

Hard work and guts!

Re: Will next Pianoteq version able to use ALL power of multi-core proc ?

You need share and synch memory between GPU and soundboard too. But this is not a major problem.
The major problem is a mathematics under Pianoteq is not good parallelized and required good performance by single core.

One core GPU is a lot lower then one core CPU for this purpose.

Re: Will next Pianoteq version able to use ALL power of multi-core proc ?

aandrmusic wrote:

I haven't been involved in hardware development for a very long time now, so I am WAY out of date.
At the time graphics processors were basically passed a very small amount of meta data, e.g. a pointer,  a few descriptor and command bytes.
The graphics processor did its thing and signaled back - little to no "data shuffling".

That's not how GPUs work, although you may be confusing the API software interface calls with the underlying activity.

You can pass an address of the data that the GPU needs (which can be substantial).  The API will take an address on the CPU side, but it must then transfer that data to the GPU's internal memory.  For a dedicated GPU card that means data is transferred over the PCI bus.  To get results you have to read them back from the GPU's memory to main system memory.

In the case of some CPUs with internal GPU units (sometimes called IGPs, AMD also used the term APU) the CPU's internal GPU uses main system memory (and does not necessarily have an internal memory of it's own).  In that case it may be not physically copy data, although even in that case it can require that memory is copied to a specific reserved portion of main system memory used by the internal GPU.  However these units have a downside in that main system memory is not designed or structured for the kind of memory access pattern that GPU kernels typically require to be most effective.

There are a lot of variations on the details, but note that the complex memory structure involved (there are different types of memory even within a GPU with different performance characteristics for each type)  makes writing kernels (the software GPUs run) a lot harder than it may seem.  Unoptimized kernel code that does not properly respect the memory management schemes can actually be slower than a CPU.  The GPU "compilers" give no optimization help, unlike CPU-side compilers which do a lot of optimization behind the scenes.

This is also complicated by the fact that CPUs now have a lot more internal implicit and explicit parallelization operations (you may see the term "vectorize" for example).  Using these directly can be faster than using a GPU depending on the specific task.  This is another reason why software like Pianoteq may actually be a bad fit for GPUs, as the CPU-only side can have more than enough processing capability to make the many complications of GPU programming redundant.

aandrmusic wrote:

Unless there is some paranoia about shared memory I doubt there is a need to shuffle anything between them.

There is a great deal of paranoia about shared (and non-shared) memory.  It causes some sleepless nights for people involved at that level in writing secure systems.  In modern systems you cannot, for example, simply free a block of memory you no longer need.  Either explicitly or behind the scenes the memory has to be cleaned of any existing data to prevent it being read by another, possibly malicious, process when it requests and gets allocated that memory.  Many of the details are done on a behind the scene basis and many software developers are largely unaware of the details and give no thought to it, but memory management remains a very non-trivial aspect of modern systems and security.

But note that dedicated GPUs do not use shared memory (from main system memory), so you cannot in general simply page memory in and out or use anything as simple as memory mapping.  The APIs hide the details but they are still there under the hood.  There are real overheads involved.

StephenG