Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 46 additions & 41 deletions src/server.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -982,7 +982,7 @@ void CServer::MixEncodeTransmitData ( const int iChanCnt, const int iNumClients
// stereo: apply stereo-to-mono attenuation
for ( i = 0, k = 0; i < iServerFrameSizeSamples; i++, k += 2 )
{
vecfIntermProcBuf[i] += ( static_cast<float> ( vecsData[k] ) + vecsData[k + 1] ) / 2;
vecfIntermProcBuf[i] += ( static_cast<float> ( vecsData[k] ) + vecsData[k + 1] ) * 0.5f;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we just right shift the short to halve it?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not with a float. Doing so with an int before conversion would reduce the resolution by 1 bit.

Doing * 0.5f or / 2.0f with a float preserves the full resolution in the mantissa and just changes the exponent. I wouldn't be surprised if the compiler produces exactly the same code in both cases.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ 2.0f feels more intuitive, somehow. If there's no execution speed difference, I'd rather keep it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unlikely to be different - however this may depend on the compiler. 0.5f is easier for the compiler to optimise than / 2, I'd argue.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not with a float. Doing so with an int before conversion would reduce the resolution by 1 bit.

Doing * 0.5f or / 2.0f with a float preserves the full resolution in the mantissa and just changes the exponent. I wouldn't be surprised if the compiler produces exactly the same code in both cases.

But since we hard clip to back to short the value will be the same in the end. If we did (short +1) >> 1 we'd get the same rounded value back as if we did Float2Short( static_cast<float>(short) / 2.f ) or am I wrong here?

}
}
}
Expand All @@ -1001,7 +1001,7 @@ void CServer::MixEncodeTransmitData ( const int iChanCnt, const int iNumClients
// stereo: apply stereo-to-mono attenuation
for ( i = 0, k = 0; i < iServerFrameSizeSamples; i++, k += 2 )
{
vecfIntermProcBuf[i] += fGain * ( static_cast<float> ( vecsData[k] ) + vecsData[k + 1] ) / 2;
vecfIntermProcBuf[i] += fGain * ( static_cast<float> ( vecsData[k] ) + vecsData[k + 1] ) * 0.5f;
Copy link
Copy Markdown
Member

@dingodoppelt dingodoppelt May 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we do this right. Shouldn't we normalize the floats to -1 .. 1 to get most out of the float precision? And even if we did we wouldn't mitigate clipping because adding many channels might still clip while we don't scale back or check, if clipping occurred. When converting back to short we don't care for the actual range, we just hard clip everything back into the short range. That doesn't seem right at all to me.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the way the signal summation work was progressive:

  • one client at 1.0f -> 1.0f / 1.0f -> 1.0f
  • two clients at 1.0f -> ( 1.0f + 1.0f ) / 2.0f -> 1.0f

etc. Clipping should never happen. However, that's ignoring panning, of course. If you multiple L 1.0f by 1.1f and divide R 1.0f by 1.1f, you will get clipping. I guess - as panning is on the client GUI as is level - this is seen as something the client user has under their control: drop the channel level to allow for panning.

As with all the audio code -- I could easily be wrong. I can't follow it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we do this right. Shouldn't we normalize the floats to -1 .. 1 to get most out of the float precision?

Floats are internally normalised anyway, having the max available precision in the mantissa, and then scaling with the exponent. This all happens internally within the floating-point engine.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with all the audio code -- I could easily be wrong. I can't follow it.

Which proves that it must be refactored...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we do this right. Shouldn't we normalize the floats to -1 .. 1 to get most out of the float precision?

Floats are internally normalised anyway, having the max available precision in the mantissa, and then scaling with the exponent. This all happens internally within the floating-point engine.

I think normalization in the float representation only means to have only one non-zero digit in front of the mantissa and therefore scaling the exponent. I think we still miss out on precision because we convert big numbers to float while float has the most precision up to values of 2 so normalizing in an audio DSP sense means scaling to -1 .. 1.

I thought the way the signal summation work was progressive:

one client at 1.0f -> 1.0f / 1.0f -> 1.0f
two clients at 1.0f -> ( 1.0f + 1.0f ) / 2.0f -> 1.0f

etc. Clipping should never happen.

I couldn't find anything which takes the number of clients into account, i.e. there is no scaling happening. We just keep adding and adding...
I think we actually do clip all the time. I added a qDebug() to the Float2Short function and, oh boy is this thing clipping with only two clients (one playback, one drums)

Copy link
Copy Markdown
Member

@dingodoppelt dingodoppelt May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which may explain why people complain about quality. But why don't we hear clipping directly?

Maybe it is not "enough" to be audible. Boost your input gain in the settings and you'll hear some definite clipping artefacts.
What we hear as the degradation in sound some people reported, are maybe just artificial overtones introduced be the square wave that is being created in the Float2Short function. -> Distortion

Edit: We hear some very unpleasant distortion because we don't saturate/soft-clip but hard clip instead. For the time being a mitigation could be to apply tanh() instead of Float2Short but that would increase load on the server immensely I suppose

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some test with a few clients sending the same sine wave. At some point you clearly hear the sine wave turn into a square wave and my qDebug() output from the Float2Short function is permanent.

Copy link
Copy Markdown
Member

@softins softins May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find anything which takes the number of clients into account, i.e. there is no scaling happening. We just keep adding and adding...

I think that's correct. The output is the sum of the inputs, not their average, and statistically, channels will hit waveform peaks at different times. (unless you are sending in synchronised sine waves!)

Consider, say, four channels, one of which has sound, and three of which are currently silent. Ignoring any faders, when mixing, the output should be a 100% copy of the input channel that has sound, not just 25% of it. Otherwise adding more and more channels would make each one quieter and quieter.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the client clipping under normal (musical) circumstances with only two clients.
Applying something similar to the jitter buffer logic, which measures errors over time and adjusts accordingly, could be applied to a sliding RMS which tracks clipping situations and above a certain threshold scales the input samples back.
Users could mitigate the problem by moving down all faders or having a dedicated fader controlling the global gain alongside something indicating a clipping in the mix.

Consider, say, four channels, one of which has sound, and three of which are currently silent. Ignoring any faders, when mixing, the output should be a 100% copy of the input channel that has sound, not just 25% of it. Otherwise adding more and more channels would make each one quieter and quieter.

An automatic gain correction could look like this: Only if too much clipping in a defined time period is being measured we scale every client input by a set amount before mixing. This would result in audible (smoothed) shifts in volume so these shouldn't happen too frequently.
This fServerGain could be a fixed value or dynamic via RPC to start with. An external tool could be used for automated gain correction in that case or could be set by an admin if necessary.

}
}
}
Expand All @@ -1020,7 +1020,7 @@ void CServer::MixEncodeTransmitData ( const int iChanCnt, const int iNumClients
const int maxPanDelay = MAX_DELAY_PANNING_SAMPLES;

int iPanDelL = 0, iPanDelR = 0, iPanDel;
int iLpan, iRpan, iPan;
int iLpan, iRpan;

for ( j = 0; j < iNumClients; j++ )
{
Expand All @@ -1036,21 +1036,20 @@ void CServer::MixEncodeTransmitData ( const int iChanCnt, const int iNumClients
const float fGainL = MathUtils::GetLeftPan ( fPan, false ) * fGain;
const float fGainR = MathUtils::GetRightPan ( fPan, false ) * fGain;

const bool isMono = vecNumAudioChannels[j] == 1;

if ( bDelayPan )
{
iPanDel = lround ( (float) ( 2 * maxPanDelay - 2 ) * ( vecvecfPannings[iChanCnt][j] - 0.5f ) );
iPanDelL = ( iPanDel > 0 ) ? iPanDel : 0;
iPanDelR = ( iPanDel < 0 ) ? -iPanDel : 0;
}

if ( vecNumAudioChannels[j] == 1 )
{
// mono: copy same mono data in both out stereo audio channels
for ( i = 0, k = 0; i < iServerFrameSizeSamples; i++, k += 2 )
if ( isMono )
{
// left/right channel
if ( bDelayPan )
// mono: copy same mono data in both out stereo audio channels
for ( i = 0, k = 0; i < iServerFrameSizeSamples; i++, k += 2 )
{
// left/right channel
// pan address shift

// left channel
Expand Down Expand Up @@ -1079,54 +1078,60 @@ void CServer::MixEncodeTransmitData ( const int iChanCnt, const int iNumClients
vecfIntermProcBuf[k + 1] += vecsData[iRpan] * fGainR;
}
}
else
{
vecfIntermProcBuf[k] += vecsData[i] * fGainL;
vecfIntermProcBuf[k + 1] += vecsData[i] * fGainR;
}
}
}
else
{
// stereo
for ( i = 0; i < ( 2 * iServerFrameSizeSamples ); i++ )
else
{
// left/right channel
if ( bDelayPan )
// stereo
for ( i = 0; i < ( 2 * iServerFrameSizeSamples ); i += 2 )
{
// pan address shift
if ( ( i & 1 ) == 0 )

iLpan = i - 2 * iPanDelL; // left channel
iRpan = ( i + 1 ) - 2 * iPanDelR; // right channel

// interleaved channels
if ( iLpan < 0 )
{
iPan = i - 2 * iPanDelL; // if even : left channel
// get from second
iLpan = iLpan + 2 * iServerFrameSizeSamples;
vecfIntermProcBuf[i] += vecsData2[iLpan] * fGain;
}
else
{
iPan = i - 2 * iPanDelR; // if odd : right channel
vecfIntermProcBuf[i] += vecsData[iLpan] * fGain;
}
// interleaved channels
if ( iPan < 0 )

if ( iRpan < 0 )
{
// get from second
iPan = iPan + 2 * iServerFrameSizeSamples;
vecfIntermProcBuf[i] += vecsData2[iPan] * fGain;
iRpan = iRpan + 2 * iServerFrameSizeSamples;
vecfIntermProcBuf[i + 1] += vecsData2[iRpan] * fGain;
}
else
{
vecfIntermProcBuf[i] += vecsData[iPan] * fGain;
vecfIntermProcBuf[i + 1] += vecsData[iRpan] * fGain;
}
}
else
}
}
else
{
if ( isMono )
{
// mono: copy same mono data in both out stereo audio channels
for ( i = 0, k = 0; i < iServerFrameSizeSamples; i++, k += 2 )
{
if ( ( i & 1 ) == 0 )
{
// if even : left channel
vecfIntermProcBuf[i] += vecsData[i] * fGainL;
}
else
{
// if odd : right channel
vecfIntermProcBuf[i] += vecsData[i] * fGainR;
}
Comment on lines -1120 to -1129
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got rid of modulo computation

vecfIntermProcBuf[k] += vecsData[i] * fGainL;
vecfIntermProcBuf[k + 1] += vecsData[i] * fGainR;
}
}
else
{
for ( i = 0; i < ( 2 * iServerFrameSizeSamples ); i += 2 )
{
// left/right channel
vecfIntermProcBuf[i] += vecsData[i] * fGainL;
vecfIntermProcBuf[i + 1] += vecsData[i + 1] * fGainR;
}
}
}
Expand Down
Loading