Set at 70% by default, it determines how closely the output matches the original voice. Higher similarity may reproduce artifacts and noise, particularly with poor audio quality.