TY - GEN
T1 - SpeechQoE
T2 - 20th ACM Conference on Embedded Networked Sensor Systems, SenSys 2022
AU - Wang, Chaowei
AU - Zhu, Huadi
AU - Li, Ming
N1 - Publisher Copyright:
© 2022 Owner/Author.
PY - 2022/11/6
Y1 - 2022/11/6
N2 - Quality of Experience (QoE) assessment is a long-lasting but yet-to-be-resolved task. Existing approaches, especially for conversational voice services, are restricted to leveraging network-centric parameters. However, their performances are hardly satisfactory due to the failure to consider comprehensive QoE-related factors. Moreover, they develop a one-for-all model that is uniform for all individuals and thus incapable of handling user diversity in QoE perception. This paper proposes a personalized QoE assessment model, namely SpeechQoE. It exploits speaker's speech signals to infer individual's perceived quality in voice services. SpeechQoE fundamentally addresses the drawback of conventional models. Instead of enumerating and incorporating unlimited QoE-related factors, SpeechQoE takes as input speech signals that inherently bear rich information needed for QoE assessment of the speaker. SpeechQoE employs an efficient few-shot learning framework to adapt the model to a new user quickly. We additionally design a lightweight data synthetic scheme to minimize the overhead of data collection needed for model adaption. A modular integration with a conventional parametric model is further implemented to avoid issues caused by the clean-slate data-driven approach. Our experiments show that SpeechQoE achieves an accuracy of 91.4% in QoE assessment which outperforms the state-of-the-art solutions by a clear margin. As another contribution of this work, we build a dataset that would be the first source of annotated audio tracks for QoE assessment of conversational calls.
AB - Quality of Experience (QoE) assessment is a long-lasting but yet-to-be-resolved task. Existing approaches, especially for conversational voice services, are restricted to leveraging network-centric parameters. However, their performances are hardly satisfactory due to the failure to consider comprehensive QoE-related factors. Moreover, they develop a one-for-all model that is uniform for all individuals and thus incapable of handling user diversity in QoE perception. This paper proposes a personalized QoE assessment model, namely SpeechQoE. It exploits speaker's speech signals to infer individual's perceived quality in voice services. SpeechQoE fundamentally addresses the drawback of conventional models. Instead of enumerating and incorporating unlimited QoE-related factors, SpeechQoE takes as input speech signals that inherently bear rich information needed for QoE assessment of the speaker. SpeechQoE employs an efficient few-shot learning framework to adapt the model to a new user quickly. We additionally design a lightweight data synthetic scheme to minimize the overhead of data collection needed for model adaption. A modular integration with a conventional parametric model is further implemented to avoid issues caused by the clean-slate data-driven approach. Our experiments show that SpeechQoE achieves an accuracy of 91.4% in QoE assessment which outperforms the state-of-the-art solutions by a clear margin. As another contribution of this work, we build a dataset that would be the first source of annotated audio tracks for QoE assessment of conversational calls.
UR - http://www.scopus.com/inward/record.url?scp=85147551435&partnerID=8YFLogxK
U2 - 10.1145/3560905.3568502
DO - 10.1145/3560905.3568502
M3 - Conference contribution
AN - SCOPUS:85147551435
T3 - SenSys 2022 - Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems
SP - 305
EP - 319
BT - SenSys 2022 - Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems
Y2 - 6 November 2022 through 9 November 2022
ER -