Efficient Blending of Large Language Models
No Thumbnail Available
Date
2025-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Indian Statistical Institute, Kolkata
Abstract
Due tothelimitedcapabilitiesofsingleLargeLanguageModels(LLMs),multipleLLMscanbe
employedintandemforbetterreliabilityofanswers.Blendingreferstocombiningthestrengths
of variousLLMstomakeuseoftheircomplementarycapabilitiesforgeneratinghigh-quality
responses.Itisanon-trivialproblem,andthetaskbecomesevenmoredifficultwhenaiming
for minimallatencyandsupervisingtheblendingcomponents.Thestandardframework,LLM-
Blender, approachesthisinthreestages:responsegeneration,candidateselectionviaranking,
and responsefusionthroughsummarization.However,thispipelinefacestwocriticallimita-
tions—high latencyduetorepeatedrankingsteps,andheavyrelianceonexternal,supervised
componentsincludingalearnedencoderforrankingandaseparatesequence-to-sequencesum-
marizer forfusion.
In thisthesis,weproposenovel,efficientalternativestoovercomethesechallenges.Thisthesis
comprises twoworks.First,weshowthatreducingthefrequencyofrankingwithinmulti-
turn conversationssignificantlyimproveslatencywithminimaldegradationinoutputquality.
Second, weintroduceapeer-review-basedresponsefusionmechanism,whereLLMscollectively
evaluateandreviseeachother’sresponses,removingtheneedforanyexternallytrainedrankers
or summarizers.Thiscollaborativemethodenablesfullyself-containedLLMblendingwithout
additional trainingorsupervision.
WeassessourproposedmethodsonthetaskofConversationalQuestionAnsweringacrossfive
multi-turnconversationalbenchmarks—ConvQuestions,Atlas-Converse,CoQA,QuAC,and
DoQA—using tendiverse,publiclyavailableopen-weightLLMs.Experimentalresultsdemon-
strate thatourpeer-review-drivenframeworkwithreducedrankingachievesqualityonparwith
existing approacheswhilebeingsubstantiallymoreefficient.Ourworkpresentsasteptoward
scalable, modularLLMensemblingforreal-worldopen-domaindialoguesystems.
Description
Dissertation under the supervision of Dr. Debapriyo Majumdar and Dr. AmitChintamaniAwekar
Keywords
Large Language Models
Citation
52p.
