Efficient Blending of Large Language Models

Chatterjee, Sandeep

Efficient Blending of Large Language Models

dc.contributor.author	Chatterjee, Sandeep
dc.date.accessioned	2025-07-22T09:31:42Z
dc.date.available	2025-07-22T09:31:42Z
dc.date.issued	2025-06
dc.description	Dissertation under the supervision of Dr. Debapriyo Majumdar and Dr. AmitChintamaniAwekar	en_US
dc.description.abstract	Due tothelimitedcapabilitiesofsingleLargeLanguageModels(LLMs),multipleLLMscanbe employedintandemforbetterreliabilityofanswers.Blendingreferstocombiningthestrengths of variousLLMstomakeuseoftheircomplementarycapabilitiesforgeneratinghigh-quality responses.Itisanon-trivialproblem,andthetaskbecomesevenmoredifficultwhenaiming for minimallatencyandsupervisingtheblendingcomponents.Thestandardframework,LLM- Blender, approachesthisinthreestages:responsegeneration,candidateselectionviaranking, and responsefusionthroughsummarization.However,thispipelinefacestwocriticallimita- tions—high latencyduetorepeatedrankingsteps,andheavyrelianceonexternal,supervised componentsincludingalearnedencoderforrankingandaseparatesequence-to-sequencesum- marizer forfusion. In thisthesis,weproposenovel,efficientalternativestoovercomethesechallenges.Thisthesis comprises twoworks.First,weshowthatreducingthefrequencyofrankingwithinmulti- turn conversationssignificantlyimproveslatencywithminimaldegradationinoutputquality. Second, weintroduceapeer-review-basedresponsefusionmechanism,whereLLMscollectively evaluateandreviseeachother’sresponses,removingtheneedforanyexternallytrainedrankers or summarizers.Thiscollaborativemethodenablesfullyself-containedLLMblendingwithout additional trainingorsupervision. WeassessourproposedmethodsonthetaskofConversationalQuestionAnsweringacrossfive multi-turnconversationalbenchmarks—ConvQuestions,Atlas-Converse,CoQA,QuAC,and DoQA—using tendiverse,publiclyavailableopen-weightLLMs.Experimentalresultsdemon- strate thatourpeer-review-drivenframeworkwithreducedrankingachievesqualityonparwith existing approacheswhilebeingsubstantiallymoreefficient.Ourworkpresentsasteptoward scalable, modularLLMensemblingforreal-worldopen-domaindialoguesystems.	en_US
dc.identifier.citation	52p.	en_US
dc.identifier.uri	http://hdl.handle.net/10263/7592
dc.language.iso	en	en_US
dc.publisher	Indian Statistical Institute, Kolkata	en_US
dc.relation.ispartofseries	MTech(CS) Dissertation;23-18
dc.subject	Large Language Models	en_US
dc.title	Efficient Blending of Large Language Models	en_US
dc.type	Other	en_US

Files

Original bundle

Now showing 1 - 2 of 2

Name:: Sandeep_Chatterjee_MTech_Thesis_ISI.pdf
Size:: 2.99 MB
Format:: Adobe Portable Document Format
Description:: Dissertations - M Tech (CS)

Download

Name:: Sandeep_Chatterjee_MTech_Thesis-Plagiarism.pdf
Size:: 1.01 MB
Format:: Adobe Portable Document Format
Description:: Plagiarism_report

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Dissertations - M Tech (CS)