Efficient Blending of Large Language Models

dc.contributor.authorChatterjee, Sandeep
dc.date.accessioned2025-07-22T09:31:42Z
dc.date.available2025-07-22T09:31:42Z
dc.date.issued2025-06
dc.descriptionDissertation under the supervision of Dr. Debapriyo Majumdar and Dr. AmitChintamaniAwekaren_US
dc.description.abstractDue tothelimitedcapabilitiesofsingleLargeLanguageModels(LLMs),multipleLLMscanbe employedintandemforbetterreliabilityofanswers.Blendingreferstocombiningthestrengths of variousLLMstomakeuseoftheircomplementarycapabilitiesforgeneratinghigh-quality responses.Itisanon-trivialproblem,andthetaskbecomesevenmoredifficultwhenaiming for minimallatencyandsupervisingtheblendingcomponents.Thestandardframework,LLM- Blender, approachesthisinthreestages:responsegeneration,candidateselectionviaranking, and responsefusionthroughsummarization.However,thispipelinefacestwocriticallimita- tions—high latencyduetorepeatedrankingsteps,andheavyrelianceonexternal,supervised componentsincludingalearnedencoderforrankingandaseparatesequence-to-sequencesum- marizer forfusion. In thisthesis,weproposenovel,efficientalternativestoovercomethesechallenges.Thisthesis comprises twoworks.First,weshowthatreducingthefrequencyofrankingwithinmulti- turn conversationssignificantlyimproveslatencywithminimaldegradationinoutputquality. Second, weintroduceapeer-review-basedresponsefusionmechanism,whereLLMscollectively evaluateandreviseeachother’sresponses,removingtheneedforanyexternallytrainedrankers or summarizers.Thiscollaborativemethodenablesfullyself-containedLLMblendingwithout additional trainingorsupervision. WeassessourproposedmethodsonthetaskofConversationalQuestionAnsweringacrossfive multi-turnconversationalbenchmarks—ConvQuestions,Atlas-Converse,CoQA,QuAC,and DoQA—using tendiverse,publiclyavailableopen-weightLLMs.Experimentalresultsdemon- strate thatourpeer-review-drivenframeworkwithreducedrankingachievesqualityonparwith existing approacheswhilebeingsubstantiallymoreefficient.Ourworkpresentsasteptoward scalable, modularLLMensemblingforreal-worldopen-domaindialoguesystems.en_US
dc.identifier.citation52p.en_US
dc.identifier.urihttp://hdl.handle.net/10263/7592
dc.language.isoenen_US
dc.publisherIndian Statistical Institute, Kolkataen_US
dc.relation.ispartofseriesMTech(CS) Dissertation;23-18
dc.subjectLarge Language Modelsen_US
dc.titleEfficient Blending of Large Language Modelsen_US
dc.typeOtheren_US

Files

Original bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
Sandeep_Chatterjee_MTech_Thesis_ISI.pdf
Size:
2.99 MB
Format:
Adobe Portable Document Format
Description:
Dissertations - M Tech (CS)
No Thumbnail Available
Name:
Sandeep_Chatterjee_MTech_Thesis-Plagiarism.pdf
Size:
1.01 MB
Format:
Adobe Portable Document Format
Description:
Plagiarism_report

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: