Efficient Blending of Large Language Models

Chatterjee, Sandeep

Efficient Blending of Large Language Models

Files

Sandeep_Chatterjee_MTech_Thesis_ISI.pdf (2.99 MB)

Sandeep_Chatterjee_MTech_Thesis-Plagiarism.pdf (1.01 MB)

Date

2025-06

Authors

Chatterjee, Sandeep

Publisher

Indian Statistical Institute, Kolkata

Abstract

Due tothelimitedcapabilitiesofsingleLargeLanguageModels(LLMs),multipleLLMscanbe employedintandemforbetterreliabilityofanswers.Blendingreferstocombiningthestrengths of variousLLMstomakeuseoftheircomplementarycapabilitiesforgeneratinghigh-quality responses.Itisanon-trivialproblem,andthetaskbecomesevenmoredifficultwhenaiming for minimallatencyandsupervisingtheblendingcomponents.Thestandardframework,LLM- Blender, approachesthisinthreestages:responsegeneration,candidateselectionviaranking, and responsefusionthroughsummarization.However,thispipelinefacestwocriticallimita- tions—high latencyduetorepeatedrankingsteps,andheavyrelianceonexternal,supervised componentsincludingalearnedencoderforrankingandaseparatesequence-to-sequencesum- marizer forfusion. In thisthesis,weproposenovel,efficientalternativestoovercomethesechallenges.Thisthesis comprises twoworks.First,weshowthatreducingthefrequencyofrankingwithinmulti- turn conversationssignificantlyimproveslatencywithminimaldegradationinoutputquality. Second, weintroduceapeer-review-basedresponsefusionmechanism,whereLLMscollectively evaluateandreviseeachother’sresponses,removingtheneedforanyexternallytrainedrankers or summarizers.Thiscollaborativemethodenablesfullyself-containedLLMblendingwithout additional trainingorsupervision. WeassessourproposedmethodsonthetaskofConversationalQuestionAnsweringacrossfive multi-turnconversationalbenchmarks—ConvQuestions,Atlas-Converse,CoQA,QuAC,and DoQA—using tendiverse,publiclyavailableopen-weightLLMs.Experimentalresultsdemon- strate thatourpeer-review-drivenframeworkwithreducedrankingachievesqualityonparwith existing approacheswhilebeingsubstantiallymoreefficient.Ourworkpresentsasteptoward scalable, modularLLMensemblingforreal-worldopen-domaindialoguesystems.

Description

Dissertation under the supervision of Dr. Debapriyo Majumdar and Dr. AmitChintamaniAwekar

Keywords

Large Language Models

Citation

52p.

URI

http://hdl.handle.net/10263/7592

Collections

Dissertations - M Tech (CS)

Full item page

Efficient Blending of Large Language Models

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By