您好,欢迎来到划驼旅游。
搜索
您的当前位置:首页W06-3815

W06-3815

来源:划驼旅游
ContextComparisonasaMinimumCostFlowProblem

VivianTsangandSuzanneStevensonDepartmentofComputerScience

UniversityofToronto

Canada

vyctsang,suzanne@cs.utoronto.ca

Abstract

Comparingwordcontextsisakeycompo-nentofmanyNLPtasks,butrarelyisitusedinconjunctionwithadditionalonto-logicalknowledge.Oneproblemisthattheamountofoverheadrequiredcanbehigh.Inthispaper,weprovideagraphi-calmethodwhicheasilycombinesanon-tologywithcontextualinformation.Wetakeadvantageoftheintrinsicgraphicalstructureofanontologyforrepresentingacontext.Inaddition,weturntheon-tologyintoametricspace,suchthatsub-graphswithinit,whichrepresentcontexts,canbecompared.Wedeveloptwovari-antsofourgraphicalmethodforcompar-ingcontexts.Ouranalysisindicatesthatourmethodperformsthecomparisoneffi-cientlyandoffersacompetitivealternativetonon-graphicalmethods.

1Introduction

Manynaturallanguageproblemscanbecastasaproblemofcomparing“contexts”(unitsoftext).Forexample,thelocalcontextofawordcanbeusedtoresolveitsambiguity(e.g.,Sch¨utze,1998),assum-ingthatwordsusedinsimilarcontextsarecloselyrelatedsemantically(MillerandCharles,1991).Ex-tendingthemeaningofcontext,thecontentofadocumentmayrevealwhichdocumentclass(es)itbelongsto(e.g.,Xuetal.,2003).Inanyappli-cation,onceasensibleviewofcontextisformu-lated,thenextstepistochoosearepresentationthatmakescomparisonspossible.Forexample,inword

97

sensedisambiguation,acontextofanambiguousinstancecanberepresentedasavectorofthefre-quenciesofwordssurroundingit.Untilrecently,thedominantapproachhasbeenanon-graphicalone—contextcomparisonisreducedtoataskofmeasuringdistributionaldistancebetweencontextvectors.Thedifferenceinthefrequencycharacteristicsofcon-textsisusedasanindicatorofthesemanticdistancebetweenthem.

Wepresentagraphicalalternativethatcombinesbothdistributionalandontologicalknowledge.Webeginwiththeuseofadifferentcontextrepresen-tationthatallowseasyincorporationofontologicalinformation.Treatinganontologyasanetwork,wecanrepresentacontextasasetofnodesinthenet-work(i.e.,conceptsintheontology),eachwithaweight(i.e.,frequency).TocontrastourworkwiththatofNavigliandVelardi(2005)andMihalcea(2006),thegoalisnotmerelytoprovideagraph-icalrepresentationforacontextinwhichtherele-vantconceptsareconnected.Rather,contextsaretreatedasweightedsubgraphswithinalargergraphinwhichtheyareconnectedviaasetofpaths.Byin-corporatingthesemanticdistancebetweenindivid-ualconcepts,thegraph(representingtheontology)becomesametricspaceinwhichwecanmeasurethedistancebetweensubgraphs(representingthecon-textstobecompared).

Morespecifically,measuringthedistancebe-tweentwocontextscanbeviewedassolvingamin-imumcostflow(MCF)problembycalculatingtheamountof“effort”requiredfortransportingtheflowfromonecontexttotheother.Ourmethodhastheadvantageofincludingsemanticinformation(bymakinguseofthegraphicalstructureofanontol-ogy)withoutlosingdistributionalinformation(by

WorkshoponTextGraphs,atHLT-NAACL2006,pages97–104,

c2006AssociationforComputationalLinguisticsNewYorkCity,June2006.󰀁

usingtheconceptfrequenciesderivedfromcorpusdata).

Thisnetworkflowformulation,thoughsupport-ingtheinclusionofanontologyincontextcompari-son,isnotflexibleenough.Theproblemisrootedinthechoiceofconcept-to-conceptdistance(i.e.,thedistancebetweentwoconcepts,tocontrastitfromtheoverallsemanticdistancebetweentwocontexts).Certainconcept-to-conceptdistancesmayresultinadifficult-to-processnetworkwhichseverelycompro-misesefficiency.Toremedythis,weproposeanovelnetworktransformationmethodforconstructingapared-downnetworkwhichmimicsthestructureofthemoreprecisenetwork,butwithouttheexpensiveprocessingoranysignificantinformationlossasaresultofthetransformation.

Intheremainderofthispaper,wefirstpresenttheunderlyingnetworkflowframework,anddevelopamoreefficientvariantofit.Wethenevaluatetherobustnessofourmethodsonacontextcomparisontask.Finally,weconcludewithananalysisandsomefuturedirections.

2TheNetworkFlowMethod

2.1

MinimumCostFlow

AsastandardexampleofanMCFproblem,considerthegraphicalrepresentationofaroutemapfordeliv-eringfreshproducefromgrocers(supplynodes)tohomes(demandnodes).Theremainingnodes(e.g.,intersections,gasstations)haveneitherasupplynorademand.Assumingtherearesufficientsupplies,theoptimalsolutionistofindthecheapestsetofroutesfromgrocerstohomessuchthatalldemandsaresatisfied.

Mathematically,letbeaconnectednetwork,whereisthesetofnodes,andisthe

1setofedges.Eachedgehasacost,

whichisthedistanceoftheedge.Eachnodeisassociatedwithavaluesuchthatindicatesitsavailablesupply(),itsdemand(),orneither().Thegoalistofindasolutionforeachnodesuchthatalltheflowpassingthroughsatisfiesitssupplyordemandrequirement().Theflowpassingthroughnodeiscapturedbysuchthatwecanobservethecom-98

demand.Thecostoftheroutesbetweennodesisdeterminedbyasemanticdistancemeasuredefinedoveranytwonodesintheontology.Now,asinthegrocerydeliverydomain,thegoalistofindtheMCFfromsupplytodemand.

Wecantreatanyontologyasthetransportnet-work.Arelation(suchashyponymy)betweentwoconceptsonandeachisedgerepresentedcanbedefinedbyanedgeastheseman-,andthecostticdistancebetweenthetwoconcepts.Thisseman-ticdistancecanbeassimpleasthenumberofedgesseparatingtheconcepts,ormoresophisticated,suchasLin’s(1998)information-theoreticmeasure.(SeeBudanitskyandHirst(2006)forasurveyofsuchmeasures).

Numerousmethodsarepossibleforconvertingthewordfrequencyvectorofacontexttoaconceptfrequencyvector(i.e.,acontextprofile).Onesimplemethodistotransfereachelementinthewordvector(i.e.,thefrequencyofeachword)tothecorrespond-ingconceptsintheontology,resultinginavectorofconceptfrequencies.Inthispaper,wehavecho-senauniformdistributionofwordfrequencycountsamongconcepts,insteadofaweighteddistributiontowardstherelevantconceptsforaparticulartext.SincewewishtoevaluatethestrengthofourmethodalonewithoutanyadditionalNLPeffort,webypasstheissueofapproximatingthetruedistributionoftheconceptsviawordsensedisambiguationorclass-basedapproximationmethods,suchasthosebyLiandAbe(1998)andClarkandWeir(2002).

Tocalculatethedistancebetweentwoprofiles,weneedtocastoneprofileasthesupply(thatour)distanceandtheotherasthedemand().Noteissymmetric,sothechoiceofthesupplyandthedemandisarbitrary.Next,wemustdeterminethevalueofateachconceptnode;thisisjustthedifferencebetweenthe(normalized)supplyfre-quencyanddemandfrequency:

(4)

Thisformulayieldsthenetsupply/demand,,atnode.Recallthatourgoalistotransportallthesup-plytomeetthedemand—thefinalstepistodeter-minethecheapestroutesbetweenandsuchthattheconstraintsin(2)and(3)aresatisfied.Thetotal

distanceoftheroutes,ortheMCF,

ineqn.(1),isthedistancebetweenthetwocontextprofiles.

99

Finally,itisimportanttonotethattheMCFfor-mulationdoesnotsimplyfindtheshortestpaths

fromtheconceptnodesinthesupplytothoseinthedemand.Becauseaprofileisafrequency-weightedconceptvector,someconceptnodesareweightedmoreheavilythanothers,andtheroutesbetweensuchnodesacrossthetwoprofilesarealsoweightedmoreheavily.Indeed,ineqn.(1),thecostofeachroute,,isweightedby(howmuchsup-ply,orfrequencyweight,istransportedbetweennodesand).

3GraphicalIssues

Asalludedtointheintroduction,certainconcept-to-conceptdistancesposeaproblemtosolvingtheMCFproblemeasily.Thedetailsaredescribednext.3.1

Additivity

Intheory,ourmethodhastheflexibilitytoincorpo-ratedifferentconcept-to-conceptdistances.Theis-sueliesinthealgorithmsforsolvingMCFproblems.Existingalgorithmsaregreedy—theytakeastep-wise“localist”approachonthesetofedgesconnect-ingthesupplyandthedemand;i.e.,ateachnode,thecheapestoutgoingedgeisselected.Theassump-tionisthattheconcept-to-conceptdistancefunctionisadditive.Mathematically,foranypathfromnodetonode,,whereand,thedistancebetweennodesandisthesumofthedistanceoftheedgesalongthepath:

(5)

Theadditivityofaconcept-to-conceptdistanceen-tailsthatselectingthecheapestedgeateachstep

(i.e.,locally)yieldstheoverallcheapestsetofroutes(i.e.,globally).Notethatsomeofthemostsuccess-fulconcept-to-conceptdistancesproposedintheCLliteraturearenon-additive(e.g.,Lin,1998;Resnik,1995).Thisposesaprobleminsolvingournetworkflowproblem—theglobaldistancebetweenanycon-cepts,and,cannotbecorrectlydeterminedbythegreedymethod.

3.2

ConstructinganEquivalentBipartiteNetwork

Theissueofnon-additivedistancescanbeaddressedinthefollowingway.Wemaptherelevantportion

Figure2:Anillustrationofthetransformations(lefttoright)fromtheoriginalnetwork(a)tothebipartitenetwork(b),andfinally,tothenetworkproducedbyourtransformation(c),giventwoprofilesSandD.Nodeslabelledwitheither“S”or“D”belongtothecorrespondingprofile.Nodeslabelledwith“”or“”arejunctionnodes(seesection4.2).

ofthenetworkintoanewnetworksuchthattheconcept-to-conceptdistanceispreserved,butwith-outtheproblemintroducedbynon-additivity.Onepossiblesolutionistoconstructacompletebipar-titegraphbetweenthesupplynodesandthedemandnodes(thenodesinthetwocontextprofiles).Weset

inthebipartitegraphtothecostofeachedge

betheconcept-to-conceptdistancebetweenandintheoriginalnetwork.Sincethereisexactlyoneedgebetweenanypairofnodes,thenon-additivityisremovedentirely.(SeeFigures2(a)and2(b).)Now,wecanapplyanetworkflowsolveronthenewgraph.

However,oneproblemarisesfromperformingtheabovemapping—thereisaprocessingbottleneckasaresultofthequadraticincreaseinthenumberofedgesinthenewnetwork.Unfortunately,thoughtractable,polynomialcomplexityisnotalwaysprac-tical.Forexample,withanaverageof900nodesperprofile,making120profilecomparisonsinaddi-tiontonetworkre-structuringcantakeaslongas10days.2Ifwechoosetouseanon-additivedistance,themethoddescribedabovedoesnotscaleupwellforalargenumberofcomparisons.Next,wepresentamethodtoalleviatethecomplexityissue.

4NetworkTransformation

Onemethodofalleviatingthebottleneckistoreducetheprocessingloadfromgeneratingalargenumber

100

non-additivityissue,bygeneratinganedgewiththeexactconcept-to-conceptdistanceforeachpotentialnodecomparison,but,asnotedabove,istooinef-ficient.Oursolutionhereistoconstructanetworkthatusestheideaofapared-downA-shapedpathtomostlyavoidnon-additivity,butwithouttheineffi-ciencyofthecompletebipartitegraph.Thus,asex-plainedinmoredetailinthefollowingsubsections,wetradeofftheexactnessofthedistancecalculationagainsttheefficiencyofthenetworkconstruction.4.2

NetworkConstruction

onthetransformednetwork.Observethateachedge

,withcost,inthecompletebipartitenetwork,where,,isnowinsteadrepre-sentedbythreeedges:,,and,whereand.Thus,thetransformed

,becomes:distancebetweenand,

(6)

Inournetworkconstruction,weexploitthegeneral

notionofanA-shapedpathbetweenanytwonodes,butreplacethe“tip”oftheAwithtwonodes.Thenforeachnodeandinprofilesand,wegen-erateanedgefromstoanancestorof(theleft“branch”oftheA),anedgefromdtoanan-of(theright“branch”oftheA),andancestor

edgebetweenand(thetwonodesformingthe“elongatedtip”oftheA).Eachedgehastheexactconcept-to-conceptdistancefromtheoriginalnet-work,sothatthedistancebetweenanytwonodesandisthesumofthreeexactdistances.

Thesetofancestornodes,and,comprisethe“junction”pointsatwhichthesupplyfromcanbetransportedacrosstothenodesintosatisfytheirdemand.Thesetofjunctionnodes,,forapro-file,mustbeselectedsuchthatforeachnodein,containsatleastoneancestorof.(Seesection4.4fordetailsonthejunctionselectionpro-cess.)Theresultingnetworkisconstructedbydi-rectlyconnectingeachprofiletoitscorrespondingjunction,thenconnectingthetwojunctionsinthemiddle(Figure2(c)).

Thedifferencebetweenthecompletebipartitenetworkandthetransformednetworkhereisthat,insteadofconnectingeachnodeintoeverynodein,weconnecteachnodeintoeverynodein.ComparethetransformednetworkinFig-ure2(c)withthecompletebipartitenetworkinFig-ure2(b).Thecompletebipartitecomponentinthetransformednetwork(themiddleportionbetweenthejunctionnodeslabelledand)isconsid-erablysmallerinsize.Thus,thenumberofedgesinthetransformednetworkissignificantlyfeweraswell.

Next,wecanproceedtodefinethecostfunction

whereisthepreciseconcept-to-conceptdistancebetweenandintheoriginalnetwork.Oncewehavesetupthetransformednetwork,wecansolvetheMCFinthisnetwork,yieldingthedis-tancebetweenthetwo(supplyanddemand)profiles.

4.3DistanceDistortion

Becausethedistancebetweennodesandisnowcalculatedasthesumofthreedistances(eqn.(6)),somedistortionmayresultfornon-additiveconcept-to-conceptdistances.Toillustratethedistortionef-fect,considerJiangandConrath’s(1997)distance:

(7)

whereistheinformationcontentofanode,andisthelowestcommonsubsumerofnodesand.Thisdistancemeasuresthedif-ferenceininformationcontentbetweentheconceptsandtheirlowestcommonsubsumers.

Afterthetransformation,thedistanceisdistortedinthefollowingway.Ifandhavenocommonjunctionancestor,thenbecomes:

(8)

whereandarethejunctionancestorsofand,respectively.Otherwise,ifandshareacommonancestoratthejunction,then

becomes,

wherethetermineqn.(8)isre-placedby.Ineithercase,thetransformationreplacesthelowestcommonsubsumer

ineqn.(7)withsomeothercommonsubsumer

(or,mentionedabove).Un-less,thedistanceisdistorted

byusingalessprecisequantity,.

Notethattheinformationcontentofaconceptisgivenbyitsmaximumlikelihoodestimatebasedon

101

itsfrequencyinalargecorpus.Anincrementinthefrequencyofaconceptleadstoanincrementinthefrequencyofallitsancestors.Duetothefrequencypercolation,conceptswithasmalldepthtendtoac-cumulatehighercountsthanthosedeeperinthehi-erarchy(notethedifferenceindepth:

).Thus,weexpecttheinforma-tioncontentofaconcepttobehigherthanitsan-cestors,i.e.,aconceptismoresemanticallyspecificthanitsancestors,whichiscapturedbytheuseofthenegativefunctioninthedefinitionofIC.Thetransformeddistanceisdistortedaccordingly().

rithmsisquadratic(theformer)orcubic(thelatter)

inthenumberofnodesinanetwork,whichisunac-ceptablyexpensiveforourtransformationmethod.Notethattoensureeveryprofilenodehasanances-tornodeinthejunction,theselectionprocesshasalinearlowerbound.Tokeepthecostlow,itisbesttokeepalinearcomplexityforthejunctionselectionprocess.However,ifthisisnotpossible,itshouldbesignificantlylessexpensivethanaquadraticcom-plexity.Wewillempiricallyexploretheprocessfur-therinsection5.3.

5ContextComparison

Asalludedtoearlier,ournetworkflowmethodpro-videsanalternativetoapurelydistributionalandnon-graphicalapproachtocontextcomparison.Inthispaper,wewilltestbothvariantsofourmethod(withorwithoutthetransformationinsection4)inanamedisambiguationtaskinwhichthecontextwordswithinasmallwindowsurroundingtheam-biguouswordsarecompared.Ourpreliminaryanal-ysisshowsthatourgeneralnetworkflowframeworkisrobustandefficient.5.1

NameDisambiguation

4.4JunctionSelection

Selectionofjunctionnodesisakeycomponentofthenetworktransformation.Trivially,ajunctionconsistingofprofilenodesyieldsanetworkequiva-lenttothecompletebipartitenetwork.Thekeyistoselectajunctionthatisconsiderablysmallerinsizethanitscorrespondingprofile,hence,cuttingdownthenumberofedgesgenerated,whichresultsinsig-nificantsavingsincomplexity.

Notethatthereisatradeoffbetweentheover-allcomputationalefficiencyandthesimilaritybe-tweenthetransformednetworkandthecompletebi-partitenetwork.Thecloserthejunctionsaretothecorrespondingprofiles,thecloserthetransformednetworkresemblesthecompletebipartitenetwork.Thoughthedistancecalculationismoreaccurate,suchanetworkisalsomoreexpensivetoprocess.Ontheotherhand,therearefewernodesinajunc-tionasitapproachestherootlevel,butthereismoredistortioninthetransformedconcept-to-conceptdis-tance.Clearly,itisimportanttobalancethetwofac-tors.

Selectingjunctionnodesinvolvesfindingasmallersetofancestornodesrepresentingthepro-filenodesinahierarchy.Inotherwords,thejunc-tioncanbeviewedasanalternativerepresentationwhichisageneralizationoftheprofilenodes.Inadditiontotheprofilenodes,thejunctionnodesarealsoincludedinthetransformednetwork.Theymayprovideextrainformationaboutthecorrespondingcontext.

FindingageneralizationofaprofileisexploredintheworksofClarkandWeir(2002)andLiandAbe(1998).Unfortunately,thecomplexityofthesealgo-102

Thegoalfornamedisambiguationistoclassifyeachambiguousinstanceonthebasisofitssurroundingcontext.Oneapproachistouseanunsupervisedmethodsuchasclustering.Thisinvolvesmakingalargenumberofpairwisecomparisonsbetweenin-dividualcontexts.Giventhatthereisanoverheadtoincorporatingontologicalinformation,ournet-workflowmethoddoesnotcomputedistancesasef-ficientlyascalculatingapurelyarithmeticdistancesuchascosineorEuclideandistance.Ouralterna-tiveapproachistouseminimaltrainingdata.Us-ingahandfulofcontexts,wecanbuilda“goldstan-dard”profileforeachsenseofanambiguousnamebyusingthecontextwordsofasmallnumberofinstances.Wethencomparethecontextprofileofeachinstancetothegoldstandards.Eachinstanceisgiventhelabelofthegoldstandardprofiletowhichitscontextprofileistheclosest.5.2

ExperimentalSetup

Inournamedisambiguationexperiment,weusethedatacollectedbyPedersenetal.(2005)fortheirnamediscriminationtask.Thisdataistakenfrom

NamePairs

Ronaldo/DavidBeckham

0.74

Microsoft/IBM

0.56

Jordan/Egyptian

0.510.53

200(Full)0.800.730.77

200(Trans)0.88

0.98

0.75

0.97

0.76

0.750.76

0.830.820.990.99

Table1:Namedisambiguationresults(accuracy/F-measure)ataglance.Thebaselineistherelativefrequencyofthemajorityname.“200”and“100”givetheaveragedresults(overfivedifferentruns)using200and100randomlyselectedtraininginstancesperambiguousname.Theweightedaverageiscalculatedbasedonthenumberoftestinstancespertask.“Full”and“Trans”refertotheresultsusingthefullnetwork(pre-transformation)orthepared-downnetwork(withtransformation),respectively.

theAgenceFrancePressEnglishServiceportionoftheGigaWordEnglishcorpusdistributedbytheLin-guisticDataConsortium.Itconsistsofthecontextsofsixpairsofnames,including:thenamesoftwosoccerplayers(RonaldoandDavidBeckham);anethnicgroupandadiplomat(TajikandRolfEkeus);twocompanies(MicrosoftandIBM);twopoliticians(ShimonPeresandSlobodanMilosevic);anationandanationality(JordanandEgyptian);andtwocountries(FranceandJapan).ThesenamepairsareselectedbyPedersenetal.(2005)toreflectarangeofconfusabilitybetweennames.

Eachpairofnamesservesasoneofsixnamedisambiguationtasks.Eachnameinstancecon-sistsofacontextwindowof50words(25wordstotheleftandtotherightofthetargetname),withthetargetnameobfuscated.Forexample,forthetaskofdistinguishing“DavidBeckham”and“Ronaldo”,thetargetnameineachinstancebe-comes“David

Notethatthecomplexityofthisselectionprocessislinear,sinceallprofilenodesmustbeexaminedtoensuretheyhaveanancestorinthejunction;anyprofilenodeofwhichnojunctionnodeisanancestorisaddedtothejunction.Thisprocesscanonlybeavoidedbyusingjunctionnodesofzerodepthexclu-sively.

3

103

,whosedepthissmall.Junctionnodeswithasmalldepthdistortthedistancemorethanthosewithalargerdepth.Surprisingly,ourexperimentindicatesthatusingsuchnodesproducesequallygoodorbet-terperformance.Thissuggeststhatselectingajunc-tionwithalargerdepth,atleastforthedatainthistask,isnotnecessary.

SpeedImprovementIncomparisontoourre-portedrunningtimeonthepre-transformationnet-work(120comparisonsrunningfor10days),onthesamemachine,making12,000comparisonscannowbeaccomplishedwithintwohours.Intermsofcomplexity,ifwehaveprofilenodesandjunc-tionnodes,thenumberofedgestobeprocessedis

.Giventhatourjunctionshavesignif-icantlyfewernodesthantheoriginalprofiles,therunningtimeissignificantlylessthanquadraticinthenumberofprofilenodes.

tivetoapurelydistributionalandnon-graphicalap-proach.

Inouron-goingwork,wearefurtherexploringhowthechoiceofjunctioninfluencestheperfor-manceofdifferenttypesofconcept-to-conceptse-manticdistances.Forexample,wouldabottom-upjunctionselectionapproach(fromtheprofilenodesinsteadoffromtherootlevel)resultinbetterper-formance?Inaddition,weintendtoexaminethegraphicalpropertiesoftheindividualprofilesaswellastheroutesbetweentheconceptsacrossprofilesselectedbyournetworkflowmethods.Suchanaly-seswillhelpusgaininsightintothestrengths(andweaknesses)oftakingadvantageofagraphicalrep-resentationofcontextsaswellastreatinganontol-ogyasametricspaceforcontextcomparisons.

References

Budanitsky,A.andHirst,G.(2006).EvaluatingWordNet-basedmeasuresofsemanticdistance.ComputationalLinguistics.Toappear.

Clark,S.andWeir,D.(2002).Class-basedprobabilityestima-tionusingasemantichierarchy.ComputationalLinguistics,28(2):187–206.

Jiang,J.andConrath,D.(1997).Semanticsimilaritybasedoncorpusstatisticsandlexicaltaxonomy.InProceedingsontheInternationalConferenceonResearchinComputationalLinguistics,pages19–33.

Li,H.andAbe,N.(1998).Wordclusteringanddisambiguationbasedonco-occurrencedata.InProceedingsofCOLING-ACL1998,pages749–755.

Lin,D.(1998).Aninformation-theoreticdefinitionofsimilar-ity.InProceedingsofInternationalConferenceonMachineLearning.

Mihalcea,R.(2006).Randomwalksontextstructures.InPro-ceedingsofCICLing2006,pages249–262.

Miller,G.A.andCharles,W.G.(1991).Contextualcorrelatesofsemanticsimilarity.LanguageandCognitiveProcesses,6(1):1–28.

Navigli,R.andVelardi,P.(2005).Structuralsemanticinter-connections:Aknowledge-basedapproachtowordsensedisambiguation.IEEETransactionsonPatternAnalysisandMachineIntelligence,27(7).

Pedersen,T.,Purandare,A.,andKulkarni,A.(2005).Namediscriminationbyclusteringsimilarcontext.InProceedingsoftheSixthInternationalConferenceonIntelligentTextPro-cessingandComputationalLinguistics.

Resnik,P.(1995).Usinginformationcontenttoevaluatese-manticsimilarityinataxonomy.InProceedingsofthe14thInternationalJointConferenceonArtificialIntelligence.Sch¨utze,H.(1998).Automaticwordsensediscrimination.ComputationalLinguistics,24(1):97–123.

Xu,W.,Liu,X.,andGong,Y.(2003).Documentclusteringbasedonnon-negativematrixfactorization.InProceedingsofthe26thACMSIGIRConference.

7Conclusions

Wehavegivenanoverviewofournetworkflowfor-malismwhichseamlesslycombinesdistributionalandontologicalinformation.Givenasuitableon-tology,acontextvectorofwordfrequenciescanbetransformedintoacontextprofile—afrequencydistributionovertheconceptsintheontology.Incontrasttotraditionalnon-graphicalapproachestomeasuringonlythedistributionaldistancebetweencontextvectors,weprovideagraphicalformalismwhichincorporatesboththesemanticdistanceofthecomponentnodesaswellasthedistributionaldiffer-encesbetweenthecontextprofiles.Bytakingadvan-tageofthegraphicalstructureofanontology,ourmethodallowsasystematicandmeaningfulwayofabstractingoverwordsinacontext,andbyexten-sion,ameaningfulwayofcomparingcontexts.

Oneconcernwithourmethodinitspre-transformationformisitsinabilitytoincorporatesophisticatedconcept-to-conceptsemanticdistancesefficiently.Toremedythis,weproposeanoveltech-niquethatmimicsthestructureofthemorecompu-tationallyintensivenetwork.Ourpreliminaryeval-uationshowsthatthetransformationdoesnotham-perthemethod’sabilitytomakefine-grainedseman-ticdistinctions,andthecomputationalcomplexityisdrasticallyreducedaswell.Generally,ournetworkflowmethodpresentsahighlycompetitivealterna-104

因篇幅问题不能全部显示,请点此查看更多更全内容

Copyright © 2019- huatuo6.com 版权所有 湘ICP备2023023988号-11

违法及侵权请联系:TEL:199 1889 7713 E-MAIL:2724546146@qq.com

本站由北京市万商天勤律师事务所王兴未律师提供法律服务