Cross docking for libraries with a depot

(1)

University of Groningen

Cross docking for libraries with a depot

van der Heide, G.; Roodbergen, K.J.; van Foreest, N.D.

Published in:

European Journal of Operational Research

DOI:

10.1016/j.ejor.2020.08.034

IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from

it. Please check the document version below.

Document Version

Publisher's PDF, also known as Version of record

Publication date:

2021

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA):

van der Heide, G., Roodbergen, K. J., & van Foreest, N. D. (2021). Cross docking for libraries with a depot.

European Journal of Operational Research, 290(2), 749-765. https://doi.org/10.1016/j.ejor.2020.08.034

Copyright

Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).

Take-down policy

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum.

(2)

ContentslistsavailableatScienceDirect

European

Journal

of

Operational

Research

journalhomepage:www.elsevier.com/locate/ejor

Innovative

Applications

of

O.R.

Cross

docking

for

libraries

with

a

depot

G. Van

der

Heide

∗

,

K.

J. Roodbergen,

N.

D. Van

Foreest

University of Groningen, P.O. Box 800, Groningen 9700 AV, the Netherlands

a

r

t

i

c

l

e

i

n

f

o

Article history: Received 8 March 2019 Accepted 18 August 2020 Available online 25 August 2020

Keywords: Inventory control Dynamic programming Heuristics Libraries Rentals

a

b

s

t

r

a

c

t

LibraryorganizationsintheNetherlandsshowanincreasinginteresttoemploydepotsforlow-cost stor-ageanddemandfulfillmentofitemrequests.Typically,alllibrariesinanorganizationhaveashared cat-alog,and,onlocalunavailability,requestscanbeshippedfromelsewhereintheorganization.Thedepot canbeused toconsolidateshipment requestsbymakingtoursalongalllibraries,deliveringrequested items,butalsopickingupitemsthathavetobestoredatthedepot,orthathavetobeshippedfrom onelibrarytoanother.Crossdockinganddelayedshipmentsaretwopreferredmethodsforfulfilling re-queststhatcannotbedirectlymetusingon-handstockatthedepot.Inthispaper,wecomparethesetwo methodsfromaninventorycontrolperspective.WemodelthelibrarysystemasaMarkovDecision pro-cess.Forone-andtwo-locationsystems,wederiveanalyticalresultsfortheaverage-costoptimalpolicy, showingthatthedecisiontostoreitemsfromthelocationatthedepotsatisfiesathresholdstructure de-pendingonthenumberofrenteditems.Forlargerinstances,aneffectiveheuristicisproposedexploiting thisthresholdstructure.Innumericalexperiments,importantmanagerialinsightsareobtainedby com-paringcrossdockinganddelayedshipmentsindifferentsituations.Crossdockingisshowntoaddmost valueinsystemswithlow totalstock,however,delayed shipmentsmayachieve similarcostsas cross dockingwhenstockishighorwhentoursfrequentlyvisitalllocations.Furthermore,effectivedecisions canbebasedonsimplemodelformulationswithmemorylessrentaltimedistributions.

1. Introduction

Peopleread fewer booksand,asa consequence,public library organizationsface a declining membershipand needto cut bud-gets duetolower revenues(Lammers, 2020).Librariesare simul-taneously seekingnewwaystocontributetosociety,forexample, on social inclusion, equal access to information, and freedom of expression(Audunsonetal.,2019).Forthispurpose,newactivities are initiatedandspacein thelibrary isreserved, forexample,to hostmeetings. Thesetrends induceareduction inthenumberof booksthat are ondisplay inlibraries.Inthissocietal context,we studythefollowingsetting.Weconsideranetworkoflibraries co-operatinginasystemofinterlibraryloans,aservicewherea cus-tomer ofonelibrarycanborrowbooksfromanotherlibrary.Such systemcan serve towidenthe varietyof booksthat each library canmakeavailabletoitscustomers.Furthermore,thelibraries en-gageinjointlymanagingtheircollections.Anextensivedescription ofthe essentialunderlying assumptionsof oursettingisgivenin

Appendix A, wherewe alsoprovide a number of examples from practice.

∗ _{Corresponding author.}

E-mail addresses: g.van.der.heide@rug.nl (G. Van der Heide),

k.j.roodbergen@rug.nl (K.J. Roodbergen),n.d.van.foreest@rug.nl (N.D. Van Foreest).

Thereisone depot, i.e., acentral storagefacilitythatsupports thenetwork of libraries.If a customer is looking fora particular book, there are several options. First, the customer may retrieve the book locally, i.e., from the shelves of the library where the customerisamember.Second,ifthereisnostocklocally,the cus-tomermayrequestthebookfromthenetwork.Ifthebookis avail-able at the depot, it can be transshipped from the depot to the locallibrary.Finally,thebookcanbetransshippedfromanother li-brary tothe locallibrary. Inall cases, thebook ispicked up and returnedbythecustomer atthelocallibrary.Thisconﬁgurationis commoninthe Netherlands. Itis alsoseen elsewhere, for exam-ple,inSanFranciscowhereacentrallibrarysupports26branch li-brarieslocatedinresidentialneighborhoods(Apte&Mason,2006), andinNewYorkwhereacentralprocessingfacilityexiststhat con-nects150involvedlibraries(Quandt,2017).

Inlibrary networkswitha depot, transportation usually takes placeseveraltimesaweek.Thethreetypesoftransportationinthe systemareshipments (transportfromthedepottomeetrequestsat libraries), take-backs (transportfromthelibrariestostoreitemsat thedepot),and lateral transshipments (transportbetweenlibraries to meet requests). Take-backs occur either with the intention of longer-termstorageofbooks,ortopre-positionbooksfor upcom-ingdemandin thenetwork.Everytransportation legeither origi-natesatorisdestinedtothedepot;booksarenotdirectlymoved https://doi.org/10.1016/j.ejor.2020.08.034

(3)

fromonelibrarytoanotherlibrary.Hencealateraltransshipment requirestwotransportation legs, ﬁrst fromtheoriginatinglibrary tothedepot,andsecondfromthedepottothedestinationlibrary. Wedistinguishtwowaysofhandlinglateraltransshipmentsatthe depot.

1. Cross docking .Twotours visitall librariesonthesameday.All itemsthat need to be laterallytransshipped are exchanged at thedepotinbetweenthetours.

2. Delayed shipments . One tour visits all libraries. All items that need to be laterally transshipped are stored at the depot for oneperiodandshippedatthenexttransportopportunity. Numerical resultsandheuristicsfordelayedshipmentscanbe foundinVan derHeide, Roodbergen,andVanForeest(2017).We introduce cross docking in this paper. We consider this system froman inventory perspectiveandoptimizetheoperational deci-sionsforshipments, cross docking,andtake-backs. Our contribu-tions are asfollows. We derive the structure of theaverage cost optimalpolicy for a one-location problemand, under a mild re-striction,fora two-locationproblem. Theoptimalpolicyfor take-backs can be characterized by a series of thresholds on the on-handinventories that depend on the numberof loaned items at eachlocation.Basedonthisthresholdstructureandother numer-icalinsights,we developaheuristicforageneralnumberof loca-tions. Innumerical experiments,the heuristic is shownto be on averagewithin1%fromtheoptimalcosts.

Various managerial insights are provided by comparing situa-tions withcrossdocking anddelayed shipments. We ﬁndthat, if suﬃcient stock is available in the system and the depot is fre-quentlyresupplied,both systemsareequallyeffective.However,if stockinthesystemisquite low,thenthecross-docking optionis superior,whichcanbeimportantconsideringthedecreasingstock atlibraries.The insightsfromthispapercanassist library profes-sionalstodeterminethebestoptionfortheirsituation.Other man-agerialinsightsconcerntheaddedvalue ofthestoragepossibility atthedepot, andthe requiredinformationaboutthe duration of borrowingforsupportingeffectivedecisionmaking.

Depots are used similarly in other sectors. Construction com-panies,forexample, share their expensive specializedequipment between their construction sites. Local hardware stores allow next-daydeliveryofrarelyused toolsfroma nearbydepot, while keepingfrequently used tools in stock. An important factor that distinguisheslibrariesfromotherrentalsystems,isthehigher will-ingnessofcustomers towait.Inrental systemsforproducts such as cars, clothing, and jewelry, customers may switch to a com-petitor on product unavailability, resulting in lost sales. Another difference ispossible downtimefor maintenance andcleaning of returnedproducts.Asalsoseveralsimilaritiesexist,andanalytical results and insights in this paper may serve as inspiration for otherrentalsystems,wedeploythecommonterm“torent” rather than“toborrow” or“toloan” intheremainderofthispaper.

Theoutlineofthearticleisasfollows.Relevantliteratureis re-viewedinSection 2.Then, inSection 3, wepresentthe main as-sumptionsandformulatethemodelasaMarkovdecisionprocess. Analyticalresultsfortheoptimalpolicyforthesingle-locationand two-locationproblemare derived inSections 4and5. InSection 6, a heuristic for the general problem is developed. Several ex-perimentsarecarriedout inSection7,inordertoinvestigatethe performanceoftheheuristicandtogainthediscussedmanagerial insights.Finally,Section8 providesconclusionsanddirectionsfor furtherresearch.

2. Literature

Since we consider a stochastic multilocation rental problem withadepot,weﬁrstreviewrecentliteratureonstochasticrental

models.Afterward, wediscusscloselyrelatedstochastic inventory controlmodels.

Stochastic rental models can be groupedinto singleand mul-tilocation models,withor withouta depot. Single locationrental models involve nodepot anddeal with particularissues such as settingthetotal rentalstockfora ﬁnitetime horizon(Pasternack & Drezner,1999;Slaugh,Biller,& Tayur,2016),allocatingdemand to different customer classes (Jain, Moinzadeh, & Dumrongsiri, 2015), and analyzing usage of newly introduced products under aheterogeneous customer base(Bassamboo,Kumar,& Randhawa, 2009).

Multilocation rental models without a depot can be divided intotwomainstreams.Thefirststreamconcernsoptimizingrental fleets.Forafinitehorizon,Baron,Hajizadeh,andMilner(2011) op-timizetheallocation ofrental products toseveralrentallocations undervarious demand andreturnpatterns. Transferringproducts between locations during the horizon is not considered. Models optimizing rental stock under a long-run average cost criterion aretypically basedonqueueingtheory.Givena policyforvehicle repositioning, PapierandThonemann (2008)andGeorge andXia (2011)determinethelong-runoptimalfleetsize.Long-runcost ex-pressionsarebasedontheavailabilityofrentalstockinloss mod-els (Papier & Thonemann, 2008) and closed queueing networks

(George&Xia,2011).

Thesecondstreamconcernsrepositioningoperationsforrental stock. Such problems are commonly modeled as a Markov deci-sion process (MDP) (Puterman, 2009). For a two location vehi-clerental system,LiandTao(2010) determineoptimalﬂeet-sizes as well as the optimal policy for repositioning vehicles at the endof every period. Brinkmann, Ulmer,andMattfeld (2019) and

Legros (2019) apply MDPs for dynamic repositioning in bike-sharingsystems.Foralibrarysetting,VanderHeideand Roodber-gen(2013) optimizethetrade-off between costs forlateral trans-shipment (inresponse todemand) andcostsforrepositioning (in anticipation of demand), without considering an option for low-coststorage.

Despiteitspracticalrelevance,onlyalimitednumberofauthors haveconsideredtheuseofadepotinmultilocationrentalmodels.

VanderHeide,VanForeest,andRoodbergen(2018)applya queue-ing approachforthetacticalproblemofoptimizingtheinventory levelsatthedepotandeachrentallocationundervarioustypesof backordering.However,itisnotpossibletodynamicallyreposition inventorybased onthesystemstate. Mostrelatedtoour workis

VanderHeideetal.(2017),whostudyalibrarysystemmakinguse ofdelayedshipments.Theauthorsnumericallyinvestigateoptimal decisionsforshipmentsandtake-backsby solvingMDPs. Numeri-cal examplesshow that theoptimaltake-backpolicy hasa state-dependent threshold structure, however, no analytical proofs are providedfortheoptimalpolicy structure.Inthispaper,we prove thepolicy structure inimportantspecial caseswithone andtwo locations.Inaddition,weconsidercrossdockinginourmodeland compareit to delayedshipmentsto generatenewmanagerial in-sights.

Now we discussrelated models in other applicationareas. In spare parts, a closely related concept is a quick-response ware-housethatcarriesoutshipmentsofsparepartstolocationsin re-sponsetostock-outs(Axsäter,Howard,&Marklund,2013).Howard, Marklund,Tan,andReijnen(2015)considertheuseofpipeline in-formationandoptimizethresholdpoliciesforshipments.Demand isbackordered ifan order ata localwarehousewill be delivered beforeathresholdtime,otherwiseitismetwithashipmentfrom thequick-response warehouse.While weconsider alesscomplex shipmentpolicy,i.e.,meetall demand,ourrental systemhas sev-eralcomplicatingfactors notaddressedinthesparepartssystem. Namely,therearestochasticandstate-dependentreturns,itis pos-sibleto crossdock, andthe analyticallyconvenient property that

(4)

inventorypositionsateachlocationareconstantdoesnotholddue todynamictake-backdecisions.

Anotherrelatedproblemistherepositioningofemptytrucksin a hub-and-spoke system. Duand Hall (1997) determine effective heuristicthresholdpoliciesforsendingemptytrucksfromthehub to thespokes andvice versa.SongandCarter(2008)providethe optimal control policy for a systemwithtwo spokes. By decom-posing the system into several systems with a singlespoke and a hub,they derivea heuristicpolicy thatworkswell. Weapply a similar approach inourheuristic by usingstructuralresults from the one location problem. In thesepapers, trucks are transferred betweenhubsandspokes,whichinourrentalterminologyimplies that productsrentedfroma locationarebydeﬁnitionreturnedto the depotandvice versa.The dynamicsofsuch ahub-and-spoke system differ from the typical rental systemwhere products are returnedtothelocationtheywereoriginallyrentedfromandthus requiredifferentpolicies.

Inessence,therentalmodelcanbeseenasalateral transship-mentmodelwithaspeciﬁctransshipmentstructure;seePaterson, Kiesmüller, Teunter, and Glazebrook (2011) for an extensive re-view. The combination of lateral transshipment and returns is rarelyconsidered.Ching,Yuen, andLoh(2003)andTaiandChing (2014) consider lateral transshipment in combination with an exogenous return process, however, this does not appropriately capture the endogeneity of the return process in rental systems. Without returns, Wee and Dada (2005) show a threshold policy forasystemwithlateraltransshipmentbetweenonelocation and one depot. We derive similar threshold policies, however,in our casethepolicyisstate-dependent.

Inventory-routing problems also study the trade-off between transportation andinventory costsinsystemswithmultiplestock points (Coelho, Cordeau, & Laporte, 2014). In some cases, lateral transshipment isalsopossible(Coelho,Cordeau,& Laporte,2012). These problems are typically solved using a rolling-horizon ap-proach,solving deterministic problemsevery period by substitut-inginforecastsofthestochasticdemand.Recentliteratureon rout-ing dealswith dynamic dispatching of demanded products from a depotto delivery points, usually under capacity restrictions of vehicles or delivery points (Rivera & Mes, 2017; Ulmer & Streng, 2019; Van Heeswijk,Mes,& Schutten, 2017). Thoughnot consid-eredhere, dynamicdispatching canbe an interesting method for fulﬁllingonlinedemandinrentalsystemswithadepot.

3. Model

Underlying to our model are nine assumptions, which are all practically relevant.Forbrevity, we simplylist theseassumptions here,whilean extendedexplanationandjustiﬁcationispresented inAppendixA.We assumethefollowing forthesettinginwhich librariesoperate:

• Librariesworktogetherandshareaninventory.

• Libraries ship substantial amounts of books to other libraries everyday.

• Librariesfacesigniﬁcantinventoryholdingcosts.

• Thereexistsa(central)depottostorebooks.

• Thedepotmustbe usedifabookisshipped fromone library toanother.

• Costsforshippingbooksincrease linearlywiththenumberof booksshipped.

• Returntimesaremoreorlessindependentofrentaltimes.

• Customersarewilling towaitan inﬁniteamountoftimefora book.

• No preemptive supplyofbooks ispermitted, even ifa library runsempty.

Fig. 1. A multilocation rental system with a depot and n rental locations.

In the remainder of this section, we start by describing the problem setting and by motivating the most relevant modeling choices. Afterward, we provide a mathematical formulation in termsofaMarkovdecisionprocessbydescribingthestate, transi-tions,actions,andcosts.

3.1. Model description

Weconsidera rentalsystemwith n rentallocationsanda de-pot,depictedinFig.1,whereperiodicallytransporttakesplace be-tweenthedepot(index 0)andtherental locations.We are inter-ested in the policy minimizing the long-run average cost of the system, henceweconsider aperiodic review modelwithan inﬁ-nite horizon. We restrict the analysisto a singleproduct type of whicha ﬁnite number ofitems are available. We can repeat the sameanalysisforotherproducttypes.

Everyperiod,customersdemandandreturnitemsattherental locations. Demand at each location follows a distribution that is discrete,ﬁnite,nonnegative,stationary,andstate-independent. De-manddistributionsmaydifferbetweenlocations.Customerseither demanditemsonlineor byvisiting therental location inperson. Regardless of its source, demand is met immediately if an item ison hand (eitherpicked up by a customer orkept asideforan onlinerequest).Ifnoitemsareon-hand,customerscanrequesta deliveryoftheitematthenextdeliverymoment,providedthe re-questis placed before the orderdeadline. The time between the orderdeadline andthenextdeliverymomentisusedtocarryout anynecessary transport. We considera subscription-based rental system, so customers do not switch to competitors on a stock-outandareawarethatnot everyrequestcanbe delivered imme-diately. Customers receive a delivery notiﬁcation when their re-quested items are bound to arrive and are assumed to pick up thesesoonafterthedeliverymoment.

Renteditemsreturnatthesamelocationwheretheyhavebeen demanded and can be rented to another customer in the same period.Customers rent itemsfor astochastic numberof periods, motivated by public library transaction data for the Groningen provincein theNetherlands. Due todifferent visitingfrequencies of customers and the possibility to request deadline extensions, thereisaconsiderablevariationinrentaltimes;infact,wefound thatonly 25%of alltransactions arereturned inthe lastweek of thethree-week deadline.The rentaltime distributionis equalfor eachlocation sincerental timemostlydependson productrather than location. For ease of presenting the model andfor analyti-cal tractability,we assume the rental time distribution is memo-ryless,i.e.,geometricallydistributed.Thismeansweonlytrackthe totalnumberofrenteditemsateachlocation,ratherthanthe ex-actrentaltimeofeachindividualrenteditem.Inanexperimentin

Section7.7,weshowthatthischoiceleadstoreasonabledecisions evenifthetruerentaltimedistributionisnotmemoryless.

In order to maximize service and minimize waiting time for customers,weassumealldeliveryrequestsaremetwhenever pos-sible.The delivery requests are handled asfollows.First, any re-turns at a location are allocated directly to delivery requests at thatsamelocation(notransporttakesplace).Second,any remain-ing delivery requests aremet by ashipment from thedepot, us-ingavailablestockatthedepot.Third,ifthedepothasinsuﬃcient stock, then the requestis met with on-hand stockfrom another

(5)

Fig. 2. Example of a state during the transition and action phase.

location.Besidesmeetingrequests,itispossibletocarryout take-backsto resupply thedepot forfuture shipment requests andto dealwithexcessinventoriesattherentallocations.Asindicatedin §1,alltransportationgoesthroughthedepotandcrossdocking is usedtodeal withlateraltransshipment requests.We excludethe possibilitytoshipmoreitemstoalocationthanrequested,because underourassumptions itisbettertostoreitemsatthedepotand shipthemwhentheyareneeded.

The costs ofthe rental systemare modeled asfollows. Rental locationsin ourmodel are uncapacitated, so we modelthe (lack of)capacityinrealitythroughaholdingcost,whichvariesbetween locationsandishigherforlocationswherecapacityistighter.The depothas the lowest holding cost per unit of inventory, since it hasthelargeststoragespaceofalllocationsanditdoesnot have to keep its assortment on display. Customer dissatisfaction from waitingis modeled by a backorder cost, incurred each period a requested item is not delivered. Because many different product typesare transportedtogether, typically every location hasto be visitedeveryperiod.Therefore,fixedcostsfordrivingtoalocation anddelivering crateswithitems cannot be avoided. The variable costsareduetohandling,becauseevery itemneeds tobepicked manuallyandscannedonpick-upanddelivery.Therefore,fromthe perspective of a single product type, it is reasonable to assume thetransportationcostislinear.Inaddition,thereiscross-docking handling cost, measuring the extra cost for exchanging items at thedepotinbetweentourscomparedtoaregular shipment. Usu-ally,thisexchangeinvolvesmanualsearchingfromcrates,whichis moredifficultthansystematicallypickingitemsfromtheshelvesat thedepot, hencethe cross-docking handling cost is nonnegative. Allcost parameters are linear, andforour long-run average cost analysistheyareadditionallyassumedtobefiniteandstationary.

3.2. State variable

In period t ,thestate of thesystemis givenby the tuple S t=

(

x 0t,x t,y t

)

. Here, x 0t represents thestock levelat thedepot. The

vectors x t=

(

x 1t, . . . , x nt

)

and y t=

(

y 1t, . . . , y nt

)

representthestock

levels x itandnumberofrenteditems y itatlocation i , i =1,...,n .If

x _it_<0,thenlocation i hasunmetrequests.Deﬁne

(

x

)

+₌max

{

x_,0

}

and

(

x

)

−=max

{

−x,0

}

asthepositiveandnegativepartfunction, takenelement-wiseforvectors. The totalnumberofitemsinthe rental systemis denoted K ,hence, for anystate S t, it must hold

that x 0t+ni=1

(

x it

)

++ni=1y it=K. Notethat due to this

restric-tion one dimension can be dropped fromthe state variable, but

we will not do that here to keep the presentation as simple as possible.

Eachperiodconsistsofatransitionphasefollowedbyanaction phase.Inordertodistinguishbetweenthesetwophases,we indi-catethestate variableafter theactionphase by aprime,i.e., the stateafteractionsinperiod t isgivenby S _t=

(

x ₀_,t,x _t,y _t

)

.

3.3. Transition phase

Inthetransitionphase,each locationfacesstochastic demands and returns by customers.The demands at each location during period t are denoted D t=

(

D 1t, . . . , D nt

)

. Analogously, the returns

are during period t are denoted R t=

(

R 1t,. . .,R nt

)

. The success

probability ofthe geometricrental timedistributionisdenoted p , whichimpliesthat R itisBinomial(y it, p )distributedifthereare y it

renteditemsatlocation i .

The left block in Fig. 2shows an exampleof the state before andafterthetransitionphase.Rentallocationsandthedepotare indicated by triangles.Stock levels are depicted aswhitesquares andrenteditemsasblacksquares.Similarly,demandsareindicated aswhitecirclesandreturnsasblackcircles.Forexample,location 1 has0on-hand and2 renteditemsbefore the transitionphase. Because1itemisdemandedand2arereturnedduringthe transi-tionphase,location1endsup with1on-handand1renteditem afterthetransitionphase.Inlocation2,demandexceedsthe avail-ablestock,hencethestocklevelafterthetransitionphasebecomes negative.The depotfaces nodemandandthereforeitsstocklevel doesnotchange.

Expressed in mathematics, given a post-action state S _t₋₁, the statevariable S t afterthetransitionphaseevolvesaccordingto

x0t=x0,t−1, (1)

xt=xt−1+Rt− Dt, (2)

yt =yt−1+

(

xt−1

)

+−

(

xt

)

+. (3)

Eqs.(1)and(2)aretrivial.Toderive(3),notethaton-handitemsat location i increaseby

(

x _it

)

+−

(

x i,t−1

)

+duringthetransitionphase,

and consequentially, rented items must decrease by the same number.

3.4. Action phase

Intheactionphase,decisionsaremadetocarryout transporta-tionactions.Theactionvector a t=

(

a 1t,...,a nt

)

speciﬁesthe

(6)

num-ber ofitems takenback fromeach location tothe depot. If a it is

negative, this number isshipped from the depotto location i . If we let a t=

(

a t

)

+−

(

a t

)

− then

(

a t

)

+ and

(

a t

)

−can be interpreted

asthenumberofitemstaken-back andshipped, respectively.We havethefollowingconstraintsontheaction:

(

at

)

+≤

(

xt

)

+, (4)

(

at

)

−≤

(

xt

)

−, (5) n i=1

(

ait

)

−≤ x0t+ n i=1

(

ait

)

+, (6) n i=1

(

ait

)

−=min

n i=1

(

xit

)

−,x0t+ n i=1

(

xit

)

+

, (7)

Constraint(4)ensuresnottakingbackmorefromanylocationthan is on hand,while constraint (5)prevents shippingmore to a lo-cationthanthereare backorders.Constraint(6)preventsshipping morefromthedepotthanisavailable afterthetake-backactions. Finally,constraint(7)ensures thatallrequestsreceiveashipment, unlessthetotalon-handinventoryinthesystemistoosmall.

TherightblockinFig.2showsanexamplefortheactionphase. First the action is shown, with the arrows indicating how many items are transported over an edge and in which direction. To theright,theresultingstateaftercompletingtheactionphaseare shown. Here, the chosen action is a t=

(

1, −2, 1

)

. Location 2 has

a stocklevel of −2, so one item is shipped from the depot and anotheritem iscrossdockedfromanotherlocation withon-hand stock.Inaddition,oneitemistakenbackfromthelocationsto re-supplythedepot.Allrequestsatlocation2arenowmet,hencethe rented itemsthereincrease by 2.Thestockatthe depotremains asis,because1itemwasshipped,but1wasalsotakenbackfrom anotherlocation.

If S _t isthestateaftertransitions,thestateafteractionsisgiven by x0t=x0t+ n i=1 ait, (8) xt=xt− at, (9) y_t=yt+

(

at

)

−. (10)

In(8),inventory atthedepotfollowsfromthenetdifference be-tween shipmentsandtake-backs.In(9),theinventoryposition at thelocationschanges bythe amountstransferred toorfromthat location.In(10),allshippeditemsareallocatedtooutstanding re-quests, so that the rented items increase by

(

a t

)

−. Any

remain-ingunmet demandisbackordered.After theactionphaseis com-pleted,costsareincurredandanewperiodstartswithatransition phase.

3.5. Costs and objective

Thefollowingnotationisusedforthecostparameters.Foreach unitofon-handstockattheendoftheperiod,theholdingcostis

h 0 atthe depot and h i at location i , with h 0<h i for i =1,...,n .

The backorder cost atlocation i is b i> 0per backordered unit at

theendoftheperiod.Thecostperunitshippedtoandtakenback from location i is c i>0.Forcross docking, there isan additional

handlingcost d ≥ 0peritemcross-docked.The totalcostforcross dockingfromlocation i to j isthus c i+c j+d.

Thecostsinperiod t forstate S tandactions a taregivenby C

(

St,at

)

=h0x0t+d

n i=1

(

ait

)

−− x0t

+ + n i=1

hi

(

xit

)

++bi

(

xit

)

−+ci

((

ait

)

++

(

ait

)

−

)

. (11) Respectively,thisgivestheholdingcostatthedepot,thehandling costsforcrossdocking,andthesumoftheholding,backorder,and shipmentcosts ofthelocations. Forexample,thecost forthe ac-tioninFig.2is h 0+d +c 1+2c 2+c 3.

Astationarypolicy

π

speciﬁesforeachstate S acorresponding action a . The goal of this paper isto determine a policy

π

that minimizestheaveragecost

lim t→∞ 1 tEπ

_t s=1 C

(

Ss,as

)

.

Althoughthecoststructureislinear,theproblemfeaturesvarious interesting trade-offs.The choice ofactions depends, among oth-ers, on the different cost parameters, the demand rates,and the currentstateofthesystem.Anactioncanimpactthesystemstate severalperiodsintothefuture,sinceifanitemisshippedtoa cer-tainlocationnow,thatlocationwillhavemoreinventorywhenthe itemreturns. The optimalpolicy takesintoaccount thistrade-off betweenthedirectandfutureimpactofactions.

4. Single-locationproblem

Inthis section we analyzethe average-cost optimalpolicy for the single-location problem (SLP). With a single location, cross dockingisnot possibleandshipmentsfromthedepotare carried out assoon asthe location has a stock-out.What remains tobe optimizedare the take-backactions, i.e., transporting itemsfrom thelocationtothedepot.Weprovethattheoptimaltake-back ac-tion satisﬁes a thresholdstructure: it isoptimal to take back all on-handitemsabovea threshold,andotherwise donothing.This thresholdisstate-dependent:itdecreasesinthenumberofrented items.

Theintuition behindthe decreasingthreshold followsby con-sideringthe last item atthe location. Suppose we label one on-hand item at the location as the last item, to be rented to cus-tomersonly ifall other itemsatthe location arerented. We can decideto takeback theitem tothedepotandship itback when it is requested, saving holding costs every period the item is at thedepot.Theholdingcostsavingsexceedthetransportationcosts onlyifittakeslong enoughforthelast itemto berequested. All elsebeingequal,thelastitemwillberequestedlaterifthenumber ofon-handitemsatthelocationincreases.Thisimpliesthatthere mustbe somethresholdinventory levelabove whichwewant to takebackallitems.Allelse beingequal,thelastitemwillalsobe requested later if thenumber of rented itemsincreases, because returning rented items can be used to fulﬁll demand. Therefore, thethresholddecreasesinthenumberofrenteditems.

Intheremainderofthissection,weformallyprovethe thresh-oldstructure.Weﬁrstexplaintheideafortheproof.After provid-ingtheproof,wederiveanalyticalexpressionsforsomethreshold values,andweprovideafastiterativeproceduretoobtainallother thresholdvalues.

Remark 1. Our analytical results from Sections 4 and 5 extend to various other settings withlinear costs. The backordering as-sumptionplays no role in the analysis, hence the same analysis appliestosettingswheredemandislostifnotmetattheﬁrst de-liverymoment.Furthermore,thetwo-locationanalysisextendsto anysettingwherelateraltransshipmentsarecarriedoutwhenthe

(7)

Fig. 3. Optimal take-back policy for the last item and its implications.

depotisoutofstock,byreplacingthecross-dockinghandlingcost byanappropriatelateraltransshipmentcost.

Remark2. Whilethesystemstocklevel K isanimportant parame-terinourproblem,ouranalyticalresultsarevalidforany K . There-fore,wedonotstudy K explicitlyuntilthenumericalexperiments inSection7.

4.1. Idea for the proof

Ourproofisbasedonstudyingthelastitematthelocation.We formulateafiniteMDPforthelastitem,anduseitspropertiesto provethe threshold structure. Fig. 3illustrates the optimal take-back actions for some states,where 1 indicates that a take-back isoptimaland0thatitisnot.Themiddlediagonalrepresentsthe statespaceofthelastitem.Everyperiod,thestatemovesalongthe diagonal:up ifreturns exceed demand,anddown ifdemand ex-ceedsreturns.Thelastitemisrequestedwhentheon-hand inven-toryreaches0. Thefirststep oftheproof isproving thatthe op-timaltake-backactiondecreasesalongthediagonal,i.e.,firstones, thenzeros.Thenextstepisshowinghowoptimalactionsona di-agonalimplytheoptimalactionson adjacentdiagonals,indicated byarrows inFig. 3.We prove that ifa take-backis optimal(not optimal)inastate,then itisalsooptimal(notoptimal)inastate withone more (fewer) on-hand or rented item. By induction,it thenfollowsthatthethresholdisnon-increasinginthenumberof renteditems.

4.2. Finite MDP for the last item

We nowformulate afiniteMDP tooptimizethe take-back ac-tionsofthelastitemuntilitisrequestedatthelocation.Itis im-portanttonotethatirrespectiveoftheactionstaken,thelastitem willberentedtoacustomerintheexactsameperiod(eitherfrom on-handstock, orby a shipment fromthe depot ifit was taken backatsomepoint). Ouractionsonlyimpactthestateofthe sys-temuntiltherequestoccurs,therefore,itsufficestoconsideronly thistime frame. Without loss ofgenerality, we subtract h 0 from eachholdingcostparameter, asthispartoftheholdingcost can-notbeinfluencedbyouractions.

Dropping time and location indices for the remainder of this section,supposethatthelocationcurrentlyhas x >0on-handand

y renteditems.Welabelthelastitematthelocationas m = x +y . Item m isrentedassoon as y =m .Therefore,the statespace for item m consistsof states y =0,...,m, with y = m the absorbing state.

Consider any state y , y < m . The action

α

m

y ∈

{

0, 1

}

indicates

whether or not item m is taken back in state y . If we choose

α

m

y =1,wepaythetake-backcost2c .Afterthetake-back,wecan

immediatelyentertheabsorbingstatebecausean itematthe de-pot incurs no further costs. Otherwise, if

α

m

y =0, we pay h − h0 andmakeatransitionto anewstate. Itfollowsthatthecostsfor action

α

m

y aregivenby C

(

y,

α

m

y )=2c

α

ym+

(

h− h0

)(

1−

α

my ). (12)

The transitionprobabilities areasfollows.If

α

m

y =1, wemove to

theabsorbingstate m ,hence

Pym

(

α

ym=1

)

=1.

If

α

m

y =0,wemovetostate z withprobability

P yz(

α

ym=0)= P(min

{

y +D − R,m

}

=z) = ∞ d=0 y r=0 1

{

min

{

y +d − r,m

}

=z

}

P(D =d)P(R (y )= r), (13) where P

(

D ₌d

)

istheprobabilitymassofthedemanddistribution and P

(

R

(

y

)

=r

)

the probability massofthe Binomial(y , p ) return distribution.

Thevaluefunctionsatisﬁestheoptimalityequation

Vm

₍

_y

₎

₌

⎧

⎪

⎨

⎪

⎩

min αm y

C

(

y,

α

m y )+ m z=0 Pyz

(

α

my )Vm

(

z

)

ify<m, 0 ify=m.

Thisoptimalityequationhasasolutionbecausetheexpectedtime untilabsorptionisﬁnite.

4.3. Optimal policy structure

Wearenowreadytoprovidestructuralresults.Lemma1shows thattheMDPfromSection4.2hasamonotoneoptimalpolicy. Fur-thermore,relationsbetweendecisionsforMDPswithdifferent val-uesof m areproven.Theproofsofalllemmasandpropositionscan befoundintheappendix.

Lemma1. Monotonicity properties of optimal take-back decisions for the SLP.

(i) For ﬁxed m , the optimal action

α

m

y is monotone decreasing in

y.

(ii) If

α

m

y =1 for some y and m , then

α

ym++kk=1 for all k =

0,...,K − m. If

α

m

y =0 for some y and m , then

α

my−k−k=0 for k =0,...,y .

(iii) If

α

m

y =1 for some y and m , then

α

yk=1 for all k ≥ m. If

α

my =

0 for some y and m , then

α

k

y=0 for all 0≤ k≤ m.

Theinterpretationof(i)isthatitisrelativelybettertotakeback thelastitemiftherearefewerrenteditems(or,equivalently,ifthe on-handstockishigher).Thisalsoimpliesthat,ifmanyitemsare rented,itissometimesbetter topostponeatake-backuntilmore items return. For(ii), note that the on-hand stock x ₌

(

m ₊k

)

₋

(

y +k

)

is constant in k . Hence, with the same on-hand stock, it isbetter tocarry out a take-backifthereare morerented items. Similarly,(iii)showsthatwiththesamenumberofrenteditems,it isbettertocarryoutatake-backiftherearemoreon-handitems. Themonotonicity propertiesinLemma1immediatelyimplya thresholdstructure.Weﬁndthatthethresholdisdecreasinginthe numberofrenteditems,withstepsofatmost1.

Proposition1. The optimal take-back policy for the SLP has the fol- lowing structure.

(8)

(i) The optimal take-back policy for the SLP is a threshold policy. There exists a threshold x ∗(y ) that leads to take-back actions

a=

x− x∗

₍

_y

₎

_if_x_>_x∗

₍

_y

₎

_,

0 ifx≤ x∗

₍

_y

₎

_. (14)

(ii) The threshold x ∗(y ) is decreasing in y , with steps of at most one item, i.e.

0≤ x∗

₍

_y

₎

_{− x}∗

₍

_y₊₁

₎

_{≤ 1}_. 4.4. Obtaining the threshold policy

In order toobtain the threshold policy,we analytically derive some ofthe threshold values andwe explain an iterative proce-duretoobtainallremainingvalues.Fortheanalyticalexpressions, weﬁrstneedtocharacterizethetime

τ

(x , y )untilthelast itemis requested when startingwith x on-handand y rented items. We obtain

τ

(

x,y

)

=min

t: t s=1 Ds≥ x+ t s=1 Rs

(

ys−1

)

, (15)

i.e.,

τ

(x , y )istheﬁrstperiodinwhichthetotaldemandequals(or exceeds)theinitialinventoryplustotalreturns.In(15),thereturns

R s

(

y s−1

)

dependontherenteditemsintheprecedingperiod,with y 0≡ y.The expectation of

τ

(x , y ) can beobtained usingstandard methodsforﬁniteMarkovchains(Kemeny&Snell,1976).

Wecannowstatethethresholdvaluesforwhichwehave ana-lyticalexpressions.

Proposition2. Threshold values for the SLP.

(i) x ∗

(

0

)

=min

{

x ≥ 0:

(

h − h0

)

E[

τ

(

x +1, 0

)

]> 2c

}

,

(ii) x ∗

(

y

)

=0 for y ≥ ¯y, with ¯y=min

{

y : P

(

D − R

(

y

)

>0

)

≤

h−h0

2c

}

, where P

(

D − R

(

y

)

>0

)

is the probability of positive net

demand conditional on having y rented items.

The thresholdat y =0 isdeﬁnedby a simple comparison be-tween transportation costs and the expected holding costs until the location runs out of stock. Furthermore, if y increases, the thresholdbecomeszeroatsome pointbecausethenumberof re-turning items R (y ) in the next period is almost always suﬃcient to coverdemand.Notefrom(ii) thatwhen transportation isvery cheap,i.e.,2c _< h _{− h}0,thenitisoptimaltoalways storeallitems atthedepot.

Forintermediatevaluesof y ,wehavenoanalyticalexpressions because we need to take into account that it is sometimes opti-maltowaitforrenteditemstoreturnbeforecarryingout a take-back. However, wecan obtain theoptimalthresholdvaluesusing an iterative procedure. The procedure is based on ﬁnite Markov chains andtherefore has signiﬁcantly shorter computationtimes thansolvingaMarkovdecisionprocess.

Theideafortheprocedureisasfollows.Suppose x ∗(y )isknown forsome y andwewanttodeterminethethresholdfor y +1.Since the thresholddecreasesby steps of1(Proposition2),the thresh-old iseither x ∗(y )or x ∗

(

y

)

− 1.Westudytheabsorptiontime ofa ﬁniteMarkovchainto determinethecorrectthreshold.Theﬁnite Markovchainhasstates z =0, . . . , m, with m = x ∗

(

y

)

+1+y .Inall states z ≤ yatake-backofitem m isnecessary(theon-hand inven-tory exceeds x ∗(y )). Therefore, all states z _{≤ y} are absorbing with value V m

₍

_z

₎

₌₂_{c and}_state_m_is_absorbing_with_value_Vm

₍

_m

₎

₌₀_.

Startingfromtransientstate y +1,wecalculatethecost V m

₍

_y₊₁

₎

until absorption, incurring a holding cost h _{− h}0 each period the chainisnotabsorbed.If V m

₍

_y₊₁

₎

_>₂_c_,_then_it_is_cheaper_to_carry

outatake-backinstate y +1thantowaituntilabsorption,sowe set x ∗

(

y +1

)

=x ∗

(

y

)

− 1.Otherwise,we set x ∗

(

y +1

)

= x ∗

(

y

)

. The completethresholdcanbeobtainedbystartingat x ∗(0)and itera-tivelyapplyingthisprocedureuntil x ∗

(

y

)

=0.

Fig. 4. Scenarios and costs for the m th item in the two-location problem.

5. Two-locationproblem

We now analyze the average-cost optimal policy for a two-locationrentalsystemwithadepot.Themaindifferencewiththe single-location problem is cross docking: local items are cross-dockedwhenthedepothasinsuﬃcientstocktomeetallrequests attheother location.Under amild restriction,we provethat the optimal take-back action in the two-location problem follows a state-dependent threshold policy. The threshold at a location is nowtwo-dimensional, dependingonthe numberofrented items atbothlocations.Thethresholddecreasesinthenumberofrented itemsatthelocationitself,however,itincreasesinthenumberof renteditemsat theother location. The latteristhe casebecause theprobabilityofhavingtocrossdockinthenextperiodincreases whentheother locationhasmorerenteditems.Bycarryingout a take-back,wecanavoidpossiblecross-dockinghandlingcosts.

5.1. Approach

WeapplythesameapproachasinSection4,studyingthecosts ofthelast on-handitematalocation. Withoutlossofgenerality, westudycostsofthelastitemoflocation1,denoted m =x 1+y 1; bysymmetry, the sameargumentcan berepeatedforlocation 2. Wederiveoptimaltake-backactionsforitem m byconsideringthe expectedcostsofall possiblescenariosfortheitemuntil itis re-questedateitherofthetwolocations.

Fig.4canbe usedtocalculatecosts fordifferentscenariosfor item m .Let

τ

1 and

τ

2betheperiodinwhichitem m isrequested atlocation1and2,respectively.Everyperiodbeforearequest oc-curs,we can decideto eithertake back item m orleave itat lo-cation1.Forexample,when item m iskeptatlocation 1,wepay

h 1 each period,anddependingon whether

τ

1 or

τ

2 occursﬁrst, wepay0or c 1+c 2+d. Whenitem m istakenbacktothedepot (node0),wepay c 1once, h 0 eachperiod,and c 1or c 2when

τ

1or

τ

2occurs.

Forthesingle-location problem,we used theanalytically con-venient property that take-back actions only impact the state of thesystemuntilitem m isrequested.Inthetwo-locationproblem, there exists one event where this property does not necessarily hold.This eventoccurswhen

τ

1=

τ

2, i.e.,item m is requestedat both locationsat the same time. Ifthe item has not been taken back to the depot, it will be rented with certainty at location 1.However, ifit has been taken back, we maydecide to ship it to location 2 instead of location 1.In order to make an optimal decision, we would have to know the expected difference in futurecosts oftheitem endingup atlocation1 and2.Weavoid this by imposing the restriction that we always ship to location

(9)

1 if

τ

1=

τ

2. We believe that this restriction does not lead to a signiﬁcantlydifferentoptimalpolicyforthefollowingreasons.The event

τ

1=

τ

2 is mostlikely when item m is in high demand at both locations. Inthis casethe choice fora shipment location is typicallynot requiredunderanoptimalpolicy: sincethedemand isexpectedto occur soon, it seems suboptimalto incuran extra costforatake-backandshipment.Thechoiceseemsmorerelevant whenitem m isinlowdemandatoneorbothlocations,however, thentheevent

τ

1=

τ

2 isnotlikely.

Remark 3. It is challenging to extend the last item approach to systems with more than two locations. With two locations, the only possible candidate for cross docking is the other location, however,withmultiplelocations,therecanbemultiplecandidates. Itisimpossibletoselectthecorrectcandidatebyconsideringonly thelastitemoflocation1.

5.2. A ﬁnite MDP for two locations

As before, we model thisas a ﬁniteMarkov decisionprocess, which can now be absorbed in multiple states.Since location 2 uses stock from itself and the depot before demanding item m , wecan set x 0=0 and x 2= K − m− y2 without lossofgenerality. Since x 1=m − y1and x 2=K − m− y2, wecanrepresentthestate of this MDP by S =

(

y 1,y 2

)

. The MDP is absorbed when y 1=m (correspondingtotime

τ

1oratake-back)orwhen y 2= K − m+1 (correspondingto

τ

2).

The binarytake-back decisionfor this problemis denoted by

α

m

y₁,y2. The two-dimensional transition probabilities can be com-putedanalogoustotheSLP.Thecostsforthetwolocationproblem are slightly different dueto the cross-docking action. The trans-portationcost c 1+c 2 when item m isdemandedatlocation 2 is unavoidable,soweonlyincurextracostsforatake-backif

τ

1≤

τ

2, hence

C

(

y1,y2,

α

ym )=2c1P

(

τ

1≤

τ

2

)

α

ym1,y2+

(

h1− h0

)(

1−

α

m y1,y2

)

,

fory1<m,y2<K− m+1. (16) The probability P (

τ

1≤

τ

2) can be computed by solving the cor-responding ﬁnite Markov chain with starting state y 1, y 2 where weset

α

m

y₁,y2=0 inall states.Finally,ifwe areabsorbedin y 2=

K − m+1 before carryingout a take-back,we paythe additional cross-dockingcost d .

Vm

₍

_y

1,K− m+1

)

=d fory1<m. 5.3. Threshold policy

Lemma 2 shows monotonicity of the optimal decisions in y 1 and y 2. Take-back becomes less interesting as y 1 increases and moreinterestingas y 2increases.Therelationwith y 1hasthesame interpretationasintheSLP.Take-backbecomesmoreinterestingas

y 2 increasesbecausehandlingcostscan beavoided whenitem m isdemandedsoonatlocation2.

Lemma2. Monotonicity in the ﬁnite MDP for two locations.

(i) For ﬁxed y 2,

α

ym₁,y2 is monotonic non-increasing in y 1. (ii) For ﬁxed y 1,

α

ym₁,y2 is monotonic non-decreasing in y 2.

The monotonicitycan beexploitedina similarwayasforthe SLPtoshowathresholdstructure.Weobtainthefollowing.

Proposition3. Threshold policy of the two-location problem.

(i) The optimal take-back action for location 1 in the two- location problem is a threshold policy. There exists a threshold x ∗₁

(

y 1,y 2

)

that leads to take-back actions

a1=

x1− x∗1

(

y1,y2

)

ifx1>x∗1

(

y1,y2

)

,

0 ifx1≤ x∗1

(

y1,y2

)

. (17)

(ii) For ﬁxed y 2, x ∗1

(

y 1, y 2

)

decreases in y 1 and for ﬁxed y 1, x ∗₁

(

y 1,y 2

)

increases in y 2.

(iii) For ﬁxed y 1, x ∗1

(

y 1,y 2

)

≤ x∗1

(

y 1

)

for all y 2.

Hence, we nowhavea thresholdpolicy that dependson both

y 1 and y 2.Interestingly,inthetwo-locationsettingwealwaystake backatleastasmuchinthesingle-locationsetting.Thereasonfor thisistheextracross-docking costthatwe couldprevent by tak-ingback intime, especiallywhentheon-handstockatlocation2 islow.The holdingcost trade-off from thesingle-locationsetting thus also applies to the two-location setting, with cross-docking costsasanadditionalincentivefortake-backs.

Forthespecialcase m ₌1_,and K ₌1_,thereisasimple expres-sionforthetake-backdecisionwhen x 1=1.

Lemma3. For m ₌1 and K ₌1 in the two-location problem, a take- back of item m is optimal if

P

(

D1>0

)

≤

h1− h0+d

2c1+d

. (18)

Hence, we carry out a take-back only if the demand rate at location 1 is smallenough, andthe higher d becomes, the more likelywearetotakeback.

5.4. Constructing the take-back policy

Similar to Section 4.4, we now state a procedure for obtain-ingthestate-dependenttake-backpolicy.Theideaistorepeatedly comparetheholdingcost duringtheexpectedtime untildemand withpossibletransportation costs andcross-docking costs,taking intoaccountthatitmaybeoptimaltopostponeatake-backuntil wereachanotherstate.

Let A be the setof statesin which it isoptimal tocarry out a take-back and

τ

_A be the ﬁrst momentwe enter a state in set

A.Theholdingcostsuntilabsorptioncoincidewiththemomentof absorption,min

(

τ

1,

τ

2,

τ

A

)

.Cross-dockinghandlingcostswillonly

havetobepaidiftheﬁrstdemandforitem m happensatlocation 2beforethepostponedtake-backiscarriedout,i.e.if

τ

1>

τ

2and

τ

2≤

τ

A.The extra transportation cost 2c for carryingout a

take-backnowcanonlybepreventedifthedemandatlocation1occurs beforethepostponedtake-back,i.e.if

τ

1≤

τ

2 and

τ

1≤

τ

A.

There-fore,webasethedecisiononthecondition

(

h1− h0

)

E[min

(

τ

1,

τ

2,

τ

A

)

]+ dP

τ

1>

τ

2,

τ

2

≤

τ

A

≥ 2c1P

τ

1≤

τ

2,

τ

1≤

τ

A

. (19)

Inwords,we take backitem m iftheexpectedholding costsand cross-docking handling cost ofnot taking back rightnow exceed theexpectedtake-backcosts.

We can use Eq. (19)to iteratively generate optimaltake-back actions by running diagonally over all possible states. We start withA=∅andthe mostextremestate y 1=0,y 2=K − m.Ifthe conditionholds,addthisstateto_A.Nowiterativelycheckthe con-dition with the updated A , keeping y 1 constant and decreasing y 2 by 1.If theconditionholds, we add the state to A, ifit does not, we stop checking for z _{≤ y}2. Then we increase y 1 by 1, set y 2=K − mandrepeattheaboveprocedure.Wecontinuethisuntil forsome y 1 and y 2=K − m theconditiondoesnothold,oruntil y 1=m .Repeatingthesameprocedureforallpossible m yieldsthe completetake-backpolicy.

6. Heuristic

Forageneralnumberoflocations,itischallengingtoobtain an-alyticalresultsandto solvetheMDP to optimality. Therefore,we propose aheuristic inordertotake effectivedecisionsin reason-abletime.Anyheuristicforourproblemmustincludethe follow-ingelements:

(10)

• A rule to selectreceiving locations in case of shipments and crossdocking.

• Aruletoselectsendinglocationsincaseofcrossdocking.

• A rule to determinehow much stock to take back from each locationtothedepot.

Van der Heide etal.(2017)propose various effectiverules for shipmentsandtake-backs,whichweadaptheretotacklethe sit-uationwithcrossdocking.TherulesforselectinglocationsinVan derHeideetal.(2017)arebasedonanextensivenumericalstudy oftheoptimalsolutionoftheMDPinsmallinstances.Fortherules fortake-backs,thereisnowasolidtheoreticalbasis,drawingfrom theanalyticalresultsinSections4and5.

The heuristic consists of three phases. The ﬁrst phase is the shipments/cross-docking phase, which is concerned with dealing with all unmet requests in the system. The second phase is the threshold take-back phase, which deals with takingback all on-hand itemsabove the singlelocation thresholds.The third phase is the preventive take-back phase, andis concerned with resup-plyingthedepotinordertopreventfuturecross-dockinghandling costs. In what follows,rules used in theserespective phases are described inmore detail. Pseudo-code forthe heuristic is shown inAlgorithm1. Algorithm1 Heuristic. Z =min

{

n_i₌₁x −_it,x 0t+ni=1x +it

}

Shipments/cross-docking for z ₌1_,_._._._,Z do if z ≤ x0then i ∗=0 else i ∗=arg min _i_∈_{_{j: x} j>0}c i− hi+b iq i j ∗=arg max _i_∈_{_{j: x} j<0}b i− ci+b ig i

(

y i

)

x i∗:=x i∗− 1 x _j∗:= x j∗+1, y j∗:=y j∗+1

for i =1,...,n do Thresholdtake-backs

z ₌

(

x _i_{− x}∗_i

(

y _i

))

+ x i:= x i− z

x 0:=x 0+z

i ∗=arg min i∈{j: x_j>0}

(

2c i− hi+h 0

)

q i Preventivetake-backs while n_i₌₁x +_i > 0 and

(

2c i∗− hi∗+h 0

)

q i∗≤

(

d +h i∗− h0

)(

1− q i∗

)

s i∗ do

x i∗:=x i∗− 1

x 0:=x 0+1

i ∗=arg min i∈{j: x_j>0}

(

2c i− hi+h 0

)

q i

In the shipments/cross-docking phase, the total number of itemsto be sent is known,anddenoted by Z . We repeatedly se-lect sendingandreceiving locationsandupdatetheir stocklevels untilall Z itemsare sent.Aslong asthe depothasstock, the de-potisthesending location.Otherwise,crossdockingisnecessary, andthe sending location is a location withstockthat minimizes

c i− hi+b iq i,where qi=P

(

Di− Ri

(

yi

)

>xi− 1

)

isthe stock-outprobability atlocation i afterremovingone item. This minimizes a combination of the direct costs in the current periodandtheexpectedbackordercostsinthenextperioddueto having lower inventory. The receiving location is a location with backordersthatmaximizes b i− ci+b ig i

(

y i

)

, where

gi

(

yi

)

=

₁

P(R(yi)>0) ifyi> 0,

M ifyi= 0,

is the expectednumber ofperiods until the ﬁrstreturn when y i

items are rented, with M some large number. This maximizes a

combinationofdirectlypreventedbackordercostsandfuture back-ordercostswhenthelocationneedstowaitforanitemtoreturn. Underthisrule,itemsare typically sent tolocationswithlow y i,

whichshould be prioritized because they areleast likelyto have returningitemsinthenextperiodthatcanmeetthebackorder.

In the threshold take-backs phase, all items exceeding the single location thresholds x ∗_i

(

y _i

)

are taken back from the loca-tions to the depot. This choice is motivated by the result in

Proposition 3 (iii), showing that the two-locationthreshold is at leastaslowasthesingle-locationthreshold.Sincethecost trade-offs yielding this result also exist in the general problem, we choose toapply thissame property tothe general problem. It is importanttonotethatthisstepapproximatestheoptimalsolution when K islarge.When K islarge,thedepotalmostalwayshas suf-ﬁcienton-handstockforshipments,henceeachlocationcancarry out its optimal single location policy without being affected by whathappensatotherlocations.

Aswe observedfromthethresholdsinthetwo-location prob-leminSection 5,inthepreventive take-backsphase thedepotis resuppliedforfutureshipmentsso that lowercross-docking han-dlingcostsareincurred.Clearly,thisisnecessaryonlyifthereisa highenoughprobabilityofhavingtocrossdockinthenextperiod. Wemodelthisbyaregretbasedrule,comparingtheexpected re-gretofatake-backwiththeexpectedregretofnotake-back.After carryingoutatake-back,withprobability q itheitemisdemanded

againat location i in the next period at extra cost 2c i− hi+h 0, so the expectedregret of a take-backis

(

2c i− hi+h 0

)

q i.Let the

stock-outprobabilityofthedepot,excludingdemandatlocation i , begivenby si=P

j=i

(

Dj− Rj

(

yj

)

− xj

)

+>x0

. (20)

When no take-backis carried out, withprobability

(

1− qi

)

s i the

item needs to be cross docked to another location at extra cost

d ₊h _i_{− h}0, so then the expected regret is

(

d +h i− h0

)(

1− qi

)

s i.

Wecarryoutpreventivetake-backsaslongastheregretofa take-backexceedsthatofnotake-back.

Remark4. TheprobabilityinEq.(20)isaconvolutionof2

(

n − 1

)

randomvariables, so we mayneedto approximateit forlarge n . Incase D iisPoissonfor i =1,...,n,wecanusetheapproximation

byVan derHeideetal.(2017).Theyapproximatethedistribution of

(

D i− Ri

(

y i

)

− xi

)

+ by aPoisson

(

− log

(

P

(

D i− Ri

(

y i

)

≤ xi

))

distri-bution,sothat_j₌_i

(

D j− Rj

(

y j

)

− xj

)

isalsoPoissondistributed. 7. Numericalexperiments

Inthissection,numericalexperimentsarecarriedoutwith sev-eralaims.First,we testthe qualityofthe heuristicby comparing ittotheoptimalsolution.Second,we comparecrossdockingand delayedshipments,toﬁndoutwhichtypeoftransportationtouse inwhichcircumstances. Third,we determinethevalue ofstorage atthedepotandweevaluatethebeneﬁtsofusingcompleterental returntime distribution information over using aggregated infor-mation only. Finally, we want to see whether there is an inven-torypoolingeffectininstanceswithhighdemandratesandahigh numberoflocations.Beforecarryingouttheexperiments,we dis-cussthe set of instances used in most experiments andthe nu-meralimplementationoftheMDP.

7.1. Instances

Wecreatea setof50 instanceswithdifferent parameter con-ﬁgurationsthat weusein mostofourexperiments.The included parametersandtheir valuesareshowninTable1.The parameter

(11)

Table 1

Possible parameter values in the instances.

Parameter Symbol Level 1 Level 2 Level 3 Level 4 Level 5 System stock level K high low

Visit frequency f 1 2 3 4 5 Depot holding cost h0 0.4 0.5 0.6 0.7 0.8

Backorder cost b 8 11 14 17 20 Shipment cost c 2 4 6 8 10 Cross-docking handling cost d 0 1 2 3 4 Demand rate λ 0.05 0.1 0.15 0.2 0.25

valuesareinspiredbypubliclibrariesintheNetherlands,andsome arebaseddirectlyonlibrarytransactiondata fromthe Groningen province.Weuseanorthogonaldesign,whichallowsfortestinga widerangeofparametervalueswitha limitednumberof experi-ments(Taguchi,1986).Speciﬁcally,weusetheL50array (see,e.g.,

NIST, 2017), which has1 two-level factorand up to 11ﬁve-level factors.We usedthe ﬁrst7 factorsof theL50 array,inthe same orderasinTable1.Werepeatall 50instances fordifferent num-bersoflocations,andforafaircomparisonbetweeninstanceswith adifferent n ,allrental locationsinaninstancehaveidenticalcost parametersanddemand/returndistributions.

The two-level factor is thesystem stocklevel K . Althoughwe havebeen unabletoformally establishconvexity, weobserve nu-mericallythat theoptimal averagecost isconvexin K .Therefore, wedetermineavaluefor K bycalculatingtheoptimalaveragecost for K =1,2,...untilitnolongerdecreases,whichwelabelas‘high

K ’sinceitturnsout toberatherhigh.Notethat thischoice for K

doesnotinvolvepurchasingcosts.Inpractice,feweritemsmaybe purchaseddueto budget cuts, sowe check the impactof having ‘low K ’bysetting

K=1.5 n

i=1

λ

i

p , (21)

roundedtothenearestpositiveinteger.

Without loss of generality, the weekly holding cost is h i=1

for i =1, . . . , n in all experiments; all other cost parameters can be scaled accordingly. We do varythe holding cost at the depot andtheother costparameters.Ascustomersarriverandomlyover theweek, we assume the weekly demandfollows a Poisson dis-tributionwithrate

λ

.Thevaluesarebasedonourdataset,where weeklydemandratesforalmostallitemsareintherange[0,0.25]. Furthermore,theweeklyreturnprobabilityinthesamedatasetis

p =0.3.

The visit frequency f measures thenumberoftimesper week transportation takes place between the depot and the locations. Whereappropriate,werescaletheotherparameterstomatchtheir correctweekly rates,e.g., the holding cost per period should be

h _i/f if the visit frequency is f . In order to have a weekly return probabilityof p ,the returnprobability per periodfora given f is 1−

(

1− p

)

1/ f_.

7.2. Implementation details

The MDPandtheheuristic havebeenimplemented inPython. Allexperiments arerun on a computer witha Corei7-4770 CPU (3.4 gigahertz) and 16 gigabytes memory. In order to have ﬁ-nitedemand,wecutoff thePoissondemanddistributionsattheir 99.99%quantile.Moreover,inordertohaveaﬁnitestatespace,we introduce a maximumnumber of backorders B =2 ateach loca-tion after demands/returns, with lost demands penalizedat cost

=2b. ThenumberofpossiblestatesintheMDPwith n locations,

Table 2

Summary statistics for the percentage optimality gap of the heuristic.

n Average Min. 1st quartile Median 3rd quartile Max. 2 0.40 0.00 0.00 0.00 0.04 6.25 3 0.59 0.00 0.03 0.15 0.50 4.52 4 0.63 0.00 0.05 0.14 0.74 4.29

K items,and B maximumbackordersperlocationis

|

S

|

= n i=0

2n+K− i K

n i

Bi.

Using value iteration, instances with n =4, K =8, and B =2 (185,526states)aresolvablewithin2minutes.Instanceswith n = 5, K =9, and B =2(2,930,642states)take severaldays,whichis whywelimittheinstancesizefortheMDPtoatmost4locations.

7.3. Performance of the heuristic

In this ﬁrst experiment, we evaluate the performance of the heuristic.Foreach valueof n ,werun the50instances(150in to-tal), comparingtheaveragecost ofthe heuristicwiththeaverage cost of the optimal policy from the MDP. Summary statistics for theoptimalitygapoftheheuristicareshowninTable2.

Theaverageoptimalitygapsaresimilarforeachvalueof n and arewellwithin1%.In126outof150instances,theoptimalitygap is within 1%, andin the remaining instances, the gapis atmost 6.25%. The heuristic has the largest optimality gaps in instances witha combinationofhigh c ,high

λ

, andhigh K ,because itcan slightlyoverestimatethetake-backamountstothedepot.

7.4. Cross docking vs. delayed shipments

We nowwant toobtain managerialinsights into thedifferent transportation methodsfordealingwithlateraltransshipment re-quests. Tothat end, we compare the difference incosts between situationswithcrossdocking(aspresentedinthispaper)and de-layed shipments(Vander Heide etal., 2017).Figs. 5and6 show theresultingaveragecosts forboth policiesforhighandlow val-uesofthetotalsystemstock. Separategraphsareshownforeach parameter,andthepointateachparametervaluegivestheaverage forallconﬁgurationswiththatsamevalue.

FromFig.5,weobservethatthedifferencebetweencross dock-ing and delayed shipmentsis small when K is high. The system stocklevelishighenoughforthedepottohavesuﬃcientstockto meetallshipmentrequests.Thisimpliesthatcross-dockingactions and delayed shipments are not used much, and therefore, both policies have practically the same cost in mostcases. The same reasoning also explains why the average cost is almost constant inthegraphsfor b , d ,and f .Theaveragecostsdoincreasestrongly in n , c ,and

λ

,forobviousreasons.Interestingly,delayedshipments sometimes havelower costs than cross docking dueto thedelay of one period. Items returning during the delay can be used to meet backorders, avoidingthe need to ship andsaving shipment costs intheprocess. Thisismostrelevant when b isvery lowor when f islarge,becausethenthebackorderscostsincurredduring thedelayarelimited.Hence,costscansometimesbesavedbynot crossdocking,whichwefurtherinvestigateinSection7.5.Overall, delayedshipmentsseem preferredto crossdocking incaseswith highstock,becauseresultingcostsaresimilarwithouttheneedto visitrentallocationsasecondtime.

For our practical case, we see that library organizations may purchasefewer booksduetobudget cuts. Thecleargapbetween crossdockinganddelayedshipmentsforlow K inFig.6indicates that crossdocking canbe quiteimportant insuchsituations. The

(12)

Fig. 5. Cost of delayed shipments and cross docking when K is high.

Fig. 6. Cost of delayed shipments and cross docking when K is low.

average gap between the two is 9.56%. With low system stock, crossdockinganddelayedshipmentsaremoreoftennecessaryand theaveragecostsare muchhigherthanforhigh K .Itisnolonger possibleto avoidsituationswithbackorders,ascan beseen from theincrease oftheaveragecost in b .Partofthegapiscausedby the directbackordercost incurredfordelayedshipments. Besides that,delayedshipmentessentiallyprolongstherentaltimebyone

period,resulting in higherutilization of the stock in the system andextrabackorders. Delayedshipmentssavecross-docking han-dlingcostsattheexpenseofbackordercosts,thereforethegap de-creasesin b andincreasesin d .Thevisitfrequencyalsohas signif-icantimpact. Ifthevisit frequencyis onceper week,the average gap is 20.51%. However, if the visit frequency is higher, the gap between cross docking and delayed shipments becomes smaller