|
|
- Queues with inspection cost: To see or not to see'
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Consider an M/M/1-type queue where joining attains a known reward, but a known waiting cost is paid per time unit spent queueing. In the 1960 s, Naor showed that any arrival optimally joins the queue if its length is less than a known threshold. Yet acquiring knowledge of the queue length often brings an additional cost, e.g., website loading time or data roaming charge. Therefore, our model presents any arrival with three options: join blindly, balk blindly, or pay a known inspection cost to make the optimal joining decision by comparing the queue length to Naor’s threshold. In a recent paper, Hassin and Roet-Green prove that a unique Nash equilibrium always exists and classify regions where the equilibrium probabilities are nonzero. We complement these findings with new closed-form expressions for the equilibrium probabilities in the majority of cases. Further, Hassin and Roet-Green show that minimising inspection cost maximises social welfare. Envisaging a queue operator choosing where to invest, we compare the effects of lowering inspection cost and increasing the queue-joining reward on social welfare. We prove that the former dominates and that the latter can even have a detrimental effect on social welfare. PubDate: 2025-04-01
- Fluid limit for a multi-server, multiclass random order of service queue
with reneging and tracking of residual patience times-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: In this paper, we consider a multi-server, multiclass queue with reneging operating under the random order of service discipline. Interarrival times, service times, and patience times are assumed to be generally distributed. Under mild conditions, we establish a fluid limit theorem for a measure-valued process that keeps track of the remaining patience time for each job in the queue, when the number of servers and classes is held fixed. We prove uniqueness for fluid model solutions in all but one case. We characterize the unique invariant state for the fluid model and prove that fluid model solutions converge to the invariant state as time goes to infinity, uniformly for suitable initial conditions. PubDate: 2025-03-23
- Sample-path moderate deviation principle for GI/GI/1+GI queues in the
nearly critically loaded regime-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: This paper establishes sample-path moderate deviation principles (MDP) for GI/GI/1+GI queues in the nearly critically loaded regime (or near-heavy-traffic regime). The processes of interest including queue-length process and offered waiting time process are scaled appropriately, with the space scaled in between the order of the time scaling and its square root. The rate functions in the sample-path MDPs for these processes can be explicitly expressed. We employ the method of exponential tightness, exponential equivalence and the contraction principle in large deviation theory and apply to these MDP-scaled processes. PubDate: 2025-03-13
- Overlap times in the G/G/1 queue via Laplace transforms
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: In this paper, we analyze the steady-state maximum overlap time distribution in the G/G/1 queue. Our methodology exploits Laplace-Stieltjes transforms with a novel decomposition of the maximum overlap time. Explicit expressions are provided for the special cases of the M/G/1 and G/M/1 queues. We also study the steady-state distribution of the minimum overlap time of a customer with its two adjacent customers. We show a novel relationship between the minimum, maximum and the steady-state waiting time. PubDate: 2025-02-22
- Analysis of a threshold-based priority queue
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: This paper considers a discrete-time single-server queueing system, with two classes of customers, named class 1 and class 2. We propose and analyze a novel threshold-based priority scheduling scheme that works as follows. Whenever the number of class-1 customers in the system exceeds a given threshold $$m \ge 0$$, the server of the system gives priority to class-1 customers; otherwise, it gives priority to class-2 customers. Consequently, for $$m=0$$, the system is equivalent to a classical priority queue with absolute priority for class-1 customers, whereby the (mean) delay of class-1 customers is lowered as much as possible at the expense of longer (mean) delays for class-2 customers. On the other hand, for $$m\rightarrow \infty $$, the system is equivalent to a priority queue with absolute priority for class-2 customers, with the opposite effect on the class-specific (mean) delays. By choosing $$0, we aim at a more gradual delay differentiation between the two customer classes of the system. The queueing analysis of the model turns out to be quite challenging. We first establish a kernel-type functional equation for the steady-state joint probability generating function $$U(z_1,z_2)$$ of the numbers of customers in the two queues, from which $$U(z_1,z_2)$$ can be solved in terms of a finite number of unknown boundary functions. Next, we develop a method to determine these boundary functions in principle and discuss the main practical obstacles in deriving explicit results from this. We show that the difficulty of a full analysis depends heavily on the value of m and the precise form of the arrival process. For the special case $$m=1$$, we derive explicit formulas for $$U(z_1,z_2)$$. We also develop a mean-value analysis technique, applicable for any m, to compute closed-from expressions for the class-specific mean customer delays. Abundant numerical results demonstrate the impact of the threshold m and the traffic mix (proportion of class-1 and class-2 traffic in the arrival process) on the delay-differentiating capabilities of the proposed scheduling discipline. PubDate: 2025-02-22
- Extreme values for the waiting time in large fork-join queues
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: We prove that the scaled maximum steady-state waiting time and the scaled maximum steady-state queue length among N GI/GI/1-queues in the N-server fork-join queue converge to a normally distributed random variable as $$N\rightarrow \infty $$. The maximum steady-state waiting time in this queueing system scales around $$\frac{1}{\gamma }\log N$$, where $$\gamma $$ is determined by the cumulant generating function $$\Lambda $$ of the service times distribution and solves the Cramér–Lundberg equation with stochastic service times and deterministic interarrival times. This value $$\frac{1}{\gamma }\log N$$ is reached at a certain hitting time. The number of arrivals until that hitting time satisfies the central limit theorem, with standard deviation $$\frac{\sigma _A}{\sqrt{\Lambda '(\gamma )\gamma }}$$. By using the distributional form of Little’s law, we can extend this result to the maximum queue length. Finally, we extend these results to a fork-join queue with different classes of servers. PubDate: 2025-02-22
- Call center data analysis and model validation
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: We analyze call center data on factors such as agent heterogeneity, customer patience and agent breaks. Based on this, we construct different simulation models and compare their performance with the actual realized performance. We classify them according to the extent in which they accurately approximate the service level and average waiting times. We also obtain a theoretical understanding on how to distinguish between the model error and other aspects such as random noise. We conclude that modeling explicitly breaks and agent heterogeneity is crucial for obtaining a precise model. PubDate: 2025-02-09
- Optimal admission control in queues with multiple customer classes and
abandonments-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: We consider a Markovian, finite capacity queueing system with multiple customer classes and multiple servers where customers waiting in queue may get impatient and leave without being served. There is a cost associated with abandonments and a holding cost associated with customers in the system. Admitted customers pay a class-dependent reward at the time of arrival. Under these assumptions, our objective is to characterize the optimal admission control policy that maximizes the long-run average reward. We formulate the problem as a Markov decision process problem and prove that the optimal policy is a DST (Double Set of Thresholds) policy where there is a pair of thresholds for each class, such that customers of that class are admitted only if the total number of customers in the system is between the two thresholds. We also identify sufficient conditions under which the optimal policy reduces to an SST (Single Set of Thresholds) policy where each customer class is admitted only if the total number of customers in the system is less than a certain threshold. After investigating how the optimal long-run average reward changes with respect to system parameters, we compare the performance of the optimal SST policy to the optimal policy. We conclude with a study of various extensions of our original model. PubDate: 2025-01-31
- Proof of the conjecture on Markovian queues with Poisson control
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: This study considers Markovian queues with different service speeds, where the server speed can only be changed at control instants assumed to follow a Poisson process. Núñez-Queija et al. (Queueing Syst 100:233–235, 2022, Indag Math 34:990–1013, 2023) formulated a conjecture on the asymptotics of the stationary distribution for the scaled process of queue length and server speed as the control rate approaches 0. We completely resolve this conjecture by rigorously analyzing an intuitive explanation of the conjectured result. Furthermore, we extend this result to a renewal control model. PubDate: 2025-01-04
- Asymptotic variability analysis for tandem queues
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: We study a multi-stage tandem queueing system and develop the law of the iterated logarithm (LIL) for performance measures including the queue length, workload, busy time, idle time and departure processes. These LIL results can help quantify the level of stochastic variability of these performance functions. Using a strong approximation method, which transforms the renewal-process-based performance functions to their continuous Brownian motion approximations, we establish all the LIL limits and express them as simple functions of model parameters (e.g, means and variances) of the interarrival and service times. Our LIL results reveal clear-cut insights on how the stochastic variability received from upstream stages can be propagated to the downsteam echelons in the tandem queue model; we show that stages that are underloaded, overloaded and critically loaded play distinct roles. An underloaded stage simply transfers all received upstream variability to the downstream stages; its own service-time variability makes no impact on any succeeding echelons. An overloaded stage overrides the variability received from upstream stages; it resets the propagation process by feeding its successive stages with its own service-time variability alone. A critically loaded inherits the variability received from upstream stages, which it modifies using its own service-time variability. PubDate: 2024-12-19
- Second-order bounds for the M/M/s queue with random arrival rate
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Consider an M/M/s queue with the additional feature that the arrival rate is a random variable of which only the mean, variance, and range are known. Using semi-infinite linear programming and duality theory for moment problems, we establish for this setting tight bounds for the expected waiting time. These bounds correspond to an arrival rate that takes only two values. The proofs crucially depend on the fact that the expected waiting time, as function of the arrival rate, has a convex derivative. We apply the novel tight bounds to a rational queueing model, where arriving individuals decide to join or balk based on expected utility and only have partial knowledge about the market size. PubDate: 2024-12-14
- Obliquely reflecting Brownian motion in nonpolyhedral, piecewise smooth
cones, with an example of application to diffusion approximation of bandwidth-sharing queues-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: This work gives sufficient conditions for uniqueness in distribution of semimartingale, obliquely reflecting Brownian motion in a nonpolyhedral, piecewise $$\mathcal {C}^2$$ cone, with radially constant, Lipschitz continuous direction of reflection on each face. The conditions are shown to be verified by the conjectured diffusion approximation to the workload in a class of bandwidth-sharing networks, thus ensuring that the conjectured limit is uniquely characterized. This is an essential step in proving the diffusion approximation. This uniqueness result is made possible by replacing the Krein–Rutman theorem used by Kwon and Williams (1993) in a smooth cone with the recent reverse ergodic theorem for inhomogeneous, killed Markov chains of Costantini and Kurtz (Ann Appl Probab, 2024. https://doi.org/10.1214/23-AAP2047; Stoch Process Appl 170:104295, 2024. https://doi.org/10.1016/j.spa.2024.104295). PubDate: 2024-12-04
- Heavy traffic scaling limits for shortest remaining processing time queues
with light tailed processing time distributions-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: We prove a heavy traffic scaling limit for a shortest remaining processing time queue. We are interested in the case where the processing time distribution has a tail that decays rapidly, i.e., has light tails. In particular, we revisit the work in Puha (2015), which shows that the diffusion scaled queue length process multiplied by a processing time distribution dependent factor that tends to infinity converges to a nontrivial reflecting Brownian motion, under the condition that this distribution dependent factor is slowly varying and obeys a certain rate of convergence condition. Here, we prove that the rate of convergence condition is not needed and the result holds more generally. We further show convergence of a sequence of nonstandardly scaled measure valued state descriptors to a point mass at one such that the total mass fluctuates randomly in accordance with the diffusion limit for the workload process. This is a sharp concentration result which shows that, under this nonstandard scaling, there are a very small number of tasks in the system and the remaining work for each such task is large and of the same order of magnitude as that of other tasks. This is due to the prioritization of the task with the least remaining work, and is in contrast to the case of heavy tailed processing times studied in Banerjee, Budhiraja, and Puha (2022). There it is shown that, while there is some concentration, the remaining times of the very small number of tasks in the system spread out over the nonnegative real line according to a random profile under this nonstandard scaling. Thus, this work completes the description of the two fundamentally different behaviors of SPRT by characterizing it in the case of light tailed processing time distributions. PubDate: 2024-12-04
- An exact analysis and comparison of manual picker routing heuristics
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: This paper presents exact derivations of the first two moments of the total order picking time in a warehouse for three routing heuristics, under the assumption of random storage. The analysis is done for general order size distributions and provides exact closed expressions in terms of the probability generating function of the order size distribution. We also indicate how the methods and insights in this paper extend to different storage policies and multi-block warehouses. The exact results derived in this paper are used to investigate effects of routing heuristics, order size distributions and layouts on warehouse efficiency. As illustration, we model the warehouse as a queueing system. By using approximations of the average order-lead time in terms of the first two moments of the order picking time, we are able to find optimal warehouse layouts and batch pick sizes. PubDate: 2024-10-04
- Applying queueing theory to evaluate wait-time-savings of triage
algorithms-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: In the past decade, artificial intelligence (AI) algorithms have made promising impacts in many areas of healthcare. One application is AI-enabled prioritization software known as computer-aided triage and notification (CADt). This type of software as a medical device is intended to prioritize reviews of radiological images with time-sensitive findings, thus shortening the waiting time for patients with these findings. While many CADt devices have been deployed into clinical workflows and have been shown to improve patient treatment and clinical outcomes, quantitative methods to evaluate the wait-time-savings from their deployment are not yet available. In this paper, we apply queueing theory methods to evaluate the wait-time-savings of a CADt by calculating the average waiting time per patient image without and with a CADt device being deployed. We study two workflow models with one or multiple radiologists (servers) for a range of AI diagnostic performances, radiologist’s reading rates, and patient image (customer) arrival rates. To evaluate the time-saving performance of a CADt, we use the difference in the mean waiting time between the diseased patient images in the with-CADt scenario and that in the without-CADt scenario as our performance metric. As part of this effort, we have developed and also share a software tool to simulate the radiology workflow around medical image interpretation, to verify theoretical results, and to provide confidence intervals for the performance metric we defined. We show quantitatively that a CADt triage device is more effective in a busy, short-staffed reading setting, which is consistent with our clinical intuition and simulation results. Although this work is motivated by the need for evaluating CADt devices, the evaluation methodology presented in this paper can be applied to assess the time-saving performance of other types of algorithms that prioritize a subset of customers based on binary outputs. PubDate: 2024-09-21
- Managing flexibility: optimal sizing and scheduling of flexible servers
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: When managing resource flexibility in service systems, the conventional wisdom is that server flexibility is beneficial due to the resource pooling effect. However, in practice, flexibility often incurs some additional costs, which makes how much flexibility is optimal and how to best utilize the flexible resource a nontrivial question. In this paper, we study the joint staffing and scheduling problem in service systems with two classes of customers and dedicated and flexible servers. The specific feature we consider is that the flexible server may incur a higher staffing cost or a loss of efficiency. We study how flexibility is best utilized in two scenarios: one with deterministic arrival rates and the other with random/uncertain arrival rates. We utilize a heavy-traffic asymptotic framework to develop structural insights. Our results show that the size of the flexible server pool is of a smaller order than the size of the dedicated server pools, and the flexible servers are mostly used to hedge against system stochasticity or demand uncertainty, depending on which source of randomness dominates. The proposed staffing and scheduling policies are easy to implement and achieve near-optimal performance. PubDate: 2024-09-21
- Core allocation to minimize total flow time in a multicore system in the
presence of a processing time constraint-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Data centers are a vital and fundamental infrastructure component of the cloud. The requirement to execute a large number of demanding jobs places a premium on processing capacity. Parallelizing jobs to run on multiple cores reduces execution time. However, there is a decreasing marginal benefit to using more cores, with the speedup function quantifying the achievable gains. A critical performance metric is flow time. Previous results in the literature derived closed-form expressions for the optimal allocation of cores to minimize total flow time for a power-law speedup function if all jobs are present at time 0. However, this work did not place a constraint on the makespan. For many diverse applications, fast response times are essential, and latency targets are specified to avoid adverse impacts on user experience. This paper is the first to determine the optimal core allocations for a multicore system to minimize total flow time in the presence of a completion deadline (where all jobs have the same deadline). The allocation problem is formulated as a nonlinear optimization program that is solved using the Lagrange multiplier technique. Closed-form expressions are derived for the optimal core allocations, total flow time, and makespan, which can be fitted to a specified deadline by adjusting the value of a single Lagrange multiplier. Compared to the unconstrained problem, the shortest job first property for optimal allocation is maintained; however, a number of other properties require revising and other properties are only retained in a modified form (such as the scale-free and size-dependence properties). It is found that with a completion deadline the optimal solution may contain groups of simultaneous completions. In general, all possible patterns of single- and group-completion need to be considered, producing an exponential search space. However, the paper determines analytically that the optimal completion pattern consists of a sequence of single completions followed by a single group of simultaneous completions at the end, which reduces the search space dimension to being linear. The paper validates the Lagrange multiplier approach by verifying constraint qualifications. PubDate: 2024-08-26
- Asymptotics for the Green’s functions of a transient reflected
Brownian motion in a wedge-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: We consider a transient Brownian motion reflected obliquely in a two-dimensional wedge. A precise asymptotic expansion of Green’s functions is found in all directions. To this end, we first determine a kernel functional equation connecting the Laplace transforms of the Green’s functions. We then extend the Laplace transforms analytically and study its singularities. We obtain the asymptotics applying the saddle point method to the inverse Laplace transform on the Riemann surface generated by the kernel. PubDate: 2024-08-13
- Editorial introduction: second part of the special issue on product forms,
stochastic matching, and redundancy-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
PubDate: 2024-08-09
- Queue-length-aware dispatching in large-scale heterogeneous systems
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: One dominant approach for reducing response times in large-scale systems is Join-the-Shortest-Queue-d: whenever a job arrives, the dispatcher queries d servers at random and then assigns the job to the queried server with the shortest queue. While $$\textsf{JSQ}$$-d is known to perform quite well in systems where all servers run at the same speed, this is not the case in systems that exhibit heterogeneity with respect to server speeds. Unfortunately, there is no straightforward way to extend $$\textsf{JSQ}$$-d (or other so-called power-of-d policies) to heterogeneous systems. Should a job be assigned to the queried server with the shortest queue even if much faster servers were among those queried' Should a job be assigned to the queried server where it is expected to complete the soonest even if there is an idle, albeit slower, server available among those queried' And for that matter, should we query faster servers more often than their slower counterparts' Recent work has introduced a framework for developing strong dispatching policies by pairing suitably chosen querying and assignment rules. Within this framework, prior work has focused on finding strong-performing dispatching policies that use only the idle/busy statuses of the queried servers, rather than detailed queue length information. In this paper, we overcome the challenge of evaluating the performance of—and finding strong-performing—general scalable dispatching policies that make use of both server speed and detailed queue length information, through a combination of mean field analysis and a sequence of modified optimization problems. We find that well-designed length-aware dispatching policies can significantly outperform their idleness-based counterparts in large-scale heterogeneous systems. While the best policies of this kind are often complicated to describe, we find that in the vast majority of cases the relatively simple Shortest Expected Wait policy performs nearly as well, without the need for an especially sophisticated and time-consuming optimization procedure. PubDate: 2024-08-03
|