Chip attach is the bottleneck operation in semiconductor assembly. Chip attach scheduling is in nature unrelated parallel machine scheduling considering practical issues, for example, machine-job qualification, sequence-dependant setup times, initial machine status, and engineering time. The major scheduling objective is to minimize the total weighted unsatisfied Target Production Volume in the schedule horizon. To apply -learning algorithm, the scheduling problem is converted into reinforcement learning problem by constructing elaborate system state representation, actions, and reward function. We select five heuristics as actions and prove the equivalence of reward function and the scheduling objective function. We also conduct experiments with industrial datasets to compare the -learning algorithm, five action heuristics, and Largest Weight First (LWF) heuristics used in industry. Experiment results show that -learning is remarkably superior to the six heuristics. Compared with LWF, -learning reduces three performance measures, objective function value, unsatisfied Target Production Volume index, and unsatisfied job type index, by considerable amounts of 80.92%, 52.20%, and 31.81%, respectively. 1. Introduction Semiconductor manufacturing consists of four basic steps: wafer fabrication, wafer sort, assembly, and test. Assembly and test are back-end steps. Semiconductor assembly contains many operations, such as reflow, wafer mount, saw, chip attach, deflux, EPOXY, cure, and PEVI. IS factory is a site for back-end semiconductor manufacturing where chip attach is the bottleneck operation in the assembly line. In terms of Theory of Constraints (TOC), the capacity of a shop floor depends on the capacity of the bottleneck, and a bottleneck operation gives a tremendous impact upon the performance of the whole shop floor. Consequently, scheduling of chip attach station has a significant effect on the performance of the assembly line. Chip attach is performed in a station which consists of ten parallel machines; thus, chip attach scheduling in nature is some form of unrelated parallel machine scheduling under certain realistic restrictions. Research on unrelated parallel machine scheduling focuses on two sorts of criteria: completion time or flow time related criteria and due date related criteria. Weng et al. [1] proposed a heuristic algorithm called “Algorithm 9” to minimize the total weighted completion time with setup consideration. Algorithm 9 was demonstrated to be superior to six heuristic algorithms. Gairing et al. [2] presented an effective
M. X. Weng, J. Lu, and H. Ren, “Unrelated parallel machine scheduling with setup consideration and a total weighted completion time objective,” International Journal of Production Economics, vol. 70, no. 3, pp. 215–226, 2001.
M. Gairing, B. Monien, and A. Woclaw, “A faster combinatorial approximation algorithm for scheduling unrelated parallel machines,” in Automata, Languages and Programming, vol. 3580 of Lecture Notes in Computer Science, pp. 828–839, 2005.
G. Mosheiov and J. B. Sidney, “Scheduling with general job-dependent learning curves,” European Journal of Operational Research, vol. 147, no. 3, pp. 665–670, 2003.
L. Yu, H. M. Shih, M. Pfund, W. M. Carlyle, and J. W. Fowler, “Scheduling of unrelated parallel machines: an application to PWB manufacturing,” IIE Transactions, vol. 34, no. 11, pp. 921–931, 2002.
K. R. Baker and J. W. M. Bertrand, “A dynamic priority rule for scheduling against due-dates,” Journal of Operations Management, vol. 3, no. 1, pp. 37–42, 1982.
J. J. Kanet and X. Li, “A weighted modified due date rule for sequencing to minimize weighted tardiness,” Journal of Scheduling, vol. 7, no. 4, pp. 261–276, 2004.
R. V. Rachamadugu and T. E. Morton, “Myopic heuristics for the single machine weighted tardiness problem,” Working Paper 28-81-82, Graduate School of Industrial Administration, Garnegie-Mellon University, 1981.
A. Volgenant and E. Teerhuis, “Improved heuristics for the n-job single-machine weighted tardiness problem,” Computers and Operations Research, vol. 26, no. 1, pp. 35–44, 1999.
A. P. J. Vepsalainen and T. E. Morton, “Priority rules for job shops with weighted tardiness costs,” Management Science, vol. 33, no. 8, pp. 1035–1047, 1987.
R. S. Russell, E. M. Dar-El, and B. W. Taylor, “A comparative analysis of the COVERT job sequencing rule using various shop performance measures,” International Journal of Production Research, vol. 25, no. 10, pp. 1523–1540, 1987.
J. Bank and F. Werner, “Heuristic algorithms for unrelated parallel machine scheduling with a common due date, release dates, and linear earliness and tardiness penalties,” Mathematical and Computer Modelling, vol. 33, no. 4-5, pp. 363–383, 2001.
C. F. Liaw, Y. K. Lin, C. Y. Cheng, and M. Chen, “Scheduling unrelated parallel machines to minimize total weighted tardiness,” Computers and Operations Research, vol. 30, no. 12, pp. 1777–1789, 2003.
D. W. Kim, D. G. Na, and F. F. Chen, “Unrelated parallel machine scheduling with setup times and a total weighted tardiness objective,” Robotics and Computer-Integrated Manufacturing, vol. 19, no. 1-2, pp. 173–181, 2003.
T. Jaakkola, M. I. Jordan, and S. P. Singh, “On the convergence of stochastic iterative dynamic programming algorithms,” Neural Computation, vol. 6, pp. 1185–1201, 1994.
S. Riedmiller and M. Riedmiller, “A neural reinforcement learning approach to learn local dispatching policies in production scheduling,” in Proceedings of the 16th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 1999.
M. E. Aydin and E. ？ztemel, “Dynamic job-shop scheduling using reinforcement learning agents,” Robotics and Autonomous Systems, vol. 33, no. 2, pp. 169–178, 2000.
J. Hong and V. V. Prabhu, “Distributed reinforcement learning control for batch sequencing and sizing in just-in-time manufacturing systems,” Applied Intelligence, vol. 20, no. 1, pp. 71–87, 2004.
Y. C. Wang and J. M. Usher, “Application of reinforcement learning for agent-based production scheduling,” Engineering Applications of Artificial Intelligence, vol. 18, no. 1, pp. 73–82, 2005.
B. C. Csáji, L. Monostori, and B. Kádár, “Reinforcement learning in a distributed market-based production control system,” Advanced Engineering Informatics, vol. 20, no. 3, pp. 279–288, 2006.
S. S. Singh, V. B. Tadi？, and A. Doucet, “A policy gradient method for semi-Markov decision processes with application to call admission control,” European Journal of Operational Research, vol. 178, no. 3, pp. 808–818, 2007.
M. Kaya and R. Alhajj, “A novel approach to multiagent reinforcement learning: utilizing OLAP mining in the learning process,” IEEE Transactions on Systems, Man and Cybernetics Part C, vol. 35, no. 4, pp. 582–590, 2005.
C. D. Paternina-Arboleda and T. K. Das, “A multi-agent reinforcement learning approach to obtaining dynamic control policies for stochastic lot scheduling problem,” Simulation Modelling Practice and Theory, vol. 13, no. 5, pp. 389–406, 2005.
C. E. Mariano-Romero, V. H. Alcocer-Yamanaka, and E. F. Morales, “Multi-objective optimization of water-using systems,” European Journal of Operational Research, vol. 181, no. 3, pp. 1691–1707, 2007.
K. Iwamura, N. Mayumi, Y. Tanimizu, and N. Sugimura, “A study on real-time scheduling for holonic manufacturing systems—determination of utility values based on multi-agent reinforcement learning,” in Proceedings of the 4th International Conference on Industrial Applications of Holonic and Multi-Agent Systems, pp. 135–144, Linz, Austria, 2009.