Chip attach is the bottleneck operation in semiconductor assembly. Chip attach scheduling is in nature unrelated parallel machine scheduling considering practical issues, for example, machine-job qualification, sequence-dependant setup times, initial machine status, and engineering time. The major scheduling objective is to minimize the total weighted unsatisfied Target Production Volume in the schedule horizon. To apply -learning algorithm, the scheduling problem is converted into reinforcement learning problem by constructing elaborate system state representation, actions, and reward function. We select five heuristics as actions and prove the equivalence of reward function and the scheduling objective function. We also conduct experiments with industrial datasets to compare the -learning algorithm, five action heuristics, and Largest Weight First (LWF) heuristics used in industry. Experiment results show that -learning is remarkably superior to the six heuristics. Compared with LWF, -learning reduces three performance measures, objective function value, unsatisfied Target Production Volume index, and unsatisfied job type index, by considerable amounts of 80.92%, 52.20%, and 31.81%, respectively. 1. Introduction Semiconductor manufacturing consists of four basic steps: wafer fabrication, wafer sort, assembly, and test. Assembly and test are back-end steps. Semiconductor assembly contains many operations, such as reflow, wafer mount, saw, chip attach, deflux, EPOXY, cure, and PEVI. IS factory is a site for back-end semiconductor manufacturing where chip attach is the bottleneck operation in the assembly line. In terms of Theory of Constraints (TOC), the capacity of a shop floor depends on the capacity of the bottleneck, and a bottleneck operation gives a tremendous impact upon the performance of the whole shop floor. Consequently, scheduling of chip attach station has a significant effect on the performance of the assembly line. Chip attach is performed in a station which consists of ten parallel machines; thus, chip attach scheduling in nature is some form of unrelated parallel machine scheduling under certain realistic restrictions. Research on unrelated parallel machine scheduling focuses on two sorts of criteria: completion time or flow time related criteria and due date related criteria. Weng et al.  proposed a heuristic algorithm called “Algorithm 9” to minimize the total weighted completion time with setup consideration. Algorithm 9 was demonstrated to be superior to six heuristic algorithms. Gairing et al.  presented an effective
M. X. Weng, J. Lu, and H. Ren, “Unrelated parallel machine scheduling with setup consideration and a total weighted completion time objective,” International Journal of Production Economics, vol. 70, no. 3, pp. 215–226, 2001.
M. Gairing, B. Monien, and A. Woclaw, “A faster combinatorial approximation algorithm for scheduling unrelated parallel machines,” in Automata, Languages and Programming, vol. 3580 of Lecture Notes in Computer Science, pp. 828–839, 2005.
R. V. Rachamadugu and T. E. Morton, “Myopic heuristics for the single machine weighted tardiness problem,” Working Paper 28-81-82, Graduate School of Industrial Administration, Garnegie-Mellon University, 1981.
R. S. Russell, E. M. Dar-El, and B. W. Taylor, “A comparative analysis of the COVERT job sequencing rule using various shop performance measures,” International Journal of Production Research, vol. 25, no. 10, pp. 1523–1540, 1987.
J. Bank and F. Werner, “Heuristic algorithms for unrelated parallel machine scheduling with a common due date, release dates, and linear earliness and tardiness penalties,” Mathematical and Computer Modelling, vol. 33, no. 4-5, pp. 363–383, 2001.
D. W. Kim, D. G. Na, and F. F. Chen, “Unrelated parallel machine scheduling with setup times and a total weighted tardiness objective,” Robotics and Computer-Integrated Manufacturing, vol. 19, no. 1-2, pp. 173–181, 2003.
S. Riedmiller and M. Riedmiller, “A neural reinforcement learning approach to learn local dispatching policies in production scheduling,” in Proceedings of the 16th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 1999.
S. S. Singh, V. B. Tadi？, and A. Doucet, “A policy gradient method for semi-Markov decision processes with application to call admission control,” European Journal of Operational Research, vol. 178, no. 3, pp. 808–818, 2007.
M. Kaya and R. Alhajj, “A novel approach to multiagent reinforcement learning: utilizing OLAP mining in the learning process,” IEEE Transactions on Systems, Man and Cybernetics Part C, vol. 35, no. 4, pp. 582–590, 2005.
C. D. Paternina-Arboleda and T. K. Das, “A multi-agent reinforcement learning approach to obtaining dynamic control policies for stochastic lot scheduling problem,” Simulation Modelling Practice and Theory, vol. 13, no. 5, pp. 389–406, 2005.
K. Iwamura, N. Mayumi, Y. Tanimizu, and N. Sugimura, “A study on real-time scheduling for holonic manufacturing systems—determination of utility values based on multi-agent reinforcement learning,” in Proceedings of the 4th International Conference on Industrial Applications of Holonic and Multi-Agent Systems, pp. 135–144, Linz, Austria, 2009.