This paper presents the design of an intelligent hierarchical control strategy for a flexible assembly system. The system controller learns a dynamic production control policy at the highest level of the hierarchy. An artificial intelligence technique called average reward reinforcement learning (RL) is used to train the supervisor (controller) on what products should be produced, and when products should be produced. RL is a simulation-based stochastic approximation technique that can find near-optimal policies for MDP's (Markov Decision Processes). An extension of RL to Semi-Markov decision processes (SMDP's) is used. SMDP's are a more general class of decision problems. In particular, this paper focuses on SMDP's under the average-reward cost criterion. The lower hierarchy level consists of local agents (e.g. robots) that make decisions on how to execute the chosen production policy. The agents' decisions are based on self-performance evaluation functions. The learning phase is undertaken offline by means of simulation. Results are then transferred to the system supervisor (controller), which will make decisions online, based on these estimates.