Suyash Sambhare

Posted on Jan 15, 2024

Mainframe Concepts

#ibm #devops #mainframe #lpars

Terminology

Legacy S/360™ systems had a single processor, which was also known as the central processing unit (CPU). The terms system, processor, and CPU were used interchangeably. However, these terms became confusing when systems became available with more than one processor.

Processor and CPU can refer to either the complete system box or to one of the processors (CPUs) within the system box. Although the meaning may be clear from the context of a discussion, even mainframe professionals must clarify which processor or CPU meaning they are using in a discussion. The term central processor complex (CPC) is used to refer to the physical collection of hardware that includes main storage, one or more central processors, timers, and channels. Some system programmers use the term central electronic complex (CEC) to refer to the mainframe "box," but the preferred term is CPC.

Initially, all the S/390 or z/Architecture processors within a CPC were processing units (PUs). After delivery of the CPC, the PUs are characterized as CPs, Integrated Facility for Linux (IFL), and Integrated Coupling Facility (ICF) for Parallel Sysplex configurations.

Mainframe professionals typically use system to indicate the hardware box, a complete hardware environment, or an operating environment, depending on the context. They typically use processor to mean a single processor (CP) within the CPC.

Design Changes

The central processor contains the processors, memory, control circuits, and interfaces for channels. A channel provides an independent data and control path between I/O devices and memory. Legacy systems have up to 16 channels; today's largest mainframe machines can have over 1000 channels.

Channels connect to control units. A control unit contains logic to work with a particular type of I/O device. A control unit for a printer would have much different internal circuitry and logic than a control unit for a tape drive. Some control units can have multiple channel connections providing multiple paths to the control unit and its devices.

Control units connect to devices, such as disk drives, tape drives, and communication interfaces. The division of circuitry and logic between a control unit and its devices is not defined, but it is usually more economical to place most of the circuitry in the control unit.

The channels are parallel channels, also known as bus and tag channels, named for the two heavy copper cables they use. A parallel channel can be connected to a maximum of eight control units. Most control units can be connected to multiple devices; the maximum depends on the particular control unit, but 16 is a typical number.

Each channel, control unit, and device has an address, expressed as a hexadecimal number:

The first digit is the channel number
The second digit is the control unit number
The last digit is the device number

Mainframe designs are more complex:

Parallel channels are not available on the newest mainframes and are slowly being displaced on older systems.
Parallel channels have been replaced with ESCON (Enterprise Systems CONnection) and FICON (FIber CONnection) channels. These channels connect to only one control unit or, more likely, are connected to a director (switch) and are optical fibers.
Current mainframes have more than 16 channels and use two hexadecimal digits as the channel portion of an address.
Channels are generally known as CHPIDs (channel path identifiers) or PCHIDs (physical channel identifiers) on later systems, although the term channel is also correct. The channels are all integrated into the main processor box.

The device address seen by software is more correctly known as a device number, although the term address is still widely used, and is indirectly related to the control unit and device addresses.

Mainframe hardware: I/O connectivity: A single SystemZ9 mainframe can have up to 1024 individual channels for input and output (I/O) connectivity.
System control and partitioning: Each SystemZ model has several elements that constitute the hardware control system for the mainframe.
Logical partitions (LPARs): Logical partitions (LPARs) are, in practice, equivalent to separate mainframes.
Consolidation of mainframes: There are fewer mainframes in use today than there were 15 or 20 years ago. The reduced number is due to consolidation.

I/O connectivity

A single SystemZ9 mainframe can have up to 1024 individual channels for input and output (I/O) connectivity. This capacity is one factor that contributes to the mainframe's legendary scalability.

Partitions create separate logical machines in the central processor complex (CPC). ESCON and FICON channels are logically similar to parallel channels but they use fiber connections and operate much faster. A modern system might have 100-200 channels or channel path identifiers (CHPIDs). Key concepts partly illustrated here include the following:

ESCON and FICON channels connect to only one device or one port on a switch.
Most modern mainframes use switches between the channels and the control units. The switches may be connected to several systems, sharing the control units and some or all of its I/O devices across all the systems.
CHPID addresses are two hexadecimal digits.
Multiple partitions can sometimes share CHPIDs. Whether this is possible depends on the nature of the control units used through the CHPIDs. In general, CHPIDs used for disks can be shared.
An I/O subsystem layer exists between the operating systems in partitions and the CHPIDs.

An ESCON director or FICON switch is a sophisticated device that can sustain high data rates through many connections. A large director might have 200 connections, and all of these can be passing data at the same time. The director or switch must keep track of which CHPID initiated which I/O operation so that data and status information are returned to the right place. Multiple I/O requests, from multiple CHPIDs attached to multiple partitions on multiple systems, can be in progress through a single control unit.

The I/O control layer uses a control file known as an IOCDS (I/O Control Data Set) that translates physical I/O addresses into device numbers that are used by the operating system software to access devices. This is loaded into the Hardware Save Area (HSA) at power-on and can be modified dynamically. A device number looks like the addresses for early S/360™ machines except that it can contain three or four hexadecimal digits.

Many users still refer to these as "addresses" although the device numbers are arbitrary numbers between x'0000' and x'FFFF'. Today's mainframes have two layers of I/O address translations between the real I/O elements and the operating system software. The second layer was added to make migration to newer systems easier.

Modern control units, especially for disks, often have multiple channels (or switch) connections and multiple connections to their devices. They can handle multiple data transfers at the same time on multiple channels. Each device will have a unit control block (UCB) in each z/OS image.

Logical partitions

Logical partitions (LPARs) are, in practice, equivalent to separate mainframes.

Each LPAR runs its operating system. This can be any mainframe operating system; there is no need to run z/OS in each LPAR. The installation planners may elect to share I/O devices across several LPARs, but this is a local decision.

The system administrator can assign one or more system processors for the exclusive use of an LPAR. Alternatively, the administrator can allow all processors to be used on some or all LPARs. Here, the system control functions provide a dispatcher to share the processors among the selected LPARs. The administrator can specify a maximum number of concurrent processors executing in each LPAR. The administrator can also provide weightings for different LPARs; for example, specifying that LPAR1 should receive twice as much processor time as LPAR2.

The operating system in each LPAR is IPLed separately, has its own copy of its operating system, has its own operator console, and so forth. If the system in one LPAR crashes, there is no effect on the other LPARs.

In a mainframe system with three LPARs, you might have a production z/OS in LPAR1, a test version of z/OS in LPAR2, and Linux for S/390 in LPAR3. If this total system has 8 GB of memory, we might have assigned 4 GB to LPAR1, 1 GB to LPAR2, 1 GB to LPAR3, and have kept 2 GB in reserve. The operating system consoles for the two z/OS LPARs might be in completely different locations.

For most practical purposes there is no difference between three separate mainframes running z/OS and three LPARs on the same mainframe doing the same thing. With minor exceptions z/OS, the operators, and applications cannot detect the difference.

The minor differences include the ability of z/OS to obtain performance and utilization information across the complete mainframe system and to dynamically shift resources among LPARs to improve performance.

Consolidation of mainframes

There are fewer mainframes in use today than there were 15 or 20 years ago. In some cases, all the applications were moved to other types of systems; however, in most cases, the reduced number is due to consolidation. That is, several smaller mainframes have been replaced with a smaller number of larger systems.

There is a compelling reason for consolidation. Mainframe software can be expensive and typically costs more than the mainframe hardware. It is usually less expensive to replace multiple software licenses with one or two licenses. Software license costs are often linked to the power of the system but the pricing curves favor a small number of large machines.

Software license costs for mainframes have become a dominant factor in the growth and direction of the mainframe industry. Several nonlinear factors make software pricing very difficult. We must remember that mainframe software is not a mass market situation like PC software. The growth of mainframe processing power in recent years has been nonlinear.

The relative power needed to run a traditional mainframe application does not have a linear relation to the power needed for a new application. The consolidation effect has produced very powerful mainframes. These mainframes might need 1% of their power to run an application, but the application vendor often sets a price based on the total power of the machine.

This pricing policy results in the odd situation where customers want the latest mainframe but they want the slowest mainframe that will run their applications.

Processing units

Early mainframes had a single processor, which was known as the central processing unit (CPU). Today's mainframes have a central processor complex (CPC), which may contain several different types of z/Architecture processors that can be used for slightly different purposes.

Several of these purposes are related to software cost control, while others are more fundamental. All of the processors in the CPC begin as equivalent processor units (PUs) or engines that have not been characterized for use. Each processor is characterized by IBM during installation or at a later time. The potential characterizations are:

Central Processor (CP): This processor type is available for normal operating systems and application software.
System Assistance Processor (SAP): The SAPs execute internal code to provide the I/O subsystem. An SAP translates device numbers and real addresses of channel path identifiers (CHPIDs), control unit addresses, and device numbers. It manages multiple paths to control units and performs error recovery for temporary errors. Operating systems and applications cannot detect SAPs, and SAPs do not use any "normal" memory.
Integrated Facility for Linux (IFL): This is a normal processor with one or two instructions disabled that are used only by z/OS. Linux does not use these instructions and can be executed by an IFL. Linux can be executed by a CP as well. The difference is that an IFL is not counted when specifying the model number of the system. This can make a substantial difference in software costs.
zAAP: This is a processor with several functions disabled such that no full operating system can be executed on the processor. However, z/OS can detect the presence of zAAP processors and will use them to execute Java™ code. The same Java code can be executed on a standard CP. Again, zAAP engines are not counted when specifying the model number of the system. Like IFLs, they exist only to control software costs.
zIIP: The SystemZ9 Integrated Information Processor (zIIP) is a specialized engine for processing eligible database workloads. The zIIP is designed to help lower software costs for select workloads on the mainframe, such as business intelligence (BI), enterprise resource planning (ERP), and customer relationship management (CRM). The zIIP reinforces the mainframe's role as the data hub of the enterprise by helping to make direct access to DB2 more cost-effective and reducing the need for multiple copies of the data.
Integrated Coupling Facility (ICF): These processors run only Licensed Internal Code. They are not visible to normal operating systems or applications. A coupling facility is, in effect, a large memory scratch pad used by multiple systems to coordinate work. ICFs must be assigned to LPARs that then become coupling facilities.
Spare: An uncharacterized PU functions as a "spare." If the system controllers detect a failing CP or SAP, it can be replaced with a spare PU. In most cases, this can be done without any system interruption, even for the application running on the failing processor.
In addition to these characterizations of processors, some mainframes have models or versions that are configured to operate slower than the potential speed of their CPs. This is widely known as kneecapping, although IBM prefers the term capacity setting. It is done by using a microcode to insert null cycles into the processor instruction stream. The purpose, again, is to control software costs by having the minimum mainframe model or version that meets the application requirements. IFLs, SAPs, zAAPs, zIIPs, and ICFs always function at the full speed of the processor because these processors "do not count" in software pricing calculations.

Multiprocessors

The term multiprocessor means several processors (CP processors) and implies that several processors are used by a copy of z/OS.

All operating systems today, from PCs to mainframes, can work in a multiprocessor environment. However, the degree of integration of the multiple processors varies considerably. Pending interrupts in a system can be accepted by any processor in the system. Any processor can initiate and manage I/O operations to any channel or device available to the system or LPAR. Channels, I/O devices, interrupts, and memory are owned by the system and not by any specific processor.

This multiprocessor integration appears simple on the surface, but its implementation is complex. It is also important for maximum performance; the ability of any processor to accept any interrupt sent to the system is especially important.

Each processor in a system has a small private area of memory that is unique to that processor. This area is the Prefix Storage Area (PSA) and is used for interrupt handling and error handling. A processor can access another processor's PSA through special programming, although this is normally done only for error recovery purposes. A processor can interrupt other processors by using a special instruction (SIGP, for Signal Processor). This is typically used only for error recovery.

Ref: https://www.ibm.com/docs/en/zos-basic-skills

Debug School