How to access iOS Kernel

how to patch ios kernel and xcode generic kernel extension and kernel panic after installing xcode and xcode linux kernel development
Dr.KiranArora Profile Pic
Published Date:27-10-2017
Your Website URL(Optional)
Mac OS X and iOS Mac OS X is a modern Unix-based operating system developed by Apple Inc for their Macintosh computer series. OS X is the tenth incarnation of Mac OS. OS X features a graphical user interface known for its ease of use and visual appeal. Apple has gained a cult-like following for their products, and any new feature addition to either OS X or iOS receives widespread attention. In addition to the regular edition of OS X, Apple also provided a server edition of OS X called Mac OS X Server. The server version was later merged with the regular version in Mac OS X 10.7 (Lion). OS X was the successor to Mac OS 9, and represented a radical departure from earlier versions. Unlike its predecessors, OS X was based on the NeXTSTEP operating system. At present, there have been eight releases of Mac OS X, with the latest being Mac OS X 10.7, codenamed Lion. The Mac OS X releases to date are shown in Table 2-1. Table 2-1. Mac OS X Releases to Date Version Name Released 10.0 Cheetah March 2001 10.1 Puma September 2001 10.2 Jaguar August 2002 10.3 Panther October 2003 10.4 Tiger April 2005 10.5 Leopard October 2007 10.6 Snow Leopard August 2009 10.7 Lion July 2011 15 CHAPTER 2  MAC OS X AND IOS Mac OS X comes with a range of tools for developers, including Xcode, which allow the development of a wide range of applications, including the major topic of this book—kernel extensions. For the end-user, OS X usually comes bundled with the iLife suite, which contains software for photo, audio, and video editing, as well as software for authoring web pages. NEXTSTEP OS X and iOS are based on the NeXTSTEP OS developed by NeXT Computer Inc, which was founded by Steve Jobs after he left Apple in 1985. The company was initially funded by Jobs himself, but later gained significant outside investments. NeXT was later acquired by Apple, and NeXTSTEP technology made its way into OS X. The aim of NeXT was to build a computer for academia and business. Despite limited commercial success relative to the competition, the NeXT computers (most notably the NeXTcube) had a highly innovative operating system, called NeXTSTEP, which was in many ways ahead of its time. NeXTSTEP had a graphical user interface and command line interface like the current versions of OS X (iOS does not provide a user accessible command line interface). Many core technologies introduced by NeXTSTEP are still found in its successors, such as application bundles and Interface Builder. Interface Builder is now part of the Xcode development environment and is widely used for both OS X and iOS Cocoa applications. NeXTSTEP provided Driver Kit, an object-oriented framework for driver development, which later evolved into I/O Kit, one of the major topics of this book. iOS was later derived from OS X, and it is Apple’s OS for mobile devices. It was launched with the release of the first iPhone, in 2007, and at that point it was called iPhone OS, though it was later renamed iOS to better reflect the fact that it runs on other mobile devices, such as the iPod Touch, the iPad, and more recently the Apple TV. iOS was built specifically for mobile devices with touch interfaces. Unlike the biggest competitor, Windows, neither OS X nor iOS are licensed for use by third parties, and they can officially only be used on Apple’s hardware products. A high-level view of the Mac OS X architecture is shown in Figure 2-1. 16 CHAPTER 2  MAC OS X AND IOS Figure 2-1. Mac OS X architecture The core of Mac OS X and iOS is POSIX compliant and has since Mac OS X 10.5 (Leopard) complied with the Unix 03 Certification. The core of OS X and iOS, which includes the kernel and the Unix base of the OS, is known as Darwin, and it is an open source operating system published by Apple. Darwin, unlike Mac OS X, does not include the characteristic user interface, as it is a bare bones system, in that it only provides the kernel and user space base of tools and services typical of Unix systems. At its release, the only supported architecture was the PowerPC platform, but Intel 32 and 64-bit support was subsequently added as part of Apple’s shift to the Intel architecture. Apple has thus far not released the ARM version of Darwin that iOS is based on. Darwin is currently downloadable in source form only, and has to be compiled. The Darwin distribution includes the source code for the XNU kernel. The kernel sources are a particularly useful resource for people wanting to know more about the inner workings of the OS, and for developing kernel extensions. You can often find more detailed explanations in the source code headers, or the code itself, than are documented on Apple’s developer website. The Darwin OS (and therefore OS X and iOS) runs the XNU kernel, which is based on code from the Mach kernel, as well as parts of the FreeBSD operating system. Figure 2-2 shows the Mac OS X desktop. 17 4 Download from Wow eBook www.wowebook.comCHAPTER 2  MAC OS X AND IOS Figure 2-2. The Mac OS X desktop Programming APIs As you can see from Figure 2-1, OS X has a layered architecture. Between the Darwin core and the user application there is a rich set of programming APIs. The most significant of these is Cocoa, which is the preferred framework for GUI-based applications. The iOS equivalent is Cocoa Touch, which is principally the same, but offers GUI elements specialized for touch-based user interaction. Both Cocoa and Cocoa Touch are written in the Objective-C language. Objective-C is a superset of C, with support for Smalltalk style messages. OBJECTIVE-C Objective-C was the language of choice for application development under Mac OS X and iOS, as well as their predecessor, NeXTSTEP. Objective-C is a superset of the C language and provides support for object- oriented programming, but it lacks many of the advanced capabilities provided by languages like C++, such as multiple inheritance, templates, and operator overloading. Objective-C uses Smalltalk-style messaging and dynamic binding (which in many ways removes the need for multiple inheritance). The language was invented in the early 1980s by Brad Cox and Tom Love. Objective-C is still the de-facto standard language for application development on both OS X and iOS, although driver or system level programming is typically done in C or C++. Many core frameworks still use the NS (for NeXTSTEP) prefix in their class names, such as NSString and NSArray. 18 CHAPTER 2  MAC OS X AND IOS Other programming APIs include the BSD API, which provides application access to low-level file and device access, as well as the POSIX threading API (pthreads). The BSD layer, unlike Cocoa, does not provide facilities for programming applications with a graphical user interface. Mac OS X has another major API, called Carbon. Carbon is a C-based API that overlaps with Cocoa in terms of functionality. It originally provided some backward compatibility with earlier versions of Mac OS. The Carbon API is now deprecated in favor of Cocoa for GUI applications, but remains in OS X to support legacy applications, such as Apple’s Final Cut Pro 7. The publically available version of Carbon remains 32-bit only, so Cocoa is needed for 64-bit compatibility. The fourth major API is Java, which has now also been deprecated. Java was removed from default installation in Mac OS X 10.7, although it is still provided as an optional install. Graphics and multimedia are key differentiators that OS X and iOS offer over other operating systems. Both offer a rich set of APIs for working with graphics and multimedia. The core of the graphics system is the Quartz system. Quartz encompasses the windowing system (Quartz Compositor), as well as the API known as Quartz 2D. Quartz is based on the PDF (Portable Document Format) model. It offers resolution independent user interfaces, as well as anti-aliased rendering of text and graphics. The Quartz Extreme interface offers hardware-assisted OpenGL rendering of windows, where supported by the graphics hardware. Here’s a short overview of some important graphics and multimedia frameworks: • Quartz: Consists of the Quartz 2D API and the Quartz Compositor, which provides the graphical window server. Cocoa Drawing offers an object-oriented interface on top of Quartz for use in Cocoa applications. • OpenGL: The industry standard API for developing 3D applications. iOS supports a version of OpenGL called OpenGL ES, a subset designed for embedded devices. • Core Animation: A layer-based API integrated with Cocoa that makes it easy to create animated content and do transformations. • Core Image: Provides support for working with images, including adding effects, cropping, or color correction. • Core Audio: Offers support for audio playback, recording, mixing, and processing. • QuickTime: An advanced library for working with multimedia. It allows playback and the recording of audio and video, including professional formats. • Core Text: A C-based API for text rendering and layout. The Cocoa Text API is based on Core Text. Supported Platforms At its release, OS X was only supported on the PowerPC platform. In January 2006, Apple released version 10.4.4, which finally brought Mac OS X to the Intel x86-platform, as announced at WWDC 2005. The reason for transitioning away from the PowerPC platform was, according to Apple, their disappointment in IBM’s ability to deliver a competitive microprocessor, especially for low-power processors intended for laptops. The transition to Intel was smooth for Apple, and indeed it is one of the few examples of a successful platform shift within the industry. Apple provided an elegant solution, called Rosetta, which is a dynamic translator that would allow existing PowerPC applications to run on x86-based Macs (naturally with some performance penalties). Apple also provided developers with Universal Binaries, which allowed native code for more than one architecture to exist within a single binary executable (also referred to as fat binaries). While support for 19 CHAPTER 2  MAC OS X AND IOS PowerPC was discontinued, as of Mac OS X 10.6 (Snow Leopard), Universal Binaries is still used to provide 32-bit, and 64-bit x86 or x86_64, executables. 64-bit Operating System Mac OS X 10.5 (Leopard) allowed, for the first time, GUI applications to be 64-bit native, accomplished through a new 64-bit version of Cocoa, which allowed developers to tap the additional benefits provided by the 64-bit CPUs found in the current generation of Macs. Applications based on the Carbon API are still 32-bit only. The subsequent release of Mac OS X 10.6 (Snow Leopard) took things one-step further by allowing the kernel to run in 64-bit mode. While most applications and APIs were already 64-bit in Leopard, the kernel itself was still running in 32-bit mode. Although Snow Leopard made a 64-bit mode kernel possible, only some of the models defaulted to 64-bit, while other models required it to be enabled manually. Snow Leopard was the first release that did not include support for PowerPC computers, although PowerPC applications could still be run with Rosetta. Support for Rosetta was removed in Lion, along with support for the 32-bit kernel. While user space is able to support both 64-bit and 32-bit applications side by side, the kernel is incompatible with 32-bit drivers and extensions when running in 64-bit mode. A 64-bit kernel provides many advantages, and a larger address space means large amounts of memory can be supported. iOS iOS, or iPhone OS 1.0 as it was initially called, was released in June 2007 (see Table 2-2 for iOS releases). It was based on Mac OS X and shared most of its fundamental architecture with its older sibling. It featured, however, a new and innovative user interface provided by the Cocoa Touch API (sharing many traits and parts with the original Cocoa), which was specifically designed for the iPhone’s capacitive touch screen. In addition to Cocoa Touch, iOS had a number of other programming APIs, like the Accelerate framework, which provided math and other related functions, optimized for the iOS hardware. The External Accessory Framework allows iOS devices to communicate with third-party hardware devices via Bluetooth or the inbuilt 30-pin connector. Table 2-2. iOS Releases Version Device Released iPhone OS 1.0 iPhone, iPod Touch (1.1) June 2007 iPhone OS 2.0 iPhone 3G July 2008 iPhone OS 3.0 iPhone 3GS, iPad (3.2) June 2009 iOS 4.0 iPhone 4 June 2010 iOS 5.0 iPhone 4S October 2011 At its launch, iPhone OS was not able to run native third party applications, but it could run web applications tailored to the iPhone, which could be added to the iPhone’s home screen. An SDK for the iPhone was later announced at the beginning of 2008, which allowed development of third party applications. Unlike most computer platforms, however, Apple requires all iPhone applications to be submitted and pre-approved, and thus digitally signed, before a customer can install it through the App 20 CHAPTER 2  MAC OS X AND IOS Store. While many criticized the approach (and still do), it allowed Apple to weed out poorly written, slow, and malicious software, thereby improving the overall user experience, and ultimately the popularity of the platform. Unofficially, it has been possible to “Jailbreak” iOS and gain access to the underlying Unix and kernel environment, but this voids the warranty. Due to concerns about battery life, the iPhone was not able to properly multitask third-party applications until the release of iOS 4.0. iOS now supports the iPhone, iPod Touch, and iPad, and also runs on the latest generation of Apple TVs, which were previously based on OS X, running on Intel x86 CPUs. Apple does not support third party applications on the Apple TV at this time. The XNU Kernel The XNU kernel is large and complex, and a full architectural description is beyond the scope of this book (there are other books that fill this need), but we will, in the following sections, outline some of the major components that make up XNU and offer a brief description of their responsibilities and mode of operation. In most cases when programming for the kernel you will be writing extensions rather than modifying the core kernel itself (unless you happen to be an Apple Engineer or contributor to Darwin), but it is useful to have a basic understanding of the kernel as a whole, as it will give a better understanding of how a kernel extension fit within the bigger picture. Subsequent chapters will focus on some of the more important programming frameworks that the kernel provides such as I/O Kit. The XNU kernel is the core of Mac OS X and iOS. XNU has a layered architecture consisting of three major components. The inner ring of the kernel is referred to as the Mach layer, derived from the Mach 3.0 kernel developed at Carnegie Mellon University. References to Mach throughout the book will refer to Mach as it is implemented in OS X and iOS and not the original project. Mach was developed as a microkernel, a thin layer providing only fundamental services, such as processor management and scheduling, as well as IPC (inter-process communication), which is a core concept of the Mach kernel. Because of the layered architecture, there are minimal differences between the iOS and Mac OS X versions of XNU. While the Mach layer in XNU has the same responsibilities as in the original project, other operating system services, such as file systems and networking, run in the same memory space as Mach. Apple cites performance as the key reason for doing this, as switching between address spaces (context switching) is an expensive operation. Because the Mach layer is still, to some degree, an isolated component, many refer to XNU as a hybrid kernel, as opposed to a microkernel or a monolithic kernel, where all OS services run in the same context. Figure 2-3 shows a simplified view of XNU’s architecture. 21 CHAPTER 2  MAC OS X AND IOS Figure 2-3. The XNU kernel architecture The second major component of XNU is the BSD layer, which can be thought of as an outer ring around the Mach layer. BSD again provides a programming interface to end-user applications. Responsibilities include process management, file systems, and networking. The last major component is the I/O Kit, which provides an object-oriented framework for device drivers. While it would be nice if each layer had clear responsibilities, reality is somewhat more complicated and the lines between each layer are blurred, as many OS services and tasks span the borders of multiple components. ■ Tip You can download the full source code for XNU at Apple’s open source website: Kernel Extensions (KEXTs) The XNU kernel, like most, if not all, modern operating systems, supports dynamically loading code into the kernel’s address space at runtime. This allows extra functionality, such as drivers, to be loaded and unloaded while the kernel is running. A main focus of this book will be the development of such kernel extensions, with a particular focus on drivers, as this is the most common reason to implement a kernel extension. There are two principal classes of kernel extensions. The first class is for I/O Kit-based kernel extensions, which are used for hardware drivers. These extensions are written in C++. The second class is for generic kernel extensions, which are typically written in C (though C++ is possible here, too). These extensions can implement anything from new network protocols to file systems. Generic kernel extensions usually interface with the BSD or Mach layers. 22 CHAPTER 2  MAC OS X AND IOS Mach The Mach layer can be seen as the core of the kernel, a provider of lower-level services to higher-level components like the BSD layer and I/O Kit. It is responsible for hardware abstraction, hiding the differences between the PowerPC architecture and the Intel x86 and x86-64 architectures. This includes details for handling traps and interrupts, as well as managing memory, including virtual memory and paging. This design allows the kernel to be easily adapted to new hardware architectures, as proven with Apple’s move to Intel x86, and later to ARM for iOS. In addition to hardware abstraction, Mach is responsible for the scheduling of threads. It supports symmetric multiprocessing (SMP), which refers to the ability to schedule processes between multiple CPUs or CPU cores. In fact, the difficulty of implementing proper SMP support in the existing BSD Unix kernel was instrumental in the development of Mach. Interprocess communication (IPC) is the core tenet of Mach’s design. IPC in Mach is implemented as a client/server system. A task (the client) is able to request services from another task (the server). The endpoints in this system are known as ports. A port has associated rights, which determine if a client has access to a particular service. This IPC mechanism is used internally throughout the XNU kernel. The following sections will outline the key abstractions and services provided by the Mach layer. ■ Tip Mach API documentation can be found in the osfmk/man directory of the XNU source package. Tasks and Threads A task is a group consisting of zero or more executable threads that share resources and memory address space. A task needs at least one thread to be executed. A Mach task maps one to one to a Unix (BSD layer) process. The XNU kernel is also a task (known as the kernel_task) consisting of multiple threads. Task resources are private and cannot normally be accessed by the threads of another task. Unlike a task, a thread is an executable entity that can be scheduled and run by the CPU. A thread shares resources, such as open files or network sockets, with other threads in the same task. Threads of the same task can execute on different CPUs concurrently. A thread has its own state, which includes a copy of the processor state (registers and instruction counter) and its own stack. The state of a thread is restored when it is scheduled to run on a CPU. Mach supports preemptive multitasking, which means that a thread’s execution can be interrupted before its allocated time slice (10ms in XNU) is up. Preemption happens under a variety of circumstances, such as when a high priority OS event occurs, when a higher priority thread needs to run, or when waiting for long I/O operations to complete. A thread can also voluntarily preempt itself by going to sleep. A Mach thread is scheduled independently from other threads, regardless of the task to which it belongs. The scheduler is also unaware of process parent-child relationships traditional in Unix systems (the BSD layer, however, is aware). Scheduling The scheduler is responsible for coordinating the access of threads to the CPU. Most modern kernels, including XNU, use a timesharing scheduler, where each thread is allocated a finite (10ms in XNU, as we’ve seen) time quantum in which the thread is allowed to execute. Upon expiration of the thread’s quantum, it is put to sleep so that other threads can run. While it may seem reasonable and fair that each thread gets to run for an equal amount of time, this is impractical, as some threads have a greater need 23 CHAPTER 2  MAC OS X AND IOS for low latencies, for example to perform audio and video playback. The XNU scheduler employs a priority-based algorithm to schedule threads. Table 2-3 shows the priority levels used by the scheduler. Table 2-3. Scheduler Priority Levels Priority Level Description Normal 0–51 Normal applications. The default priority for a regular application thread is 31. Zero is the idle priority. High Priority 52–79 High priority threads. Kernel Mode 80–95 Range is reserved for high priority kernel threads, for example those used by a device driver. Real-time 96–127 Real-time threads (user space threads can run in real- time). The kernel organizes threads in doubly-linked lists. This collection of lists is known as the run queue. There is one list per priority level (currently 0–127). Each processor (core) in the system maintains its own run queue structure (osfmk/kern/sched.h): struct run_queue int highq; / highest runnable queue / int bitmapNRQBM; / run queue bitmap array / int count; / of threads total / int urgency; / level of preemption urgency / queue_head_t queuesNRQS; / one for each priority / ; A regular application thread starts with a priority of 31. Its priority may decrease over time, as a side effect of the scheduling algorithm. This will happen, for example, if a thread is highly compute intensive. By lowering the priority of such threads, it will improve the scheduling latency of I/O bound threads, which spend most of their time sleeping in-between issuing I/O requests, thus usually going back to sleep before their quantum expires, and thus allowing compute intensive threads access to the CPU again. The end result is improved system responsiveness. To avoid getting into a situation where the thread’s priority will be too low for it to run, the Mach scheduler will decay a thread’s processor usage accounting over time, eventually resetting it, and thus a thread’s priority will fluctuate over time. The Mach scheduler provides support for real-time threads, although it does not provide guaranteed latency; however, every effort is made to ensure it will run for the required amount of clock cycles. A real-time thread may be downgraded to normal priority if it does not block/sleep frequently enough, for example if it is highly compute bound. Mach IPC: Ports and Messages A port is a unidirectional communications endpoint, which represents a resource referred to as an object. If you are familiar with TCP/IP networking, many parallels can be drawn between Mach’s IPC and the UDP protocol, though unlike the UDP protocol, Mach IPC is used for more than just data transfers. It can be used to provide synchronization, or to send notifications between tasks. An IPC client 24 CHAPTER 2  MAC OS X AND IOS can send messages to a port. The owner of the port receives the messages. For bidirectional communication, two ports are needed. A port is implemented as a message queue (though other mechanisms exist). Messages for the port are queued until a thread is available to service them. A port can receive messages from multiple senders, but there can be only one receiver per port. Ports have protection mechanisms known as port rights. A task must have the proper permissions in order to interact with a port. Port rights are associated with a task; therefore, all threads in a task share the same privileges to a port. The following are examples of port rights: send, send once, and receive. The rights can be copied or moved between tasks. Unlike Unix permissions, port rights are not inherited from parent to child processes (Mach tasks do not have this concept). Table 2-4 shows the available port right types. Table 2-4. Port Right Types (from mach/port.h) Port Right Type Description MACH_PORT_RIGHT_SEND The holder of the right has permission to send messages to a port. MACH_PORT_RIGHT_RECIEVE The holder has the right to receive messages from a port. Receive rights provide automatic send rights. MACH_PORT_RIGHT_SEND_ONCE Same as send rights, but only valid for one message. MACH_PORT_RIGHT_PORT_SET Receive (and send) rights to a group of ports. MACH_PORT_RIGHT_DEAD_NAME Denotes rights that have become invalid or been destroyed, such as after messaging a port with send once rights. A group of ports are collectively known as a port set. The message queue is shared between all ports in a set. A 32-bit integer number addresses ports in the system. There is no global register or namespace for ports. The Mach IPC system is also available in user space programs and can be used to pass messages between tasks or from a task to the kernel. It offers an alternative to system calls, though the mechanism uses system calls under the hood. Mach Exceptions Exceptions are interrupts sent by a CPU when certain (exceptional) events or conditions occur during the execution of a thread. An exception will result in the interruption of a thread’s execution, while the OS (Mach) processes the exception. The task may resume afterwards, depending on the type of exception that occurred. Common causes for exceptions include access to invalid or non-existing memory, execution of an invalid processor instruction, passing invalid arguments, or division by zero. These exceptions usually result in the termination of the offending task, but there are also a number of non-erroneous exceptions that can occur. A system call is one such exception. A user space application may issue a system call exception when it needs to perform a low-level operation involving the kernel, such as writing from a file, or receiving data on a network socket. When the OS handles the system call, it inspects a register for the system call number, which is then used to look up the handler for that call, for example read() or recv(). 25 CHAPTER 2  MAC OS X AND IOS A task may also generate an exception if attempting to access paged out memory. In this case, a page fault exception is generated, which will be handled by retrieving the missing page from the backing store, or result in an invalid memory access. A task may also issue deliberate exceptions with the EXC_BREAKPOINT exception, which are typically used in debugging or tracing applications, such as Xcode, to temporarily halt the execution of a thread. It is possible, of course, for the kernel itself to misbehave and cause exceptions. In this case, the OS will be halted and the grey screen of death will be shown (unless the kernel debugger is activated), informing the user to reboot the computer. Table 2-5 shows a subset of defined Mach exceptions. Table 2-5. Common Mach Exception Types Exception Type Description EXC_BAD_ACCESS Invalid memory access. EXC_BAD_INSTRUCTION The thread attempted to access an illegal/invalid instruction or gave an invalid parameter (operand) to the instruction. EXC_ARITMETHIC Issued on division by zero or integer overflow/underflow. EXC_SYSCALL and Issued by an application to access kernel services such as file I/O or EXC_MACH_SYSCALL network access. … Other Mach exceptions are defined in mach/exception_types.h. Processor dependent exceptions are defined in mach/(i386,ppc, …)/exception.h. When an exception occurs, the kernel will suspend the thread which caused the exception, and send an IPC message to the thread’s exception port. If the thread does not handle the exception, it’s forwarded to the containing task’s exception port, and finally to the system’s (host) exception port. The following structure encapsulates a thread, task, or processor’s (host) exception ports: struct exception_action struct ipc_port port; / exception port / thread_state_flavor_t flavor; / state flavor to send / exception_behavior_t behavior; / exception type to raise / boolean_t privileged; / survives ipc_task_reset / ; Each thread, task, and host has an array of the structure exception_action, which specifies exception behavior, one structure is defiend for each exception type (as defined in Table 2-5). The flavor and behavior fields specify the type of information that should be sent with the exception message, such as the state of general purpose, or other specialized CPU registers, and the handler, which should be executed. The handler will be either catch_mach_exception_raise(), catch_mach_exception_raise_state() or catch_mach_exception_raise_state_identity(). When an exception has been dispatched, the kernel waits for a reply in order to determine the course of action. A return of KERN_SUCCESS means the exception was handled, and the thread will be allowed to resume. A thread’s exception port defaults to PORT_NULL, unless a port is explicitly allocated, exceptions will be handled by task’s exception port instead. When a process issues the fork() system call to spawn a 26 CHAPTER 2  MAC OS X AND IOS child process, the child will inherit exception ports from the parent task. The Unix signaling mechanism is implemented on top of the Mach’s exception system. Time Management Proper timekeeping is a vital responsibility of any OS, not only to serve user applications, but also to serve other important kernel functions such as scheduling processes. In Mach, the abstraction for time management is known as a clock. A clock object in Mach represents time in nanoseconds as a monotonically increasing value. There are three main clocks defined: the real-time clock, the calendar clock, and the high-resolution clock. The real-time clock keeps the time since the last boot, while the calendar clock is typically battery backed, so its value is persistent across system reboots, or in periods when the computer is powered off. It has a resolution of seconds and as the name implies, it is used to keep track of the current time. The Mach time KPI consists of three functions: void clock_get_uptime(uint64_t result); void clock_get_system_nanotime(uint32_t secs, uint32_t nanosecs); void clock_get_calendar_nanotime(uint32_t secs, uint32_t nanosecs); The calendar clock is typically only used by applications, as the kernel itself rarely needs to concern itself with the current time or date, and doing so, in fact, is considered poor design. The kernel uses the relative time provided by the real-time clock. The time from the real-time clock typically comes from a circuit on the computer’s motherboard that contains an oscillating crystal. The real-time clock circuit (RTC) is programmable, and wired to the CPUs’ (every CPU/core) interrupt pins. The RTC gets programmed in XNU with a deadline of 100 Hz (using clock_set_timer_deadline()). Memory Management The Mach layer is responsible for coordinating the use of physical memory in a machine independent manner, providing a consistent interface to higher-level components. The virtual memory subsystem of Mach, the Mach VM, provides protected memory and facilities to applications, and the kernel itself, for allocating, sharing, and mapping memory. A solid understanding of memory management is essential to a successful kernel programmer. Task Address Space Each Mach task has its own virtual address (VM) space. For a 32-bit task, the address space is 4 GB, while for a 64-bit task it is substantially larger, with 51-bits (approximately 2 petabytes) of usable address space. Specialized applications, such as video editing or effects software, often exceed the 32-bit address space. Support for 64-bit virtual address space became available in OS X 10.4. ■ Note While 32-bit applications are limited to a 4 GB address space, this does not correlate with the amount of physical memory that can be used in a system. Technologies such as Physical Address Extensions (PAE) are supported by OS X and allow 32-bit x86 processors (or 64-bit processors running in 32-bit mode) to address up to 36-bits (64 GB) of physical memory; however, a task’s address space remains limited to 4 GB. 27 Download from Wow eBook www.wowebook.comCHAPTER 2  MAC OS X AND IOS A task’s address space is fundamental to the concept of protected memory. A task is not allowed to access the address space, and thus the underlying physical memory containing the data of another task, unless explicitly allowed to do so, through the use of shared memory or other mechanisms. KERNEL ADDRESS SPACE MANAGEMENT The kernel itself has its own task, the kernel_task, which has its own seperate address space. Let’s assume a 32-bit OS such as iOS. Some Unix-based operating systems, including Linux, have a design where the kernel’s address space is mapped into each task’s address space. The kernel has 1GB of address space available, while a task has 3GB available. When a task context switches into kernel space, the MMU (memory management unit) can avoid reconfiguring the translation lookaside buffer (TLB) with a new address space, as the kernel is already at a known location, thus speeding up the otherwise expensive context switch. The drawback, of course, is the limited amount of address space available for the kernel, as well as having only 3GB available for the task. In XNU, the kernel runs in its own virtual address space, which is not shared with user tasks, leaving 4GB for the kernel and 4GB for the user task. VM Maps and Entries The virtual memory (VM) map is the actual representation of a task’s address space. Each task has its own VM map. The map is represented by the structure vm_map. There is no map associated with a thread as they share the VM map of the task that owns them. A VM map represents a doubly-linked list of memory regions that is mapped into the process address space. Each region is a virtually contiguous range of memory addresses (not necessarily backed by contiguous physical memory) described by a start and end address, as well as other meta-data, such as protection flags, which can be any combination of read, write, and execute. The regions are represented by the vm_map_entry structure. A VM map entry may be merged with another adjacent entry when more memory is allocated before or after an existing entry or split into smaller regions. Splitting will occur if the protection flags are modified for a range of addresses described by an entry, as protection flags can only be set on VM map entries. Figure 2-4 shows a VM map with two VM map entries. 28 CHAPTER 2  MAC OS X AND IOS Figure 2-4. Relationship between VM subsystem structures ■ Tip The relevant structures pertaining to task address spaces are defined in mach/vm_map.h and mach/vm_region.h in the XNU source package. The Physical Map Each VM map has an associated physical map, or pmap structure. This structure helps hold information on virtual to physical memory mappings being used by the task. The portion of the Mach VM that deals with physical mappings is machine dependent, as it interacts with the memory management unit (MMU), a specialized hardware component of the system that takes care of address translation. VM Objects A VM map entry can point to either a VM object or a VM submap. A submap is a container for other (VM map) mappings. A submap is used to share memory between addresses spaces. The VM object is a representation of the location, or rather how the described memory is accessed. Memory pages underlying the object may not be present in physical memory, but could be located on an external backing store (a hard drive on OS X). In this case, the VM object will have information on how to page in the external pages. Transfer to or from a backing store is handled by the pager discussed next. A VM object describes memory in units of pages. A page in XNU is currently 4096 bytes. A virtual page is described by the vm_page structure. A VM object may contain many pages, but a page is only ever associated with one VM object. 29 CHAPTER 2  MAC OS X AND IOS PAGES A page is the smallest unit of the virtual memory system. On Mac OS X and iOS, as well as many other operating systems, the size of a page is 4096 bytes (4KB). The page size is determined by the processor, as the processor, or rather its memory management unit (MMU), is responsible for virtual to physical mappings and manages the VM page table cache, also called a TLB. The page size of many architectures can be set by the operating system, and can be, for architectures such as the x86, up to 4 MB, or even a mixture between more than one page size. The operating system maintains a data structure called the page table, which contains one struct vm_page for each page-sized block of physical memory. The structure contains metadata, such as whether the page is in use. When memory needs to be shared between tasks, a VM map entry will point into the foreign address space via a submap, as opposed to a VM object. This commonly happens when a shared library is used. The shared library gets mapped into the task’s address space. Let’s consider another example. When a Unix process issues the fork() system call to create a child process, a new process will be created as a copy of the parent. To avoid having to copy the memory from the parent to the child, an optimization known as copy-on-write (COW) is employed. Read access to a child’s memory will simply reference the same pages as the parent. If the child process modifies its memory, the page describing that memory will be copied, and a shadow VM object will be created. On the next read to that memory region, a check is performed to see if the shadow object has a copy of the page, and if not the original shared page is referenced. The previously described behavior is only true when the inheritance property of the original VM map entry from the parent is set to copy. Other possible values are shared, in which case the child will continue both the read and write operation to the original memory location. If the setting is none, the memory pages referenced by the map entry will not be mapped into the child’s address space. The fourth possible value is copy and delete, where the memory will be copied to the child and deleted from the parent. ■ Note Copy-on-write is also used by Mach IPC to optimize the transfer of data between tasks. Examining a Task’s Address Space The vmmap command line utility allows you to inspect a process virtual memory map and its VM map entries. It clearly illustrates how memory regions are mapped into a task’s VM address space. The vmmap command takes a process identifier (PID) as an argument. The following shows the output of vmmap executed with the PID of a simple Hello World C application (a.out), which prints a message and then goes to sleep: ==== Non-writable regions for process 46874 __PAGEZERO 00000000-00001000 4K -/- SM=NUL /Users/ole/a.out __TEXT 00001000-00002000 4K r-x/rwx SM=COW /Users/ole/a.out __LINKEDIT 00003000-00004000 4K r/rwx SM=COW /Users/ole/a.out MALLOC guard page 00004000-00005000 4K -/rwx SM=NUL 30 CHAPTER 2  MAC OS X AND IOS MALLOC metadata 00021000-00022000 4K r/rwx SM=PRV __TEXT 8fe00000-8fe42000 264K r-x/rwx SM=COW /usr/lib/dyld __LINKEDIT 8fe70000-8fe84000 80K r/rwx SM=COW /usr/lib/dyld __TEXT 9703b000-971e3000 1696K r-x/r-x SM=COW /usr/lib/libSystem.B.dylib STACK GUARD bc000000-bf800000 56.0M -/rwx SM=NUL stack guard for thread 0 ==== Writable regions for process 46874 __DATA 00002000-00003000 4K rw-/rwx SM=PRV /Users/ole/a.out MALLOC metadata 00015000-00020000 44K rw-/rwx SM=PRV MALLOC_TINY 00100000-00200000 1024K rw-/rwx SM=PRV DefaultMallocZone_0x5000 MALLOC_SMALL 00800000-01000000 8192K rw-/rwx SM=PRV DefaultMallocZone_0x5000 __DATA 8fe42000-8fe6f000 180K rw-/rwx SM=PRV /usr/lib/dyld __IMPORT 8fe6f000-8fe70000 4K rwx/rwx SM=COW /usr/lib/dyld shared pmap a0800000-a093a000 1256K rw-/rwx SM=COW __DATA a093a000-a0952000 96K rw-/rwx SM=COW /usr/lib/libSystem.B.dylib shared pmap a0952000-a0a00000 696K rw-/rwx SM=COW Stack bf800000-bffff000 8188K rw-/rwx SM=ZER thread 0 Stack bffff000-c0000000 4K rw-/rwx SM=COW thread 0 The result has been trimmed for readability. The output is divided between non-writable regions and writable regions. The former, as you can see, includes the page zero mapping, which is read-only and will generate an exception if an application tries to write to memory addresses 0-4096 (4096 decimal = 0x1000 hex). This is why your application will crash if you try to dereference a null-pointer. The next map entry is the text segment of the application, which contains the executable code of the application. You will see that the text segment is marked as having a share mode (SM) of COW, which means that if this process spawns a child, it will inherit this mapping from the parent, thus avoiding a copy until pages in that segment are modified. In addition to the text segment for the a.out program itself, you will also see a mapping for libSystem.B.dylib. On Mac OS X and iOS, libSystem implements the standard C Library and the POSIX thread API, as well as other system APIs. The a.out process inherited the mapping for libSystem from its parent process /sbin/launchd, the parent of all user space processes. This ensures the library is only loaded once, saving memory and improving the launch speed of applications, as fetching a library from secondary storage, such as a hard drive, is usually slow. In the writable regions you can see the data segment of a.out and libSystem. These segments contain variables defined by the program/library. Obviously, these can be modified, so each process needs a copy of the data segment for a shared library, however it is COW, so no overhead is necessary until a process makes modifications to the mapping. ■ Tip If you want to inspect the virtual memory map of a system process, such as launchd, you need to run vmmap with sudo, as by default your user will only be able to inspect its own processes. Pagers Virtual memory allows a process to have a virtual address space larger than the available physical memory, and it is possible for tasks running on the system to be combined, consuming more than the available amount of memory. The mechanism that makes this possible is known as a pager. The pager controls the transfer of memory pages to and from the system memory (RAM), to a secondary backing 31 CHAPTER 2  MAC OS X AND IOS store, usually a hard drive. When a task that has high memory requirements needs to run, the pager can temporarily transfer (page out) memory pages belonging to inactive tasks to the backing store, thereby freeing up enough memory to allow the demanding task to execute. Similarly, if a process is found to be largely idle, the system can opt to page out the task’s memory to free memory for current or future tasks. When an application runs, and it tries to access memory that has been paged out, an exception known as a page fault will occur, which is also the exception that occurs if a task tries to access an invalid memory address. When the page fault occurs, the kernel will attempt to transfer back (page in) the page corresponding to the memory address, and if the page cannot be transferred back, it will be treated as an invalid memory access, and the task will be aborted. The XNU kernel supports three different pagers: • Default Pager: Performs traditional paging and transfers between the main memory and a swap file on the system hard drive (/var/vm/swapfile). • Vnode Pager: Ties in with the Unified Buffer Cache (UBC) used by file systems and is used to cache files in memory. • Device Pager: Used for managing memory mappings of hardware devices, such as PCI devices that map registers into memory. Mapped memory is commonly used by I/O Kit drivers, and I/O Kit provides abstractions for working with such memory. Which pager is in use is more or less transparent to higher-level parts, such as the VM object. Each VM object has an associated memory object, which provides (via ports) an interface to the current pager. Memory Allocation in Mach Some fundamental routines for memory allocation in Mach are: kern_return_t kmem_alloc(vm_map_t map, vm_offset_t addrp, vm_size_t size); kern_return_t kmem_alloc_contig(vm_map_t map, vm_offset_t addrp, vm_size_t size, vm_offset_t mask, int flags); void kmem_free(vm_map_t map, vm_offset_t addr, vm_size_t size); kmem_alloc() provides the main interface to obtaining memory in Mach. In order to allocate memory, you must provide a VM map. For most work within the kernel, kernel_map is defined and points to the VM map of kernel_task. The second variant, kmem_alloc_contig(), attempts to allocate memory that is physically contiguous, as opposed to the former, which allocates virtually contiguous memory. Apple recommends against making this type of allocation, as there is a significant penalty incurred in searching for free contiguous blocks. Mach also provides kmem_alloc_aligned() function, which allocates memory aligned to a power of two, as well as a few other variants that are less commonly used. The kmem_free() function is provided to free allocated memory. You have to take care to pass the same VM map as you used when you allocated, as well as the size of the original allocation. The BSD Layer Unlike Mach, which only provides a few fundamental services, the BSD layer sits between Mach and the user applications and implements many core OS functions, building on the services provided by Mach. In OS X and iOS, the BSD layer is running with the processor in privileged mode and not as a user task, as originally intended by the Mach project. The layer therefore does not have memory protection, and runs in the same address space as Mach and I/O Kit. The BSD layer refers to a portion of the kernel derived from the FreeBSD 5 operating system, and it is not a complete system in itself, but rather a portion of code originating from it. 32 CHAPTER 2  MAC OS X AND IOS The BSD layer provides services such as process management, system calls, file systems, and networking. Table 2-6 shows a brief overview of the services provided by the BSD layer. Table 2-6. BSD Layer Services Overview Service Description Process and User Management Provides support for user (uid), group (gid), and process (pid) ids, as well as process creation (fork) and the Unix security model. POSIX threads and synchronization. Shared library support, signal handling. File Management Files, pipes, sockets, and POSIX IPC. The VFS, as well as the HFS, HFS+, ISO, and NFS file systems. Asynchronous I/O. Security Security auditing and cryptographic algorithms, such as AES, Blowfish, DES, MD5, and SHA-1. Memory Management The vnode file-based pager. Facilities for memory allocation. Unified Buffer Cache (UBC). Drivers Various drivers, including the console and other character device drivers such as /dev/null, /dev/zero, /dev/random, and RAM disk driver (/dev/md). Networking TCP/IP 4&6, DHCP, ICMP, ARP, Ethernet, Routing and Firewall, Packet filters (BPF), and BSD sockets. Low-level network drivers are found in I/O Kit. System Calls Provides an API for granting user space applications access to basic/low-level kernel services such as file and process management. The BSD layer provides abstractions on top of the services provided by Mach. For example, its process management and memory management is implemented on top of Mach services. System Calls When an application needs services from the file system, or wishes to access the network, it needs to issue a system call to the kernel. The BSD layer implements all system calls. When a system call handler executes, the kernel context switches from user mode to kernel mode to service a request by the application, such as to read a file. This API is referred to as the syscall API, and it is the traditional Unix API for calling functions in the kernel from user space. There are hundreds of system calls available, ranging from calls related to process control, such as fork() and execve(), or file management calls, such as open(), close(), read(), and write(). The BSD layer also provides ioctl() function (itself a system call), which is short for I/O control, and this is typically used to send commands to device drivers. The sysctl() function is provided to set or get a variety of kernel parameters, including but not limited to the scheduler, memory, and networking subsystems. 33 CHAPTER 2  MAC OS X AND IOS ■ Tip Available system calls are defined in /usr/include/sys/syscall.h. Mach traps are mechanisms similar to system calls, used for crossing the kernel/user space boundary. Unlike system calls that provide direct services to an application, the Mach traps are used to carry IPC messages from a user space client to a kernel server. Networking Networking is a major subsystem of the BSD portion of XNU. BSD handles most aspects of networking, such as the details of socket communication and the implementation of protocols like TCP/IP, except for low-level communication with actual hardware devices, which is typically handled by an I/O Kit driver. The I/O Kit network driver will interface with the network stack that is responsible for handling received buffers from the networking device, inspect them, and ensure they make their way down to the initiator, for example your web browser. Similarly, the BSD networking stack will accept outgoing data from an application, format the data in a packet, then route or dispatch it to the appropriate network interface. BSD also implements the IPFW firewall, which will filter packets to/from the computer according to policy set by the system administrator. The BSD networking layer supports a wide range of network and transport layer protocols, including IPv4 and IPv6, TCP, and UDP. At the higher level we find support for BOOTP, DHCP, and ICMP, among others. Other networking-related functions include routing, bridging, and Network Address Translation (NAT), as well as device level packet filtering with Berkeley Packet Filter (BPF). NETWORK KERNEL EXTENSIONS (NKE) The Network Kernel Extensions KPI (kernel programming interface) is a mechanism that allows various parts of the networking stack to be extended. NKEs allow new protocols to be defined, and for hooks or filters to be inserted at various levels in the networking stack. For example, it would be possible to create a filter that intercepted TCP connections to a certain address by a certain application or user. It is also possible to temporarily block network packets, or modify them before transmission to a higher/lower level. NKEs originate from Apple and are not part of the traditional BSD networking stack, but, due to their nature, they are now intimately tied to it. NKEs are discussed in Chapter 13. File Systems The kernel has inbuilt support for a range of different file systems, as shown in Table 2-7. The primary file system used by Mac OS X and iOS is HFS+. It was developed as a replacement for the Mac OS file system HFS. 34