API changes in the 2.6 kernel series [LWN.net] (original) (raw)
The 2.6 kernel development series differs from its predecessors in that much larger and potentially destabilizing changes are being incorporated into each release. Among these changes are modifications to the internal programming interfaces for the kernel, with the result that kernel developers must work harder to stay on top of a continually-shifting API. There has never been a guarantee of internal API stability within the kernel - even in a stable development series - but the rate of change is higher now.
This article will be updated to keep track of the internal changes for each 2.6 kernel release. Its permanent location is:
If you are looking for changes prior to 2.6.26, you'll find them on the older version of this page.
Last update: September 9, 2009.
2.6.31 (September 9, 2009)
- There is a new workqueue function:
int __cancel_delayed_work(struct delayed_work *work);
Unlike cancel_delayed_work(), it will not wait to ensure that the work function is not actually running. - There is a new atomic function:
int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock);
This function will decrement cnt and, if cnt reaches zero, it will acquire the given lock. - A number of block layer request queue API changes have been merged; all drivers must now dequeue requests before executing them. Beyond that, the merging of the storage topology patches (in preparation for 4K-sector disks) mean that block drivers must now distinguish between the physical block size on the disk and the logical block size used by the kernel.
- The 32-bit x86 architecture now supports the atomic64_t type.
- The kernel memory leak detector has been merged at last. The kmemcheck kernel memory checker to detect the use of uninitialized memory has also been merged.
- The fsnotify backend has been merged. This code provides a new, common implementation for dnotify and inotify; it also will serve as the base for the "fanotify" code (formerly TALPA), which has not been merged as of this writing.
- Tree read-copy update (RCU) is now the default, though Classic RCU is still available.
- Changes to the include/asm-generic header files were merged. These changes are meant to serve as a model for or be used directly by new architectures rather than copying from an existing architecture. The S+core (score) architecture depends on these changes and the MicroBlaze architecture will be using them to clean up its ABI.
- All network drivers have converted to the new net_device_ops API and the old API available with COMPAT_NET_DEV_OPS has been removed.
- The rfkill core has been rewritten for devices that implement a way to stop all radio transmission from the device (in response to a laptop key for turning off wireless, for example). Various drivers have also been updated to use the new rfkill API.
- Debugfs has had all of its references throughout the tree turned into/sys/kernel/debug/ in both documentation and code. In addition, LWN's updated guide to debugfs was added to the Documentation directory.
- Unicode handling in the kernel has been updated, with functions likeutf_mbstowcs() being renamed to utf8s_to_utf16s() for better readability.
- The TTM GPU memory manager (covered a bit over a year ago) has been merged.
- Quite a bit of Big Kernel Lock (BKL) removal code has been merged in the fs/ tree. Now, all of the super_operations andaddress_space_operations are called without holding the BKL.
- IRQF_SAMPLE_RANDOM, which governs whether a device's interrupts are used as an entropy source, has been added to the feature-removal-schedule.
- The memory debugging infrastructure for DRM has been removed. "It hasn't been used in ages, and having the user tell you how much memory is being freed at free time is a recipe for disaster even if it was ever used."
- David Miller is now the IDE subsystem maintainer, taking over from Bartlomiej Zolnierkiewicz, in a friendly handoff. Miller plans to put IDE into maintenance-only mode.
2.6.30 (June 9, 2009)
- The threaded interrupt handlers patch has been merged, making it possible for drivers to set up an interrupt handler which runs in its own thread. Over the long term, it is hoped that drivers will move in this direction, eventually making it possible to remove facilities like tasklets.
- The adaptive spinning mutex patch has been merged. This change will cause mutexes to behave more like spinlocks in the contended case. If (and only if) the lock is held by code running on a different CPU, the mutex code will spin on the assumption that the lock will be released soon. This behavior results in significant performance improvements. Btrfs, which had its own spinning mutex implementation, has been converted to the new mutexes.
- There is a new set of functions added to the crypto API which allow for piecewise compression and decompression of data.
- The bus_id member of struct device is gone; code needing that information should use the dev_name() macro instead.
- There is a new timer function:
int mod_timer_pending(struct timer_list *timer, unsigned long expires);
It is like mod_timer() with the exception that it will not reactivate an already-expired timer. - There have been some changes around the fasync() function instruct file_operations. This function is now responsible for maintaining the FASYNC bit in struct file; it is also now called without the big kernel lock held. Finally, a positive return value from fasync() is mapped to zero, meaning that the return value from fasync_helper() can be returned directly by fasync().
- The SCSI layer has a new support library for object storage device support; see Documentation/scsi/osd.txt for details.
- The x86 "subarchitecture" mechanism has been removed, now that no architectures actually use it. The Voyager architecture has been removed as a result of these changes.
- x86 is also the first architecture to use a new per-CPU memory allocator merged for 2.6.30. This allocator changes little at the API level, but it will provide for more efficient and flexible per-CPU variable management.
- Support for compressing the kernel with the bzip2 or lzma algorithms has been added. Support for the old zImage format has been removed.
- The asynchronous function call infrastructure is now enabled by default.
- The DMA operations debugging facility has been merged.
- The owner field of struct proc_dir_entry has been removed, causing lots of changes throughout the tree.
- There is a new memory debug tool controlled by the PAGE_POISONING configuration variable. Turning this feature on causes a pattern to be written to all freed pages and checked at allocation time. The result is "a large slowdown," but also the potential to catch a number of use-after-free errors.
- The new function:
int pci_enable_msi_block(struct pci_dev *dev, int count);
allows a driver to enable a block of MSI interrupts. - As part of the FS-Cache work, the "slow work" thread pool mechanism has been merged. Some have expressed the hope that it would become the One True Kernel Thread Pool, but there seems to be little progress in that direction. See this article and Documentation/slow-work.txt for more information.
- There is a pair of new printing functions:
int vbin_printf(u32 *bin_buf, size_t size, const char *fmt, ...);
int bstr_printf(char *buf, size_t size, const char *fmt,
const u32 *bin_buf);
The difference here is that vbin_printf() places the binary value of its arguments into bin_buf. The process can be reversed with bstr_printf(), which formats a string from the given binary buffer. The main use for these functions would appear to be with Ftrace; they allow the encoding of values to be deferred until a given trace string is read by user space.
- Also added is printk_once(), which only prints its message the first time it is executed.
- The "kmemtrace" tracing facility has been merged. Kmemtrace provides data on how the core slab allocations function. See Documentation/vm/kmemtrace.txt for details.
- A number of ftrace changes have been merged. There is a workqueue tracer which tracks the operations of workqueue threads. The blktrace block subsystem tracer can now be used via ftrace. The new "event" tracer allows a user to turn on specific tracepoints within the kernel; tracepoints have been added for various scheduler and interrupt events. "Raw" events (with binary-formatted data) are available now. The new "syscall" tracer is for tracing system calls.
2.6.29 (March 23, 2009)
- The massive task credentials patch set has been merged. This code reorganizes the handling of process credentials (user ID, capabilities, etc.). One of the immediate implications of this change is direct references to credential-oriented fields in the task structure need to be changed; for example, current->user->uid becomescurrent_uid(). See Documentation/credentials.txt for a description of the new API.
- The ftrace code has seen a lot of internal changes. The function tracing feature has seen a number of improvements, and the developers have added mechanisms to profile the behavior of if statements, provide function call graphs, obtain user-space stack traces, and follow CPU power-state transitions.
- Most of the callback functions/methods associated with thenet_device structure have been moved out of that structure and into the new struct net_device_ops. In-tree drivers have been converted to the new API.
- The priv field has been removed from struct net_device; drivers should use netdev_priv() instead.
- The generic PHY layer now has power management support. To that end, two new methods - suspend() and resume() - have been added to struct phy_driver.
- The networking layer now supports large receive offload (or "generic receive offload") operation.
- The NAPI API has been cleaned up somewhat; in particular, functions like netif_rx_schedule(), netif_rx_schedule_prep(), and netif_rx_complete() have lost the unneeded struct net_device parameter.
- The poll() file operation is now allowed to sleep; see this article for more information on this change.
- The CPU mask mechanism, used to represent sets of processors in the system, is in the middle of being massively reworked. The problem is that CPU masks were often put on the stack, but, as the number of processors grows, the stack lacks room for the mask. The new API is designed to get these masks off the stack, and to guard against anybody ever trying to put one back. See this posting by Rusty Russell for details on this work.
- An infrastructure for asynchronous function calls has been merged. This code is still a work in progress, though, and, for 2.6.29, it will not be activated in the absence of the fastboot command-line parameter.
- The exclusive I/O memory allocation functions have been merged.
- There is a new synchronous hash interface called "shash." It simplifies the use of synchronous hash operations while allowing the same tfm to be used simultaneously in different threads. All in-tree users have been switched to the new API.
- The hrtimer code has been simplified with the removal of variable modes for callback functions. All processing is now done in hardirq context.
- A new set of LSM hooks has been added; these support pathname-based security operations. With the merging of these hooks, one major obstacle to the inclusion of security modules like AppArmor and TOMOYO has been removed.
- The kernel will now refuse to build with GCC 4.1.0 or 4.1.1; those versions have unfortunate bugs which prevent the building of a working kernel. Versions 3.0 and 3.1 have also been deemed to be too old and will not be supported in 2.6.29.
- Video4Linux drivers now use a separate v4l2_file_operations structure to hold their VFS-like callbacks. The prototypes of a number of these functions have been changed to remove theinode argument.
- Video4Linux2 has also acquired a new "subdevice" concept, meant to reflect the fact that video "devices" tend to be, in reality, a set of cooperating devices. See the new document for a description of how this mechanism works.
- Two new functions - stop_machine_create() andstop_machine_destroy() - allow the independent creation of the threads used by stop_machine(). That, in turn, lets those threads be created before trying to actually stop the machine, making that operation more resistant to failure.
- The exports for a number of SUNRPC functions have been changed to GPL-only.
- The internal MTD (memory technology device) API has seen significant changes aimed at supporting larger devices (those requiring 64-bit sizes).
2.6.28 (December 24, 2008)
- Discard request and request timeout handling have been added to the block layer; a number of other internal API changes have been made as well. See this article for details.
- Video4Linux2 drivers no longer have their open() function called with the big kernel lock held. The lock_kernel() calls have been pushed down into individual drivers within the mainline tree; external drivers will need to be fixed.
- A number of tracing-related patches have been merged. These include the tracepoints mechanism, some instrumentation in the core scheduler code, improvements to the ftrace function tracing feature, a new ftrace-based stack tracer, a new ftrace-based boot (initcall) tracer, and the low-level trace buffer code.
- The sysctl strategy() function prototype has changed: the unused name and nlen parameters have been removed.
- Asynchronous I/O support can now be configured out of the kernel, saving about 7KB of space on systems where AIO is not needed.
- As planned, device_create_drvdata() has been renamed todevice_create(), with the same parameters.
- There is now a mechanism to enable and disable output frompr_debug() and dev_dbg() calls on a per-module basis. Control is through a virtual file in debugfs. There is no documentation file associated with this change; instructions on how to use this feature can be found in the patch changelog.
- The new dev_WARN() function:
dev_WARN(struct device *dev, char *format, ...);
will output the formatted warning, along with a full stack trace. This will allow the warnings to be collected at kerneloops.org and incorporated into the reports there. - The new %pR formatting directive allows printk() and friends to output the contents of resource structures.
- There is a new function intended to make life easier for PCI driver writers:
static inline void *pci_ioremap_bar(struct pci_dev *pdev, int bar);
This function will remap the entire PCI I/O memory region, as selected by the bar argument. - There is a new core_param() macro:
core_param(name, var, type, perm);
Its purpose is to define "core" parameters and let them be represented in /sys/module/kernel/parameters. - It is now possible to create a workqueue running at realtime priority with:
struct workqueue_struct *create_rt_workqueue(const char *name); - The block driver API has changed considerably, with the inode and file parameters being removed from most block device operations. The new API looks like this:
struct block_device_operations {
int (*open) (struct block_device *bdev, fmode_t mode);
int (*release) (struct gendisk *gd, fmode_t mode);
int (*locked_ioctl) (struct block_device *bdev, fmode_t mode,
unsigned cmd, unsigned long arg);
int (*ioctl) (struct block_device *bdev, fmode_t mode,
unsigned cmd, unsigned long arg);
int (*compat_ioctl) (struct block_device *bdev, fmode_t mode,
unsigned cmd, unsigned long arg);
int (*direct_access) (struct block_device *bdev, sector_t sector,
void **kaddr, unsigned long *pfn);
int (*media_changed) (struct gendisk *gd);
int (*revalidate_disk) (struct gendisk *gd);
int (*getgeo)(struct block_device *bdev, struct hd_geometry *geo);
struct module *owner;
};
The new prototypes do away with the file and inode structure pointers which were passed in previous kernels. Note that the ioctl() method is now called without the big kernel lock; code needing BKL protection must explicitly define alocked_ioctl() function instead. - The range timer API has been merged; callers can now specify a time period in which they would like the timeout to be delivered. The kernel can then take advantage of the range to coalesce wakeups and keep the processor idle for longer periods.
2.6.27 (October 9, 2008)
- The register_security() function has been removed. Security modules which wish to implement stacking must now do so explicitly.
- The request_queue_t type is gone at last; block drivers should use struct request_queue instead.
- Quite a bit of big kernel lock removal work has been merged. For char devices, the open() method from struct file_operations is no longer protected by the BKL. Calls tofasync() have also lost BKL protection.
- Many drivers have been converted to use the firmware loader, making it possible to strip the firmware from the kernel for those who are inclined to do so. See this article for more information on the firmware work.
- The API work in the i2c layer continues; there is now an autodetection capability which allows new-style drivers to detect devices on their buses automatically.
- The SCSI layer has gained new support for "device handlers," which are mostly concerned with multipath management. Some of this code has been moved over from the device mapper.
- The new suspend and hibernate infrastructure has been merged, providing a wider set of callbacks for power management events. The PCI and platform bus interfaces have been enhanced with support for this new infrastructure.
- The TTY layer continues to evolve; significant changes include the introduction of a new tty_port structure meant to hold information common to all TTY ports and a rework of the line discipline code.
- The mac80211 code has a new module which can simulate any number of IEEE 802.11 radios; it is suitable for testing mac80211 functionality and associated user-space tools.
- There is a new "rfkill" mechanism for unified handling of "radio off" switches on wireless devices.
- A number of Video4Linux2 format-related callbacks have been renamed to make them match the names used with the associated buffer types. In addition, the vidioc_enum_fmt_vbi_cap() callback has been deprecated and marked for removal in 2.6.28.
- The videobuf layer now has support for controllers which cannot do scatter/gather I/O.
- The USB "gadget" framework has been massively reworked to provide better support for composite devices.
- The prototype for device_create() has changed:
struct device *device_create(struct class *class,
struct device *parent,
dev_t devt,
void *drvdata,
const char *fmt, ...);
Those who see a resemblance to device_create_drvdata() are right; all in-tree users were converted over to that interface, the old device_create() was removed, anddevice_create_drvdata() was renamed. For now, a macro makes calls to device_create_drvdata() do the right thing, but that macro will probably go away before the 2.6.27 final release.
- User-space UIO drivers can now write a signed value to the/dev/uioX device to enable and disable interrupts.
- Debugfs (finally) has a function for removing an entire directory tree:
void debugfs_remove_recursive(struct dentry *dentry);
As a result, code creating hierarchies in debugfs no longer need remember the dentry of every file they create. - The tracehook mechanism for defining static trace points (described inthis article) has been merged, along with a number of trace points in the core kernel.
- A new, lockless form of get_user_pages() has been added:
int get_user_pages_fast(unsigned long start, int nr_pages, int write,
struct page **pages);
Details of this interface can be found in this article, with the one note that early versions were called fast_gup() instead. (See also the related lockless page cache work, which was also merged).
- The long-debated mmu-notifiers patch has been merged. The notifiers allow external memory management units (as may be seen in some graphics cards or in virtualized guests) to be told about decisions made by the core memory management code.
- There is a new framework for debugging boot-time memory initialization; there's also "a few basic defensive measures" intended to prevent difficult-to-debug boot problems.
- The new function:
int object_is_on_stack(void *obj);
returns a true value if the pointed-to object is on the current kernel stack. - There is a new macro for issuing warnings:
WARN(condition, format, ...);
It's much like WARN_ON() in that it will produce a full oops listing; the difference is the added printk()-style format string and arguments. - A new helper function:
int flush_work(struct work_struct *work);
waits for the specific workqueue job work to finish executing. - dma_mapping_error() and pci_dma_mapping_error() have new prototypes:
int dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
int pci_dma_mapping_error(struct pci_dev *hwdev, dma_addr_t dma_addr);
In each case, they have gained a new argument specifying which device the mapping is being done for. - There are a couple of new radix tree functions:
unsigned int radix_tree_gang_lookup_slot(struct radix_tree_root *root,
void ***results,
unsigned long first_index,
unsigned int max_items);
unsigned int radix_tree_gang_lookup_tag_slot(struct radix_tree_root *root,
void ***results,
unsigned long first_index,
unsigned int max_items,
unsigned int tag);
They are useful for looking up multiple items in a single call.
- Slab cache constructors no longer have a pointer to the cache itself as an argument; they now take a single void * pointer to the object itself.
- The long list of Video4Linux2 ioctl() callbacks has been moved into its own structure (struct v4l2_ioctl_ops) which is pointed to by the ioctl_ops member of struct video_device.
2.6.26 (July 13, 2008)
- At long last, support for the KGDB interactive debugger has been added to the x86 architecture. There is a DocBook document in the Documentation directory which provides an overview on how to use this new facility.
- Page attribute table (PAT) support is also (again, at long last) available for the x86 architecture. PATs allow for fine-grained control of memory caching behavior with more flexibility than the older MTRR feature. See Documentation/x86/pat.txt for more information.
- ioremap() on the x86 architecture will now always return an uncached mapping. Previously, it had taken a more relaxed approach, leaving the caching as the BIOS had set it up. The practical result was to almost always create uncached mappings, but with occasional exceptions. Drivers which depend on a cached mapping will now break; they will need to use ioremap_cache() instead.
- The nopage() virtual memory area operation has been removed; all in-tree code is now using fault() instead.
- Two new functions (inode_getsecid() andipc_getsecid()), added to support security modules and the audit code, provide general access to security IDs associated with inodes and IPC objects. A number of superblock-related LSM callbacks now take a struct path pointer instead of struct nameidata. There is also a new set of hooks providing generic audit support in the security module framework.
- The now-unused ieee80211 software MAC layer has been removed; all of the drivers which needed it have been converted to mac80211. Also removed are the sk98lin network driver (in favor of skge) and bcm43xx (replaced by b43 and b43legacy).
- The generic semaphores patch has been merged. The semaphore code also has newdown_killable() and down_timeout() functions.
- The ata_port_operations structure used by libata drivers now supports a simple sort of operation inheritance, making it easier to write drivers which are "almost like" existing code, but with small differences.
- A new function (ns_to_ktime()) converts a time value in nanoseconds to ktime_t.
- The final users of struct class_device have been converted to use struct device instead. The class_device type has been removed.
- The seq_file code now accepts a return value of SEQ_SKIP from the show() callback; that value causes any accumulated output from that call to be discarded.
- The Video4Linux2 API now defines a set of controls for camera devices; they allow user space to work with parameters like exposure type, tilt and pan, focus, and more.
- On the x86 architecture, there is a new configuration parameter which allows gcc to make its own decisions about the inlining of functions, even when functions are declared inline. In some cases, this option can reduce the size of the kernel's text segment by over 2%.
- The legacy IDE layer has gone through a lot of internal changes which will break any remaining IDE drivers.
- The SLUB allocator supports a new sysfs file (/sys/kernel/slab/name/order) which allows system administrators to change the size of page allocations used by the named slab.
- A condition which triggers a warning from WARN_ON will now also taint the kernel.
- The get_info() interface for /proc files has been removed. There is also a new function for creating /proc files:
struct proc_dir_entry *proc_create_data(const char *name, mode_t mode,
struct proc_dir_entry *parent,
const struct file_operations *proc_fops,
void *data);