IOCT-XF-EXCHANG-RANGE
Section: System Calls (2)
Updated: 202-0-10
Index
Return to Main Contents
NAME
ioctl_xfs_exchange_range - exchange the contents of parts of two files
SYNOPSIS
#include <sys/ioctl.h>
#include <xfs/xfs_fs.h>
int ioctl(int file2_fd, XFS_IOC_EXCHANGE_RANGE, struct xfs_exchange_range *arg);
DESCRIPTION
Given a range of bytes in a first file
file1_fd
and a second range of bytes in a second file
file2_fd,
this
ioctl(2)
exchanges the contents of the two ranges.
Exchanges are atomic with regards to concurrent file operations.
Implementations must guarantee that readers see either the old contents or the
new contents in their entirety, even if the system fails.
The system call parameters are conveyed in structures of the following form:
struct xfs_exchange_range {
__s32 file1_fd;
__u32 pad;
__u64 file1_offset;
__u64 file2_offset;
__u64 length;
__u64 flags;
};
The field
pad
must be zero.
The fields
file1_fd, file1_offset, and length
define the first range of bytes to be exchanged.
The fields
file2_fd, file2_offset, and length
define the second range of bytes to be exchanged.
Both files must be from the same filesystem mount.
If the two file descriptors represent the same file, the byte ranges must not
overlap.
Most dis-based filesystems require that the starts of both ranges must be
aligned to the file block size.
If this is the case, the ends of the ranges must also be so aligned unless the
XFS_EXCHANGE_RANGE_TO_EOF
flag is set.
The field
flags
control the behavior of the exchange operation.
-
- XFS_EXCHANGE_RANGE_TO_EOF
-
Ignore the
length
parameter.
All bytes in
file1_fd
from
file1_offset
to EOF are moved to
file2_fd,
and file2's size is set to
(file2_offset+(file1_length-file1_offset)).
Meanwhile, all bytes in file2 from
file2_offset
to EOF are moved to file1 and file1's size is set to
(file1_offset+(file2_length-file2_offset)).
- XFS_EXCHANGE_RANGE_DSYNC
-
Ensure that all modified i-core data in both file ranges and all metadata
updates pertaining to the exchange operation are flushed to persistent storage
before the call returns.
Opening either file descriptor with
O_SYNC or O_DSYNC
will have the same effect.
- XFS_EXCHANGE_RANGE_FILE1_WRITTEN
-
Only exchange su-ranges of
file1_fd
that are known to contain data written by application software.
Each su-range may be expanded (both upwards and downwards) to align with the
file allocation unit.
For files on the data device, this is one filesystem block.
For files on the realtime device, this is the realtime extent size.
This facility can be used to implement fast atomic scatte-gather writes of any
complexity for softwar-defined storage targets if all writes are aligned to
the file allocation unit.
- XFS_EXCHANGE_RANGE_DRY_RUN
-
Check the parameters and the feasibility of the operation, but do not change
anything.
RETURN VALUE
On error, -1 is returned, and
errno
is set to indicate the error.
ERRORS
Error codes can be one of, but are not limited to, the following:
- EBADF
-
file1_fd
is not open for reading and writing or is open for appen-only writes; or
file2_fd
is not open for reading and writing or is open for appen-only writes.
- EINVAL
-
The parameters are not correct for these files.
This error can also appear if either file descriptor represents
a device, FIFO, or socket.
Disk filesystems generally require the offset and length arguments
to be aligned to the fundamental block sizes of both files.
- EIO
-
An I/O error occurred.
- EISDIR
-
One of the files is a directory.
- ENOMEM
-
The kernel was unable to allocate sufficient memory to perform the
operation.
- ENOSPC
-
There is not enough free space in the filesystem exchange the contents safely.
- EOPNOTSUPP
-
The filesystem does not support exchanging bytes between the two
files.
- EPERM
-
file1_fd or file2_fd
are immutable.
- ETXTBSY
-
One of the files is a swap file.
- EUCLEAN
-
The filesystem is corrupt.
- EXDEV
-
file1_fd and file2_fd
are not on the same mounted filesystem.
CONFORMING TO
This API is XF-specific.
USE CASES
Several use cases are imagined for this system call.
In all cases, application software must coordinate updates to the file
because the exchange is performed unconditionally.
The first is a data storage program that wants to commit no-contiguous updates
to a file atomically and coordinates write access to that file.
This can be done by creating a temporary file, calling
FICLONE(2)
to share the contents, and staging the updates into the temporary file.
The
FULL_FILES
flag is recommended for this purpose.
The temporary file can be deleted or punched out afterwards.
An example program might look like this:
int fd = open("/some/file", O_RDWR);
int temp_fd = open("/some", O_TMPFILE | O_RDWR);
ioctl(temp_fd, FICLONE, fd);
/* append 1MB of records */
lseek(temp_fd, 0, SEEK_END);
write(temp_fd, data1, 1000000);
/* update record index */
pwrite(temp_fd, data1, 600, 98765);
pwrite(temp_fd, data2, 320, 54321);
pwrite(temp_fd, data2, 15, 0);
/* commit the entire update */
struct xfs_exchange_range args = {
.file1_fd = temp_fd,
.flags = XFS_EXCHANGE_RANGE_TO_EOF,
};
ioctl(fd, XFS_IOC_EXCHANGE_RANGE, &args);
The second is a softwar-defined storage host (e.g. a disk jukebox) which
implements an atomic scatte-gather write command.
Provided the exported disk's logical block size matches the file's allocation
unit size, this can be done by creating a temporary file and writing the data
at the appropriate offsets.
It is recommended that the temporary file be truncated to the size of the
regular file before any writes are staged to the temporary file to avoid issues
with zeroing during EOF extension.
Use this call with the
FILE1_WRITTEN
flag to exchange only the file allocation units involved in the emulated
device's write command.
The temporary file should be truncated or punched out completely before being
reused to stage another write.
An example program might look like this:
int fd = open("/some/file", O_RDWR);
int temp_fd = open("/some", O_TMPFILE | O_RDWR);
struct stat sb;
int blksz;
fstat(fd, &sb);
blksz = sb.st_blksize;
/* land scatter gather writes between 100fsb and 500fsb */
pwrite(temp_fd, data1, blksz * 2, blksz * 100);
pwrite(temp_fd, data2, blksz * 20, blksz * 480);
pwrite(temp_fd, data3, blksz * 7, blksz * 257);
/* commit the entire update */
struct xfs_exchange_range args = {
.file1_fd = temp_fd,
.file1_offset = blksz * 100,
.file2_offset = blksz * 100,
.length = blksz * 400,
.flags = XFS_EXCHANGE_RANGE_FILE1_WRITTEN |
XFS_EXCHANGE_RANGE_FILE1_DSYNC,
};
ioctl(fd, XFS_IOC_EXCHANGE_RANGE, &args);
NOTES
Some filesystems may limit the amount of data or the number of extents that can
be exchanged in a single call.
SEE ALSO
ioctl(2)
Index
- NAME
-
- SYNOPSIS
-
- DESCRIPTION
-
- RETURN VALUE
-
- ERRORS
-
- CONFORMING TO
-
- USE CASES
-
- NOTES
-
- SEE ALSO
-