Unix I/O¶
Unix I/O Overview¶
- A file is a sequence of bytes: $B_0 , B_1 , .... , B_k , .... , B_{m-1}$
- Cool fact: All I/O devices are represented as files:
/dev/sda2(disk partition)/dev/tty2(terminal)/dev/null(discard all writes / read empty file)
- Cool fact: Kernel data structures are exposed as files
cat /proc/$$/statusls -l /proc/$$/fd/ls –RC /sys/devices | less
- Kernel offers a set of basic operations for all files
- Opening and closing files
open()andclose()
- Reading and writing a file
read()andwrite()
- Look up information about a file (size, type, last modification time, …)
stat(), lstat(), fstat()
- Changing the current file position (seek)
- indicates next offset into file to read or write
lseek()![[Screen Shot 2024-07-19 at 15.08.40.png]]
- Opening and closing files
File Types¶
- Each file has a type indicating its role in the system
- Regular file: Stores arbitrary data
- Directory: Index for a related group of files
- Socket: For communicating with a process on another machine
- Other file types beyond our scope
- Named pipes (FIFOs)
- Symbolic links
- Character and block devices
Regular Files¶
- A regular file contains arbitrary data
- Applications often distinguish between text and binary files
- Text files contain human-readable text
- Binary files are everything else (object files, JPEG images, …)
- Kernel doesn’t care! It’s all just bytes!
- Text file is sequence of text lines
- Text line is sequence of characters terminated (not separated!) by end of line indicator
- Characters are defined by a text encoding (ASCII, UTF-8, EUC-JP, …)
- End of line (EOL) indicators:
- All “Unix”: Single byte
0x0A- line feed (LF)
- DOS, Windows: Two bytes
0x0D 0x0A- Carriage return (CR) followed by line feed (LF)
- Also used by many Internet protocols
- C library translates to '
\n'
- All “Unix”: Single byte
Directories¶
- Directory consists of an array of entries (also called links)
- Each entry maps a filename to a file
- Each directory contains at least two entries
- . (dot) maps to the directory itself
- .. (dot dot) maps to the parent directory in the directory hierarchy (next slide)
- Commands for manipulating directories
mkdir: create empty directoryls: view directory contentsrmdir: delete empty directory
Directory Hierarchy¶
- All files are organized as a hierarchy anchored by root directory named / (slash) ![[Screen Shot 2024-07-19 at 15.16.52.png]]
- Kernel maintains current working directory (cwd) for each process
- Modified using the cd command
Pathnames¶
- Locations of files in the hierarchy denoted by pathnames
- Absolute pathname starts with ‘/’ and denotes path from root
/home/droh/hello.c
- Relative pathname denotes path from current working directory
../droh/hello.c![[Screen Shot 2024-07-19 at 15.21.04.png]]
- Absolute pathname starts with ‘/’ and denotes path from root
Opening Files¶
- Opening a file informs the kernel that you are getting ready to access that file![[Screen Shot 2024-07-19 at 15.21.23.png]]
- Returns a small identifying integer file descriptor
- fd == -1 indicates that an error occurred
- Each process begins life with three open files
- 0: standard input (stdin)
- 1: standard output (stdout)
- 2: standard error (stderr)
- These could be files, pipes, your terminal, or even a network connection!
Lots of ways to call open: ![[Screen Shot 2024-07-19 at 15.22.42.png]]
Closing Files¶
-
Closing a file informs the kernel that you are finished accessing that file![[Screen Shot 2024-07-19 at 15.23.18.png]]
-
Take care not to close any file more than once
- Same as not calling free() twice on the same pointer
- Closing a file can fail!
- Well, not exactly fail—the file is still closed
- The OS is taking this opportunity to report a delayed error from a previous write operation
- You might silently lose data if you don’t check!
Reading Files¶
- Reading a file copies bytes from the current file position to memory, and then updates file position![[Screen Shot 2024-07-19 at 15.35.00.png]]
- Returns number of bytes read from file
fdintobuf- Return type
ssize_tis signed integer nbytes < 0indicates that an error occurred- Short counts (
nbytes < sizeof(buf)) are possible and are not errors!
- Return type
Writing Files¶
- Writing a file copies bytes from memory to the current file position, and then updates current file position ![[Screen Shot 2024-07-19 at 19.12.30.png]]
- Returns number of bytes written from
bufto filefdnbytes < 0indicates that an error occurred- As with reads, short counts are possible and are not errors!
Simple Unix I/O example¶
Copying stdin to stdout, one byte at a time![[Screen Shot 2024-07-19 at 19.34.14.png]] ![[Screen Shot 2024-07-19 at 19.35.18.png]]
On Short Counts¶
- Def: Short counts occur when fewer bytes are read or written than requested. This can happen for various reasons in different situations.
- Short counts can occur in these situations:
- Encountering (end-of-file) EOF on reads
- Reading text lines from a terminal
- Reading and writing network sockets, pipes, etc.
- Short counts never occur in these situations:
- Reading from disk files (except for EOF)
- Writing to disk files
- Best practice is to always allow for short counts.
Standard I/O¶
Standard I/O Functions¶
- The C standard library (libc.so) contains a collection of higher-level standard I/O functions
- Documented in Appendix B of K&R
- Examples of standard I/O functions:
- Opening and closing files (
fopenandfclose) - Reading and writing bytes (
freadandfwrite) - Reading and writing text lines (
fgetsandfputs) - Formatted reading and writing (
fscanfandfprintf)
- Opening and closing files (
Standard I/O Streams¶
- Standard I/O models open files as streams
- In C programming, when you work with files or input/output operations, they are treated as streams.
- A stream is an abstraction that includes a file descriptor (a unique identifier for an open file) and a buffer in memory (a temporary storage area for data).
- C programs begin life with three open streams (defined in stdio.h)
stdin(standard input): Used for reading input (usually from the keyboard).stdout(standard output): Used for writing output (usually to the screen).stderr(standard error): Used for writing error messages (also usually to the screen). ![[Screen Shot 2024-07-23 at 16.00.49.png]]
Buffered I/O: Motivation¶
- Applications often read/write one character at a time
getc, putc, ungetcgets, fgets- Read line of text one character at a time, stopping at newline
- Implementing as Unix I/O calls expensive
readandwriterequire Unix kernel calls- 10,000 clock cycles
- Solution: Buffered read
- Use Unix read to grab block of bytes
- User input functions take one byte at a time from buffer
- Refill buffer when empty ![[Screen Shot 2024-07-23 at 16.01.07.png]]
Buffering in Standard I/O¶
- Standard I/O functions use buffered I/O ![[Screen Shot 2024-07-23 at 16.01.25.png]]
- The buffer is flushed to the output file descriptor (fd) when a newline character
\nis encountered. - It can also be flushed by calling
fflush, exiting the program, or returning from themainfunction. - The
write(1, buf, 6);line shows how the buffer contents are written to the output. Here,1is the file descriptor for standard output (stdout),bufis the buffer, and6is the number of bytes to write.
Standard I/O Buffering in Action¶
- You can see this buffering in action for yourself, using the always fascinating Linux
straceprogram: ![[Screen Shot 2024-07-23 at 16.01.50.png]]
Which I/O when¶
Pros and Cons of Unix I/O¶
- Pros
- Unix I/O is the most general form of I/O
- All other I/O packages are implemented using Unix I/O functions
- Unix I/O provides functions for accessing file metadata
- Unix I/O functions are async-signal-safe and can be used safely in signal handlers
- Unix I/O is the most general form of I/O
- Cons
- Dealing with short counts is tricky and error prone
- Efficient reading of text lines requires some form of buffering, also tricky and error prone
Pros and Cons of Standard I/O¶
- Pros:
- Buffering increases efficiency by decreasing the number of read and write system calls
- Short counts are handled automatically
- Cons:
- Provides no function for accessing file metadata
- Standard I/O functions are not async-signal-safe, and not appropriate for signal handlers
- Standard I/O is not appropriate for input and output on network sockets
- There are poorly documented restrictions on streams that interact badly with restrictions on sockets (CS:APP3e, Sec 10.11)
Choosing I/O Functions¶
- General rule: use the highest-level I/O functions you can
- Many C programmers are able to do all of their work using the standard I/O functions
- But, be sure to understand the functions you use!
- When to use standard I/O
- When working with “ordinary” files
- When to use raw Unix I/O
- Inside signal handlers, because Unix I/O is async-signal-safe
- When you are reading and writing network sockets
- Libraries dedicated to buffered network I/O make this easier
- CS:APP
rio_*functions; libevent, libuv, …
- In rare cases when you need absolute highest performance
Aside: Working with Binary Files¶
- Functions you should never use on binary files
- Text-oriented I/O: such as
fgets, scanf, rio_readlineb- Interpret EOL characters.
- Use functions like
rio_readnorrio_readnbinstead
- String functions
strlen, strcpy, strcat- Interprets byte value 0 (end of string) as special
- Text-oriented I/O: such as
Metadata, sharing, and redirection¶
File Metadata¶
- Metadata is data about data, in this case file data
- Per-file metadata maintained by kernel
- accessed by users with the
statandfstatfunctions ![[Screen Shot 2024-07-23 at 18.29.30.png]]
- accessed by users with the
How the Unix Kernel Represents Open Files¶
- Two descriptors referencing two distinct open files. Descriptor 1 (stdout) points to terminal, and descriptor 4 points to open disk file ![[Screen Shot 2024-07-23 at 18.29.41.png]]
File Sharing¶
- Two distinct descriptors sharing the same disk file through two distinct open file table entries
- E.g., Calling
opentwice with the samefilenameargument ![[Screen Shot 2024-07-23 at 18.30.16.png]]
- E.g., Calling
How Processes Share Files: fork¶
- A child process inherits its parent’s open files
- Note: situation unchanged by
execfunctions (usefcntlto change)
- Note: situation unchanged by
-
Before
forkcall: ![[Screen Shot 2024-07-23 at 18.30.33.png]] -
A child process inherits its parent’s open files
- After fork:
- Child’s table same as parent’s, and +1 to each
refcnt![[Screen Shot 2024-07-23 at 18.31.12.png]]
- Child’s table same as parent’s, and +1 to each
I/O Redirection¶
- Question: How does a shell implement I/O redirection?
linux> ls > foo.txt - Answer: By calling the
dup2(oldfd, newfd)function- Copies (per-process) descriptor table entry
oldfdto entrynewfd![[Screen Shot 2024-07-23 at 19.17.11.png]]
- Copies (per-process) descriptor table entry
I/O Redirection Example¶
- Step #1: open file to which stdout should be redirected
- Happens in child executing shell code, before
exec![[Screen Shot 2024-07-23 at 19.17.57.png]]
- Happens in child executing shell code, before
- Step #2: call
dup2(4,1)- cause fd=1 (stdout) to refer to disk file pointed at by fd=4![[Screen Shot 2024-07-23 at 19.22.43.png]]
Warm-Up: I/O and Redirection Example¶
![[Screen Shot 2024-07-23 at 19.22.57.png]]
Master Class: Process Control and I/O¶
![[Screen Shot 2024-07-23 at 19.23.20.png]]