Limiting the Number of Characters That a Printf Can Print C

Plan

An example of the printf function

printf format string refers to a control parameter used by a class of functions in the input/output libraries of C and many other programming languages. The string is written in a elementary template language: characters are commonly copied literally into the office's output, but format specifiers, which start with a % character, indicate the location and method to interpret a piece of data (such equally a number) to characters.

"printf" is the name of one of the main C output functions, and stands for "impress formatted". printf format strings are complementary to scanf format strings, which provide formatted input (parsing). In both cases these provide simple functionality and stock-still format compared to more sophisticated and flexible template engines or parsers, just are sufficient for many purposes.

Many languages other than C copy the printf format cord syntax closely or exactly in their ain I/O functions.

Mismatches betwixt the format specifiers and type of the data can cause crashes and other vulnerabilities. The format string itself is very often a cord literal, which allows static analysis of the function call. Nonetheless, information technology can also be the value of a variable, which allows for dynamic formatting but besides a security vulnerability known every bit an uncontrolled format string exploit.

History [edit]

Early programming languages such equally Fortran used special statements with completely different syntax from other calculations to build formatting descriptions. In this example, the format is specified on line 601, and the WRITE command refers to it by line number:

                                                                WRITE                                OUTPUT                Tape                6                ,                601                ,                IA                ,                IB                ,                IC                ,                AREA                                  601                                FORMAT                (                four                H                A                =                ,                I5                ,                5                H                B                =                ,                I5                ,                5                H                C                =                ,                I5                ,                                                &                viii                H                Expanse                =                ,                F10                .                2                ,                thirteen                H                SQUARE                UNITS                )

ALGOL 68 had more function-like API, but still used special syntax (the $ delimiters surround special formatting syntax):

                        printf            ((            $            "Color "            g            ", number1 "            6            d            ,            ", number2 "            iv            zd            ,            ", hex "            xvi            r2d            ,            ", bladder "            -            d            .2            d            ,            ", unsigned value"            -three            d            "."            50$            ,                                                "red"            ,                                    123456            ,                                    89            ,                                    BIN                                    255            ,                                    3.xiv            ,                                    250            ));

But using the normal function calls and information types simplifies the language and compiler, and allows the implementation of the input/output to be written in the same language. These advantages outweigh the disadvantages (such as a complete lack of type safety in many instances) and in about newer languages I/O is not part of the syntax.

C's printf has its origins in BCPL's writef function (1966). In comparison to C and printf, *Due north is a BCPL linguistic communication escape sequence representing a newline character (for which C uses the escape sequence \n) and the order of the format specification's field width and blazon is reversed in writef:^[1]

                WRITEF("%I2-QUEENS PROBLEM HAS %I5 SOLUTIONS*Due north", NUMQUEENS, COUNT)

Probably the beginning copying of the syntax outside the C language was the Unix printf vanquish command, which beginning appeared in Version 4, as office of the port to C.^[2]

Format placeholder specification [edit]

Formatting takes place via placeholders within the format string. For case, if a program wanted to print out a person's age, it could nowadays the output by prefixing it with "Your historic period is ", and using the signed decimal specifier character d to denote that we desire the integer for the age to be shown immediately after that message, we may apply the format string:

                                printf                (                "Your age is %d"                ,                                                age                );

Syntax [edit]

The syntax for a format placeholder is

%[parameter][flags][width][.precision][length]blazon

Parameter field [edit]

This is a POSIX extension and not in C99. The Parameter field can be omitted or can be:

Character	Clarification
n$	n is the number of the parameter to display using this format specifier, assuasive the parameters provided to exist output multiple times, using varying format specifiers or in dissimilar orders. If whatever unmarried placeholder specifies a parameter, all the residual of the placeholders MUST also specify a parameter. For case, `printf("%2$d %2$#x; %1$d %i$#x",16,17)` produces `17 0x11; sixteen 0x10`.

This characteristic mainly sees its use in localization, where the order of occurrence of parameters vary due to the language-dependent convention.

On the not-POSIX Microsoft Windows, support for this feature is placed in a separate printf_p function.

Flags field [edit]

The Flags field tin can be zero or more (in any order) of:

Grapheme	Description
- (minus)	Left-align the output of this placeholder. (The default is to correct-align the output.)
+ (plus)	Prepends a plus for positive signed-numeric types. positive = +, negative = -. (The default doesn't prepend annihilation in front of positive numbers.)
(space)	Prepends a space for positive signed-numeric types. positive = , negative = -. This flag is ignored if the + flag exists. (The default doesn't prepend anything in front of positive numbers.)
0 (naught)	When the 'width' option is specified, prepends zeros for numeric types. (The default prepends spaces.) For example, `printf("%4X",3)` produces `3`, while `printf("%04X",3)` produces `0003`.
' (apostrophe)	The integer or exponent of a decimal has the thousands grouping separator applied.
# (hash)	Alternate form: For 1000 and G types, trailing zeros are not removed. For f, F, due east, E, yard, G types, the output always contains a decimal signal. For o, ten, X types, the text 0, 0x, 0X, respectively, is prepended to non-zero numbers.

Width field [edit]

The Width field specifies a minimum number of characters to output, and is typically used to pad stock-still-width fields in tabulated output, where the fields would otherwise be smaller, although it does not crusade truncation of oversized fields.

The width field may be omitted, or a numeric integer value, or a dynamic value when passed as another argument when indicated by an asterisk *.^[iii] For example, printf("%*d", 5, ten) will issue in 10 beingness printed, with a total width of 5 characters.

Though non function of the width field, a leading zero is interpreted as the zero-padding flag mentioned above, and a negative value is treated as the positive value in conjunction with the left-alignment - flag also mentioned above.

Precision field [edit]

The Precision field commonly specifies a maximum limit on the output, depending on the detail formatting type. For floating betoken numeric types, it specifies the number of digits to the right of the decimal betoken that the output should exist rounded. For the cord type, it limits the number of characters that should be output, after which the string is truncated.

The precision field may be omitted, or a numeric integer value, or a dynamic value when passed as another argument when indicated by an asterisk *. For example, printf("%.*s", iii, "abcdef") will result in abc being printed.

Length field [edit]

The Length field tin can be omitted or be whatever of:

Grapheme	Description
hh	For integer types, causes printf to look an int-sized integer statement which was promoted from a char.
h	For integer types, causes printf to expect an int-sized integer statement which was promoted from a short.
l	For integer types, causes printf to expect a long-sized integer argument. For floating point types, this is ignored. float arguments are always promoted to double when used in a varargs call.^[iv]
ll	For integer types, causes printf to wait a long long-sized integer argument.
Fifty	For floating point types, causes printf to wait a long double statement.
z	For integer types, causes printf to expect a size_t-sized integer argument.
j	For integer types, causes printf to expect a intmax_t-sized integer argument.
t	For integer types, causes printf to expect a ptrdiff_t-sized integer statement.

Additionally, several platform-specific length options came to exist prior to widespread use of the ISO C99 extensions:

Characters	Description
I	For signed integer types, causes printf to look ptrdiff_t-sized integer argument; for unsigned integer types, causes printf to await size_t-sized integer statement. Commonly found in Win32/Win64 platforms.
I32	For integer types, causes printf to expect a 32-scrap (double word) integer argument. Commonly found in Win32/Win64 platforms.
I64	For integer types, causes printf to expect a 64-scrap (quad word) integer argument. Ordinarily found in Win32/Win64 platforms.
q	For integer types, causes printf to look a 64-bit (quad discussion) integer argument. Commonly found in BSD platforms.

ISO C99 includes the inttypes.h header file that includes a number of macros for use in platform-contained printf coding. These must be exterior double-quotes, e.g. printf("%" PRId64 "\n", t);

Case macros include:

Macro	Description
PRId32	Typically equivalent to I32d (Win32/Win64) or d
PRId64	Typically equivalent to I64d (Win32/Win64), lld (32-bit platforms) or ld (64-scrap platforms)
PRIi32	Typically equivalent to I32i (Win32/Win64) or i
PRIi64	Typically equivalent to I64i (Win32/Win64), lli (32-bit platforms) or li (64-bit platforms)
PRIu32	Typically equivalent to I32u (Win32/Win64) or u
PRIu64	Typically equivalent to I64u (Win32/Win64), llu (32-bit platforms) or lu (64-bit platforms)
PRIx32	Typically equivalent to I32x (Win32/Win64) or ten
PRIx64	Typically equivalent to I64x (Win32/Win64), llx (32-fleck platforms) or 60 (64-bit platforms)

Type field [edit]

The Blazon field can be any of:

Character	Clarification
%	Prints a literal % character (this type doesn't accept whatever flags, width, precision, length fields).
d, i	int every bit a signed integer. %d and %i are synonymous for output, only are different when used with `scanf` for input (where using %i volition translate a number as hexadecimal if information technology'south preceded by 0x, and octal if information technology's preceded by 0.)
u	Print decimal unsigned int.
f, F	double in normal (fixed-signal) annotation. f and F but differs in how the strings for an infinite number or NaN are printed (inf, infinity and nan for f; INF, INFINITY and NAN for F).
eastward, E	double value in standard form ( d.ddde±dd ). An E conversion uses the letter East (rather than e) to innovate the exponent. The exponent always contains at least two digits; if the value is aught, the exponent is 00. In Windows, the exponent contains three digits by default, east.m. 1.5e002, but this can be altered past Microsoft-specific `_set_output_format` part.
g, K	double in either normal or exponential notation, whichever is more than appropriate for its magnitude. g uses lower-case letters, G uses upper-case letters. This type differs slightly from fixed-point notation in that insignificant zeroes to the right of the decimal point are non included. Also, the decimal signal is not included on whole numbers.
x, X	unsigned int as a hexadecimal number. ten uses lower-case messages and X uses upper-case.
o	unsigned int in octal.
s	cipher-terminated string.
c	char (character).
p	void* (pointer to void) in an implementation-defined format.
a, A	double in hexadecimal notation, starting with 0x or 0X. a uses lower-case letters, A uses upper-instance letters.^[5] ^[6] (C++11 iostreams take a hexfloat that works the same).
n	Print naught, but writes the number of characters written and then far into an integer pointer parameter. In Java this prints a newline.^[7]

Custom format placeholders [edit]

At that place are a few implementations of printf-like functions that allow extensions to the escape-character-based mini-language, thus allowing the programmer to accept a specific formatting function for non-builtin types. I of the most well-known is the (now deprecated) glibc's register_printf_function(). Still, it is rarely used due to the fact that it conflicts with static format string checking. Another is Vstr custom formatters, which allows adding multi-character format names.

Some applications (like the Apache HTTP Server) include their own printf-similar role, and embed extensions into it. All the same these all tend to have the aforementioned problems that register_printf_function() has.

The Linux kernel printk function supports a number of ways to display kernel structures using the generic %p specification, by appending additional format characters.^[8] For example, %pI4 prints an IPv4 address in dotted-decimal form. This allows static format string checking (of the %p portion) at the expense of full compatibility with normal printf.

Most languages that take a printf-like office work around the lack of this feature by just using the %s format and converting the object to a string representation.

Vulnerabilities [edit]

Invalid conversion specifications [edit]

If there are too few office arguments provided to supply values for all the conversion specifications in the template string, or if the arguments are not of the correct types, the results are undefined, may crash. Implementations are inconsistent about whether syntax errors in the cord swallow an argument and what type of argument they consume. Backlog arguments are ignored. In a number of cases, the undefined beliefs has led to "Format string attack" security vulnerabilities. In near C or C++ calling conventions arguments may be passed on the stack, which means in the case of too few arguments printf will read by the end of the current stackframe, thus allowing the assailant to read the stack.

Some compilers, similar the GNU Compiler Collection, will statically check the format strings of printf-like functions and warn about problems (when using the flags -Wall or -Wformat). GCC volition also warn nearly user-defined printf-manner functions if the non-standard "format" __attribute__ is applied to the function.

Field width versus explicit delimiters in tabular output [edit]

Using only field widths to provide for tabulation, equally with a format like %8d%8d%8d for three integers in three 8-grapheme columns, volition non guarantee that field separation will be retained if large numbers occur in the data. Loss of field separation can hands atomic number 82 to corrupt output. In systems which encourage the use of programs every bit edifice blocks in scripts, such decadent data can often be forwarded into and decadent further processing, regardless of whether the original programmer expected the output would only be read by human being eyes. Such problems tin can be eliminated past including explicit delimiters, even spaces, in all tabular output formats. Only changing the unsafe instance from before to %7d %7d %7d addresses this, formatting identically until numbers get larger, but then explicitly preventing them from becoming merged on output due to the explicitly included spaces. Similar strategies apply to string information.

Memory write [edit]

Although an outputting function on the surface, printf allows for write to a memory location specified by an argument via %n. This functionality is occasionally used as a part of more than elaborate format string attacks.^[nine]

The %n functionality also makes printf accidentally Turing complete even with a well-formed prepare of arguments. A game of tic-tac-toe written in the format string is a winner of the 27th IOCCC.^[10]

Programming languages with printf [edit]

Languages that employ format strings that deviate from the style in this article (such as AMPL and Elixir), languages that inherit their implementation from the JVM or other environment (such equally Clojure and Scala), and languages that do not take a standard native printf implementation but have external libraries which emulate printf behavior (such as JavaScript) are not included in this list.

awk (via sprintf)
C
- C++ (too provides overloaded shift operators and manipulators equally an alternative for formatted output – see iostream and iomanip)
- Objective-C
D
F#
G (LabVIEW)
GNU MathProg
GNU Octave
Go
Haskell
J
Java (since version 1.v) and JVM languages
Julia (via its Printf standard library;^[eleven] Formatting.jl library adds Python-style general formatting and "c-fashion office of this bundle aims to get effectually the limitation that @sprintf has to have a literal string argument.")
Lua (string.format)
Maple
MATLAB
Max (via the sprintf object)
Mythryl
PARI/GP
Perl
PHP
Python (via % operator)^[12]
R
Raku (via printf, sprintf, and fmt)
Red/Organization
Ruby
Tcl (via format control)
Transact-SQL (via xp_sprintf)
Vala (via print() and FileStream.printf())
The printf utility command, sometimes built in to the shell, such as with some implementations of the KornShell (ksh), Bourne again beat (bash), or Z beat (zsh). These commands usually interpret C escapes in the format string.

Come across also [edit]

Format (Mutual Lisp)
C standard library
Format cord assault
iostream
ML (programming language)
printf debugging
printf (Unix)
printk (print kernel messages)
scanf
cord interpolation

References [edit]

^ "BCPL". cl.cam.ac.uk . Retrieved 19 March 2018.
^ McIlroy, M. D. (1987). A Research Unix reader: annotated excerpts from the Programmer's Transmission, 1971–1986 (PDF) (Technical written report). CSTR. Bong Labs. 139.
^ "printf - C++ Reference". cplusplus.com . Retrieved x June 2020.
^ ISO/IEC (1999). ISO/IEC 9899:1999(Due east): Programming Languages - C §7.19.half-dozen.1 para 7
^ ""The GNU C Library Reference Manual", "12.12.3 Table of Output Conversions"". Gnu.org. Retrieved 17 March 2014.
^ "printf" (%a added in C99)
^ "Formatting Numeric Impress Output". The Java Tutorials. Oracle Inc. Retrieved 19 March 2018.
^ "Linux kernel Documentation/printk-formats.txt". Git.kernel.org. Retrieved 17 March 2014.
^ https://www.exploit-db.com/docs/english/28476-linux-format-string-exploitation.pdf
^ https://www.ioccc.org/2020/carlini/index.html
^ "Printf Standard Library". The Julia Language Manual . Retrieved 22 February 2021.
^ "Built-in Types: printf-fashion String Formatting", The Python Standard Library, Python Software Foundation, retrieved 24 February 2021

External links [edit]

C++ reference for std::fprintf
gcc printf format specifications quick reference
printf : print formatted output – System Interfaces Reference, The Single UNIX Specification, Result seven from The Open up Group
The Formatter specification in Java ane.5
GNU Bash printf(1) builtin

millerhumbecioned.blogspot.com

Source: https://en.wikipedia.org/wiki/Printf_format_string