String

Source: Real Python
String = object that contains a sequence of character data.
Python provides a rich set of operators, functions, and methods for working with strings.
String Operators
The + Operator
The + operator concatenates strings. It returns a string consisting of the operands joined together
Example
>>> str_1 = "Hello"
>>> str_2 = "World"
>>> str_1 + str_2
HelloWorld
The * Operator
The * operator creates multiple copies of a string. If s is a string and n is an integer, either of the following expressions returns a string consisting of n concatenated copies of s:
s * nn * s
If n <= 0, then the result is an empty string.
Example
>>> s = 'foo.'
>>> s * 4 # n > 0
'foo.foo.foo.foo.'
>>> 4 * s
'foo.foo.foo.foo.'
n <= 0:
>>> 'foo' * -8 # n <= 0
''
The in Operator
As string is essentially a list of characters, the membership operator in can also be used with string.
Example
>>> s = 'foo'
>>> s in 'That\'s food for thought.'
True
>>> s in 'That\'s good for now.'
False
>>> 'z' not in 'abc'
True
>>> 'z' not in 'xyz'
False
Build-in String Functions
| Function | Description |
|---|---|
chr() | Converts an integer to a character |
ord() | Converts a character to an integer |
len() | Returns the length of a string |
str() | Returns a string representation of an object |
ord(c)
Returns an integer value (ASCII or Unicode) for the given character.
Example
>>> ord('a')
97
>>> ord('#')
35
>>> ord('€')
8364
>>> ord('∑')
8721
chr(n)
chr() does the reverse of ord(). Given a numeric value n, chr(n) returns a string representing the character that corresponds to n.
Example
>>> chr(97)
'a'
>>> chr(35)
'#'
>>> chr(97)
'a'
>>> chr(35)
'#'
len(s)
Returns the length (number of characters) of a string.
Example
>>> s = 'I am a string.'
>>> len(s)
14
str(obj)
Returns a string representation of an object.
Example
>>> str(49.2)
'49.2'
>>> str(3+4j)
'(3+4j)'
>>> str(3 + 29)
'32'
>>> str('foo')
'foo'
List Operations for String
In Python, strings are ordered sequences of character data. Therefore, list operations (indexing, slicing, negative indexing) also work with string.
Interpolating Variables into a String
Since Python 3.6, a new powerful formatting mechanism was introduced.
This feature is formally named the Formatted String Literal, but is more usually referred to by its nickname f-string.
More about f-string see: f-string
Modify Strings
In a nutshell, you can’t. Strings are one of the data types Python considers immutable, meaning not able to be changed.
You can usually easily accomplish what you want by generating a copy of the original string that has the desired change in place. Two possibilities:
Assign a new string to the variable
>>> s = 'foobar' >>> s = s[:3] + 'x' + s[4:] >>> s 'fooxar'Use built-in string method
>>> s = 'foobar' >>> s = s.replace('b', 'x') >>> s 'fooxar'
Built-in String Methods
Python provides a lot of useful built-in methods for string objects.
- Function: a callable procedure that you can invoke to perform specific tasks.
- Method: a specialized type of callable procedure that is tightly associated with an object. Like a function, a method is called to perform a distinct task, but it is invoked on a specific object and has knowledge of its target object during execution.
[]) are optional.Case Conversion
Methods in this group perform case conversion on the target string.
s.capitalize()
Returns a copy of s with the first character converted to uppercase and all other characters converted to lowercase. Non-alphabetic characters are unchanged.
Example
>>> s = 'foo123#BAR#.'
>>> s.capitalize()
'Foo123#bar#.'
s.lower()
Returns a copy of s with all alphabetic characters converted to lowercase.
s.swapcase()
Returns a copy of s with uppercase alphabetic characters converted to lowercase and vice versa.
Example
>>> 'FOO Bar 123 baz qUX'.swapcase()
'foo bAR 123 BAZ Qux'
s.title()
Returns a copy of s in which the first letter of each word is converted to uppercase and remaining letters are lowercase.
Example
>>> "what's happened to ted's IBM stock?".title()
"What'S Happened To Ted'S Ibm Stock?"
s.upper()
Returns a copy of s with all alphabetic characters converted to uppercase.
Find and Replace
- These methods provide various means of searching the target string for a specified substring.
- Each method in this group supports optional
<start>and<end>arguments.- The action of the method is restricted to the portion of the target string starting at character position
<start>and proceeding up to but NOT including character position<end> - If
<start>is specified but<end>is not, the method applies to the portion of the target string from<start>through the end of the string.
- The action of the method is restricted to the portion of the target string starting at character position
s.count(<sub>[, <start>[, <end>]])
- Counts occurrences of a substring in the target string.
- returns the number of non-overlapping occurrences of substring
<sub>ins:
Example
>>> 'foo goo moo'.count('oo')
3
>>> 'foo goo moo'.count('oo', 0, 8)
2
s.endswith(<suffix>[, <start>[, <end>]])
- Determines whether the target string ends with a given substring
- returns
Trueifsends with the specified<suffix>andFalseotherwise
Example
>>> 'foobar'.endswith('bar')
True
>>> 'foobar'.endswith('baz')
False
>>> 'foobar'.endswith('oob', 0, 4)
True
>>> 'foobar'.endswith('oob', 2, 4)
False
s.find(<sub>[, <start>[, <end>]])
- Searches the target string for a given substring. You can use
.find()to see if a Python string contains a particular substring. - Returns the lowest index in
swhere substring<sub>is found - Returns
-1if the specified substring is not found
Example
>>> 'foo bar foo baz foo qux'.find('foo')
0
>>> 'foo bar foo baz foo qux'.find('grault')
-1
>>> 'foo bar foo baz foo qux'.find('foo', 4)
8
>>> 'foo bar foo baz foo qux'.find('foo', 4, 7)
-1
s.index(<sub>[, <start>[, <end>]])
Identical to .find(), except that it raises an exception if <sub> is not found rather than returning -1
s.rfind(<sub>[, <start>[, <end>]])
Searches the target string for a given substring starting at the end.
Returns the highest index in
swhere substring<sub>is foundReturns
-1if the substring is not found
Example
>>> 'foo bar foo baz foo qux'.rfind('foo')
16
>>> 'foo bar foo baz foo qux'.rfind('grault')
-1
>>> 'foo bar foo baz foo qux'.rfind('foo', 0, 14)
8
>>> 'foo bar foo baz foo qux'.rfind('foo', 10, 14)
-1
s.rindex(<sub>[, <start>[, <end>]])
Identical to .rfind(), except that it raises an exception if <sub> is not found rather than returning -1
s.startswith(<prefix>[, <start>[, <end>]])
- Determines whether the target string starts with a given substring.
- Returns
Trueifsstarts with the specified<prefix>andFalseotherwise
Example
>>> 'foobar'.startswith('foo')
True
>>> 'foobar'.startswith('bar')
False
>>> 'foobar'.startswith('bar', 3)
True
>>> 'foobar'.startswith('bar', 3, 2)
False
Character Classification
Classify a string based on the characters it contains.
s.isalnum()
- Determines whether the target string consists of alphanumeric characters
- Returns
Trueifsis nonempty and all its characters are alphanumeric (either a letter or a number), andFalseotherwise
Example
>>> 'abc123'.isalnum()
True
>>> 'abc$123'.isalnum()
False
>>> ''.isalnum()
False
s.isalpha()
- Determines whether the target string consists of alphabetic characters.
s.isalpha()returnsTrueifsis nonempty and all its characters are alphabetic, andFalseotherwise
Example
>>> 'ABCabc'.isalpha()
True
>>> 'abc123'.isalpha()
False
s.isdigit()
Determines whether the target string consists of digit characters. You can use the
.isdigit()Python method to check if your string is made of only digits.Returns
Trueifsis nonempty and all its characters are numeric digits, andFalseotherwise
Example
>>> '123'.isdigit()
True
>>> '123abc'.isdigit()
False
s.isidentifier()
- Determines whether the target string is a valid Python identifier.
- Returns
Trueifsis a valid Python identifier according to the language definition, andFalseotherwise
Example
>>> 'foo32'.isidentifier()
True
>>> '32foo'.isidentifier()
False
>>> 'foo$32'.isidentifier()
False
.isidentifier() will return True for a string that matches a Python keyword even though that would not actually be a valid identifier, e.g.,
>>> 'and'.isidentifier() # and is a keyword in python
True
To test whether a string matches a Python keyword, use keyword.iskeyword():
>>> from keyword import iskeyword
>>> iskeyword('and')
True
If you really want to ensure that a string would serve as a valid Python identifier, you should check that .isidentifier() is True and that iskeyword() is False
s.islower()
- Determines whether the target string’s alphabetic characters are lowercase.
- returns
Trueifsis nonempty and all the alphabetic characters it contains are lowercase, andFalseotherwise. Non-alphabetic characters are ignore.
Example
>>> 'abc'.islower()
True
>>> 'abc1$d'.islower()
True
>>> 'Abc1$D'.islower()
False
s.isprintable()
- Returns
Trueifsis empty or all the alphabetic characters it contains are printable. - Returns
Falseifscontains at least one non-printable character. - Non-alphabetic characters are ignored
Example
>>> 'a\tb'.isprintable()
False
>>> 'a b'.isprintable()
True
>>> ''.isprintable()
True
>>> 'a\nb'.isprintable()
False
.isxxxx() method that returns True if s is an empty string. All the others return False for an empty string.s.isspace()
- Determines whether the target string consists of whitespace characters.
- Returns
Trueifsis nonempty and all characters are whitespace characters, andFalseotherwise. - The most commonly encountered whitespace characters are
- space
' ' - tab
'\t' - newline
'\n'
- space
Example
>>> ' \t \n '.isspace()
True
>>> ' a '.isspace()
False
s.istitle()
- Determines whether the target string is title cased.
- Returns
Trueifsis nonempty, the first alphabetic character of each word is uppercase, and all other alphabetic characters in each word are lowercase. (more intuitive: “Uppercase characters may only follow uncased characters and lowercase characters only cased ones.”)False, otherwise
Example
>>> 'This Is A Title'.istitle()
True
>>> 'This is a title'.istitle()
False
>>> 'Give Me The #$#@ Ball!'.istitle()
True
s.isupper()
- Determines whether the target string’s alphabetic characters are uppercase.
- Returns
Trueifsis nonempty and all the alphabetic characters it contains are uppercase, andFalseotherwise. - Non-alphabetic characters are ignored
Example
>>> 'ABC'.isupper()
True
>>> 'ABC1$D'.isupper()
True
>>> 'Abc1$D'.isupper()
False
String Formatting
Modify or enhance the format of a string
s.center(<width>[, <fill>])
Centers a string in a field.
Returns a string consisting of
scentered in a field of width<width>. By default, padding consists of the ASCII space character>>> 'foo'.center(10) ' foo 'If the optional
<fill>argument is specified, it is used as the padding character>>> 'bar'.center(10, '-') '---bar----'If
sis already at least as long as<width>, it is returned unchanged>>> 'foo'.center(2) 'foo'
s.expandtabs(tabsize=8)
- Replaces each tab character (
'\t') with spaces. tabsizeis an optional keyword parameter specifying alternate tab stop columns. By default, spaces are filled in assuming a tab stop at every eighth column
Example
>>> 'a\tb\tc'.expandtabs()
'a b c'
>>> 'aaa\tbbb\tc'.expandtabs()
'aaa bbb c'
>>> 'a\tb\tc'.expandtabs(4)
'a b c'
>>> 'aaa\tbbb\tc'.expandtabs(tabsize=4)
'aaa bbb c'
s.ljust(<width>[, <fill>])
Left-justifies a string in field.
Returns a string consisting of
sleft-justified in a field of width<width>>>> 'foo'.ljust(10) 'foo 'If the optional
<fill>argument is specified, it is used as the padding character>>> 'foo'.ljust(10, '-') 'foo-------'If
sis already at least as long as<width>, it is returned unchanged>>> 'foo'.ljust(2) 'foo'
s.lstrip([<chars>])
Trims leading characters from a string.
returns a copy of
swith any whitespace characters removed from the left end>>> ' foo bar baz '.lstrip() 'foo bar baz ' >>> '\t\nfoo\t\nbar\t\nbaz'.lstrip() 'foo\t\nbar\t\nbaz'If the optional
<chars>argument is specified, it is a string that specifies the set of characters to be removed>>> 'http://www.realpython.com'.lstrip('/:pth') 'www.realpython.com'
s.replace(<old>, <new>[, <count>])
Replaces occurrences of a substring within a string.
Returns a copy of
swith all occurrences of substring<old>replaced by<new>>>> 'foo bar foo baz foo qux'.replace('foo', 'grault') 'grault bar grault baz grault qux'If the optional
<count>argument is specified, a maximum of<count>replacements are performed, starting at the left end ofs>>> 'foo bar foo baz foo qux'.replace('foo', 'grault', 2) 'grault bar grault baz foo qux'
s.rjust(<width>[, <fill>])
- Right-justifies a string in a field.
- Works similarly to
s.ljust()
s.rstrip([<chars>])
- Trims trailing characters from a string.
- Works similarly to
s.lstrip()
s.strip([<chars>])
Strips characters from the left and right ends of a string.
Equivalent to
s.lstrip().rstrip()As with
.lstrip()and.rstrip(), the optional<chars>argument specifies the set of characters to be removed
s.zfill(<width>)
Pads a string on the left with zeros.
Returns a copy of
sleft-padded with'0'characters to the specified<width>>>> '42'.zfill(5) '00042'If
scontains a leading sign, it remains at the left edge of the result string after zeros are inserted>>> '+42'.zfill(8) '+0000042' >>> '-42'.zfill(8) '-0000042'If
sis already at least as long as<width>, it is returned unchanged
Converting Between Strings and Lists
Convert between a string and some composite data type by either pasting objects together to make a string, or by breaking a string up into pieces.
s.join(<iterable>)
Concatenates strings from an iterable.
Example
>>> ', '.join(['foo', 'bar', 'baz', 'qux'])
'foo, bar, baz, qux'
In the following example, <iterable> is specified as a single string value. When a string value is used as an iterable, it is interpreted as a list of the string’s individual characters.
>>> list('corge')
['c', 'o', 'r', 'g', 'e']
>>> ':'.join('corge')
'c:o:r:g:e'
s.partition(<sep>)
- Splits
sat the first occurrence of string<sep>. - The return value is a three-part tuple consisting of:
- The portion of
spreceding<sep> <sep>itself- The portion of
sfollowing<sep>
- The portion of
Example
>>> 'foo.bar'.partition('.')
('foo', '.', 'bar')
>>> 'foo@@bar@@baz'.partition('@@')
('foo', '@@', 'bar@@baz')
If <sep> is not found in s, the returned tuple contains s followed by two empty strings:
>>> 'foo.bar'.partition('@@')
('foo.bar', '', '')
s.rpartition()
Works exactly like s.partition(<sep>), except that s is split at the last occurrence of <sep> instead of the first occurrence
s.rsplit(sep=None, maxsplit=-1)
Splits a string into a list, starting from the right
Without arguments,
s.rsplit()splitssinto substrings delimited by any sequence of whitespace and returns the substrings as a list>>> 'foo bar baz qux'.rsplit() ['foo', 'bar', 'baz', 'qux'] >>> 'foo\n\tbar baz\r\fqux'.rsplit() ['foo', 'bar', 'baz', 'qux']If
<sep>is specified, it is used as the delimiter for splitting>>> 'foo.bar.baz.qux'.rsplit(sep='.') ['foo', 'bar', 'baz', 'qux']When
<sep>is explicitly given as a delimiter, consecutive delimiters insare assumed to delimit empty strings, which will be returned>>> 'foo...bar'.rsplit(sep='.') ['foo', '', '', 'bar']
If the optional keyword parameter
<maxsplit>is specified, a maximum of that many splits are performed, starting from the right end ofs>>> 'www.realpython.com'.rsplit(sep='.', maxsplit=1) ['www.realpython', 'com']
s.split(sep=None, maxsplit=-1)
Behaves exactly like s.rsplit(), except that if <maxsplit> is specified, splits are counted from the left end of s rather than the right end
s.splitlines([<keepends>])
Splits
sat line boundaries up into lines and returns them in a list.Any of the following characters or character sequences is considered to constitute a line boundary:
Escape Sequence Character \nNewline \rCarriage Return \r\nCarriage Return + Line Feed \vor\x0bLine Tabulation \for\x0cForm Feed \x1cFile Separator \x1dGroup Separator \x1eRecord Separator \x85Next Line (C1 Control Code) \u2028Unicode Line Separator \u2029Unicode Paragraph Separator
Advanced
Substring count with overlapping occurrences
Let’s say we want to count the occurrence of substring 11 in the string 1011101111.
Note that in Python, the count() method returns the number of substrings in a given string, but it does not give correct results when two occurrences of the substring overlap. However, we still have different solution for this problem.
string = "1011101111"
sub_string = "11"
Use built-in re module
import re
Use [re.findall()](https://docs.python.org/3/library/re.html#re.findall):
>>> len(re.findall(f"(?={sub_string})", string))
5
Use re.subn :
>>> re.subn(f"(?={sub_string})", "", string)[1]
5
Use built-in string methods
Use startswith():
def count_substring(string, sub_string):
count = 0
for pos in range(len(string)):
if string[pos:].startswith(sub_string):
count += 1
return count
>>> count_substring(string, sub_string)
5
Or in a more pythonic way, use list comprehension:
sum([string.startswith(sub_string, i) for i in range(len(string))])