String
String = object that contains a sequence of character data.
Python provides a rich set of operators, functions, and methods for working with strings.
String Operators
The +
Operator
The +
operator concatenates strings. It returns a string consisting of the operands joined together
Example
>>> str_1 = "Hello"
>>> str_2 = "World"
>>> str_1 + str_2
HelloWorld
The *
Operator
The *
operator creates multiple copies of a string. If s
is a string and n
is an integer, either of the following expressions returns a string consisting of n
concatenated copies of s
:
s * n
n * s
If n
<= 0, then the result is an empty string.
Example
>>> s = 'foo.'
>>> s * 4 # n > 0
'foo.foo.foo.foo.'
>>> 4 * s
'foo.foo.foo.foo.'
n <= 0:
>>> 'foo' * -8 # n <= 0
''
The in
Operator
As string is essentially a list of characters, the membership operator in
can also be used with string.
Example
>>> s = 'foo'
>>> s in 'That\'s food for thought.'
True
>>> s in 'That\'s good for now.'
False
>>> 'z' not in 'abc'
True
>>> 'z' not in 'xyz'
False
Build-in String Functions
Function | Description |
---|---|
chr() | Converts an integer to a character |
ord() | Converts a character to an integer |
len() | Returns the length of a string |
str() | Returns a string representation of an object |
ord(c)
Returns an integer value (ASCII or Unicode) for the given character.
Example
>>> ord('a')
97
>>> ord('#')
35
>>> ord('€')
8364
>>> ord('∑')
8721
chr(n)
chr()
does the reverse of ord()
. Given a numeric value n
, chr(n)
returns a string representing the character that corresponds to n
.
Example
>>> chr(97)
'a'
>>> chr(35)
'#'
>>> chr(97)
'a'
>>> chr(35)
'#'
len(s)
Returns the length (number of characters) of a string.
Example
>>> s = 'I am a string.'
>>> len(s)
14
str(obj)
Returns a string representation of an object.
Example
>>> str(49.2)
'49.2'
>>> str(3+4j)
'(3+4j)'
>>> str(3 + 29)
'32'
>>> str('foo')
'foo'
List Operations for String
In Python, strings are ordered sequences of character data. Therefore, list operations (indexing, slicing, negative indexing) also work with string.
Interpolating Variables into a String
Since Python 3.6, a new powerful formatting mechanism was introduced.
This feature is formally named the Formatted String Literal, but is more usually referred to by its nickname f-string.
More about f-string see: f-string
Modify Strings
In a nutshell, you can’t. Strings are one of the data types Python considers immutable, meaning not able to be changed.
You can usually easily accomplish what you want by generating a copy of the original string that has the desired change in place. Two possibilities:
Assign a new string to the variable
>>> s = 'foobar' >>> s = s[:3] + 'x' + s[4:] >>> s 'fooxar'
Use built-in string method
>>> s = 'foobar' >>> s = s.replace('b', 'x') >>> s 'fooxar'
Built-in String Methods
Python provides a lot of useful built-in methods for string objects.
- Function: a callable procedure that you can invoke to perform specific tasks.
- Method: a specialized type of callable procedure that is tightly associated with an object. Like a function, a method is called to perform a distinct task, but it is invoked on a specific object and has knowledge of its target object during execution.
[]
) are optional.Case Conversion
Methods in this group perform case conversion on the target string.
s.capitalize()
Returns a copy of s
with the first character converted to uppercase and all other characters converted to lowercase. Non-alphabetic characters are unchanged.
Example
>>> s = 'foo123#BAR#.'
>>> s.capitalize()
'Foo123#bar#.'
s.lower()
Returns a copy of s
with all alphabetic characters converted to lowercase.
s.swapcase()
Returns a copy of s
with uppercase alphabetic characters converted to lowercase and vice versa.
Example
>>> 'FOO Bar 123 baz qUX'.swapcase()
'foo bAR 123 BAZ Qux'
s.title()
Returns a copy of s
in which the first letter of each word is converted to uppercase and remaining letters are lowercase.
Example
>>> "what's happened to ted's IBM stock?".title()
"What'S Happened To Ted'S Ibm Stock?"
s.upper()
Returns a copy of s
with all alphabetic characters converted to uppercase.
Find and Replace
- These methods provide various means of searching the target string for a specified substring.
- Each method in this group supports optional
<start>
and<end>
arguments.- The action of the method is restricted to the portion of the target string starting at character position
<start>
and proceeding up to but NOT including character position<end>
- If
<start>
is specified but<end>
is not, the method applies to the portion of the target string from<start>
through the end of the string.
- The action of the method is restricted to the portion of the target string starting at character position
s.count(<sub>[, <start>[, <end>]])
- Counts occurrences of a substring in the target string.
- returns the number of non-overlapping occurrences of substring
<sub>
ins
:
Example
>>> 'foo goo moo'.count('oo')
3
>>> 'foo goo moo'.count('oo', 0, 8)
2
s.endswith(<suffix>[, <start>[, <end>]])
- Determines whether the target string ends with a given substring
- returns
True
ifs
ends with the specified<suffix>
andFalse
otherwise
Example
>>> 'foobar'.endswith('bar')
True
>>> 'foobar'.endswith('baz')
False
>>> 'foobar'.endswith('oob', 0, 4)
True
>>> 'foobar'.endswith('oob', 2, 4)
False
s.find(<sub>[, <start>[, <end>]])
- Searches the target string for a given substring. You can use
.find()
to see if a Python string contains a particular substring. - Returns the lowest index in
s
where substring<sub>
is found - Returns
-1
if the specified substring is not found
Example
>>> 'foo bar foo baz foo qux'.find('foo')
0
>>> 'foo bar foo baz foo qux'.find('grault')
-1
>>> 'foo bar foo baz foo qux'.find('foo', 4)
8
>>> 'foo bar foo baz foo qux'.find('foo', 4, 7)
-1
s.index(<sub>[, <start>[, <end>]])
Identical to .find()
, except that it raises an exception if <sub>
is not found rather than returning -1
s.rfind(<sub>[, <start>[, <end>]])
Searches the target string for a given substring starting at the end.
Returns the highest index in
s
where substring<sub>
is foundReturns
-1
if the substring is not found
Example
>>> 'foo bar foo baz foo qux'.rfind('foo')
16
>>> 'foo bar foo baz foo qux'.rfind('grault')
-1
>>> 'foo bar foo baz foo qux'.rfind('foo', 0, 14)
8
>>> 'foo bar foo baz foo qux'.rfind('foo', 10, 14)
-1
s.rindex(<sub>[, <start>[, <end>]])
Identical to .rfind()
, except that it raises an exception if <sub>
is not found rather than returning -1
s.startswith(<prefix>[, <start>[, <end>]])
- Determines whether the target string starts with a given substring.
- Returns
True
ifs
starts with the specified<prefix>
andFalse
otherwise
Example
>>> 'foobar'.startswith('foo')
True
>>> 'foobar'.startswith('bar')
False
>>> 'foobar'.startswith('bar', 3)
True
>>> 'foobar'.startswith('bar', 3, 2)
False
Character Classification
Classify a string based on the characters it contains.
s.isalnum()
- Determines whether the target string consists of alphanumeric characters
- Returns
True
ifs
is nonempty and all its characters are alphanumeric (either a letter or a number), andFalse
otherwise
Example
>>> 'abc123'.isalnum()
True
>>> 'abc$123'.isalnum()
False
>>> ''.isalnum()
False
s.isalpha()
- Determines whether the target string consists of alphabetic characters.
s.isalpha()
returnsTrue
ifs
is nonempty and all its characters are alphabetic, andFalse
otherwise
Example
>>> 'ABCabc'.isalpha()
True
>>> 'abc123'.isalpha()
False
s.isdigit()
Determines whether the target string consists of digit characters. You can use the
.isdigit()
Python method to check if your string is made of only digits.Returns
True
ifs
is nonempty and all its characters are numeric digits, andFalse
otherwise
Example
>>> '123'.isdigit()
True
>>> '123abc'.isdigit()
False
s.isidentifier()
- Determines whether the target string is a valid Python identifier.
- Returns
True
ifs
is a valid Python identifier according to the language definition, andFalse
otherwise
Example
>>> 'foo32'.isidentifier()
True
>>> '32foo'.isidentifier()
False
>>> 'foo$32'.isidentifier()
False
.isidentifier()
will return True
for a string that matches a Python keyword even though that would not actually be a valid identifier, e.g.,
>>> 'and'.isidentifier() # and is a keyword in python
True
To test whether a string matches a Python keyword, use keyword.iskeyword():
>>> from keyword import iskeyword
>>> iskeyword('and')
True
If you really want to ensure that a string would serve as a valid Python identifier, you should check that .isidentifier()
is True
and that iskeyword()
is False
s.islower()
- Determines whether the target string’s alphabetic characters are lowercase.
- returns
True
ifs
is nonempty and all the alphabetic characters it contains are lowercase, andFalse
otherwise. Non-alphabetic characters are ignore.
Example
>>> 'abc'.islower()
True
>>> 'abc1$d'.islower()
True
>>> 'Abc1$D'.islower()
False
s.isprintable()
- Returns
True
ifs
is empty or all the alphabetic characters it contains are printable. - Returns
False
ifs
contains at least one non-printable character. - Non-alphabetic characters are ignored
Example
>>> 'a\tb'.isprintable()
False
>>> 'a b'.isprintable()
True
>>> ''.isprintable()
True
>>> 'a\nb'.isprintable()
False
.isxxxx()
method that returns True
if s
is an empty string. All the others return False
for an empty string.s.isspace()
- Determines whether the target string consists of whitespace characters.
- Returns
True
ifs
is nonempty and all characters are whitespace characters, andFalse
otherwise. - The most commonly encountered whitespace characters are
- space
' '
- tab
'\t'
- newline
'\n'
- space
Example
>>> ' \t \n '.isspace()
True
>>> ' a '.isspace()
False
s.istitle()
- Determines whether the target string is title cased.
- Returns
True
ifs
is nonempty, the first alphabetic character of each word is uppercase, and all other alphabetic characters in each word are lowercase. (more intuitive: “Uppercase characters may only follow uncased characters and lowercase characters only cased ones.”)False
, otherwise
Example
>>> 'This Is A Title'.istitle()
True
>>> 'This is a title'.istitle()
False
>>> 'Give Me The #$#@ Ball!'.istitle()
True
s.isupper()
- Determines whether the target string’s alphabetic characters are uppercase.
- Returns
True
ifs
is nonempty and all the alphabetic characters it contains are uppercase, andFalse
otherwise. - Non-alphabetic characters are ignored
Example
>>> 'ABC'.isupper()
True
>>> 'ABC1$D'.isupper()
True
>>> 'Abc1$D'.isupper()
False
String Formatting
Modify or enhance the format of a string
s.center(<width>[, <fill>])
Centers a string in a field.
Returns a string consisting of
s
centered in a field of width<width>
. By default, padding consists of the ASCII space character>>> 'foo'.center(10) ' foo '
If the optional
<fill>
argument is specified, it is used as the padding character>>> 'bar'.center(10, '-') '---bar----'
If
s
is already at least as long as<width>
, it is returned unchanged>>> 'foo'.center(2) 'foo'
s.expandtabs(tabsize=8)
- Replaces each tab character (
'\t'
) with spaces. tabsize
is an optional keyword parameter specifying alternate tab stop columns. By default, spaces are filled in assuming a tab stop at every eighth column
Example
>>> 'a\tb\tc'.expandtabs()
'a b c'
>>> 'aaa\tbbb\tc'.expandtabs()
'aaa bbb c'
>>> 'a\tb\tc'.expandtabs(4)
'a b c'
>>> 'aaa\tbbb\tc'.expandtabs(tabsize=4)
'aaa bbb c'
s.ljust(<width>[, <fill>])
Left-justifies a string in field.
Returns a string consisting of
s
left-justified in a field of width<width>
>>> 'foo'.ljust(10) 'foo '
If the optional
<fill>
argument is specified, it is used as the padding character>>> 'foo'.ljust(10, '-') 'foo-------'
If
s
is already at least as long as<width>
, it is returned unchanged>>> 'foo'.ljust(2) 'foo'
s.lstrip([<chars>])
Trims leading characters from a string.
returns a copy of
s
with any whitespace characters removed from the left end>>> ' foo bar baz '.lstrip() 'foo bar baz ' >>> '\t\nfoo\t\nbar\t\nbaz'.lstrip() 'foo\t\nbar\t\nbaz'
If the optional
<chars>
argument is specified, it is a string that specifies the set of characters to be removed>>> 'http://www.realpython.com'.lstrip('/:pth') 'www.realpython.com'
s.replace(<old>, <new>[, <count>])
Replaces occurrences of a substring within a string.
Returns a copy of
s
with all occurrences of substring<old>
replaced by<new>
>>> 'foo bar foo baz foo qux'.replace('foo', 'grault') 'grault bar grault baz grault qux'
If the optional
<count>
argument is specified, a maximum of<count>
replacements are performed, starting at the left end ofs
>>> 'foo bar foo baz foo qux'.replace('foo', 'grault', 2) 'grault bar grault baz foo qux'
s.rjust(<width>[, <fill>])
- Right-justifies a string in a field.
- Works similarly to
s.ljust()
s.rstrip([<chars>])
- Trims trailing characters from a string.
- Works similarly to
s.lstrip()
s.strip([<chars>])
Strips characters from the left and right ends of a string.
Equivalent to
s.lstrip().rstrip()
As with
.lstrip()
and.rstrip()
, the optional<chars>
argument specifies the set of characters to be removed
s.zfill(<width>)
Pads a string on the left with zeros.
Returns a copy of
s
left-padded with'0'
characters to the specified<width>
>>> '42'.zfill(5) '00042'
If
s
contains a leading sign, it remains at the left edge of the result string after zeros are inserted>>> '+42'.zfill(8) '+0000042' >>> '-42'.zfill(8) '-0000042'
If
s
is already at least as long as<width>
, it is returned unchanged
Converting Between Strings and Lists
Convert between a string and some composite data type by either pasting objects together to make a string, or by breaking a string up into pieces.
s.join(<iterable>)
Concatenates strings from an iterable.
Example
>>> ', '.join(['foo', 'bar', 'baz', 'qux'])
'foo, bar, baz, qux'
In the following example, <iterable>
is specified as a single string value. When a string value is used as an iterable, it is interpreted as a list of the string’s individual characters.
>>> list('corge')
['c', 'o', 'r', 'g', 'e']
>>> ':'.join('corge')
'c:o:r:g:e'
s.partition(<sep>)
- Splits
s
at the first occurrence of string<sep>
. - The return value is a three-part tuple consisting of:
- The portion of
s
preceding<sep>
<sep>
itself- The portion of
s
following<sep>
- The portion of
Example
>>> 'foo.bar'.partition('.')
('foo', '.', 'bar')
>>> 'foo@@bar@@baz'.partition('@@')
('foo', '@@', 'bar@@baz')
If <sep>
is not found in s
, the returned tuple contains s
followed by two empty strings:
>>> 'foo.bar'.partition('@@')
('foo.bar', '', '')
s.rpartition()
Works exactly like s.partition(<sep>)
, except that s
is split at the last occurrence of <sep>
instead of the first occurrence
s.rsplit(sep=None, maxsplit=-1)
Splits a string into a list, starting from the right
Without arguments,
s.rsplit()
splitss
into substrings delimited by any sequence of whitespace and returns the substrings as a list>>> 'foo bar baz qux'.rsplit() ['foo', 'bar', 'baz', 'qux'] >>> 'foo\n\tbar baz\r\fqux'.rsplit() ['foo', 'bar', 'baz', 'qux']
If
<sep>
is specified, it is used as the delimiter for splitting>>> 'foo.bar.baz.qux'.rsplit(sep='.') ['foo', 'bar', 'baz', 'qux']
When
<sep>
is explicitly given as a delimiter, consecutive delimiters ins
are assumed to delimit empty strings, which will be returned>>> 'foo...bar'.rsplit(sep='.') ['foo', '', '', 'bar']
If the optional keyword parameter
<maxsplit>
is specified, a maximum of that many splits are performed, starting from the right end ofs
>>> 'www.realpython.com'.rsplit(sep='.', maxsplit=1) ['www.realpython', 'com']
s.split(sep=None, maxsplit=-1)
Behaves exactly like s.rsplit()
, except that if <maxsplit>
is specified, splits are counted from the left end of s
rather than the right end
s.splitlines([<keepends>])
Splits
s
at line boundaries up into lines and returns them in a list.Any of the following characters or character sequences is considered to constitute a line boundary:
Escape Sequence Character \n
Newline \r
Carriage Return \r\n
Carriage Return + Line Feed \v
or\x0b
Line Tabulation \f
or\x0c
Form Feed \x1c
File Separator \x1d
Group Separator \x1e
Record Separator \x85
Next Line (C1 Control Code) \u2028
Unicode Line Separator \u2029
Unicode Paragraph Separator
Advanced
Substring count with overlapping occurrences
Let’s say we want to count the occurrence of substring 11
in the string 1011101111
.
Note that in Python, the count()
method returns the number of substrings in a given string, but it does not give correct results when two occurrences of the substring overlap. However, we still have different solution for this problem.
string = "1011101111"
sub_string = "11"
Use built-in re
module
import re
Use [re.findall()](https://docs.python.org/3/library/re.html#re.findall)
:
>>> len(re.findall(f"(?={sub_string})", string))
5
Use re.subn
:
>>> re.subn(f"(?={sub_string})", "", string)[1]
5
Use built-in string methods
Use startswith()
:
def count_substring(string, sub_string):
count = 0
for pos in range(len(string)):
if string[pos:].startswith(sub_string):
count += 1
return count
>>> count_substring(string, sub_string)
5
Or in a more pythonic way, use list comprehension:
sum([string.startswith(sub_string, i) for i in range(len(string))])