Python文字列操作チートシート

目的別のPython文字列操作テクニックまとめ

1. 文字列の生成と連結
2. 文字列の分割と置換
3. 文字列の検索と判定
4. 文字列の整形と変換
5. エンコーディングとデコーディング
- 5.1. str.encode(encoding='utf-8', errors='strict')
- 5.2. bytes.decode(encoding='utf-8', errors='strict')
6. 正規表現 (re モジュール)

1. 文字列の生成と連結

文字列リテラルの作成方法や、複数の文字列を結合する方法です。

1.1. 基本的な生成

シングルクォート'またはダブルクォート"で囲みます。

s1 = 'これは文字列です。'
s2 = "This is also a string."
print(s1)
print(s2)

1.2. 複数行文字列

トリプルクォート'''または"""で囲むと、改行を含む文字列を作成できます。

multi_line_str = """複数行にわたる
文字列を
定義できます。"""
print(multi_line_str)

1.3. `+` 演算子による連結

文字列同士を + で連結します。

str1 = "こんにちは"
str2 = "世界"
result = str1 + "、" + str2 + "！" # 変数やリテラルを連結
print(result) # 出力: こんにちは、世界！

注意点: 大量の文字列を連結する場合、+ 演算子は中間文字列を生成するため非効率になることがあります。その場合は str.join() を検討してください。

1.4. `*` 演算子による繰り返し

文字列を整数倍して繰り返します。

separator = "-" * 10 # "-" を 10 回繰り返す
print(separator) # 出力: ----------
laugh = "ha" * 3
print(laugh) # 出力: hahaha

1.5. `str()` による型変換

他のデータ型（数値など）を文字列に変換します。

age = 30
message = "私の年齢は" + str(age) + "歳です。" # int を str に変換して連結
print(message)
pi = 3.14159
pi_str = "円周率は " + str(pi) + " です。"
print(pi_str)

1.6. f-string (フォーマット済み文字列リテラル)

Python 3.6以降で推奨される、最も簡潔で読みやすいフォーマット方法です。文字列リテラルの前に f または F を付け、{} 内に変数や式を記述します。

name = "太郎"
age = 25
height = 175.5
# 基本的な使い方
info = f"名前: {name}, 年齢: {age}歳"
print(info) # 出力: 名前: 太郎, 年齢: 25歳
# 式も埋め込める
calculation = f"年齢の10年後は {age + 10} 歳です。"
print(calculation) # 出力: 年齢の10年後は 35 歳です。
# 書式指定も可能（例: 小数点以下2桁）
height_info = f"身長: {height:.2f} cm"
print(height_info) # 出力: 身長: 175.50 cm
# 波括弧 자체를 출력하고 싶은 경우
braces_info = f"波括弧を表示: {{}} , 変数: {name}"
print(braces_info) # 出力: 波括弧を表示: {} , 変数: 太郎

書式指定の詳細は Python公式ドキュメントの書式指定ミニ言語仕様を参照してください。

1.7. `str.format()` メソッド

f-stringが登場する前から使われているフォーマット方法です。{} プレースホルダーを文字列内に配置し、format() メソッドの引数で値を渡します。

item = "リンゴ"
price = 150
quantity = 3
# 位置引数
order1 = "商品: {}, 価格: {}円, 個数: {}".format(item, price, quantity)
print(order1) # 出力: 商品: リンゴ, 価格: 150円, 個数: 3
# インデックス指定
order2 = "商品: {0}, 個数: {2}, 価格: {1}円".format(item, price, quantity)
print(order2) # 出力: 商品: リンゴ, 個数: 3, 価格: 150円
# キーワード引数
order3 = "商品: {product}, 価格: {cost}円".format(product=item, cost=price)
print(order3) # 出力: 商品: リンゴ, 価格: 150円
# 書式指定（例: ゼロ埋め5桁）
order_id = 123
formatted_id = "注文番号: {:05d}".format(order_id)
print(formatted_id) # 出力: 注文番号: 00123

1.8. `%` 演算子 (旧スタイル)

C言語の printf スタイルに似た古い形式です。現在は f-string や str.format() の利用が推奨されますが、既存のコードやログ出力などで見られます。

name = "花子"
score = 85.5
# 文字列 (%s), 整数 (%d), 浮動小数点数 (%f)
result = "名前: %s, スコア: %.1f点" % (name, score) # %.1f は小数点以下1桁
print(result) # 出力: 名前: 花子, スコア: 85.5点
# 辞書を使ったマッピング
data = {"person": "次郎", "age": 28}
info = "名前: %(person)s, 年齢: %(age)d" % data
print(info) # 出力: 名前: 次郎, 年齢: 28

1.9. `str.join()` メソッド

文字列のリスト（や他のイテラブル）を、指定した区切り文字で連結します。+ 演算子よりも効率的な場合が多いです。

words = ["Python", "is", "fun"]
sentence = " ".join(words) # スペースで連結
print(sentence) # 出力: Python is fun
chars = ["a", "b", "c"]
combined = "".join(chars) # 区切り文字なしで連結
print(combined) # 出力: abc
data = ["2025", "04", "01"]
date_str = "-".join(data) # ハイフンで連結
print(date_str) # 出力: 2025-04-01
# 数値などが含まれる場合は、事前に文字列に変換する必要がある
items = ["apple", 100, "orange", 200]
# NG例: print(",".join(items)) -> TypeError
# OK例:
items_str = [str(item) for item in items]
print(", ".join(items_str)) # 出力: apple, 100, orange, 200

2. 文字列の分割と置換

文字列を特定のルールで分割したり、一部を別の文字列に置き換えたりします。

2.1. `str.split(sep=None, maxsplit=-1)`

指定した区切り文字 sep で文字列を分割し、リストを返します。

sep を省略または None にすると、空白文字（スペース、タブ、改行など）で分割します。連続する空白は1つの区切り文字として扱われます。
maxsplit で分割回数の最大値を指定できます。指定した回数分割すると、残りは最後の要素に含まれます。-1 は制限なし（デフォルト）。

text1 = "apple,orange,banana"
fruits = text1.split(',') # カンマで分割
print(fruits) # 出力: ['apple', 'orange', 'banana']
text2 = "one two three\nfour"
words = text2.split() # 空白文字で分割
print(words) # 出力: ['one', 'two', 'three', 'four']
text3 = "a-b-c-d-e"
parts1 = text3.split('-', maxsplit=2) # 最大2回分割
print(parts1) # 出力: ['a', 'b', 'c-d-e']
# 区切り文字が見つからない場合、元の文字列を含むリストを返す
text4 = "hello"
result = text4.split(',')
print(result) # 出力: ['hello']

2.2. `str.rsplit(sep=None, maxsplit=-1)`

split() と同様ですが、文字列の右側（末尾）から分割を開始します。maxsplit の挙動が異なります。

text = "a-b-c-d-e"
parts_r = text.rsplit('-', maxsplit=2) # 右から最大2回分割
print(parts_r) # 出力: ['a-b-c', 'd', 'e']

2.3. `str.splitlines(keepends=False)`

文字列を改行コード（\n, \r, \r\n など）で分割し、各行のリストを返します。

keepends=True にすると、返されるリストの各要素の末尾に改行コードが含まれます。

text_lines = "First line\nSecond line\r\nThird line"
lines = text_lines.splitlines()
print(lines) # 出力: ['First line', 'Second line', 'Third line']
lines_with_ends = text_lines.splitlines(keepends=True)
print(lines_with_ends) # 出力: ['First line\n', 'Second line\r\n', 'Third line']

2.4. `str.partition(sep)`

文字列内で最初に見つかった区切り文字 sep で文字列を3つの部分（sep の前の部分, sep 自身, sep の後の部分）に分割し、タプルで返します。

sep が見つからない場合は、(元の文字列, ”, ”) というタプルを返します。

text = "user@example.com"
parts = text.partition('@')
print(parts) # 出力: ('user', '@', 'example.com')
text_no_sep = "filename.txt"
parts_no_sep = text_no_sep.partition(':')
print(parts_no_sep) # 出力: ('filename.txt', '', '')

2.5. `str.rpartition(sep)`

partition() と同様ですが、文字列内で最後に現れた区切り文字 sep で分割します。

path = "/usr/local/bin/python"
dir_sep_file = path.rpartition('/')
print(dir_sep_file) # 出力: ('/usr/local/bin', '/', 'python')

2.6. `str.replace(old, new, count=-1)`

文字列中の部分文字列 old を new に置換します。

count で置換する最大回数を指定できます。-1 は全て置換（デフォルト）。
元の文字列は変更されず、新しい文字列が返されます。

message = "Hello world, hello Python!"
new_message1 = message.replace("hello", "Hi") # 大文字小文字を区別する
print(new_message1) # 出力: Hello world, Hi Python!
new_message2 = message.replace(" ", "_") # スペースをアンダースコアに置換
print(new_message2) # 出力: Hello_world,_hello_Python!
new_message3 = message.replace("l", "L", 3) # 最初から3つの 'l' を 'L' に置換
print(new_message3) # 出力: HeLLo worLd, hello Python!

2.7. `str.translate(table)` / `str.maketrans(x, y, z)`

複数の文字を一度に対応する文字に置換したり、特定の文字を削除したりします。maketrans() で変換テーブルを作成し、translate() に渡すのが一般的です。

maketrans(x, y): 文字列 x の各文字を、文字列 y の同じ位置にある文字にマッピングするテーブルを作成（x と y は同じ長さである必要があります）。
maketrans(x, y, z): 上記に加え、文字列 z に含まれる全ての文字を削除（None にマッピング）するテーブルを作成。
maketrans(dict): {文字コード(int): 置換先(str/int/None)} の辞書からテーブルを作成。

# 特定の文字を置換
text1 = "abcde"
trans_table1 = str.maketrans("ae", "AE") # 'a' -> 'A', 'e' -> 'E'
translated1 = text1.translate(trans_table1)
print(translated1) # 出力: AbcdE
# 特定の文字を削除
text2 = "Hello, world!"
trans_table2 = str.maketrans("", "", ",!") # カンマと感嘆符を削除
translated2 = text2.translate(trans_table2)
print(translated2) # 出力: Hello world
# 置換と削除を同時に行う
text3 = "Remove Vowels Example"
trans_table3 = str.maketrans("aeiou", "AEIOU", " ") # 母音を大文字にし、スペースを削除
translated3 = text3.translate(trans_table3)
print(translated3) # 出力: RmvVwlsExmpl
# 辞書を使ったテーブル作成 (全角英数を半角に変換する例)
zen = "ＡＢＣ１２３"
han = "ABC123"
# 全角 -> 半角 の変換テーブル（一部）
# ord() で文字コードを取得
trans_table_dict = str.maketrans( {ord(z): h for z, h in zip(zen, han)}
)
translated_dict = zen.translate(trans_table_dict)
print(translated_dict) # 出力: ABC123

3. 文字列の検索と判定

文字列内に特定のパターンが存在するかどうかを調べたり、文字列の特性（数字のみか、アルファベットのみかなど）を判定したりします。

3.1. `in` 演算子

部分文字列が文字列内に含まれているかを判定し、True / False を返します。

text = "Python programming is fun."
print("Python" in text) # 出力: True
print("Java" in text) # 出力: False
print("gram" in text) # 出力: True
print("Fun" in text) # 出力: False (大文字小文字を区別)

3.2. 検索メソッド: `find`, `rfind`, `index`, `rindex`

部分文字列が最初または最後に現れるインデックス（位置）を返します。

メソッド	説明	見つからない場合の戻り値	例
`find(sub[, start[, end]])`	文字列の左から部分文字列 `sub` を検索し、最初に見つかったインデックスを返す。	`-1`	`"abcabc".find("bc")` → `1`
`rfind(sub[, start[, end]])`	文字列の右から部分文字列 `sub` を検索し、最初に見つかったインデックスを返す。	`-1`	`"abcabc".rfind("bc")` → `4`
`index(sub[, start[, end]])`	`find()` と同様だが、見つからない場合にエラー (`ValueError`) を送出する。	`ValueError`	`"abcabc".index("bc")` → `1`
`rindex(sub[, start[, end]])`	`rfind()` と同様だが、見つからない場合にエラー (`ValueError`) を送出する。	`ValueError`	`"abcabc".rindex("bc")` → `4`

text = "This is a test string. This is fun."
print(f"find('is'): {text.find('is')}") # 出力: find('is'): 2
print(f"rfind('is'): {text.rfind('is')}") # 出力: rfind('is'): 26
print(f"find('is', 5): {text.find('is', 5)}") # 5文字目以降で検索 -> 出力: find('is', 5): 5
print(f"find('xyz'): {text.find('xyz')}") # 出力: find('xyz'): -1
try: print(f"index('is'): {text.index('is')}") # 出力: index('is'): 2 print(f"index('xyz'): {text.index('xyz')}") # ここで ValueError が発生
except ValueError as e: print(f"index('xyz') failed: {e}") # 出力: index('xyz') failed: substring not found

start と end 引数で検索範囲を指定できます（スライスと同様の指定方法）。

3.3. `str.count(sub[, start[, end]])`

文字列中に部分文字列 sub が出現する回数を返します。

text = "banana bandana"
print(f"count('an'): {text.count('an')}") # 出力: count('an'): 4
print(f"count('ana'): {text.count('ana')}") # 出力: count('ana'): 3
print(f"count('a'): {text.count('a')}") # 出力: count('a'): 6
print(f"count('an', 0, 6): {text.count('an', 0, 6)}") # 最初の6文字 "banana" 内で 'an' をカウント -> 出力: count('an', 0, 6): 2

3.4. `str.startswith(prefix[, start[, end]])`

文字列が指定した接頭辞 prefix で始まるかどうかを判定します。prefix は文字列のタプルでも指定でき、その場合はいずれか一つにマッチすれば True となります。

filename = "document.txt"
print(filename.startswith("doc")) # 出力: True
print(filename.startswith("Doc")) # 出力: False (大文字小文字を区別)
print(filename.startswith(("image", "doc"))) # "image" または "doc" で始まるか -> 出力: True
url = "https://example.com"
print(url.startswith("http")) # 出力: True
# 範囲を指定して判定 (8文字目以降が "example" で始まるか)
print(url.startswith("example", 8)) # 出力: True

3.5. `str.endswith(suffix[, start[, end]])`

文字列が指定した接尾辞 suffix で終わるかどうかを判定します。suffix もタプルで指定可能です。

filename = "report.pdf"
print(filename.endswith(".pdf")) # 出力: True
print(filename.endswith(".txt")) # 出力: False
print(filename.endswith((".pdf", ".docx"))) # ".pdf" または ".docx" で終わるか -> 出力: True
text = "Hello World"
# 範囲を指定して判定 (最初の5文字 "Hello" が "lo" で終わるか)
print(text.endswith("lo", 0, 5)) # 出力: True

3.6. 文字種別判定メソッド

文字列が特定の種類の文字（英数字、数字、空白など）だけで構成されているかを判定するメソッド群です。全て True / False を返します。空文字列に対しては全て False を返します。

メソッド	説明	例 (True)	例 (False)
`isalnum()`	全ての文字が英数字（アルファベット or 数字）であり、かつ1文字以上あるか。	`"abc123"`, `"Python3"`	`"abc 123"`, `"Python-3"`, `""`
`isalpha()`	全ての文字がアルファベットであり、かつ1文字以上あるか。	`"abc"`, `"Python"`, `"あいう"`	`"abc1"`, `"Python 3"`, `""`
`isascii()`	全ての文字がASCII文字（U+0000-U+007F）であるか、または空文字列か。	`"Hello"`, `"123!@#"`, `""`	`"こんにちは"`, `"你好"`
`isdigit()`	全ての文字が数字（0-9）であり、かつ1文字以上あるか。	`"12345"`	`"123.45"`, `"-123"`, `"123a"`, `"①②③"`, `""`
`isdecimal()`	全ての文字が十進数文字であり、かつ1文字以上あるか。(UnicodeのDecimalカテゴリ)	`"12345"`, `"٠١٢٣"` (アラビア数字)	`"123.45"`, `"①②③"` (数字だが十進数ではない), `"¹²³"`(上付き文字), `""`
`isnumeric()`	全ての文字が数値文字（数字、分数、ローマ数字、漢数字などを含む）であり、かつ1文字以上あるか。(UnicodeのNumericカテゴリ)	`"123"`, `"¹²³"`, `"½"`, `"一二三"`, `"①②③"`	`"123.45"`, `"-123"`, `"abc"`, `""`
`islower()`	全てのアルファベット文字が小文字であり、かつアルファベットが1文字以上含まれるか。	`"hello world"`, `"python 3"`	`"Hello world"`, `"PYTHON"`, `"123"`, `""`
`isupper()`	全てのアルファベット文字が大文字であり、かつアルファベットが1文字以上含まれるか。	`"HELLO WORLD"`, `"PYTHON 3"`	`"Hello world"`, `"python"`, `"123"`, `""`
`isspace()`	全ての文字が空白文字（スペース、タブ `\t`, 改行 `\n`, 復帰 `\r` など）であり、かつ1文字以上あるか。	`" "`, `"\t\n "`	`" a "`, `"Hello"`, `""`
`istitle()`	文字列がタイトルケース（単語の先頭が大文字で、残りが小文字）であり、かつアルファベットが1文字以上含まれるか。数字や記号の後の文字は小文字でもよい。	`"Title Case String"`, `"Python Is Fun"`, `"1st Word"`	`"Title case string"`, `"python is fun"`, `"TITLE"`, `""`
`isidentifier()`	文字列がPythonの有効な識別子（変数名、関数名など）として使えるか。キーワードは `False` になる。	`"variable_name"`, `"myFunc"`, `"_private"`, `"変数1"`	`"1variable"`, `"my-func"`, `"class"` (キーワード), `""`
`isprintable()`	文字列中の全ての文字が印字可能文字（改行 `\n` やタブ `\t` などの制御文字以外）であるか、または空文字列か。空白は印字可能とみなされる。	`"Hello World 123!?"`, `" "`, `""`	`"Hello\nWorld"`, `"abc\tdef"`

print(f"'abc123'.isalnum(): {'abc123'.isalnum()}") # True
print(f"'123'.isdigit(): {'123'.isdigit()}") # True
print(f"'¹²³'.isnumeric(): {'¹²³'.isnumeric()}") # True (isdigit() は False)
print(f"' '.isspace(): {' '.isspace()}") # True
print(f"'Hello World'.istitle(): {'Hello World'.istitle()}") # True
print(f"'my_var'.isidentifier(): {'my_var'.isidentifier()}") # True
print(f"'class'.isidentifier(): {'class'.isidentifier()}") # False (キーワード)
print(f"'你好'.isprintable(): {'你好'.isprintable()}") # True
print(f"'Hello\\n'.isprintable(): {'Hello\\n'.isprintable()}") # False

4. 文字列の整形と変換

文字列の見た目を整えたり、大文字/小文字を変換したり、特定の文字を除去したりします。

4.1. 空白除去メソッド: `strip`, `lstrip`, `rstrip`

文字列の両端、左端、右端から指定した文字（デフォルトは空白文字）を除去します。

メソッド	説明	例	出力
`strip([chars])`	両端から `chars` に含まれる文字を除去。`chars` 省略時は空白文字。	`" spacious ".strip()` `".,xyABCxy.,".strip('.,xy')`	`"spacious"` `"ABC"`
`lstrip([chars])`	左端（先頭）から `chars` に含まれる文字を除去。`chars` 省略時は空白文字。	`" spacious ".lstrip()` `".,xyABCxy.,".lstrip('.,xy')`	`"spacious "` `"ABCxy.,"`
`rstrip([chars])`	右端（末尾）から `chars` に含まれる文字を除去。`chars` 省略時は空白文字。	`" spacious ".rstrip()` `".,xyABCxy.,".rstrip('.,xy')`	`" spacious"` `".,xyABC"`

text = "\t \n Hello World \n\r "
print(f"Original: '{text}'")
print(f"strip(): '{text.strip()}'") # 両端の空白を除去
print(f"lstrip(): '{text.lstrip()}'") # 左端の空白を除去
print(f"rstrip(): '{text.rstrip()}'") # 右端の空白を除去
csv_data = "***,,value1,,***"
# '*' と ',' を両端から除去
print(f"strip('*,'): '{csv_data.strip('* ,')}'") # -> value1

4.2. 大文字/小文字変換メソッド

文字列内のアルファベットの大文字と小文字を変換します。

メソッド	説明	例	出力
`lower()`	全てのアルファベットを小文字に変換。	`"PyThoN".lower()`	`"python"`
`upper()`	全てのアルファベットを大文字に変換。	`"PyThoN".upper()`	`"PYTHON"`
`capitalize()`	文字列の先頭の文字を大文字に、残りを小文字に変換。	`"pyThoN is FUN".capitalize()`	`"Python is fun"`
`title()`	各単語の先頭文字を大文字に、残りを小文字に変換（タイトルケース）。単語の区切りは空白や記号など。	`"pyThoN is FUN".title()` `"they're".title()`	`"Python Is Fun"` `"They'Re"` (注意: アポストロフィの後も大文字になる)
`swapcase()`	大文字と小文字を入れ替える。	`"PyThoN is Fun".swapcase()`	`"pYtHOn IS fUN"`

text = "lOwEr, UPPER, Title, cAPITALIZE"
print(f"lower(): {text.lower()}")
print(f"upper(): {text.upper()}")
print(f"capitalize(): {text.capitalize()}")
print(f"title(): {text.title()}")
print(f"swapcase(): {text.swapcase()}")

4.3. 寄せ/埋め込みメソッド: `center`, `ljust`, `rjust`, `zfill`

指定した幅の中で文字列を配置したり、指定した文字で埋めたりします。

メソッド	説明	例	出力
`center(width[, fillchar])`	指定した `width` の中央に文字列を配置。余白は `fillchar` (デフォルトはスペース) で埋める。	`"abc".center(10)` `"abc".center(10, '-')`	`" abc "` `"---abc----"`
`ljust(width[, fillchar])`	指定した `width` の左側に文字列を配置。右側の余白は `fillchar` で埋める。	`"abc".ljust(10)` `"abc".ljust(10, '*')`	`"abc "` `"abc*******"`
`rjust(width[, fillchar])`	指定した `width` の右側に文字列を配置。左側の余白は `fillchar` で埋める。	`"abc".rjust(10)` `"abc".rjust(10, '0')`	`" abc"` `"0000000abc"`
`zfill(width)`	指定した `width` になるように、文字列の左側をゼロ ‘0’ で埋める。符号 (`+`, `-`) は先頭に維持される。	`"42".zfill(5)` `"-42".zfill(5)`	`"00042"` `"-0042"`

text = "Data"
width = 12
print(f"center: '{text.center(width, '=')}'")
print(f"ljust: '{text.ljust(width)}'")
print(f"rjust: '{text.rjust(width, '_')}'")
print(f"zfill: '{'123'.zfill(width)}'")
print(f"zfill: '{'-123'.zfill(width)}'")

4.4. `str.expandtabs(tabsize=8)`

文字列中のタブ文字 \t を、指定された tabsize (デフォルトは8) のスペースに置き換えます。タブ位置は累積的に計算されます。

text = "col1\tcol2\tcol3"
expanded_default = text.expandtabs()
expanded_custom = text.expandtabs(tabsize=4)
print(f"Original: '{text}'")
print(f"Expanded (8): '{expanded_default}'") # col1 の後 4 スペース, col2 の後 4 スペース
print(f"Expanded (4): '{expanded_custom}'") # col1 の後 1 スペース, col2 の後 1 スペース
text2 = "a\tbc\tdef\tghij"
print(f"'{text2.expandtabs(4)}'") # a の後 3, bc の後 2, def の後 1 スペース

4.5. `str.removeprefix(prefix)` (Python 3.9+)

文字列が指定した接頭辞 prefix で始まる場合、その接頭辞を削除した新しい文字列を返します。始まらない場合は元の文字列をそのまま返します。

url = "https://example.com"
print(url.removeprefix("https://")) # 出力: example.com
print(url.removeprefix("http://")) # 出力: https://example.com (変化なし)
filename = "test_data.csv"
print(filename.removeprefix("test_")) # 出力: data.csv

4.6. `str.removesuffix(suffix)` (Python 3.9+)

文字列が指定した接尾辞 suffix で終わる場合、その接尾辞を削除した新しい文字列を返します。終わらない場合は元の文字列をそのまま返します。

filename = "image.jpg"
print(filename.removesuffix(".jpg")) # 出力: image
print(filename.removesuffix(".png")) # 出力: image.jpg (変化なし)
code = "function();"
print(code.removesuffix("();")) # 出力: function

5. エンコーディングとデコーディング

コンピュータが内部で扱うバイト列（bytes）と、人間が読む文字列（str）を相互に変換します。ファイル入出力やネットワーク通信で重要です。

5.1. `str.encode(encoding='utf-8', errors='strict')`

文字列 (str) を指定した encoding を使ってバイト列 (bytes) に変換（エンコード）します。

一般的なエンコーディング: 'utf-8' (推奨), 'shift_jis' (Windows日本語), 'euc-jp' (Unix日本語), 'cp932' (Shift_JISの亜種)
errors: エンコードできない文字があった場合の処理方法を指定します。
- 'strict': UnicodeEncodeError を送出 (デフォルト)。
- 'ignore': エンコードできない文字を無視する。
- 'replace': エンコードできない文字を ? に置き換える。
- 'xmlcharrefreplace': XML文字参照 (例: Ӓ) に置き換える。
- 'backslashreplace': Pythonのバックスラッシュエスケープシーケンス (例: \u1234) に置き換える。

text_jp = "こんにちは世界"
text_en = "Hello World"
# UTF-8 (デフォルト)
bytes_utf8 = text_jp.encode() # encoding 省略時は 'utf-8'
print(f"UTF-8: {bytes_utf8}")
# Shift_JIS
try: bytes_sjis = text_jp.encode('shift_jis') print(f"Shift_JIS: {bytes_sjis}")
except UnicodeEncodeError as e: print(f"Shift_JISエンコードエラー: {e}") # Shift_JISにない文字が含まれるとエラー
# エラーハンドリング
text_mixed = "① Python" # ① は Shift_JIS にない
bytes_sjis_ignore = text_mixed.encode('shift_jis', errors='ignore')
print(f"Shift_JIS (ignore): {bytes_sjis_ignore}") # -> b' Python'
bytes_sjis_replace = text_mixed.encode('shift_jis', errors='replace')
print(f"Shift_JIS (replace): {bytes_sjis_replace}") # -> b'? Python'

5.2. `bytes.decode(encoding='utf-8', errors='strict')`

バイト列 (bytes) を指定した encoding を使って文字列 (str) に変換（デコード）します。

encoding と errors は encode() と同様ですが、デコードできないバイトシーケンスに対する処理になります。
errors='strict' の場合、不正なバイトシーケンスがあると UnicodeDecodeError を送出します。

bytes_utf8 = b'\xe3\x81\x93\xe3\x82\x93\xe3\x81\xab\xe3\x81\xa1\xe3\x81\xaf\xe4\xb8\x96\xe7\x95\x8c'
bytes_sjis = b'\x82\xb1\x82\xf1\x82\xc9\x82\xbf\x82\xcd\x90\xa2\x8a\x E9' # こんにちは世界 (Shift_JIS)
# UTF-8 (デフォルト)
str_from_utf8 = bytes_utf8.decode() # encoding 省略時は 'utf-8'
print(f"Decoded from UTF-8: {str_from_utf8}")
# Shift_JIS
str_from_sjis = bytes_sjis.decode('shift_jis')
print(f"Decoded from Shift_JIS: {str_from_sjis}")
# 間違ったエンコーディングでデコードしようとするとエラーになるか、文字化けする
try: # Shift_JISのバイト列をUTF-8でデコード str_error = bytes_sjis.decode('utf-8', errors='strict') print(str_error)
except UnicodeDecodeError as e: print(f"デコードエラー: {e}")
# errors='replace' で文字化け部分を置換
str_replace = bytes_sjis.decode('utf-8', errors='replace')
print(f"Decoded with replace: {str_replace}") # -> ���󂲂����ԁA�E�

重要: エンコード時に使用したエンコーディングと、デコード時に使用するエンコーディングは、原則として一致させる必要があります。異なるエンコーディングを使用すると、文字化けやエラーの原因となります。

6. 正規表現 (`re` モジュール)

複雑なパターンマッチングや文字列操作を行うための強力なツールです。import re して利用します。

ここでは基本的な関数を紹介します。正規表現パターン自体の書き方は多岐にわたるため、別途学習が必要です。(参考: Python公式 re ドキュメント, 正規表現 HOWTO)

6.1. 検索: `re.search(pattern, string, flags=0)`

文字列 string 全体を検索し、正規表現 pattern に最初にマッチした箇所をマッチオブジェクトとして返します。マッチしない場合は None を返します。

import re
text = "Email: user1@example.com, user2@sample.net"
pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b' # 簡単なメールアドレスパターン
match = re.search(pattern, text)
if match: print(f"最初にマッチしたメールアドレス: {match.group(0)}") # マッチした文字列全体 print(f"マッチした範囲: {match.span()}")
else: print("マッチしませんでした。")

6.2. 先頭からマッチ: `re.match(pattern, string, flags=0)`

文字列 string の先頭が正規表現 pattern にマッチするかを試みます。先頭からマッチしない場合は None を返します。

import re
text1 = "Python is fun"
text2 = "Not Python"
pattern = r"Python"
match1 = re.match(pattern, text1) # 先頭が "Python" なのでマッチ
print(f"text1 match: {match1.group(0) if match1 else None}") # -> Python
match2 = re.match(pattern, text2) # 先頭が "Not" なのでマッチしない
print(f"text2 match: {match2.group(0) if match2 else None}") # -> None

6.3. 完全一致: `re.fullmatch(pattern, string, flags=0)`

文字列 string 全体が正規表現 pattern に完全にマッチするかを試みます。完全にマッチしない場合は None を返します。

import re
text1 = "123-4567"
text2 = "Tel: 123-4567"
pattern = r"\d{3}-\d{4}" # 郵便番号や電話番号形式
match1 = re.fullmatch(pattern, text1) # 文字列全体がパターンにマッチ
print(f"text1 fullmatch: {match1.group(0) if match1 else None}") # -> 123-4567
match2 = re.fullmatch(pattern, text2) # "Tel: " が余計なので完全マッチしない
print(f"text2 fullmatch: {match2.group(0) if match2 else None}") # -> None

6.4. 分割: `re.split(pattern, string, maxsplit=0, flags=0)`

正規表現 pattern にマッチする箇所を区切り文字として、文字列 string を分割し、リストを返します。str.split() の正規表現版です。

import re
text = "apple, orange; banana\tgrape"
# カンマ、セミコロン、空白文字で分割
pattern = r"[,;\s]\s*" # 区切り文字 + 任意個の空白
parts = re.split(pattern, text)
print(parts) # 出力: ['apple', 'orange', 'banana', 'grape']

6.5. 全て検索: `re.findall(pattern, string, flags=0)`

文字列 string 中で正規表現 pattern にマッチする全ての部分を文字列のリストとして返します。マッチが見つからない場合は空のリスト [] を返します。

import re
text = "Numbers: 123, 45, 6789"
pattern = r"\d+" # 1桁以上の数字
numbers = re.findall(pattern, text)
print(numbers) # 出力: ['123', '45', '6789']

6.6. 全て検索 (イテレータ): `re.finditer(pattern, string, flags=0)`

findall() と似ていますが、マッチ結果をリストではなくマッチオブジェクトのイテレータとして返します。メモリ効率が良い場合に利用します。

import re
text = "Contact us at info@example.com or support@example.org"
pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
matches = re.finditer(pattern, text)
for match in matches: print(f"Found: {match.group(0)} at {match.span()}")
# 出力:
# Found: info@example.com at (14, 30)
# Found: support@example.org at (34, 53)

6.7. 置換: `re.sub(pattern, repl, string, count=0, flags=0)`

文字列 string 中で正規表現 pattern にマッチする部分を repl (置換文字列または関数) で置換します。str.replace() の正規表現版です。

count: 最大置換回数。0は全て置換（デフォルト）。
repl には、マッチした部分を参照する特殊シーケンス（例: \1, \g<name>）を使用できます。
repl に関数を指定すると、各マッチオブジェクトを引数として関数が呼び出され、その戻り値が置換文字列として使用されます。

import re
text = " Remove extra spaces. "
pattern = r"\s+" # 1つ以上の連続する空白文字
# 連続する空白を1つのスペースに置換
cleaned_text = re.sub(pattern, " ", text.strip()) # 先に strip() で両端の空白を除去
print(f"'{cleaned_text}'") # 出力: 'Remove extra spaces.'
# マッチ部分を使った置換 (日付形式変更)
date_text = "Date: 2025-04-01"
date_pattern = r"(\d{4})-(\d{2})-(\d{2})" # 年(グループ1), 月(グループ2), 日(グループ3)
# グループを参照してフォーマット変更 (\2/\3/\1)
formatted_date = re.sub(date_pattern, r"\2/\3/\1", date_text)
print(formatted_date) # 出力: Date: 04/01/2025
# 関数を使った置換 (数値を2倍にする)
num_text = "Values: 10, 25, 100"
num_pattern = r"\d+"
def double_value(match): value = int(match.group(0)) return str(value * 2)
doubled_text = re.sub(num_pattern, double_value, num_text)
print(doubled_text) # 出力: Values: 20, 50, 200

6.8. 置換 (置換回数付き): `re.subn(pattern, repl, string, count=0, flags=0)`

re.sub() と同じ動作ですが、戻り値が (新しい文字列, 置換回数) のタプルになります。

import re
text = "apple apple orange apple"
pattern = r"apple"
result_tuple = re.subn(pattern, "banana", text)
print(result_tuple) # 出力: ('banana banana orange banana', 3)
print(f"新しい文字列: {result_tuple[0]}")
print(f"置換回数: {result_tuple[1]}")

6.9. コンパイル: `re.compile(pattern, flags=0)`

正規表現パターンを事前にコンパイルしておくと、同じパターンを繰り返し使用する場合に効率が向上します。コンパイルされたパターンオブジェクトは、上記で紹介した関数（search, match など）をメソッドとして持ちます。

import re
# パターンをコンパイル
email_pattern = re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b')
text1 = "My email is test@example.com"
text2 = "Another one: user.name+tag@sub.domain.co.uk"
match1 = email_pattern.search(text1)
match2 = email_pattern.search(text2)
if match1: print(f"Email 1: {match1.group(0)}")
if match2: print(f"Email 2: {match2.group(0)}")
# findall もメソッドとして使える
emails = email_pattern.findall(text1 + ", " + text2)
print(f"All emails: {emails}")

6.10. フラグ (`flags`)

正規表現の挙動を調整するオプションです。複数のフラグは | で組み合わせます。

re.IGNORECASE / re.I: 大文字小文字を無視してマッチング。
re.MULTILINE / re.M: ^ が各行の先頭、$ が各行の末尾にもマッチするようになる。
re.DOTALL / re.S: . が改行文字 \n にもマッチするようになる。
re.VERBOSE / re.X: パターン内で空白やコメント (#以降) を無視し、見やすく書けるようにする。
re.ASCII / re.A: \w, \W, \b, \B, \d, \D, \s, \S がASCII文字のみにマッチするようになる。

import re
text = "Python\nPYTHON\npython"
# 大文字小文字無視
pattern_i = r"python"
matches_i = re.findall(pattern_i, text, flags=re.IGNORECASE)
print(f"IGNORECASE: {matches_i}") # -> ['Python', 'PYTHON', 'python']
# 各行の先頭にマッチ
pattern_m = r"^python"
matches_m = re.findall(pattern_m, text, flags=re.MULTILINE | re.IGNORECASE)
print(f"MULTILINE | IGNORECASE: {matches_m}") # -> ['Python', 'PYTHON', 'python']
# VERBOSE フラグで見やすいパターン
pattern_v = r""" \b # 単語境界 [A-Z0-9._%+-]+ # ユーザー名部分 @ # @ マーク [A-Z0-9.-]+ # ドメイン名部分 \. # ドット [A-Z]{2,} # TLD (2文字以上) \b # 単語境界
"""
email_text = "test@EXAMPLE.com"
match_v = re.search(pattern_v, email_text, flags=re.VERBOSE | re.IGNORECASE)
if match_v: print(f"VERBOSE | IGNORECASE match: {match_v.group(0)}") # -> test@EXAMPLE.com

1. 文字列の生成と連結

1.1. 基本的な生成

1.2. 複数行文字列

1.3. + 演算子による連結

1.4. * 演算子による繰り返し

1.5. str() による型変換

1.6. f-string (フォーマット済み文字列リテラル)

1.7. str.format() メソッド

1.8. % 演算子 (旧スタイル)

1.9. str.join() メソッド

2. 文字列の分割と置換

2.1. str.split(sep=None, maxsplit=-1)

2.2. str.rsplit(sep=None, maxsplit=-1)

2.3. str.splitlines(keepends=False)

2.4. str.partition(sep)

2.5. str.rpartition(sep)

2.6. str.replace(old, new, count=-1)

2.7. str.translate(table) / str.maketrans(x, y, z)

3. 文字列の検索と判定

3.1. in 演算子

3.2. 検索メソッド: find, rfind, index, rindex

3.3. str.count(sub[, start[, end]])

3.4. str.startswith(prefix[, start[, end]])

3.5. str.endswith(suffix[, start[, end]])

3.6. 文字種別判定メソッド

4. 文字列の整形と変換

4.1. 空白除去メソッド: strip, lstrip, rstrip

4.2. 大文字/小文字変換メソッド

4.3. 寄せ/埋め込みメソッド: center, ljust, rjust, zfill

4.4. str.expandtabs(tabsize=8)

4.5. str.removeprefix(prefix) (Python 3.9+)

4.6. str.removesuffix(suffix) (Python 3.9+)