ã¯ããã«ïŒAnnoyãšã¯äœãïŒ ãªã泚ç®ãããã®ãïŒ ð€
倧éã®ããŒã¿ã®äžããã䌌ãŠãããã®ããçŽ æ©ãèŠã€ãåºãããïŒãããªèŠæ±ã¯ãã¬ã³ã¡ã³ããŒã·ã§ã³ã·ã¹ãã ãç»åæ€çŽ¢ãèªç¶èšèªåŠçãªã©ãçŸä»£ã®å€ãã®ã¢ããªã±ãŒã·ã§ã³ã§äžå¯æ¬ ãšãªã£ãŠããŸããç¹ã«ãããŒã¿ãé«æ¬¡å ïŒããããã®ç¹åŸŽéãæã€ïŒã®å Žåãå®å šãªäžèŽãæ¢ãã®ã¯éåžžã«æéãããããçŸå®çã§ã¯ãããŸããã
ããã§ç»å Žããã®ããè¿äŒŒæè¿åæ¢çŽ¢ïŒApproximate Nearest Neighbor Search, ANNSïŒããšããæè¡ã§ããããã¯ãå®å šã«äžèŽãããã®ã§ãªããŠãããã ããã䌌ãŠããããã®ãé«éã«èŠã€ãåºãææ³ã§ãã
Annoy (Approximate Nearest Neighbors Oh Yeah) ã¯ããã®ANNSãå®çŸããããã®åŒ·åãªPythonïŒããã³C++ïŒã©ã€ãã©ãªã®äžã€ã§ããSpotify瀟ã«ãã£ãŠéçºããïŒ2013幎é ã«ãªãŒãã³ãœãŒã¹åã2015幎ã«Erik Bernhardssonæ°ãéçºïŒãç¹ã«å€§èŠæš¡ããŒã¿ã»ããã«ãããã¡ã¢ãªå¹çãšæ€çŽ¢é床ã«æé©åãããŠããç¹ã倧ããªç¹åŸŽã§ãã
ãã®ããã°èšäºã§ã¯ãAnnoyã®é åãä»çµã¿ã䜿ãæ¹ããããŠæ³šæç¹ãŸã§ã詳ãã解説ããŠãããŸãããããAnnoyã®äžçãæ¢æ€ããŸãããïŒâš
Annoyã®äž»ãªç¹åŸŽ ð
- é«éãªè¿äŒŒæè¿åæ¢çŽ¢: Annoyã¯ãå®å šãªæè¿åã§ã¯ãªããããã«è¿ããã®ãé«éã«èŠã€ãåºãããšã«ç¹åããŠããŸããæ¢çŽ¢æéãå€§å¹ ã«ççž®ã§ããããããªã¢ã«ã¿ã€ã æ§ãæ±ããããã¢ããªã±ãŒã·ã§ã³ã«é©ããŠããŸãã
- ã¡ã¢ãªå¹çã®è¯ã: Annoyã¯ãã€ã³ããã¯ã¹ãã¡ã€ã«ïŒæ¢çŽ¢ãé«éåããããã®ããŒã¿æ§é ïŒãã¡ã¢ãªãããããã¡ã€ã«ãšããŠæ±ããŸããããã«ãããç©çã¡ã¢ãªãµã€ãºãè¶ ãããããªå·šå€§ãªããŒã¿ã»ããã§ããå¿ èŠãªéšåã ããã¡ã¢ãªã«èªã¿èŸŒãã§å¹ççã«åŠçã§ããŸãããŸããã€ã³ããã¯ã¹ãã¡ã€ã«èªäœãæ¯èŒçå°ãããªãããã«èšèšãããŠããŸãã
- ã€ã³ããã¯ã¹ã®å ±æãšæ°žç¶å: äœæããã€ã³ããã¯ã¹ã¯ãã¡ã€ã«ãšããŠä¿åã§ããŸãããã®ãã¡ã€ã«ã¯èªã¿åãå°çšã§ãè€æ°ã®ããã»ã¹éã§å ±æããããšãå¯èœã§ããäžåºŠã€ã³ããã¯ã¹ãæ§ç¯ããã°ããããé åžããŠæ§ã ãªç°å¢ã§çŽ æ©ãèªã¿èŸŒã¿ãå©çšã§ããŸãã
- å€æ§ãªè·é¢ææšã®ãµããŒã: ããŒã¿éã®ãè¿ããã枬ãããã®ææšãšããŠã以äžã®ãã®ãå©çšã§ããŸãã
- ãŠãŒã¯ãªããè·é¢ (
euclidean
) - ãã³ããã¿ã³è·é¢ (
manhattan
) - ã³ãµã€ã³é¡äŒŒåºŠ (
angular
– å éšçã«ã¯æ£èŠåããããã¯ãã«éã®ãŠãŒã¯ãªããè·é¢ã®å¹³æ¹æ ¹sqrt(2*(1-cos(u,v)))
ã䜿çš) - ããã³ã°è·é¢ (
hamming
– ãã€ããªãã¯ãã«åã) - ãããç© (
dot
)
- ãŠãŒã¯ãªããè·é¢ (
- ã·ã³ãã«ãªAPI: Pythonããã®å©çšãéåžžã«ç°¡åã§ãæ°è¡ã®ã³ãŒãã§ã€ã³ããã¯ã¹ã®æ§ç¯ãšæ€çŽ¢ãå®è¡ã§ããŸãã
- èšèªãã€ã³ãã£ã³ã°: C++ã§å®è£ ãããŠãããPython以å€ã«ãJavaãªã©ãä»ã®èšèªããã®å©çšãå¯èœã§ãïŒãã ããå ¬åŒã®Pythonãã€ã³ãã£ã³ã°ãæããã䜿ãããŠããŸãïŒã
- ãã£ã¹ã¯äžã§ã®ã€ã³ããã¯ã¹æ§ç¯:
on_disk_build
ãªãã·ã§ã³ã䜿ãããšã§ãã¡ã¢ãªã«ä¹ãåããªã巚倧ãªããŒã¿ã»ããã®ã€ã³ããã¯ã¹ããã£ã¹ã¯äžã§çŽæ¥æ§ç¯ããããšãå¯èœã§ãã
ãããã®ç¹åŸŽã«ãããAnnoyã¯ç¹ã«Spotifyã§ã®é³æ¥œæšèŠã·ã¹ãã ãªã©ã§æŽ»çšãããŠããŸãããè¡åå解ãªã©ã®ææ³ã§ãŠãŒã¶ãŒãã¢ã€ãã ãé«æ¬¡å ãã¯ãã«ãšããŠè¡šçŸãããåŸãé¡äŒŒãŠãŒã¶ãŒãé¡äŒŒã¢ã€ãã ãæ¢ãããã«Annoyã圹ç«ã£ãŠããŸãã
Annoyã®ä»çµã¿ïŒã©ã³ãã å°åœ±ããªãŒ ð³
Annoyã®é«éæ€çŽ¢ã®ç§å¯ã¯ããã®å éšæ§é ã§ããã©ã³ãã å°åœ±ããªãŒã«ãããŸãã
- 空éã®åå²: Annoyã¯ãããŒã¿ãååšããé«æ¬¡å 空éããã©ã³ãã ã«éžã°ããè¶ å¹³é¢ïŒ2次å ãªãçŽç·ã3次å ãªãå¹³é¢ïŒã䜿ã£ãŠååž°çã«2ã€ã«åå²ããŠãããŸãã
- ããªãŒæ§é ã®æ§ç¯: ãã®åå²ããã»ã¹ãç¹°ãè¿ãããšã§ãããŒã¿ç¹ãããªãŒæ§é ã®èïŒãªãŒãïŒããŒãã«æ ŒçŽãããŸããéåžžã1ã€ã®èã«ã¯ããçšåºŠã®æ°ã®ããŒã¿ç¹ïŒäŸãã°100åçšåºŠïŒãå«ãŸããããã«åå²ãããŸãã
- è€æ°ã®ããªãŒïŒãã©ã¬ã¹ãïŒ: Annoyã¯ããã®ãããªããªãŒãè€æ°ïŒ
n_trees
åïŒæ§ç¯ããŸããããããã®ããªãŒã¯ç°ãªãã©ã³ãã ãªè¶ å¹³é¢ã§åå²ããããããããŒã¿ã®ç°ãªãåŽé¢ãæããå€æ§ãªããªãŒçŸ€ïŒãã©ã¬ã¹ãïŒã圢æãããŸãã - æ¢çŽ¢: ã¯ãšãªç¹ïŒæ¢ããã察象ã®ãã¯ãã«ïŒãäžãããããšãAnnoyã¯ãã®è€æ°ã®ããªãŒã䞊è¡ããŠæ¢çŽ¢ããŸããåããªãŒã§ã¯ãšãªç¹ãå«ãŸããé åã蟿ããåè£ãšãªãç¹ãæ¢ããŸãã
- åè£ã®çµ±åãšçµã蟌ã¿: åããªãŒããåŸãããåè£ç¹ãéããåªå
床ä»ããã¥ãŒãªã©ã䜿ã£ãŠãå®éã«ã¯ãšãªç¹ã«è¿ãç¹ãéžã³åºããŸããæ¢çŽ¢æã«èª¿ã¹ãããŒãæ°ïŒ
search_k
ïŒã調æŽããããšã§ãæ¢çŽ¢ã®ç²ŸåºŠãšé床ã®ãã©ã³ã¹ãåããŸãã
ãã®ããªãŒããŒã¹ã®ã¢ãããŒãã«ãããå šããŒã¿ç¹ãç·åœ¢ã«æ¢çŽ¢ããïŒç·åœããã§æ¯èŒããïŒãããã¯ããã«é«éã«ãè¿äŒŒçãªæè¿åãèŠã€ããããšãã§ããã®ã§ãã
ã€ã³ã¹ããŒã«æ¹æ³ ð»
Annoyã®ã€ã³ã¹ããŒã«ã¯éåžžã«ç°¡åã§ããpipã䜿ã£ãŠã€ã³ã¹ããŒã«ã§ããŸãã
pip install annoy
ããã§ãPythonç°å¢ã§Annoyãå©çšããæºåãæŽããŸãããð (Windowsç°å¢ã§ã¯ãéå»ã«Visual C++ Build Toolsãå¿ èŠãªå ŽåããããŸããããçŸåšã¯æ¹åãããŠããå¯èœæ§ããããŸãã)
Condaãå©çšããŠããå Žåã¯ãconda-forgeãã£ãã«ãããã€ã³ã¹ããŒã«å¯èœã§ãã
conda install -c conda-forge python-annoy
åºæ¬çãªäœ¿ãæ¹ïŒã€ã³ããã¯ã¹æ§ç¯ãšæ€çŽ¢ ð ïž
Annoyã®åºæ¬çãªäœ¿ãæ¹ãèŠãŠãããŸããããããã§ã¯ãç°¡åãªãµã³ãã«ããŒã¿ã§ã€ã³ããã¯ã¹ãæ§ç¯ããç¹å®ã®ãã¯ãã«ã«è¿ããã¯ãã«ãæ€çŽ¢ããäŸã瀺ããŸãã
import numpy as np
from annoy import AnnoyIndex
import random
# --- ããŒã¿ã®æºå ---
f = 3 # ãã¯ãã«ã®æ¬¡å
æ°
num_items = 1000 # ããŒã¿ç¹ã®æ°
# ã©ã³ãã ãªãã¯ãã«ããŒã¿ãçæ (äŸ)
all_vectors = np.random.rand(num_items, f).astype('float32')
# --- Annoyã€ã³ããã¯ã¹ã®æ§ç¯ ---
# 1. AnnoyIndexãªããžã§ã¯ãã®åæå
# 次å
æ°(f)ãšè·é¢ææš(metric)ãæå®
# metricã«ã¯ 'angular', 'euclidean', 'manhattan', 'hamming', 'dot' ãæå®å¯èœ
t = AnnoyIndex(f, 'angular')
# 2. ã¢ã€ãã ïŒãã¯ãã«ïŒã®è¿œå
# add_item(i, vector) ã§ãã¢ã€ãã ID `i` ã«ãã¯ãã« `vector` ãè¿œå
for i in range(num_items):
v = all_vectors[i]
t.add_item(i, v)
# 3. ã€ã³ããã¯ã¹ã®ãã«ã
# build(n_trees) ã§ãæå®ããæ°ã®ããªãŒãæ§ç¯
# ããªãŒæ°ãå€ãã»ã©ç²ŸåºŠã¯äžãããããã«ãæéãšã€ã³ããã¯ã¹ãµã€ãºãå¢å
# n_jobs=-1 ã§å©çšå¯èœãªå
šCPUã³ã¢ãäœ¿çš (ããã©ã«ãã¯1)
num_trees = 10
t.build(num_trees)
# 4. (ä»»æ) ã€ã³ããã¯ã¹ã®ãã¡ã€ã«ãžã®ä¿å
index_file = 'my_index.ann'
t.save(index_file)
print(f"ã€ã³ããã¯ã¹ã {index_file} ã«ä¿åããŸããã")
# --- ã€ã³ããã¯ã¹ã®èªã¿èŸŒã¿ãšæ€çŽ¢ ---
# 1. (å¥ã®ããã»ã¹ãåŸã§äœ¿ãå Žå) ã€ã³ããã¯ã¹ã®èªã¿èŸŒã¿
u = AnnoyIndex(f, 'angular')
# prefault=True ã«ãããšããã¡ã€ã«ãå
èªã¿ããŠæ€çŽ¢ãé«éåã§ããå Žåããã
u.load(index_file)
print(f"{index_file} ããã€ã³ããã¯ã¹ãèªã¿èŸŒã¿ãŸããã")
# 2. ç¹å®ã®ã¢ã€ãã ã«æãè¿ãã¢ã€ãã ãæ€çŽ¢
# get_nns_by_item(item_id, n, search_k=-1, include_distances=False)
# item_id: æ€çŽ¢ã®åºç¹ãšãªãã¢ã€ãã ã®ID
# n: ååŸãããè¿åã¢ã€ãã ã®æ°
# search_k: æ¢çŽ¢æã«èª¿ã¹ãããŒãæ°ã-1ã®å Žåãn_trees * n ã䜿ããã (ããã©ã«ã)ã倧ãããããšç²ŸåºŠãäžãããé
ããªãã
# include_distances: Trueã«ãããšè·é¢ãè¿ã
query_item_id = 0
num_neighbors = 5
nearest_neighbors_ids = u.get_nns_by_item(query_item_id, num_neighbors)
print(f"ã¢ã€ãã {query_item_id} ã«æãè¿ã {num_neighbors} åã®ã¢ã€ãã ID: {nearest_neighbors_ids}")
# 3. ç¹å®ã®ãã¯ãã«ã«æãè¿ãã¢ã€ãã ãæ€çŽ¢
# get_nns_by_vector(vector, n, search_k=-1, include_distances=False)
# vector: æ€çŽ¢ã¯ãšãªãšãªããã¯ãã«
query_vector = all_vectors[random.randint(0, num_items-1)]
nearest_neighbors_ids_vec, nearest_neighbors_distances = u.get_nns_by_vector(
query_vector, num_neighbors, include_distances=True
)
print(f"ã¯ãšãªãã¯ãã«ã«è¿ã {num_neighbors} åã®ã¢ã€ãã ID: {nearest_neighbors_ids_vec}")
print(f"ã¯ãšãªãã¯ãã«ãšã®è·é¢: {nearest_neighbors_distances}")
# 4. ã¢ã€ãã ã®ãã¯ãã«ãååŸ
# get_item_vector(item_id)
item_vector = u.get_item_vector(nearest_neighbors_ids_vec[0])
# print(f"ã¢ã€ãã {nearest_neighbors_ids_vec[0]} ã®ãã¯ãã«: {item_vector}") # åºåã¯çç¥
# 5. ã€ã³ããã¯ã¹ã«å«ãŸããã¢ã€ãã æ°ãååŸ
# get_n_items()
print(f"ã€ã³ããã¯ã¹å
ã®ã¢ã€ãã æ°: {u.get_n_items()}")
# 6. (ä»»æ) ã¡ã¢ãªããã€ã³ããã¯ã¹ã解æŸ
# unbuild() ã§ããªãŒæ§é ãã¡ã¢ãªããè§£æŸ (ã€ã³ããã¯ã¹ãã¡ã€ã«ã¯æ®ã)
# t.unbuild()
# u.unload() ã§ã¡ã¢ãªãããã解é€
# u.unload()
ãã®ããã«ãéåžžã«ã·ã³ãã«ãªæé ã§è¿äŒŒæè¿åæ¢çŽ¢ãå®è¡ã§ããããšãããããŸãã
ãã©ã¡ãŒã¿ãã¥ãŒãã³ã°ïŒç²ŸåºŠãšé床ã®ãã¬ãŒããªã âïž
Annoyã®æ§èœãæ倧éã«åŒãåºãããã«ã¯ãããã€ãã®éèŠãªãã©ã¡ãŒã¿ãç解ããé©åã«èšå®ããå¿ èŠããããŸããäž»ãªãã©ã¡ãŒã¿ã¯ä»¥äžã®2ã€ã§ãã
ãã©ã¡ãŒã¿ | 説æ | åœ±é¿ |
---|---|---|
n_trees (ãã«ãæ) |
æ§ç¯ããã©ã³ãã å°åœ±ããªãŒã®æ°ã |
|
search_k (æ€çŽ¢æ) |
æ€çŽ¢æã«æ¢çŽ¢ããããŒãïŒåè£ïŒã®æ倧æ°ãæ¢çŽ¢ããããªãŒã®æ°ãå¶åŸ¡ããããã§ã¯ãªãç¹ã«æ³šæãããã©ã«ãå€ã¯ n_trees * n (nã¯ååŸãããè¿åæ°) ã ããæ瀺çã«æå®ããããšãæšå¥šãããã |
|
ãããã®ãã©ã¡ãŒã¿ã¯ãã¢ããªã±ãŒã·ã§ã³ã®èŠä»¶ïŒæ±ãããã粟床ã蚱容ãããæ€çŽ¢æéãå©çšå¯èœãªã¡ã¢ãªãªã©ïŒã«å¿ããŠèª¿æŽããå¿
èŠããããŸããäžè¬çã«ã¯ãn_trees
ãå¢ãããsearch_k
ãããã«åãããŠïŒãŸãã¯ãã以äžã«ïŒå¢ããããšã§ç²ŸåºŠãåäžããŸããã©ã®å€ãæé©ãã¯ãå®éã®ããŒã¿ã»ãããšãŠãŒã¹ã±ãŒã¹ã§å®éšçã«æ±ºå®ããã®ãè¯ãã§ãããããã³ãããŒã¯ããŒã« (äŸ: ann-benchmarks) ãªã©ãåèã«ãä»ã®ã©ã€ãã©ãªãšã®æ¯èŒæ€èšãè¡ãã®ãæå¹ã§ãã
Annoyã®ãŠãŒã¹ã±ãŒã¹ ð
Annoyã¯ãã®ç¹æ§ãããæ§ã ãªåéã§æŽ»çšãããŠããŸãã
- ã¬ã³ã¡ã³ããŒã·ã§ã³ã·ã¹ãã : Spotifyã§ã®é³æ¥œæšèŠã代衚äŸã§ãããŠãŒã¶ãŒãã¢ã€ãã ïŒæ¥œæ²ãã¢ãŒãã£ã¹ãããã¬ã€ãªã¹ããªã©ïŒããã¯ãã«åããé¡äŒŒã®ãã¯ãã«ãæã€ãã®ãæšèŠããŸããEã³ããŒã¹ãµã€ãã§ã®é¡äŒŒååæšèŠãªã©ã«ãå¿çšãããŠããŸãã
- é¡äŒŒç»åæ€çŽ¢: ç»åã®ç¹åŸŽéããã¯ãã«ãšããŠæœåºãã䌌ãç»åãé«éã«æ€çŽ¢ããããã«å©çšãããŸãããªã³ã©ã€ã³ã¹ãã¢ã®ååç»åæ€çŽ¢ããç»åããŒã¿ããŒã¹ããã®æ€çŽ¢ãªã©ã«äœ¿ããŸãã
- èªç¶èšèªåŠç: åèªãææžã®åã蟌ã¿ãã¯ãã«ïŒWord2Vec, Doc2Vecãªã©ïŒãçšããŠãæå³çã«é¡äŒŒããåèªãææžãæ€çŽ¢ããã¿ã¹ã¯ã«å¿çšã§ããŸãã(Gensimã©ã€ãã©ãªãªã©ã§ãAnnoyãã€ã³ãã¯ãµãŒãšããŠå©çšå¯èœã§ã)
- ã¯ã©ã¹ã¿ãªã³ã°ã®ååŠç: 倧èŠæš¡ããŒã¿ã»ããã«å¯ŸããŠãé«éã«è¿åç¹ãèŠã€ããããšã§ãã¯ã©ã¹ã¿ãªã³ã°ã¢ã«ãŽãªãºã ã®å¹çåã«è²¢ç®ã§ããŸãã
- ç°åžžæ€ç¥: æ£åžžãªããŒã¿ç¹ã®ååžãã倧ããå€ããç¹ããè¿åæ¢çŽ¢ã«ãã£ãŠå¹ççã«æ€åºããã¢ãããŒãã«ãå©çšã§ããŸãã
åºæ¬çã«ãé«æ¬¡å ãã¯ãã«ç©ºéã«ãããé¡äŒŒæ§æ€çŽ¢ãå¿ èŠãšãªãå€ãã®å Žé¢ã§ãAnnoyã¯ãã®é床ãšã¡ã¢ãªå¹çã®è¯ãããæåãªéžæè¢ãšãªããŸãã
Annoyã®å©ç¹ãšæ³šæç¹ (Pros & Cons) ðð
å©ç¹ (Pros)
- â é«éãªæ€çŽ¢: è¿äŒŒæ¢çŽ¢ã«ããã倧èŠæš¡ããŒã¿ã§ãé«éãªæ€çŽ¢ãå¯èœã
- â ã¡ã¢ãªå¹çãè¯ã: ã¡ã¢ãªãããããã¡ã€ã«ã«ãããã¡ã¢ãªäœ¿çšéãæãããããã€ã³ããã¯ã¹ãµã€ãºãæ¯èŒçå°ããã
- â ã€ã³ããã¯ã¹ã®å ±æã»æ°žç¶å: äœæããã€ã³ããã¯ã¹ããã¡ã€ã«ãšããŠä¿åã»é åžã»å ±æã§ããã
- â ã·ã³ãã«ãªAPI: Pythonããç°¡åã«å©çšã§ããã
- â å€æ§ãªè·é¢ææš: ãŠãŒã¹ã±ãŒã¹ã«åãããŠè·é¢ææšãéžã¹ãã
- â ãã£ã¹ã¯äžã§ã®æ§ç¯: ã¡ã¢ãªã«åãŸããªãããŒã¿ãæ±ããã
泚æç¹ (Cons)
- â ïž è¿äŒŒæ¢çŽ¢: å³å¯ãªæè¿åãåŸãããä¿èšŒã¯ãªãã粟床ãšé床ã¯ãã¬ãŒããªãã
- â ïž éçãªã€ã³ããã¯ã¹: äžåºŠã€ã³ããã¯ã¹ãæ§ç¯ãããšãåŸããã¢ã€ãã ãè¿œå ã»åé€ããããšã¯ã§ããªãïŒã€ã³ããã¯ã¹å šäœãåæ§ç¯ããå¿ èŠãããïŒãé »ç¹ãªæŽæ°ãå¿ èŠãªåçããŒã¿ã»ããã«ã¯äžåãã
- â ïž ãã«ãæé: é«ã粟床ãæ±ããŠ
n_trees
ãå¢ãããšãã€ã³ããã¯ã¹ã®ãã«ãæéãé·ããªãå¯èœæ§ãããã - â ïž äœæ¬¡å ããŒã¿: éåžžã«äœæ¬¡å ïŒäŸ: 20次å 以äžïŒã®ããŒã¿ã§ã¯ãä»ã®ã¢ã«ãŽãªãºã ïŒäŸ: HNSWlibãªã©ã®ã°ã©ãããŒã¹ã®ææ³ïŒã®æ¹ãå¹ççãªå ŽåããããAnnoyã¯ã©ã³ãã å°åœ±ãå¹æçãªé«æ¬¡å 空éã«æé©åãããŠããã
- â ïž GPUé察å¿: CPUããŒã¹ã®èšç®ã®ã¿ã§ãGPUã¢ã¯ã»ã©ã¬ãŒã·ã§ã³ã¯ãµããŒããããŠããªããè¶ å€§èŠæš¡ããŒã¿ã§ã®ãããªãé«éåã«ã¯FaissãScaNNãªã©ã®éžæè¢ãããã
Annoyã¯éåžžã«åŒ·åãªã©ã€ãã©ãªã§ããããã®ç¹æ§ãç解ãããŠãŒã¹ã±ãŒã¹ã«é©ããŠãããå€æããããšãéèŠã§ããç¹ã«ãã€ã³ããã¯ã¹ãéçã§ããç¹ã¯å€§ããªå¶çŽãšãªãå ŽåããããŸãã
ãªããSpotifyã¯2023幎10æã«ãAnnoyã®åŸç¶ãšããŠVoyagerãšããæ°ããè¿äŒŒæè¿åæ¢çŽ¢ã©ã€ãã©ãªãçºè¡šããŸãããVoyagerã¯hnswlibãããŒã¹ã«ããŠãããAnnoyãšæ¯èŒããŠåçã®ç²ŸåºŠã§10åé«éããããã¯åçã®é床ã§2åã®ç²ŸåºŠãéæãããšãããŠããŸãããã¡ãã泚ç®ãã¹ãã©ã€ãã©ãªã§ãã
ãŸãšã ð
Annoyã¯ãé«æ¬¡å ãã¯ãã«ç©ºéã«ãããè¿äŒŒæè¿åæ¢çŽ¢ã®ããã®ãé«éãã€ã¡ã¢ãªå¹çã®è¯ãPythonã©ã€ãã©ãªã§ããSpotifyã«ãã£ãŠéçºãããç¹ã«å€§èŠæš¡ãªéçããŒã¿ã»ããã«å¯Ÿããé¡äŒŒæ§æ€çŽ¢ã¿ã¹ã¯ã«ãããŠåŒ·åãªéžæè¢ãšãªããŸãã
- ã©ã³ãã å°åœ±ããªãŒãšã¡ã¢ãªãããããã¡ã€ã«ã«ãããé床ãšã¡ã¢ãªå¹çãäž¡ç«ã
- ã€ã³ããã¯ã¹ããã¡ã€ã«ãšããŠä¿åã»å ±æå¯èœã
n_trees
ãšsearch_k
ã®èª¿æŽã«ããã粟床ãšé床ã®ãã¬ãŒããªããå¶åŸ¡ã- ã¬ã³ã¡ã³ããŒã·ã§ã³ãç»åæ€çŽ¢ãNLPãªã©å¹ åºãå¿çšãå¯èœã
- ãã ããã€ã³ããã¯ã¹ã¯éçã§ãããåçãªããŒã¿æŽæ°ã«ã¯äžåããGPUãé察å¿ã
Annoyã®ä»çµã¿ãšäœ¿ãæ¹ãç解ããããšã§ãããªãã®ãããžã§ã¯ãã«ãããé¡äŒŒæ§æ€çŽ¢ã®èª²é¡è§£æ±ºã«åœ¹ç«ã€ãããããŸããããã²å®éã«è©ŠããŠã¿ãŠããã®éããšæ軜ããäœéšããŠãã ããïŒð
å ¬åŒãªããžããª: https://github.com/spotify/annoy
ã³ã¡ã³ã