menu

Questions & Answers

django-pgroonga installation error: "UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 112: character maps to <undefined>"

I need to use Japanese characters with vector searches in Django / Postres.

I am trying to install django-pgroonga and keep getting the same encoding error with cp1252.py:

PS C:\JGRAM\JLPT> pip install django-pgroonga
Collecting django-pgroonga
  Using cached django-pgroonga-0.0.1.tar.gz (3.7 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [10 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\61458\AppData\Local\Temp\pip-install-21w4o7u8\django-pgroonga_87013717bf0e4bcca83db91a993082b4\setup.py", line 17, in <module>
          long_description=read('README.rst'),
        File "C:\Users\61458\AppData\Local\Temp\pip-install-21w4o7u8\django-pgroonga_87013717bf0e4bcca83db91a993082b4\setup.py", line 6, in read     
          return open(os.path.join(os.path.dirname(__file__), fname)).read()
        File "C:\Users\61458\AppData\Local\Programs\Python\Python310\lib\encodings\cp1252.py", line 23, in decode
          return codecs.charmap_decode(input,self.errors,decoding_table)[0]
      **UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 112: character maps to <undefined>**
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
PS C:\JGRAM\JLPT> 

Can you help? I cannot find a solution online that outlines how to resolve this error when it occurs 'during installation'. I have tried updating the cp1252.py file, copying and pasting new versions, etc. but nothing works. I've also tried downloading unzipped pgroonga into the python site-packages folder but no luck. (All the other modules I have installed with pip before this have run successfully.)

Is the problem pgroonga?

If so, is there another module / tool that will solve the vector search with Japanese characters requirement?

Comments:
2023-01-19 00:55:04
Answering my own question here after a lot more research and to save other users a lot of time before taking the Django-Postgres-Pgroonga detour: 1) Pgroonga support doesn't seem to have been updated since 2016; 2) The Django pip installation for django_elasticsearch_dsl worked first time, and Python supports a suite of ElasticSearch and Kuromoji/Sudachi, etc. tools that meet the requirements for Japanese-language FTS, tokenization, text analysis, etc. Also, there are a lot of current examples in GitHub, etc, that focus specifically on 日本語 (github.com/topics/kuromoji).
2023-01-19 00:55:04
And it's now all installed and doing it all in Japanese. You'll need to download Java SE/JDK and ElasticSearch and get them running (edit ElasticSearch.yml SSL's to all false), and then pip install elasticsearch-dsl and add to settings. There's an older YouTube guide here youtube.com/watch?v=cXYVE28igkE - (thanks to Samuli Natri - aka Sean Connery) which you'll need to update - eg. DocType to Document and Meta to Django. Newer complex files are here - github.com/elastic/elasticsearch-dsl-py.
Answers(0) :