Monday, September 9, 2024

Azure translation api connection from python

 import requests, uuid, json

import re
# Add your key and endpoint
myKey = "XXXXXXXXX"
api_url = "https://XXXXXXXXXX/translate"
api_url1 = "https://XXXXXXXX/detect"
myParams = {
    'api-version': '3.0',
    'scope': 'translation',
    # 'from': 'en',
    'to': ['fr']
}
myHeaders = {
    'Ocp-Apim-Subscription-Key': myKey,
    'charset': 'UTF-8',
    'Content-type': 'application/json',
    'includeAlignment': 'true'
    # 'X-ClientTraceId': str(uuid.uuid4())
}

ztext = 'pH value (10% dry substance  DIN 19268                                                                   '
ztext_copy = ztext
words = ztext.split("  ")
translated = []
cnt = 0; word_len = 0; endIndex, startIndex = 0,0
ztext1=""; extra=""

fromText = [{'Text': ztext}]
APIRequest = requests.post(api_url1, params=myParams, headers=myHeaders, json=fromText)
APIResponse = APIRequest.json()
language = APIResponse[0]["language"]
print("Lang:",language)

if language == "de":
    print("ok")
local_var = ""
leng = 0
for i,word in enumerate(words):
    if word:
        startIndex += (word_len + cnt)
        # print("start index:",startIndex)
        sp = startIndex - endIndex
        # lc = local_var[word_len:sp]
        if sp>0:
            ztext1 +=extra[:sp-1]
            leng = len(extra[:sp-1])
        word_len = len(word)
        endIndex = startIndex + word_len
        # print(word_len)
        cnt =2
        # print("end index: ", endIndex)
        fromText = [{'Text': word}]
        APIRequest = requests.post(api_url, params=myParams, headers=myHeaders, json=fromText)
        APIResponse = APIRequest.json()
        #Getting only language text
        translation = APIResponse[0]["translations"][0]["text"]
        translation = translation.replace("&lt;","<").replace("&gt;",">")
        # translation = translation.replace("!", " ")
        # Return the translation
        translated.append(translation)
        # print(translation)
        # t = ztext[startIndex:endIndex].replace(" ","*")
        # print("index printinting: ",t)
        trsLen = len(translation)
        # print("trslen:",word_len, trsLen)
        # ex_len = 0
        # if trsLen > word_len:
        #     ex_len = trsLen - word_len
        if trsLen < word_len:
            addSpace = word_len - trsLen
            translation = translation + " " * addSpace
        print("Start and end:",startIndex, endIndex)
        # print("____start val:",ztext[:startIndex])
        print(translation)
        # print("end val___:",ztext[(endIndex+ex_len):])
        # ztext =ztext[:startIndex]+ translation + ztext[(endIndex+ex_len):]
        dif = endIndex - startIndex
        ztext1 +=(sp - leng)*" "
        ztext1 +=translation[:word_len]
        extra = translation[word_len:]
        # print(ztext)
        # print(ztext_copy)
        cnt =2
    else:
        cnt +=2
   
print(ztext)
print(ztext1)
print(ztext_copy)

Friday, September 2, 2022

How to select the top-N rows per group with SQL in Oracle Database

 with rws as (

  select o.*, row_number () over (

           partition by t_gruppe

           order by t_gruppe

         ) rn

  from   qztuser o

)

  select q_user, t_gruppe, lzudat from rws

  where  lzudat is null and rn <= 3

  order  by t_gruppe;


REF: https://blogs.oracle.com/sql/post/how-to-select-the-top-n-rows-per-group-with-sql-in-oracle-database

Friday, August 26, 2022

Installing apache24 in eclipse (Not apache tomcat)

 Go to Run menu, select External Tools click and select External Tools configuration like below





After that click on Program and add new program and select the location of apache server



Monday, July 18, 2022

Delete/Eradicating those nasty .pyc files

I recently acquired a new development laptop and moved a number of local Git repositories from my old machine to my new machine. In doing so I also changed the folder structure, and when trying to run some code I was presented with this Python error:


import file mismatch:
imported module 'tests.desktop.consumer_pages.test_details_page' has this __file__ attribute:
/Users/bsilverberg/gitRepos/marketplace-tests/tests/desktop/consumer_pages/test_details_page.py
which is not the same as the test file we want to collect:
/Users/bsilverberg/Documents/gitRepos/marketplace-tests/tests/desktop/consumer_pages/test_details_page.py
HINT: remove __pycache__ / .pyc files and/or use a unique basename for your test file modules

This was a symptom of the fact that Python creates .pyc files on my machine when it compiles code. This can result in other nastiness too, as well as cluttering up your machine, so I wanted to both delete all of these files and also prevent Python from doing it in the future. This post contains info on how to do both.

Deleting all .pyc files from a folder

You can use the find command (on OS X and Linux) to locate all of the .pyc files, and then use its delete option to delete them.

The command to find all .pyc files in all folders, starting with the current one is:
find . -name '*.pyc'

If you want to delete all the files found, just add the -delete option:
find . -name '*.pyc' -delete

Obviously, this can be used for any file type that you wish to eradicate, not just .pyc files.

Preventing Python from writing .pyc files

I don’t like having all of those extra files cluttering my machine, and, in addition to the error I mentioned above, I have from time to time seen other errors related to out of date .pyc files.

Another issue that .pyc files can cause is that they can be orphaned, for example if you remove a .py file from your project, but the .pyc file remains (which can happen as one often adds *.pyc to .gitignore). Python can then still pick up the module from the .pyc file via an import which can lead to difficult to diagnose bugs.

For these reasons I want to prevent Python from ever writing those files again. To do this all you have to do is set the environment variable PYTHONDONTWRITEBYTECODE to 1. You can ensure that that variable is set for any bash session that you start by adding the following to your .bash_profile or .bashrc:
export PYTHONDONTWRITEBYTECODE=1

Thursday, July 14, 2022

Read Text from Image with One Line of Python Code

 Dealing with images is not a trivial task. To you, as a human, it’s easy to look at something and immediately know what is it you’re looking at. But computers don’t work that way.


Tasks that are too hard for you, like complex arithmetics, and math in general, is something that a computer chews without breaking a sweat. But here the exact opposite applies — tasks that are trivial to you, like recognizing is it cat or dog in an image are really hard for a computer. In a way, we are a perfect match. For now at least.

While image classification and tasks that involve some level of computer vision might require a good bit of code and a solid understanding, reading text from a somewhat well-formatted image turns out to be a one-liner in Python —and can be applied to so many real-life problems.

And in today’s post, I want to prove that claim. There will be some installation to go though, but it shouldn’t take much time. These are the libraries you’ll need:

  • OpenCV
  • PyTesseract

I don’t want to prolonge this intro part anymore, so why don’t we jump into the good stuff now.

OpenCV

Now, this library will only be used to load the images(s), you don’t actually need to have a solid understanding of it beforehand (although it might be helpful, you’ll see why).

According to the official documentation:

OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library. OpenCV was built to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in the commercial products. Being a BSD-licensed product, OpenCV makes it easy for businesses to utilize and modify the code.[1]

In a nutshell, you can use OpenCV to do any kind of image transformations, it’s fairly straightforward library.

If you don’t already have it installed, it’ll be just a single line in terminal:

pip install opencv-python

And that’s pretty much it. It was easy up until this point, but that’s about to change.

PyTesseract

What the heck is this library? Well, according to Wikipedia:

Tesseract is an optical character recognition engine for various operating systems. It is free software, released under the Apache License, Version 2.0, and development has been sponsored by Google since 2006.[2]

I’m sure there are more sophisticated libraries available now, but I’ve found this one working out pretty well. Based on my own experience, this library should be able to read text from any image, provided that the font isn’t some bulls*** that even you aren’t able to read.

If it can’t read from your image, spend more time playing around with OpenCV, applying various filters to make the text stand out.

Now the installation is a bit of a pain in the bottom. If you are on Linux it all boils down to a couple of sudo-apt get commands:

sudo apt-get update
sudo apt-get install tesseract-ocr
sudo apt-get install libtesseract-dev

I’m on Windows, so the process is a bit more tedious.

First, open up THIS URL, and download 32bit or 64bit installer:

The installation by itself is straightforward, boils down to clicking Next a couple of times. And yeah, you also need to do a pip installation:

pip install pytesseract

Is that all? Well, no. You still need to tell Python where Tesseract is installed. On Linux machines, I didn’t have to do so, but it’s required on Windows. By default, it’s installed in Program Files.

If you did everything correctly, executing this cell should not yield any error:

Is everything good? You may proceed.

Reading the Text

Let’s start with a simple one. I’ve found a couple of royalty-free images that contain some sort of text, and the first one is this:

https://upload.wikimedia.org/wikipedia/commons/thumb/b/bc/Wikinews_Breaking_News.png/800px-Wikinews_Breaking_News.png

It should be the easy one, and there exists a possibility that Tesseract will read those blue ‘objects’ as brackets. Let’ see what will happen:

My claim was true. It’s not a problem though, you could easily address those with some Python magic.

The next one could be more tricky:

https://live.staticflickr.com/7892/46879778504_3b11c328b0_b.jpg

I hope it won’t detect that ‘B’ on the coin:

Looks like it works perfectly.

Now it’s up to you to apply this to your own problem. OpenCV skills could be of vital importance here if the text blends with the background.

Before you leave

Reading text from an image is a pretty difficult task for a computer to perform. Think about it, the computer doesn’t know what a letter is, it only works only with numbers. What happens behind the hood might seem like a black box at first, but I encourage you to investigate further if this is your area of interest.

I’m not saying that PyTesseract will work perfectly every time, but I’ve found it good enough even on some trickier images. But not straight out of the box. Some image manipulation is required to make the text stand out.

It’s a complex topic, I know. Take it one day at a time. One day it will be second nature to you.

Image Processing: How to read image from string in python ?

 

Reading image from string base64


🚩 In this tutorial we are going to investigate together how to read image string base64 in python language.

To be able to do processing on image in python some of modules should be used. We are going to use PIL to demonstrate image at this point.

Let's code it !

To get an image as string we need to convert base64 format firstly. Python base64 module is also used for this process.

Let's start with importing the required modules.

👉

   import base64
   import io
   from PIL import Image

That's it !
So, the first we are going to convert image to base64 using python. Let's do it !

👉

   def read_string():
      with open("tux.jpg", "rb") as image:
          image_string = base64.b64encode(image.read())

      return image_string

I have used a local image "tux.jpg" so, you can use anything that has true image format. Let me clarify the codes above.

Step 1: I have defined function read_string() and opened the image in rb mode. The variable image_string that inside of the function holds base64 string.

👉

   if __name__ == "__main__":
       base64_string = read_string()
       print(base64_string)

When I call the function in the main method we probably are going to get an output like this:

   b'/9j/4AAQSkZJRgABAQAAAQABAAD/7QBsUGhvdG9zaG9wIDMuMAA4QklNBAQAAAAAAE8cAVoAAxslRxwCAAACAAAcAnQAO8KpIFNvZmlhWW91c2hpIC0gaHR0cDovL3d3dy5yZWRidWJibGUuY29tL3Blb3BsZS9zb2ZpYXlvdXNoAP/bAEMAAwICAwICAwMDAwQDAwQFCAUFBAQFCgcHBggMCgwMCwoLCw0OEhANDhEOCwsQ

....

/5W09f+V+gfFfqHxX6h8V+ofFfqHxX6h8V+ofFfvHxX7x8V+8fFfvHxU6Y9FblW4A0P8Aj2//2Q=='

It's too long as you see but there is no problem. Not too big data to process on programming language !

Let's decode the image base64 string vice versa.

👉

   def decode_base64():
       base64_string = read_string()
       decoded_string = io.BytesIO(base64.b64decode(base64_string))
       img = Image.open(decoded_string)
       return img.show()

I have used the other function inside this function to get image string and the other function returns image string as you know. Anyways so base64_string variable holds image string in base64 format and decoded. The last I have used Image function from PIL to show image.

👉

   if __name__ == "__main__":
       decode_base64()

When I call the function decode_base64() the image will be open.

Output:
tux.jpg

The entire code is shared below:
👉

import base64
import io
from PIL import Image


def read_string():
    with open("tux.jpg", "rb") as image:
        image_string = base64.b64encode(image.read())

    return image_string

def decode_base64():

    base64_string = read_string()
    decoded_string = io.BytesIO(base64.b64decode(base64_string))
    img = Image.open(decoded_string)
    return img.show()



if __name__ == "__main__":
    base64_string = read_string()
    print(base64_string)

    decode_base64()

REFERENCE: https://dev.to/bl4ckst0n3/image-processing-how-to-read-image-from-string-in-python-pf8