NodeJS: Indicate data writing is finished (stdin) - python

I'm trying to signal I've finished writing from my main NodeJS process but nothing works
pythonProcess.stdin.write(base64Image); // What I'm writing
// None of these worked
pythonProcess.stdin.write("\x04");
pythonProcess.stdin.write("\n");
pythonProcess.stdin.write("\r\n");
pythonProcess.stdin.write(os.EOL);
Only sending the pythonProcess.stdin.end() works (also tried cork() and uncork() but didn't work either). Since this is a process I'll be querying often I want to avoid closing and reopening the process every time. What am I doing wrong?

Related

Why os.remove() raises exception PermissionError?

On a Windows 7 platform I'm using python 3.6 as a framework to start working processes (written in C).
For starting the processes subprocess.Popen is used. The following shows the relevant code (one thread per process to be started).
redirstream = open(redirfilename, "w")
proc = subprocess.Popen(batchargs, shell=False, stdout=redirstream)
outs, errs = proc.communicate(timeout=60)
# wait for job to be finished
ret = proc.wait()
...
if ret == 0: # changed !!
redirstream.flush()
redirstream.close()
os.remove(redirfilename)
communicate is just used to be able, to terminate the executable after 60 seconds , for the case it hangs. redirstream is used to write output from the executable (written in C) to a file, used for general debugging purposes (not related to this issue). Of course, all processes are passed redirfiles with different filenames.
Up to ten such subprocesses are started in that way from independent python threads.
Although it works, I made a mysterious observation:
For the case an executable has finished without errors, I want to delete redirfilename, because it is not used anymore.
Now lets say, I have started process-A, B and C.
Processes A and B are finished and gave back 0 as result.
Process C however intentionally doesn't get data (just for testing, a serial connection has been disconnected) and waits for input from a named pipe (created from python) using Windows ReadFile function:
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365467(v=vs.85).aspx
In that case, while "C" is still waiting for ReadFile to be finished, os.remove(redirfilename) for A and B sometimes throws exception "PermissionError", saying, that the file is still used by another process. But from task manager I can see, that the processes A and B are not existing anymore (as expected).
I tried to catch the PermissionError and repeat the delete command after some delay. Only after "C" has terminated (timeout after 60 seconds), the redirfile for A or B can be deleted.
Why is the redirstream still blocked and somehow in use, although the process behind is not alive anymore and why is it blocked by ReadFile() in a completely unrelated process, which is definitely not related to that particular file? Is that an issue in Python or in my implementation?
Any hints are highly appreciated...

Python Multi Process has missing process and completes a join when it shouldn't have

I am facing a pretty odd issue. I have multi process python code that processes some data in parallel. I split the data in to 8 and work on each split individually using a Process Class, I then do a join on each Process.
I just noticed that when I process a large amount of data, one of the threads.... disappears. As in it doesn't error out or raise an exception and it just goes missing. What is even more interesting is that it seems to successfully complete the join() on the process when I know for a fact it did not finish.
tn1_processes = []
for i in range(8):
tn1_processes.append(
MyCustomProcess(logger=self.logger, i=i,
shared_queue=shared_queue))
tn1_processes[-1].start()
for tn1_processor in tn1_processes:
tn1_processor.join()
print('Done')
What do I know for sure:
All Processes are starting and are processing data and reach about half way, I know this because I have logs that show all the Processes doing their work.
Then Process 1 disappears from the logs towards the end of it's job, while all the other ones keep working fine and completing. Then My code moves on after thinking all the Processes are complete after the joins (I demonstrate this with a print) however I know for a fact that one of the processes did not complete, it did not error out and for some strange reason it passed the join()?
The only thing I can think of is that the Process runs out of memory but I would feel it would error out or throw an exception if this happened. Actually it has happened to me before using the same code and I saw the exception in my logs and the code was able to handle and see that the Process failed. But this, no error or anything is strange.
Can anyone shed some light?
Using Python3.4
If I remember correctly when a process abruptly terminates it wouldn't throw an error, you need to have another queue for storing the thrown exceptions and handle them elsewhere.
When a process ends however, an exit code is given: https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Process.exitcode
A rudimentary check would be making sure all of them safely exited (probably with 0 as exit code, while negative indicates termination signal and None as running).
The issue was that the python was running out of memory. The only way I knew this is that I monitored the machine's memory usage while the code was running and it needed more space than was available so one of the processes was just killed with no errors or exceptions. #j4hangir's answer of how to avoid this is good, I need to check the exit code. I haven't tested this yet but I will and then update

Express closes the request when spawned Python script sleeps

Original problem
I am creating an API using express that queries a sqlite DB and outputs the result as a PDF using html-pdf module.
The problem is that certain queries might take a long time to process and thus would like to de-couple the actual query call from the node server where express is running, otherwise the API might slow down if several clients are running heavy queries.
My idea to solve this was to decouple the execution of the sqlite query and instead run that on a python script. This script can then be called from the API and thus avoid using node to query the DB.
Current problem
After quickly creating a python script that runs a sqlite query, and calling that from my API using child_process.spawn(), I found out that express seems to get an exit code signal as soon as the python script starts to execute the query.
To confirm this, I created a simple python script that just sleeps in between printing two messages and the problem was isolated.
To reproduce this behavior you can create a python script like this:
print("test 1")
sleep(1)
print("test 2)
Then call it from express like this:
router.get('/async', function(req, res, next) {
var python = child_process.spawn([
'python3'
);
var output = "";
python.stdout.on('data', function(data){
output += data
console.log(output)
});
python.on('close', function(code){
if (code !== 0) {
return res.status(200).send(code)
}
return res.status(200).send(output)
});
});
If you then run the express server and do a GET /async you will get a "1" as the exit code.
However if you comment the sleep(1) line, the server successfully returns
test 1
test 2
as the response.
You can even trigger this using sleep(0).
I have tried flushing the stdout before the sleep, I have also tried piping the result instead of using .on('close') and I have also tried using -u option when calling python (to use unbuffered streams).
None of this has worked, so I'm guessing there's some mechanism baked into express that closes the request as soon as the spawned process sleeps OR finishes (instead of only when finishing).
I also found this answer related to using child_process.fork() but I'm not sure if this would have a different behavior or not and this one is very similar to my issue but has no answer.
Main question
So my question is, why does the python script send an exit signal when doing a sleep() (or in the case of my query script when running cursor.execute(query))?
If my supposition is correct that express closes the request when a spawned process sleeps, is this avoidable?
One potential solution I found suggested the use of ZeroRPC, but I don't see how that would make express keep the connection open.
The only other option I can think of is using something like Kue so that my express API will only need to respond with some sort of job ID, and then Kue will actually spawn the python script and wait for its response, so that I can query the result via some other API endpoint.
Is there something I'm missing?
Edit:
AllTheTime's comment is correct regarding the sleep issue. After I added from time import sleep it worked. However my sqlite script is not working yet.
As it turns out AllTheTime was indeed correct.
The problem was that in my python script I was loading a config.json file, which was loaded correctly when called from the console because the path was relative to the script.
However when calling it from node, the relative path was no longer correct.
After fixing the path it worked exactly as expected.

Named pipe is not flushing in Python

I have a named pipe created via the os.mkfifo() command. I have two different Python processes accessing this named pipe, process A is reading, and process B is writing. Process A uses the select function to determine when there is data available in the fifo/pipe. Despite the fact that process B flushes after each write call, process A's select function does not always return (it keeps blocking as if there is no new data). After looking into this issue extensively, I finally just programmed process B to add 5KB of garbage writes before and after my real call, and likewise process A is programmed to ignore those 5KB. Now everything works fine, and select is always returning appropriately. I came to this hack-ish solution by noticing that process A's select would return if process B were to be killed (after it was writing and flushing, it would sleep on a read pipe). Is there a problem with flush in Python for named pipes?
What APIs are you using? os.read() and os.write() don't buffer anything.
To find out if Python's internal buffering is causing your problems, when running your scripts do "python -u" instead of "python". This will force python in to "unbuffered mode" which will cause all output to be printed instantaneously.
The flush operation is irrelevant for named pipes; the data for named pipes is held strictly in memory, and won't be released until it is read or the FIFO is closed.

Further question on segmenting AJAX responses

This question is related to one I asked previously, see here.
As a way to implement segmented ajax responses, I created a code which does this:
The client first calls the script which initializes the process. On the server side, the startScript.cgi code starts generating data, and as it does this, it groups the responses into chunks, and writes them into individual files indexed sequentially (chunk1.txt chunk2.txt etc). Immediately after startScript.cgi starts this process, the client side begins a second ajax request, sent to gatherOutput.cgi, with parameter ?index=0.
gatherOutput.cgi sees the request, and then looks in 'chunk'.$index.'.txt' and then returns the data. The client outputs this to html, and then begins a second ajax request to gatherOutput.cgi with parameter ?index=1, etc. This continues until all of the data from startScript.cgi is reported.
If gatherOutput.cgi cannot locate "chunk$index.txt", it goes into this loop:
until(-e "$directory/chunk$index.txt")
{
#nothing
}
open $fh, "<$directory/chunk$index.txt" || warn "File not found. blah blah";
#Read file and print, etc...
Note, startScript.cgi runs code which may take a long time to complete, so the point is to simultaneously broadcast older output from startScript.cgi as it is generating new output.
The problem with this is that the performance suffers, and output would take a while to come out despite being long ago created. I'm assuming this is due to harddrive access being very slow compared to the CPU operations in startScript.cgi, so gatherOutput.cgi is frequently waiting on the new chunk to be written, or the client is frequently waiting for gatherOutput.cgi to read the files, etc. Though there could be other issues.
Does any one have any ideas or suggestions to fix this problem? Or if anyone has a different approach to this problem that'd be great as well.
By the way, startScript.cgi may only be called once, it starts a large task system task (with system escapes such as exec, system, or backticks) that keeps running, and can't fathomably be segmented.
Your gatherOutput.cgi shouldn't drop into a loop when the file doesn't exist. Instead return a status to your AJAX request that the file doesn't exist yet and then have it wait (using setInterval or setTimeout) and try again after so many seconds.
That will be MUCH easier on your server. For the user you can still show a loading graphic or something else that let's them know the process is still happening in the background.

Resources