Introduction to Robotic Process Automation
We all know that it is possible to have a robot instead of a human to click on application UI and input data. But how does it work? Let’s do a deep dive into Win32 API.
What is a window in Windows operating system? Technically speaking it’s a set o parameters describing look (dimensions, style, state like maximized/minimized plus a message queue and an event loop working on this queue. Interaction with windows is based on messages instead of function calls. Why? One argument that may convince you is openness of the API — you can add a new message type without adding a new named function. Another one — it is easier to add a new message to a list and the end of it instead of a function call with handling the process stack and context switching. In the 90s hardware was not so fast, so UI design had to be careful with resource utilization.

GUI exe file is loaded from the disk into memory and WinMain function is executed. It registers unique a window class name and creates a window identified by a handle (HWND type is basically a void pointer, see: https://docs.microsoft.com/en-us/windows/win32/winprog/windows-data-types). A window must be marked as visible and then we may enter the loop for receiving and dispatching messages.


If we add in a window procedure a simple code line to display message type with params we will see a lot of thing happening in the console. Every mouse move or keyboard action generates message (actually typing a key generates 2 events: key down and key up). Moving or minimizing a window also generates an event seen as a UINT message with WPARAM and LPARAM. What is more: a low battery state in your laptop or logging out, shutting down generates a system message. Isn’t it cool? You don’t need to register anywhere or poll for some state information — any system event will be delivered directly to your window procedure. If you don’t handle it — it will be ignored. Now think about it — we’ve got de facto event storm for every message. Having a synchronous function call would kill graphical operating system.

Now, how to connect these with RPA? RPA implementation on Windows uses Win32 API and messages. Robot generates messages about mouse click, pressed key, typed text etc. On the picture you can see a simple Win32 C/C++ application calling CreateProcess with notepad.exe as one of arguments, finding editor window for this process and then setting text inside Notepad using SendMessage WM_TEXT. Win32 API is quite powerful and using it we can emulate a human end user.
You can learn about windows in Windows using Microsoft documentation:
- https://docs.microsoft.com/en-us/windows/win32/winmsg/about-messages-and-message-queues
- https://docs.microsoft.com/en-us/windows/win32/learnwin32/window-messages
Event loop is a common pattern in graphical operating systems (see: https://en.wikipedia.org/wiki/Event_loop). If you would like to learn about Linux low level GUI you can do it here: https://en.wikibooks.org/wiki/X_Window_Programming/Xlib. Historical Apple documentation is here: https://developer.apple.com/library/archive/documentation/General/Devpedia-CocoaApp-MOSX/MainEventLoop.html.