Create a Remote Desktop Viewer using C# and WCF

I have wanted to create some code that utilizes a Windows Communication Foundation service for quite some time. This blog will introduce a “remote desktop” service. The purpose is to create something that may have a use later and at the same time keep it simple enough to be a learning platform. In no way do I envision this being a viable remote desktop competitor against Microsoft Remote Desktop, VNC, or the many other mature technologies. The goal of the project is to be able to use a client (WinForm in this case) to remotely view the desktop (and maybe later control) of a computer hosting the WCF service.

Capturing Desktop Activity

Capturing the activity on the desktop is fairly straight-forward. However, if you want to optimize the performance then you must use some tricks. A key performance metric for remote desktop applications is remote screen refresh rate. Two areas that I will concentrate on to increase refresh rates are:

  • The algorithm used to generate the screen capture must be fast.
  • The amount of data shuttled across the network must be minimized.

The following class diagram shows my screen capture class:

imageThis class exposes two public methods: Cursor and Screen. I want to be able to capture both the desktop surface and the mouse as it moves over the desktop. I have separated the methods to capture this data. It seems reasonable that I would be able to get a better user experience by refreshing the cursor faster and the screen a bit slower. So this class allows me to run different refresh rates for those two.

Here is the code for capturing the cursor and the screen:

public Bitmap Screen(ref Rectangle bounds)
{
    // Capture a new screenshot.
    //
    _newBitmap = CaptureScreen.CaptureDesktop();

    // If we have a previous screenshot, only send back
    //    a subset that is the minimum rectangular area
    //    that encompasses all the changed pixels.
    //
    if (_prevBitmap != null)
    {
        // Get the bounding box.
        //
        bounds = GetBoundingBoxForChanges();
        if (bounds == Rectangle.Empty)
        {
            // Nothing has changed.
            //
            return null;
        }

        // Get the minimum rectangular area
        //
        Bitmap diff = new Bitmap(bounds.Width, bounds.Height);
        Graphics g = Graphics.FromImage(diff);
        g.DrawImage(_newBitmap, 0, 0, bounds, GraphicsUnit.Pixel);
        g.Dispose();

        // Set the current bitmap as the previous to prepare
        //    for the next screen capture.
        //
        _prevBitmap = _newBitmap;

        return diff;
    }
    // We don't have a previous screen capture. Therefore
    //    we need to send back the whole screen this time.
    //
    else
    {
        // Set the previous bitmap to the current to prepare
        //    for the next screen capture.
        //
        _prevBitmap = _newBitmap;

        // Create a bounding rectangle.
        //
        bounds = new Rectangle(0, 0, _newBitmap.Width, _newBitmap.Height);

        return _newBitmap;
    }
}

public Bitmap Cursor(ref int cursorX, ref int cursorY)
{
    if (_newBitmap == null)
    {
        return null;
    }
    else
    {
        Bitmap img = CaptureScreen.CaptureCursor(ref cursorX, ref cursorY);
        if (img!=null && cursorX < _newBitmap.Width && cursorY < _newBitmap.Height)
        {
            return img;
        }
        else
        {
            return null;
        }
    }
}

First I must tell you that I am using Rashid Mahmood’s code to capture the actual bitmaps of the screen and cursor. Rashid’s code can be found here. He provides a wrappers around a number of WIN32 API. For performance, I want to be as close to the Operating System as possible on these calls to minimize the number of wrappers. All the static methods in the CaptureScreen class are provided by Rashid. Please check out the link for more information.

To minimize the amount of network traffic, I only want to return data when pixels have changed. To accomplish this the Screen method keeps the previously captured screenshot for comparison. It then utilizes a “GetBoundingBoxForChanges” method to determine the minimal rectangle that encompasses all the changed pixels. This rectangle is then used to generate a smaller bitmap that is a portion of the full screen for transfer over the wire. I searched for WIN32 API to get this bounding box, but did not find one.

The “GetBoundingBoxForChanges” routine is shown below. It is fairly well commented. The algorithm uses a few tricks to increase performance.

private Rectangle GetBoundingBoxForChanges()
{
    // The search algorithm starts by looking
    //    for the top and left bounds. The search
    //    starts in the upper-left corner and scans
    //    left to right and then top to bottom. It uses
    //    an adaptive approach on the pixels it
    //    searches. Another pass is looks for the
    //    lower and right bounds. The search starts
    //    in the lower-right corner and scans right
    //    to left and then bottom to top. Again, an
    //    adaptive approach on the search area is used.
    //

    // Note: The GetPixel member of the Bitmap class
    //    is too slow for this purpose. This is a good
    //    case of using unsafe code to access pointers
    //    to increase the speed.
    //

    // Validate the images are the same shape and type.
    //
    if (_prevBitmap.Width != _newBitmap.Width ||
        _prevBitmap.Height != _newBitmap.Height ||
        _prevBitmap.PixelFormat != _newBitmap.PixelFormat)
    {
        // Not the same shape...can't do the search.
        //
        return Rectangle.Empty;
    }

    // Init the search parameters.
    //
    int width = _newBitmap.Width;
    int height = _newBitmap.Height;
    int left = width;
    int right = 0;
    int top = height;
    int bottom = 0;

    BitmapData bmNewData = null;
    BitmapData bmPrevData = null;
    try
    {
        // Lock the bits into memory.
        //
        bmNewData = _newBitmap.LockBits(
            new Rectangle(0, 0, _newBitmap.Width, _newBitmap.Height),
            ImageLockMode.ReadOnly, _newBitmap.PixelFormat);
        bmPrevData = _prevBitmap.LockBits(
            new Rectangle(0, 0, _prevBitmap.Width, _prevBitmap.Height),
            ImageLockMode.ReadOnly, _prevBitmap.PixelFormat);

        // The images are ARGB (4 bytes)
        //
        int numBytesPerPixel = 4;

        // Get the number of integers (4 bytes) in each row
        //    of the image.
        //
        int strideNew = bmNewData.Stride / numBytesPerPixel;
        int stridePrev = bmPrevData.Stride / numBytesPerPixel;

        // Get a pointer to the first pixel.
        //
        // Note: Another speed up implemented is that I don't
        //    need the ARGB elements. I am only trying to detect
        //    change. So this algorithm reads the 4 bytes as an
        //    integer and compares the two numbers.
        //
        System.IntPtr scanNew0 = bmNewData.Scan0;
        System.IntPtr scanPrev0 = bmPrevData.Scan0;

        // Enter the unsafe code.
        //
        unsafe
        {
            // Cast the safe pointers into unsafe pointers.
            //
            int* pNew = (int*)(void*)scanNew0;
            int* pPrev = (int*)(void*)scanPrev0;

            // First Pass - Find the left and top bounds
            //    of the minimum bounding rectangle. Adapt the
            //    number of pixels scanned from left to right so
            //    we only scan up to the current bound. We also
            //    initialize the bottom & right. This helps optimize
            //    the second pass.
            //
            // For all rows of pixels (top to bottom)
            //
            for (int y = 0; y < _newBitmap.Height; ++y)
            {
                // For pixels up to the current bound (left to right)
                //
                for (int x = 0; x < left; ++x)
                {
                    // Use pointer arithmetic to index the
                    //    next pixel in this row.
                    //
                    if ((pNew + x)[0] != (pPrev + x)[0])
                    {
                        // Found a change.
                        //
                        if (x < left)
                        {
                            left = x;
                        }
                        if (x > right)
                        {
                            right = x;
                        }
                        if (y < top)
                        {
                            top = y;
                        }
                        if (y > bottom)
                        {
                            bottom = y;
                        }
                    }
                }

                // Move the pointers to the next row.
                //
                pNew += strideNew;
                pPrev += stridePrev;
            }

            // If we did not find any changed pixels
            //    then no need to do a second pass.
            //
            if (left != width)
            {
                // Second Pass - The first pass found at
                //    least one different pixel and has set
                //    the left & top bounds. In addition, the
                //    right & bottom bounds have been initialized.
                //    Adapt the number of pixels scanned from right
                //    to left so we only scan up to the current bound.
                //    In addition, there is no need to scan past
                //    the top bound.
                //

                // Set the pointers to the first element of the
                //    bottom row.
                //
                pNew = (int*)(void*)scanNew0;
                pPrev = (int*)(void*)scanPrev0;
                pNew += (_newBitmap.Height - 1) * strideNew;
                pPrev += (_prevBitmap.Height - 1) * stridePrev;

                // For each row (bottom to top)
                //
                for (int y = _newBitmap.Height - 1; y > top; y--)
                {
                    // For each column (right to left)
                    //
                    for (int x = _newBitmap.Width - 1; x > right; x--)
                    {
                        // Use pointer arithmetic to index the
                        //    next pixel in this row.
                        //
                        if ((pNew + x)[0] != (pPrev + x)[0])
                        {
                            // Found a change.
                            //
                            if (x > right)
                            {
                                right = x;
                            }
                            if (y > bottom)
                            {
                                bottom = y;
                            }
                        }
                    }

                    // Move up one row.
                    //
                    pNew -= strideNew;
                    pPrev -= stridePrev;
                }
            }
        }
    }
    catch (Exception ex)
    {
        int xxx = 0;
    }
    finally
    {
        // Unlock the bits of the image.
        //
        if (bmNewData != null)
        {
            _newBitmap.UnlockBits(bmNewData);
        }
        if (bmPrevData != null)
        {
            _prevBitmap.UnlockBits(bmPrevData);
        }
    }

    // Validate we found a bounding box. If not
    //    return an empty rectangle.
    //
    int diffImgWidth = right - left + 1;
    int diffImgHeight = bottom - top + 1;
    if (diffImgHeight < 0 || diffImgWidth < 0)
    {
        // Nothing changed
        return Rectangle.Empty;
    }

    // Return the bounding box.
    //
    return new Rectangle(left, top, diffImgWidth, diffImgHeight);
}

The first trick is to not use the “GetPixel” method that is available on the Bitmap class. This method is too slow for our purposes. Instead we access the bitmap data using pointers. This requires a bit of “unsafe” code. Each pixel has 4 bytes that provide the red, green, blue, and alpha channel values (RGBA). The algorithm reads all 4 bytes at once by using int pointers.

The algorithm uses a two pass approach for determining the bounding box. The first pass searches from the top to the bottom while scanning from left to right for changed pixels. The number of pixels scanned is adapted as we go to optimize the search time. The result of the first pass is that we now know the top & left boundaries and have initialized the bottom and right boundaries. The second pass searches from bottom to the top while scanning from right to left for changed pixels. Again we adapt the scan width as we go. After this pass we know the bottom and right boundaries.

I am sure there are better ways to achieve a faster algorithm and minimize the amount of data. If you have suggestions, please leave a comment.

WCF Server & Host

To create the WCF server, I first created the service contract. The following is a simple contract that provides update of desktop activity. Again, I separated the screen and cursor updates with the intent on having different refresh rates.

[ServiceContract(SessionMode=SessionMode.Required)]
public interface IRemoteDesktop
{
    [OperationContract]
    byte[] UpdateScreenImage();

    [OperationContract]
    byte[] UpdateCursorImage();
}

The implementation of this service is shown by the following code:

public class RemoteDesktopService : IRemoteDesktop
{
    // An instance of the screen capture class.
    //
    private ScreenCapture capture = new ScreenCapture();

    /// <summary>
    /// Capture the screen image and return bytes.
    /// </summary>
    /// <returns>4 ints [top,bot,left,right] (16 bytes) + image data bytes</returns>
    public byte[] UpdateScreenImage()
    {
        // Capture minimally sized image that encompasses
        //    all the changed pixels.
        //
        Rectangle bounds = new Rectangle();
        Bitmap img = capture.Screen(ref bounds);
        if (img != null)
        {
            // Something changed.
            //
            byte[] result = Utils.PackScreenCaptureData(img, bounds);

            // Log to the console.
            //
            Console.WriteLine(DateTime.Now.ToString() + " Screen Capture - {0} bytes", result.Length);
            return result;
        }
        else
        {
            // Nothing changed.
            //

            // Log to the console.
            Console.WriteLine(DateTime.Now.ToString() + " Screen Capture - {0} bytes", 0);
            return null;
        }
    }

    /// <summary>
    /// Capture the cursor data.
    /// </summary>
    /// <returns>2 ints [x,y] (8 bytes) + image bytes</returns>
    public byte[] UpdateCursorImage()
    {
        // Get the cursor bitmap.
        //
        int cursorX = 0;
        int cursorY = 0;
        Bitmap img = capture.Cursor(ref cursorX, ref cursorY);
        if (img != null)
        {
            // Something changed.
            //
            byte[] result = Utils.PackCursorCaptureData(img, cursorX, cursorY);

            // Log to the console.
            //
            Console.WriteLine(DateTime.Now.ToString() + " Cursor Capture - {0} bytes", result.Length);
            return result;
        }
        else
        {
            // Nothing changed.
            //

            // Log to the console.
            //
            Console.WriteLine(DateTime.Now.ToString() + " Cursor Capture - {0} bytes", 0);
            return null;
        }
    }
}

Notice that the service methods are returning a byte array. I selected this format because I wanted to pack data into a structure before sending and then unpack on the other side. This packing is using the static Utils class methods. This class needs to be cleaned up a bit. For now it serves as a holder for methods that I know will be re-used. The following code shows one of the packing methods.

public static byte[] PackScreenCaptureData(Image image, Rectangle bounds)
{
    // Pack the image data into a byte stream to
    //    be transferred over the wire.
    //

    // Get the bytes of the image data.
    //    Note: We are using JPEG compression.
    //
    byte[] imgData;
    using (MemoryStream ms = new MemoryStream())
    {
        image.Save(ms, System.Drawing.Imaging.ImageFormat.Jpeg);
        imgData = ms.ToArray();
    }

    // Get the bytes that describe the bounding
    //    rectangle.
    //
    byte[] topData = BitConverter.GetBytes(bounds.Top);
    byte[] botData = BitConverter.GetBytes(bounds.Bottom);
    byte[] leftData = BitConverter.GetBytes(bounds.Left);
    byte[] rightData = BitConverter.GetBytes(bounds.Right);

    // Create the final byte stream.
    // Note: We are streaming back both the bounding
    //    rectangle and the image data.
    //
    int sizeOfInt = topData.Length;
    byte[] result = new byte[imgData.Length + 4 * sizeOfInt];
    Array.Copy(topData, 0, result, 0, topData.Length);
    Array.Copy(botData, 0, result, sizeOfInt, botData.Length);
    Array.Copy(leftData, 0, result, 2 * sizeOfInt, leftData.Length);
    Array.Copy(rightData, 0, result, 3 * sizeOfInt, rightData.Length);
    Array.Copy(imgData, 0, result, 4 * sizeOfInt, imgData.Length);

    return result;
}

Pretty straightforward. Notice that I am using JPEG compression on what is left of the image before packing it into the byte array. There are similar routines that unpack the data on the other side. There may be a better way to do this marshalling. If you have ideas, please let me know.

For now I am hosting the WCF service in a console application. The code for the host is simple enough:

[STAThread()]
static void Main(string[] args)
{
    string myHost = System.Net.Dns.GetHostName();
    string myIp = System.Net.Dns.GetHostEntry(myHost).AddressList[1].ToString();

    Uri baseAddress = new Uri("http://" + myIp + ":8080/Rlc/RemoteDesktop");

    Console.WriteLine("WCF Remote Desktop Server");
    Console.WriteLine("=========================");
    Console.WriteLine();
    Console.WriteLine("Initializing server endpoint...");
    Console.WriteLine("Listening on: " + baseAddress.ToString());
    Console.WriteLine();
    ServiceHost myServiceHost = new ServiceHost(typeof(RemoteDesktopService), baseAddress);
    myServiceHost.Open();

    Console.ReadLine();

    if (myServiceHost.State != CommunicationState.Closed)
    {
        myServiceHost.Close();
    }
}

When you start the host you see the following:

image

WinForm Client

For the client, I created a simple WinForms application. In the Visual Studio designer it looks like the following:

image

The dashed rectangle is a picture box. The two text boxes at the bottom are the refresh rate in milliseconds for the screen and cursor. The apply button allows the refresh rates to be updated after the application has been started. The two elements that you cannot see are two timers that are used to trigger the updates for the screen and the cursor.

The first step in connecting to the WCF service is to create a service reference. You just need to fire up the service and let the WSDL to the work. You will then have a proxy for accessing the service methods we defined earlier.

The following code is the implementation for the triggers for the two timers:

private void timer1_Tick(object sender, EventArgs e)
{
    byte[] data = svc.UpdateScreenImage();
    if (data != null)
    {
        // Update the current screen.
        //
        Utils.UpdateScreen(ref _screen, data);

        // Update the UI.
        //
        ShowImage();
    }
    else
    {
        // screen has not changed
    }
}

private void cursorTimer_Tick(object sender, EventArgs e)
{
    byte[] data = svc.UpdateCursorImage();
    if (data != null)
    {
        // Unpack the data.
        //
        Utils.UnpackCursorCaptureData(data, out _cursor, out _cursorX, out _cursorY);
    }
    else
    {
        _cursor = null;
    }

    // Update the UI.
    //
    ShowImage();
}

Notice each method uses the WCF service proxy (svc) to remotely call the service method. Then the returned data is unpacked using another method in the “Utils” class. Finally the UI is updated to show the new data. During this update, the screen image and the cursor image are merged together.

The following shows a screenshot of the client running. Notice that it is capturing itself over and over.

image

Results

The results so far are decent. When there is not activity on the remote machine, the server does not transfer any data to the client. This is shown below in the server console window.

image

During activity the number of bytes transferred adapts to the number of pixels changed.

image

As can be seen in the screens above, the refresh rates range from 2-10 frames per second depending on the number of pixels changed. Take these numbers with some skepticism, since these test were run on my internal network (100MB/s).

I have some more road to travel with this concept before I post the project. I will also try to post a video of the desktop viewer at some point. If you have any suggestions on improvements, please leave a comment.

Tags:,
Comments
  1. John Thom
  2. Hkpavel
  3. Marsel
    • rcravens
      • Dileepa
        • rcravens
          • Dileepa
  4. John
  5. Kuldeep
  6. susee
  7. gan
  8. Joe
  9. pradip chavhan
  10. diego
  11. Rob Perry
  12. Tridip
  13. kim
  14. UMIT
  15. King Coffee
  16. Manoj Shukla
    • rcravens
      • Manoj Shukla
      • Jack
  17. Danish
  18. Josh
  19. praneeth
  20. Upendra Kumar pandey
  21. Praneeth
  22. arash
  23. Daniel
  24. Renan Moreira
  25. Seeks
  26. jaydeep
  27. Tom
  28. Belarmino
  29. Sandeep
  30. Hamza
  31. kushal
  32. Troy makaro
  33. Deedat Babel
  34. Efroim Rosenberg
  35. Amjath
  36. norani
  37. arham
  38. zain
  39. Miroslav

Leave a Reply

Your email address will not be published. Required fields are marked *

*