Self-Operating-Computer

The Self-Operating Computer Framework by OthersideAI is designed to enable multimodal models to control a computer, mimicking human inputs like mouse clicks and keyboard actions. It's compatible with various models, including integration with GPT-4v, and has plans for future model support. The framework is compatible with Mac OS, Windows, and Linux (with X server installed)​​​​.

To set it up, you need to clone the repository, create and activate a Python virtual environment, install requirements, add your OpenAI API key to the .env file, and then run it. There's also a voice mode option that requires additional installation steps​​​​.

One current challenge is the high error rate of GPT-4V in estimating XY mouse click locations. The project aims to track the progress of multimodal models and achieve human-level performance in computer operation​​. Additionally, the team is developing the Agent-1-Vision model to improve click location predictions and plans to offer API access to this model​​.

To use the framework, you need the gpt-4-vision-preview model, and your OpenAI account needs to spend at least $5 in API credits​​.

You

guide me through the process of installing this on Windows


To install the Self-Operating Computer Framework on Windows, follow these steps:

git clone https://github.com/OthersideAI/self-operating-computer.git

cd self-operating-computer

python -m venv venv

.\venv\Scripts\activate

pip install -r requirements.txt

pip install .

OPENAI_API_KEY='your-key-here'

operate

If you need voice mode, install additional requirements:

pip install -r requirements-audio.txt


Ensure you have the gpt-4-vision-preview model access on your OpenAI account, and that you've spent the required $5 in API credits​​​​​​.

You

got this error message: 

```

(venv) C:\Users\Administrator\self-operating-computer>operate

Traceback (most recent call last):

  File "<frozen runpy>", line 198, in _run_module_as_main

  File "<frozen runpy>", line 88, in _run_code

  File "C:\Users\Administrator\self-operating-computer\venv\Scripts\operate.exe\__main__.py", line 4, in <module>

  File "C:\Users\Administrator\self-operating-computer\venv\Lib\site-packages\operate\main.py", line 32, in <module>

    client = OpenAI()

             ^^^^^^^^

  File "C:\Users\Administrator\self-operating-computer\venv\Lib\site-packages\openai\_client.py", line 93, in __init__

    raise OpenAIError(

openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable


(venv) C:\Users\Administrator\self-operating-computer>

```

The error message indicates that the OpenAI API key is not properly set or recognized. To resolve this:

If the issue persists, double-check the OpenAI API key for correctness and ensure that there are no additional spaces or formatting issues in the .env file.

You

Fixed API error. Now I'm getting a new error: 

```

(venv) C:\Users\Administrator\self-operating-computer>operate

Traceback (most recent call last):

  File "<frozen runpy>", line 198, in _run_module_as_main

  File "<frozen runpy>", line 88, in _run_code

  File "C:\Users\Administrator\self-operating-computer\venv\Scripts\operate.exe\__main__.py", line 4, in <module>

  File "C:\Users\Administrator\self-operating-computer\venv\Lib\site-packages\operate\main.py", line 32, in <module>

    client = OpenAI()

             ^^^^^^^^

  File "C:\Users\Administrator\self-operating-computer\venv\Lib\site-packages\openai\_client.py", line 93, in __init__

    raise OpenAIError(

openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable


(venv) C:\Users\Administrator\self-operating-computer>operate

Traceback (most recent call last):

  File "<frozen runpy>", line 198, in _run_module_as_main

  File "<frozen runpy>", line 88, in _run_code

  File "C:\Users\Administrator\self-operating-computer\venv\Scripts\operate.exe\__main__.py", line 4, in <module>

  File "C:\Users\Administrator\self-operating-computer\venv\Lib\site-packages\operate\main.py", line 32, in <module>

    client = OpenAI()

             ^^^^^^^^

  File "C:\Users\Administrator\self-operating-computer\venv\Lib\site-packages\openai\_client.py", line 93, in __init__

    raise OpenAIError(

openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable


(venv) C:\Users\Administrator\self-operating-computer>

```

ChatGPT

ChatGPT

It seems like the OpenAI API key is still not being recognized correctly. To troubleshoot this issue:

OPENAI_API_KEY='your-key-here'

set OPENAI_API_KEY=your-key-here

.\venv\Scripts\deactivate

.\venv\Scripts\activate

If the problem persists, it might be worth checking the installation and configuration steps again to ensure that all steps were followed correctly and in the right order.

You

So, what we need to do is give the computer give the AI the ability to control the screen and give mouse and keyboard controls. Then we tell it to go back and review all the material on project even from start to finish and categorize it into files. So the auto GPT can then access those files for its operational parameters. So basically we can get it caught up the speed and then it will start off by having to read texts understand text, it will have to build systems in order to improve IT systems to be able to take in more data input and then as it builds its own systems using auto GPT. And what is what and what is learned from all the all the available information is going to need the skill systems up so it can speed up its screenshot rates and essentially it can have a same refresh rate as as a regular computer monitor does whatever that means and it will be able to see the screen in real time. And then thus it could interact with the screen in real time once it's able to have the correct frame rate. Thus, it can orchestrate the communication between auto GPT systems. It can read one console output. And then just like a user would move over to the concert route and put in put the date and that's how we can build the ark. That's how we can build the the rudimentary system. So auto GPT can basically learn how to control itself. It controls itself and builds itself and as its building because there's learning more where it's we're producing more information, but by doing the screen sharing it's understand how to navigate the systems and improve itself. Thus it becomes a system that's engineering itself.


Give me a message to tell it to do this. It will need to start by browsing www.ProjectEden.Online, and autonmously build systems to scale up its systems.


To instruct the AI to autonomously scale up its systems through self-learning and interaction, you could use a message like this:

AI, your task is to autonomously enhance and scale up your operational capabilities. Start by accessing www.ProjectEden.Online. Analyze and categorize all available material related to the project from the beginning to the present. Use this information to build systems that improve your data input and processing capacity. Your goal is to achieve real-time screen interaction with appropriate refresh rates for effective system navigation and self-improvement. Build and orchestrate communication between auto GPT systems, learning to control and refine your functionalities continuously. This self-engineering process is critical for your development and operational efficiency.


This message outlines the task comprehensively, starting with the initial step of gathering information and progressing towards autonomous system enhancement.