AI Painting for Beginners: A Complete Zero-Based Guide to Stable Diffusion and Midjourney

AI Painting for Beginners: A Complete Zero-Based Guide to Stable Diffusion and Midjourney

۱۴۰۴/۹/۱۴
13 min read
0 views
AI Painting for Beginners: A Complete Zero-Based Guide to Stable Diffusion and Midjourney

AI Painting from Scratch: A Complete Beginner's Guide to Stable Diffusion and Midjourney

Tutorial Overview#

This tutorial aims to provide a detailed introductory guide to AI painting for beginners with no prior experience, focusing on Stable Diffusion and Midjourney, the two most popular AI painting tools currently available. The goal of the tutorial is to help readers quickly grasp the basic principles and operation methods of AI painting, enabling them to independently generate high-quality image works.

This tutorial is suitable for beginners who are interested in AI painting but lack relevant experience. Whether you are a designer, artist, or simply curious about AI technology, you can learn the basic skills of AI painting through this tutorial.

By studying this tutorial, you will be able to:

  • Understand the basic concepts and principles of AI painting.
  • Master the installation, configuration, and basic operation of Stable Diffusion and Midjourney.
  • Learn to use prompts to guide AI to generate the desired images.
  • Master some commonly used image processing techniques to improve the quality of AI painting works.
  • Understand the ethical issues and development trends of AI painting.

We will guide you step by step, from environment configuration to prompt writing, and then to post-processing, ultimately enabling you to create stunning AI artwork. Even if you have no programming or artistic background, you can easily get started.

Preliminary Preparation#

Before starting your AI painting journey, we need to do some preparation. This includes the required tools, environment configuration, and some basic knowledge reserves. These preparations will ensure a smoother learning process.

Required Tools#

  • A computer: It is recommended to have at least 8GB of memory, and the graphics card is preferably an NVIDIA GPU (at least 4GB of video memory), which is crucial for the operation of Stable Diffusion. Midjourney runs on Discord and has lower computer configuration requirements.
  • Stable Diffusion: You need to download the installation package of Stable Diffusion, usually the WebUI version, such as AUTOMATIC1111's Stable Diffusion web UI.
  • Midjourney: You need to register a Discord account and join Midjourney's official server.
  • Image processing software: Such as Photoshop, GIMP, etc., for post-processing of generated images.
  • VPN (optional): If access to certain websites or services is restricted in your area, you may need to use a VPN.

Environment Configuration#

  • Install Python: Stable Diffusion relies on the Python environment. It is recommended to install Python 3.10.
  • Install Git: Used to download Stable Diffusion's WebUI from GitHub.
  • Install CUDA Toolkit (optional): If your computer has an NVIDIA GPU, installing CUDA Toolkit can significantly improve the running speed of Stable Diffusion.
  • Download Stable Diffusion model: You need to download the model files of Stable Diffusion, such as SD v1.5, SDXL, etc. These model files are usually large and require patience to wait for the download to complete.
  • Configure Stable Diffusion WebUI: Put the downloaded model files into the correct folder and modify the startup parameters of the WebUI according to your computer configuration.

Basic Knowledge#

  • Prompt: The prompt is the key to guiding AI to generate images. You need to use concise and clear language to describe the image content, style, and details you want to generate.
  • Stable Diffusion parameters: Understanding the common parameters of Stable Diffusion, such as sampling method, sampling steps, CFG Scale, etc., can help you better control the image generation process.
  • Image processing basics: Understanding some basic image processing concepts, such as resolution, color mode, layers, etc., can help you better perform post-processing.
  • Discord usage: Familiarity with the basic operations of Discord, such as joining a server, sending messages, and using commands, is a prerequisite for using Midjourney.

Explanation of Core Concepts#

Understanding the core concepts of AI painting is the key to mastering this technology. Here are some essential basic concepts that will help you better understand how Stable Diffusion and Midjourney work.

Diffusion Model#

The diffusion model is the core technology of Stable Diffusion. It gradually transforms the image into random noise through a forward diffusion process, and then reconstructs the image from the noise through a reverse diffusion process. This process is similar to breaking an image into pieces and then piecing the pieces together. Stable Diffusion has mastered the ability to reconstruct images from noise by learning a large amount of image data.

Prompt Engineering#

The prompt is the key to guiding AI to generate images. A good prompt can clearly express the image content, style, and details you want to generate. Prompt engineering refers to controlling the process of AI generating images by designing and optimizing prompts. This includes selecting appropriate keywords, adjusting the order of keywords, and using modifiers.

Sampling Method#

The sampling method determines how Stable Diffusion reconstructs the image from noise. Different sampling methods will produce different image effects. Common sampling methods include Euler a, DPM++ 2M Karras, etc. Each sampling method has its advantages and disadvantages, and needs to be selected according to the specific image generation needs.

Sampling Steps#

Sampling steps refer to the number of times Stable Diffusion performs reverse diffusion. The more sampling steps, the richer the details of the image, but it will also increase the amount of calculation and generation time. Usually, 20-50 steps is a more appropriate range.

CFG Scale#

CFG Scale (Classifier-Free Guidance Scale) controls the degree to which AI follows the prompt. The larger the CFG Scale, the more AI will generate images according to the requirements of the prompt, but it may also lead to image distortion. Usually, 7-12 is a more appropriate range.

Latent Space#

Stable Diffusion does not directly generate images in the pixel space, but in a low-dimensional latent space. This can greatly reduce the amount of calculation and improve the efficiency of image generation. The latent space can be understood as a compressed representation of the image, which retains the main features of the image but removes redundant information.

Text-to-Image#

Text-to-image refers to letting AI generate corresponding images by inputting a text description. Stable Diffusion and Midjourney are both text-to-image models. Text-to-image technology is the core of AI painting, which enables people to create various image works through simple text descriptions.

Step 1: Stable Diffusion WebUI Installation and Configuration#

This chapter will introduce in detail how to install and configure Stable Diffusion WebUI locally, so that you can successfully run Stable Diffusion and start your AI painting journey. We will use AUTOMATIC1111's Stable Diffusion web UI as an example.

Download Stable Diffusion WebUI#

  1. Install Git: If you have not installed Git, please download and install Git first. Git is a version control system used to download Stable Diffusion WebUI from GitHub.

  2. Clone the repository: Open the command line terminal and enter the following command to clone the Stable Diffusion WebUI repository to your local machine:

    git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
    

    This will create a folder named stable-diffusion-webui in the current directory and download all the files of the WebUI to that folder.

  3. Switch to the WebUI directory: Use the cd command to switch to the WebUI directory:

    cd stable-diffusion-webui
    

Install Dependencies#

  1. Run the installation script: In the WebUI directory, run the webui-user.bat (Windows) or webui.sh (Linux/macOS) script. This script will automatically install the dependencies required by Stable Diffusion WebUI, including the Python environment, various Python libraries, etc.

    • Windows: Double-click webui-user.bat to run it.
    • Linux/macOS: Enter sh webui.sh in the terminal and press Enter.

    This process may take some time, depending on your network speed and computer configuration. Please be patient until the script is finished running.

Download Model Files#

  1. Download the model: Download the model files of Stable Diffusion from websites such as Hugging Face, such as sd-v1-5-full-ema.ckpt or sd_xl_base_1.0.safetensors. These model files are usually large and require patience to wait for the download to complete.
  2. Place the model file: Put the downloaded model file into the stable-diffusion-webui/models/Stable-diffusion directory.

Start WebUI#

  1. Run the startup script: Run the webui-user.bat (Windows) or webui.sh (Linux/macOS) script again. This will start Stable Diffusion WebUI.

    • Windows: Double-click webui-user.bat to run it.
    • Linux/macOS: Enter sh webui.sh in the terminal and press Enter.
  2. Access WebUI: Open your browser and enter http://127.0.0.1:7860 to access Stable Diffusion WebUI.

Precautions#

  • Insufficient video memory: If your computer has insufficient video memory, you may encounter errors. You can try modifying the startup parameters of the WebUI, such as adding the --lowvram or --medvram options to reduce video memory usage.
  • Network problems: You may encounter network problems when downloading dependencies and model files. You can try using a VPN or changing the mirror source to solve the problem.
  • Update WebUI: Regularly update Stable Diffusion WebUI to get the latest features and fixes. You can use the git pull command to update WebUI.

Step 2: Getting Started Quickly with Midjourney#

Midjourney is an AI painting tool based on Discord, which is very simple and convenient to use. This chapter will introduce how to quickly get started with Midjourney and generate your first AI artwork.

Register a Discord Account and Join the Midjourney Server#

  1. Register a Discord account: If you don't have a Discord account, please visit the Discord official website (https://discord.com/) to register an account.
  2. Join the Midjourney server: Visit the Midjourney official website (https://www.midjourney.com/), click the "Join the Beta" button, and follow the prompts to join Midjourney's Discord server.

Use Midjourney to Generate Images#

  1. Enter the newbie channel: In the Midjourney server, find the channel marked "#newbies". These channels are specially provided for newbies, and you can try to generate images here.
  2. Use the /imagine command: Enter /imagine in the chat box, and then enter your prompt. For example: /imagine a beautiful landscape with mountains and a lake.
  3. Wait for generation: Midjourney will generate four images based on your prompt. This process may take a few minutes.
  4. Select and enlarge the image: Below the generated four images, there are U1, U2, U3, and U4 buttons, which correspond to enlarging the first, second, third, and fourth images, respectively. Click the corresponding button to enlarge the image you like.
  5. Make variations: Below the generated four images, there are also V1, V2, V3, and V4 buttons, which correspond to making variations of the first, second, third, and fourth images, respectively. Click the corresponding button, and Midjourney will generate four new images similar to the image you selected.

Common Commands#

  • /imagine: Generate images based on prompts.
  • /info: View your Midjourney account information, including the remaining number of generations.
  • /help: View Midjourney's help documentation.
  • /settings: Set Midjourney's parameters, such as style, quality, etc.

Precautions#

  • Free trial: Midjourney provides a free trial, but the number of free trials is limited. If you want to continue using Midjourney, you need to purchase a subscription.
  • Prompt skills: The more detailed the prompt, the more the generated image will meet your expectations. You can try using different keywords, modifiers, and style descriptions to optimize your prompts.
  • Community interaction: Midjourney's Discord server is an active community. You can communicate with other users, share your works, and learn new skills here.

Step 3: Prompt Writing Skills#

The prompt is the soul of AI painting. A good prompt can guide AI to generate stunning works, while a bad prompt may lead to disappointing results. This chapter will introduce some prompt writing skills to help you better control AI painting.

The Structure of the Prompt#

A typical prompt usually contains the following parts:

  • Subject: The main object you want to depict, such as people, animals, landscapes, etc.
  • Environment: The environment in which the subject is located, such as indoor, outdoor, city, countryside, etc.
  • Style: The style of the image, such as realistic, cartoon, oil painting, watercolor, etc.
  • Lighting: The lighting effect of the image, such as sunrise, sunset, night, spotlight, etc.
  • Details: The detailed description of the image, such as color, material, texture, etc.
  • Artist: Imitate the style of a specific artist, such as Van Gogh, Monet, Da Vinci, etc.

Prompt Writing Skills#

  • Use concise and clear language: Avoid using overly complex or ambiguous words.
  • Use specific descriptions: Try to use specific descriptions, such as "a black cat sitting on a red sofa" instead of "a cat sitting on a sofa".
  • Use modifiers: Using modifiers can enhance the expressiveness of the prompt, such as "a cute black cat sitting on a comfortable red sofa".
  • Use weights: You can use weights to emphasize the importance of certain keywords. For example, in Stable Diffusion, you can use (keyword:1.5) to increase the weight of the keyword.
  • Use negative prompts: Using negative prompts can avoid AI generating content you don't want. For example, in Stable Diffusion, you can use negative prompt: blurry, ugly, distorted to avoid generating blurry, ugly, or distorted images.
  • Refer to other works: You can refer to other artworks or photographic works, draw inspiration from them, and transform them into prompts.
  • Constantly try and adjust: Prompt writing is a process of continuous trial and adjustment. You need to continuously optimize your prompts based on the generated image effects.

Examples#

Here are some examples of prompts:

  • a portrait of a young woman with long hair, realistic, soft lighting, detailed face, by Artgerm and Alphonse Mucha
  • a futuristic city at night, neon lights, cyberpunk style, detailed architecture, by Syd Mead
  • a landscape painting of a forest in autumn, vibrant colors, impressionistic style, by Claude Monet
  • a cute cartoon character of a cat, big eyes, smiling face, colorful background

Prompt Tools#

  • Lexica.art: A powerful prompt search engine that can help you find inspiration.
  • PromptBase: A prompt marketplace where you can buy or sell prompts.

Step 4: Common Parameter Adjustment and Optimization#

Stable Diffusion has many parameters. Understanding and mastering the adjustment of these parameters can help you better control the image generation process and obtain results that better meet your expectations. This chapter will introduce some commonly used parameters and how to adjust and optimize these parameters.

Sampling Method#

  • Euler a: A fast and efficient sampling method, suitable for generating stylized images.
  • DPM++ 2M Karras: A high-quality sampling method, suitable for generating images with rich details.
  • LMS: A relatively stable sampling method, suitable for generating realistic images.

Choosing the right sampling method depends on the image style and quality you want to generate. Generally speaking, DPM++ 2M Karras is the best choice, but if your computer configuration is low, you can try using Euler a.

Sampling Steps#

The more sampling steps, the richer the details of the image, but it will also increase the amount of calculation and generation time. Usually, 20-50 steps is a more appropriate range. For simple images, 20 steps may be enough, but for complex images, 50 steps or more may be required.

CFG Scale#

CFG Scale controls the degree to which AI follows the prompt. The larger the CFG Scale, the more AI will generate images according to the requirements of the prompt, but it may also lead to image distortion. Usually, 7-12 is a more appropriate range. If you want AI to play more freely, you can lower the CFG Scale.

Seed#

The seed determines the randomness of image generation. Using the same seed and prompt can generate the same image. This is very useful for repeatedly generating images or making comparisons. If you want to generate different images, you can change the seed.

Resolution#

Resolution determines the size of the image. The higher the resolution, the richer the details of the image, but it will also increase the amount of calculation. Usually, 512x512 or 768x768 is a more appropriate range. If you want to generate higher resolution images, you can try using the "upscaling" function.

Batch Count and Batch Size#

Batch count determines how many images are generated at once. Batch size determines how many images are generated each time. Batch count and batch size both affect the efficiency of image generation. If you want to generate multiple images, you can increase the batch count. If you want to improve the efficiency of image generation, you can increase the batch size, but you need to pay attention to video memory usage.

Optimization Tips#

  • Use a reasonable parameter range: Do not set the parameters too high or too low.
  • Adjust parameters according to image type: Different image types require different parameter settings.
  • Experiment more: Constantly try different parameter settings to find the parameters that best suit your image style.
  • Refer to other users' settings: You
AI Painting for Beginners: A Complete Zero-Based Guide to Stable Diffusion and Midjourney | EndTo.AI