KazaKhan

The primary objective is to explore the evolution of AI-generated images by combining the capabilities of the Llava vision model and Stable Diffusion for automatic image generation. This involves an iterative process where images generated by Stable Diffusion are subsequently described using Llava, with the resulting descriptions then used as prompts for generating new images.

A basic prompt is used to generate initial images. Following each generation cycle, the summarize_image function is used to create descriptions of the generated image. This description is then fed back into the generate_image function as a prompt, driving the next iteration of image generation.

This has had some interesting results

The summarize_image and generate_image functions are at the bottom of the results.

Python
prompt = """Circular design, cute colorful unicorn, rainbow, vector"""
for i in range(0, 5):
print(prompt)
image_path = generate_image(prompt)
print(image_path)
img = Image(image_path)
display(img)
prompt = summarize_image(image_path)
Circular design, cute colorful unicorn, rainbow, vector
tmp/0405241924.png

0405241924

The image is a digital illustration featuring a sticker with a stylized, cartoon-like depiction of a unicorn. The unicorn appears to be smiling and has a simplistic, colorful design with a rainbow mane and tail, which are commonly associated with the mythical creature. The sticker is set against a blue background that resembles a clear sky. Above the unicorn, there's a circular rainbow pattern, reminiscent of the one often found in depictions of a rainbow after rain. Below the unicorn, there are two clouds with a white cloud-like design and a slight shadow effect, creating an impression of fluffy clouds. The clouds seem to be floating gently near the unicorn's feet. The sticker is framed by a solid blue border that matches the background color. On the top right corner of the sticker, there's a small icon with what appears to be a pair of glasses and a heart, suggesting a brand or logo associated with the image. The overall style of the illustration is cute and whimsical, suitable for decoration or as part of a themed collection. The colors used are bright and vibrant, contributing to the cheerful atmosphere of the sticker. tmp/0405241926.png

0405241828

The image displays a colorful sticker with an illustration of a cartoon unicorn. The unicorn has a stylized design, featuring a single horn on its head and a mane of different colors. It stands on all fours against the backdrop of a rainbow with stars scattered across it. Below the unicorn, there is text that reads "SHOVE" in bold, capital letters. The sticker appears to be designed for the purpose of shoving or pushing someone, as suggested by the wording and the symbolism commonly associated with shovels. The overall style of the image is playful and whimsical, utilizing vibrant colors and a cute cartoon character to create a visually engaging graphic. tmp/0405241928.png

0405241928

The image displays a square icon, which appears to be a digital sticker or emoji. It features a stylized graphic with a globe at the center and two concentric circles surrounding it. The outer circle is white with a blue border, while the inner circle is light gray, providing a contrasting backdrop to the globe. Within the outer circle, there are two text elements. The upper text reads "SCOVE SOVE," which translates from French as "WALK ON WATER." This phrase has been adopted by the company Scove and is often used in their branding. The lower text reads "SOVE SOL," which translates to "WALK WITH THE SUN." At the top of the icon, there is a blue semicircle with three white clouds, suggesting an upward or hopeful motion. Additionally, below this semicircle, there's an illustration of a boat with a compass rose on its stern, implying travel or navigation themes. The overall style of the image is clean and minimalistic, employing simple shapes and colors to convey the brand message. The use of the globe, boat, and clouds within a circular frame suggests themes of exploration, adventure, or global connection. tmp/0405241930.png

0405241930

The image displays a circular compass rose design, which is predominantly black and white. At the center of the circle, there is a compass needle with the word "NORTH" inscribed above it, pointing towards the top of the image. Surrounding the compass needle are multiple concentric circles that represent the compass cardinal directions: North, East, South, West. Each direction is marked by a line and is accompanied by symbols representing the sunrise/sunset for each cardinal direction. The background of the design features a star-like pattern radiating outward from the center, adding to the nautical theme. The outermost circle is teal, creating a contrasting color scheme with the black compass needle and the white inner circles. tmp/0405241932.png

0405241932

Keeping some of the prompt persistent throughout had better results.

persistent_prompt = """line drawing, vector, pop art, rainbox, circular design. """
prompt = persistent_prompt + """Cute colorful mermaid """
print(prompt)
for i in range(0, 5):
image_path = generate_image(prompt)
print(image_path)
img = Image(image_path)
display(img)
prompt = summarize_image(image_path)
prompt = persistent_prompt + prompt
print(prompt)

0505241004

Cute colorful mermaid line drawing, vector, pop art, rainbox, circular design. The image features a vibrant illustration of a mermaid. She has long, flowing hair and is adorned with a tail that resembles a fish scale pattern in shades of blue, green, and purple. Her skin has a gradient effect, transitioning from yellow at the top to green at the bottom. She's wearing a green bikini bottom and a pair of round, dark sunglasses that are positioned on her head. The mermaid is standing within a circular frame that also contains elements such as bubbles, rocks, and clouds, suggesting an underwater setting. In her hand, she holds a starfish. Her expression is serene, and she is looking directly at the viewer. tmp/0505241007.png

0505241007

Cute colorful mermaid line drawing, vector, pop art, rainbox, circular design. The image is an illustration featuring a woman with blonde hair and blue eyes. She has large, expressive eyes and is wearing a purple swimsuit with a bikini top. Her swimsuit appears to have fish scales or sequins on it. The woman has a long, flowing hairstyle that seems to be wavy. She is standing in an underwater environment, with fish swimming around her. There are also fish swimming nearby and one directly above her head. In the background, there are plants with purple flowers, contributing to the underwater ambiance of the scene. The color palette of the image includes shades of blue, green, and purple, creating a vibrant and lively atmosphere. There is no text present in the image. The style of the illustration is reminiscent of comic book art, characterized by bold lines and vivid colors that are typical of graphic novels or illustrations. The artwork has a cartoon-like quality to it, with exaggerated features and a dynamic composition. tmp/0505241009.png

0505241009

Cute colorful mermaid line drawing, vector, pop art, rainbox, circular design. The image is an illustration of a fantasy scene. In the foreground, there's a female figure with mermaid-like features, such as long flowing hair and fish scales on her skin, wearing a blue bikini top. She has red hair and is standing in front of a vibrant coral reef teeming with marine life including tropical fish. The woman also wears a green mermaid tail that extends from beneath the water's surface. Above and around her are various elements of an underwater environment, like bubbles and sea creatures like dolphins. The art style is cartoonish with a focus on bright colors to evoke a sense of fantasy and adventure. tmp/0505241012.png

0505241012

Cute colorful mermaid line drawing, vector, pop art, rainbox, circular design. The image features a cartoon-style depiction of a mermaid standing in front of an underwater scene. The mermaid is wearing a bikini top, has red hair, and is waving with one hand while the other is on her hip. She appears to be smiling as she looks out towards the viewer. The background consists of various marine life forms including fish and coral, creating an immersive aquatic environment. The ocean floor can be seen in the foreground with colorful fish swimming around. To the left, there are vibrant orange corals, while to the right, there is a blue coral structure. The overall atmosphere of the image is cheerful and fantastical, evoking a sense of adventure and exploration. tmp/0505241014.png

0505241014

Cute colorful mermaid line drawing, vector, pop art, rainbox, circular design. The image is a vibrant and colorful representation of an underwater scene. It features a mermaid with long flowing hair and a fishtail, swimming in the center of the image among a rich variety of marine life. She appears to be looking up towards the surface. The water around her is clear and blue, with small fish and a few larger ones swimming nearby. The ocean floor is covered in coral and colorful anemones, providing a vibrant underwater garden. Above her, the sun shines brightly through a window of water, illuminating the scene with its warm light. The overall atmosphere of the image is one of tranquility and wonder at the beauty of the ocean's depths.

"""
Generate an image from a text prompt
Args:
prompt (str): A prompt detailing the request
Returns:
An image representng the prompt
"""
import requests
import base64
from IPython.display import Image
import datetime
def generate_image(prompt: str = "Cute cat", tiling: bool = False):
# Define the URL and the payload to send.
url = "http://127.0.0.1:7777"
payload = {
"prompt": prompt,
"steps": 28,
"width": 400,
"height": 400,
"tiling": tiling,
"override_settings": {
"sd_model_checkpoint": "Deliberate_v5",
"CLIP_stop_at_last_layers": 2,
}
}
# Send said payload to said URL through the API.
response = requests.post(url=f'{url}/sdapi/v1/txt2img', json=payload)
r = response.json()
# Get current date and time # Format current date and time to DDMMYYHHMM
current_datetime = datetime.datetime.now()
formatted_datetime = current_datetime.strftime("%d%m%y%H%M")
# Decode and save the image.
with open(f'tmp/{formatted_datetime}.png', 'wb') as f:
f.write(base64.b64decode(r['images'][0]))
return f'tmp/{formatted_datetime}.png'
"""
Converts an image to base64 and creates a text summary of what is in the image
Args:
image_path (str): The local path or URL of the image.
prompt (str): A prompt detailing the request
Returns:
A text summary of the image contents
"""
import requests
import json
import base64
from urllib.parse import urlparse
import os
def summarize_image(image_path: str, prompt: str = "Describe this image in a single sentemce"):
"""
Downloads an image from a URL and returns the local path to the downloaded image.
"""
def download_image(image_url: str) -> str:
try:
response = requests.get(image_url)
if response.status_code == 200:
# Parse the URL to extract the filename
parsed_url = urlparse(image_url)
filename = os.path.basename(parsed_url.path)
# Save the image locally
local_path = f".tmp/{filename}"
with open(local_path, "wb") as file:
file.write(response.content)
return local_path
else:
raise Exception("Failed to download image from URL.")
except Exception as e:
raise e
try:
# Check if the image_path is a URL
if urlparse(image_path).scheme:
# If it's a URL, download the image
image_path = download_image(image_path)
with open(image_path, "rb") as image_file:
image_base64 = base64.b64encode(image_file.read()).decode('utf-8')
url = "http://127.0.0.1:11434/api/generate"
#url = "http://127.0.0.1:4000/completions"
payload = {
"model": "llava",
"prompt": prompt,
"stream": False,
"images": [image_base64]
}
response = requests.post(url, data=json.dumps(payload))
# Parse JSON response
json_response = json.loads(response.text)
# Access the value of the "response" key
response_value = json_response["response"]
return response_value
except Exception as e:
return f"Error: {str(e)}"