Claude
Skills
Sign in
Back

pokeclaw-android-ai-agent

Included with Lifetime
$97 forever

PokeClaw (PocketClaw) — on-device Android AI phone agent using Gemma 4 via LiteRT-LM with tool calling, accessibility automation, and optional cloud models.

Cloud & DevOps

What this skill does


# PokeClaw Android AI Agent

> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection.

PokeClaw is an open-source Android app that runs Gemma 4 entirely on-device via [LiteRT-LM](https://ai.google.dev/edge/litert/llm/overview) with native tool calling. The LLM reads the screen as a UI tree, selects tools (tap, swipe, type, open app, send message, etc.), executes them through Android Accessibility Services, observes the result, and loops until the task is complete — no cloud, no API key required for local mode.

---

## Architecture Overview

```
User prompt
    │
    ▼
TaskOrchestrator          ← manages task lifecycle & session history
    │
    ▼
LLMEngine (LiteRT-LM)     ← Gemma 4 on-device, tool-call aware
    │  tool_calls[]
    ▼
ToolDispatcher            ← routes to concrete tool implementations
    │
    ├── AccessibilityTool  ← tap / swipe / long_press / input_text
    ├── AppLaunchTool      ← open_app
    ├── ScreenReaderTool   ← get_screen_info / take_screenshot
    ├── MessagingTool      ← send_message / auto_reply
    └── FinishTool         ← finish (signals task done)
         │
         ▼
    Android Accessibility Service / UI Automator
```

---

## Installation / Setup

### 1. Clone the repo

```bash
git clone https://github.com/agents-io/PokeClaw.git
cd PokeClaw
```

### 2. Open in Android Studio

- Android Studio Hedgehog or newer recommended
- SDK: Android 9+ (API 28), target API 34+
- Kotlin 1.9+

### 3. Add LiteRT-LM dependency

In `app/build.gradle.kts`:

```kotlin
dependencies {
    // LiteRT-LM for on-device LLM inference with tool calling
    implementation("com.google.ai.edge.litert:litert-lm:1.0.0")

    // Coroutines for async inference
    implementation("org.jetbrains.kotlinx:kotlinx-coroutines-android:1.7.3")

    // JSON for tool-call serialization
    implementation("org.json:json:20231013")
}
```

### 4. AndroidManifest.xml permissions

```xml
<!-- Required: accessibility for UI control -->
<uses-permission android:name="android.permission.BIND_ACCESSIBILITY_SERVICE" />

<!-- Required: read notifications for auto-reply -->
<uses-permission android:name="android.permission.BIND_NOTIFICATION_LISTENER_SERVICE" />

<!-- Optional: foreground service for background tasks -->
<uses-permission android:name="android.permission.FOREGROUND_SERVICE" />
<uses-permission android:name="android.permission.FOREGROUND_SERVICE_DATA_SYNC" />

<!-- Accessibility service declaration -->
<service
    android:name=".accessibility.PokeAccessibilityService"
    android:permission="android.permission.BIND_ACCESSIBILITY_SERVICE"
    android:exported="false">
    <intent-filter>
        <action android:name="android.accessibilityservice.AccessibilityService" />
    </intent-filter>
    <meta-data
        android:name="android.accessibilityservice"
        android:resource="@xml/accessibility_service_config" />
</service>
```

`res/xml/accessibility_service_config.xml`:

```xml
<accessibility-service xmlns:android="http://schemas.android.com/apk/res/android"
    android:accessibilityEventTypes="typeAllMask"
    android:accessibilityFeedbackType="feedbackGeneric"
    android:accessibilityFlags="flagDefault|flagRetrieveInteractiveWindows|flagRequestEnhancedWebAccessibility"
    android:canRetrieveWindowContent="true"
    android:canPerformGestures="true"
    android:notificationTimeout="100"
    android:description="@string/accessibility_service_description" />
```

### 5. Build & install APK

```bash
./gradlew assembleDebug
adb install app/build/outputs/apk/debug/app-debug.apk
```

Or download the [latest release APK](https://github.com/agents-io/PokeClaw/releases/latest).

---

## Core Concepts

### Tool Definition

Tools are declared as JSON schemas that LiteRT-LM uses for structured output. Define a tool:

```kotlin
// domain/tools/ToolDefinition.kt
data class ToolDefinition(
    val name: String,
    val description: String,
    val parameters: ToolParameters
)

data class ToolParameters(
    val type: String = "object",
    val properties: Map<String, ToolProperty>,
    val required: List<String>
)

data class ToolProperty(
    val type: String,
    val description: String,
    val enum: List<String>? = null
)
```

### Registering Tools with LiteRT-LM

```kotlin
// llm/LLMEngine.kt
import com.google.ai.edge.litert.lm.LiteRtLm
import com.google.ai.edge.litert.lm.InferenceOptions
import com.google.ai.edge.litert.lm.ToolConfig

class LLMEngine(private val context: Context) {

    private lateinit var lm: LiteRtLm

    suspend fun initialize(modelPath: String) {
        lm = LiteRtLm.create(
            context = context,
            modelPath = modelPath,
            inferenceOptions = InferenceOptions.builder()
                .setMaxTokens(2048)
                .setTemperature(0.1f)   // low temp for reliable tool calls
                .setTopK(40)
                .build()
        )
    }

    fun buildToolConfigs(): List<ToolConfig> {
        return listOf(
            ToolConfig.fromJson(tapToolJson()),
            ToolConfig.fromJson(inputTextToolJson()),
            ToolConfig.fromJson(openAppToolJson()),
            ToolConfig.fromJson(getScreenInfoToolJson()),
            ToolConfig.fromJson(sendMessageToolJson()),
            ToolConfig.fromJson(finishToolJson())
        )
    }

    private fun tapToolJson() = """
    {
      "name": "tap",
      "description": "Tap a UI element by its resource ID, content description, or screen coordinates.",
      "parameters": {
        "type": "object",
        "properties": {
          "target": {
            "type": "string",
            "description": "Resource ID, content-desc, or visible text of the element to tap."
          },
          "x": { "type": "number", "description": "Screen X coordinate (optional)." },
          "y": { "type": "number", "description": "Screen Y coordinate (optional)." }
        },
        "required": ["target"]
      }
    }
    """.trimIndent()

    private fun inputTextToolJson() = """
    {
      "name": "input_text",
      "description": "Type text into the currently focused or specified text field.",
      "parameters": {
        "type": "object",
        "properties": {
          "text": { "type": "string", "description": "Text to type." },
          "target": { "type": "string", "description": "Optional: resource ID of the target field." }
        },
        "required": ["text"]
      }
    }
    """.trimIndent()

    private fun openAppToolJson() = """
    {
      "name": "open_app",
      "description": "Launch an installed app by its name or package name.",
      "parameters": {
        "type": "object",
        "properties": {
          "app_name": { "type": "string", "description": "Human-readable app name, e.g. 'WhatsApp'." },
          "package_name": { "type": "string", "description": "Optional explicit package, e.g. 'com.whatsapp'." }
        },
        "required": ["app_name"]
      }
    }
    """.trimIndent()

    private fun getScreenInfoToolJson() = """
    {
      "name": "get_screen_info",
      "description": "Return a text representation of all interactive UI elements on the current screen.",
      "parameters": {
        "type": "object",
        "properties": {},
        "required": []
      }
    }
    """.trimIndent()

    private fun sendMessageToolJson() = """
    {
      "name": "send_message",
      "description": "Send a chat/SMS message to a contact. Handles: open app, find contact, type, send.",
      "parameters": {
        "type": "object",
        "properties": {
          "app": { "type": "string", "description": "Messaging app name, e.g. 'WhatsApp'." },
          "contact": { "type": "string", "description": "Contact name or phone number." },
          "message": { "type": "string", "description": "Message body to send." }
        },
        "required": ["app", "contact", "message"]
      }
    }
    """.trimIndent()

    private fun finishToolJson() = """
    {
      "name": "finish",
      "description": "Sig

Related in Cloud & DevOps