.. include:: /index.rst
   :start-after: start_hello_message
   :end-before: end_hello_message

.. _mp_hand_count_auto_tts:

13. Touchless Auto TTS — Hands-Free Voice Broadcast
======================================================

-----------------------------------------------------------------
1. Overview
-----------------------------------------------------------------

In :ref:`mp_hand_count_tts` (Section 12), we built a hand gesture
counting program where the user presses the ``t`` key to trigger
a TTS voice broadcast.

In this section, we take the next step: **remove the keyboard entirely.**
The system now *automatically* detects when you hold a hand gesture
steady and speaks the finger count — no keys, no buttons,
completely touchless.

.. image:: img/mp_hand_count.png
   :align: center

This lesson introduces a **state-machine pattern** for touchless
interaction — a technique you can apply to accessibility projects,
hands-free installations, and any scenario where keyboard input
is not practical.

By the end of this lesson, you will know how to:

- Design a state machine for hand-presence tracking
- Detect gesture *stability* over multiple frames
- Use a hold-duration gate to avoid false triggers
- Auto-detect when a hand enters or leaves the frame
- Provide multi-stage visual feedback (idle → detected → stable → speaking)
- Display a progress bar for hold-duration countdown


-----------------------------------------------------------------
2. How It Works
-----------------------------------------------------------------

The program replaces the keyboard trigger with an **automatic
stability-based trigger**. Here is the pipeline:

1. Initialize **MediaPipe Hands** for real-time hand detection.
2. Initialize the **Fusion HAT+ TTS engine** (Espeak).
3. Capture video frames and detect fingers (same as before).
4. Feed the finger count into a **stability detector** — a sliding
   window that checks whether the count has remained the same
   across multiple consecutive frames.
5. Once the count is confirmed stable, start a **hold-duration timer**.
6. If the user holds the same gesture for 2.5 seconds, TTS fires
   automatically.
7. If the hand leaves the frame, the system speaks "hand left the frame"
   after a short delay.
8. A **progress bar** and **multi-color border** show the current
   state at a glance.

The key design idea is:

    *The user's steady hand replaces the keyboard —*
    the system watches for *intent* (holding still) rather than
    reacting to every fleeting gesture.

This makes the project fully hands-free and accessible — ideal for
assistive technology, interactive exhibits, or situations where
the user cannot reach a keyboard.


-----------------------------------------------------------------
3. Key Design Concepts
-----------------------------------------------------------------

Adding auto-triggered TTS requires more sophisticated state
management than the key-press version. Let's walk through each
new concept.

--------------------------------------------------
3.1 State Machine for Hand Tracking
--------------------------------------------------

The program tracks hand presence as a **state**, not just a
per-frame value. A ``HandTrackingState`` class encapsulates
all the state variables:

.. code-block:: python

    class HandTrackingState:
        def __init__(self):
            self.finger_history = deque(maxlen=FRAME_HISTORY_SIZE)
            self.current_fingers = 0
            self.stable_fingers = -1
            self.stable_start_time = 0
            self.is_stable = False
            self.hand_present = False
            self.hand_absent_start_time = 0
            self.last_tts_time = 0
            self.last_tts_message = ""
            self.last_no_hand_tts_time = 0

    state = HandTrackingState()

By grouping all tracking variables into one object, the code
stays organized even as the logic grows more complex.

The state machine transitions through these phases:

- **No hand** — gray border, idle status
- **Hand detected, not yet stable** — cyan border, "keep hand still" prompt
- **Stable, holding** — green border fills in, progress bar animates
- **Speaking** — bright green flash, "SPEAKING..." label

--------------------------------------------------
3.2 Stability Detection
--------------------------------------------------

A single-frame finger count is unreliable — the number can
flicker due to camera noise or slight hand movement. To avoid
false triggers, we use a **sliding window** of recent counts:

.. code-block:: python

    from collections import deque

    FRAME_HISTORY_SIZE = 10
    STABLE_FRAMES_REQUIRED = 5

    state.finger_history = deque(maxlen=FRAME_HISTORY_SIZE)

    def update_stability(new_count):
        state.finger_history.append(new_count)

        if len(state.finger_history) >= STABLE_FRAMES_REQUIRED:
            recent_counts = list(state.finger_history)[-STABLE_FRAMES_REQUIRED:]
            if all(c == new_count for c in recent_counts):
                # Gesture is stable!
                state.is_stable = True
                state.stable_start_time = time.time()
                state.current_fingers = new_count
                return True

        state.current_fingers = new_count
        return False

The gesture is considered **stable** only when the last 5 frames
all report the same finger count. This filters out momentary
flickers and ensures the system only speaks when the user is
intentionally holding a gesture.

--------------------------------------------------
3.3 Auto-Trigger with Hold Duration
--------------------------------------------------

Stability alone is not enough — the user must *hold* the gesture
long enough to demonstrate intent:

.. code-block:: python

    HOLD_DURATION_REQUIRED = 2.5    # seconds
    MIN_TTS_INTERVAL = 4.0          # seconds between auto triggers

    def should_trigger_tts():
        now = time.time()

        # Minimum interval between TTS triggers
        if now - state.last_tts_time < MIN_TTS_INTERVAL:
            return False

        # Hand must be present and stable
        if not state.hand_present or not state.is_stable:
            return False

        # Must have been stable for the required hold duration
        hold_time = now - state.stable_start_time
        if hold_time < HOLD_DURATION_REQUIRED:
            return False

        # Don't repeat the same count too quickly
        if state.stable_fingers == state.current_fingers:
            if now - state.last_tts_time < MIN_TTS_INTERVAL * 2:
                return False

        return True

Three gates protect against false triggers:

1. **Minimum interval** — at least 4 seconds between any two TTS events.
2. **Hold duration** — the gesture must be held steady for 2.5 seconds.
3. **Repeat guard** — the same count won't be spoken again for 8 seconds.

--------------------------------------------------
3.4 Hand Exit Detection
--------------------------------------------------

When the user removes their hand from the camera, the system
notices and speaks a notification:

.. code-block:: python

    HAND_EXIT_DELAY = 4.0  # seconds after hand leaves

    # When hand just left:
    if state.hand_present:
        state.hand_present = False
        state.is_stable = False
        state.stable_fingers = -1
        state.finger_history.clear()

        if now - state.last_tts_time >= MIN_TTS_INTERVAL:
            tts.say("hand left the frame")

The exit message only fires if enough time has passed since
the last TTS event — preventing it from interrupting a
finger-count announcement.

--------------------------------------------------
3.5 Building the Message
--------------------------------------------------

Message construction is identical to the key-press version:

.. code-block:: python

    if count == 0:
        message = "no fingers detected"
    elif count == 1:
        message = "one finger detected"
    else:
        message = f"{count} fingers detected"

.. note::

   Unlike the key-press version which sums fingers across both hands,
   this version uses ``max(total_fingers, finger_count)`` to pick
   the hand with the most visible fingers. This produces more
   reliable results when both hands are in frame.

--------------------------------------------------
3.6 Multi-Stage Visual Feedback
--------------------------------------------------

Instead of a single green flash, this version provides a
**continuous color-coded border** that reflects the current state:

.. code-block:: python

    COLOR_IDLE     = (128, 128, 128)   # gray   — no hand
    COLOR_DETECTED = (255, 255, 0)     # cyan   — hand seen, not yet stable
    COLOR_STABLE   = (0, 255, 0)       # green  — gesture stable, holding
    COLOR_SPEAKING = (0, 255, 0)       # bright green — TTS in progress

The border color transitions smoothly from cyan to green as the
hold duration progresses, giving the user real-time feedback on
how close they are to triggering TTS.

**Progress bar**: A small bar in the top-right corner fills from
left to right as the hold duration counts up. When it reaches 100%,
TTS fires. This gives the user a clear visual countdown.

**Status text**: A status line below the finger count shows the
current phase:

- ``"Status: No hand detected"``
- ``"Status: Detecting... keep hand still"``
- ``"Status: Hold gesture (1.3s to speak)"``
- ``"Status: Ready to speak!"``


-----------------------------------------------------------------
4. Run the Code
-----------------------------------------------------------------

.. important::

   Before you start, make sure:

   * The Fusion HAT+ is assembled and the speaker is connected
   * You can access the Raspberry Pi desktop
   * The code package is installed
   * MediaPipe and OpenCV are installed

   For detailed instructions, see :ref:`mediapipe_install` and :ref:`opencv_install`.

#. Open the terminal and enter the following command:

   .. code-block:: bash

      sudo python3 ~/ai-lab-kit/mediapipe/mp_hand_count_tts_without_tap.py

#. After running the program:

   - A window titled "MediaPipe Hand Detection + AUTO TTS (Touchless Mode)" opens,
     showing the live camera feed.
   - Hold your hand up to the camera — the finger count appears
     in the top-left corner.
   - *Keep your hand still* — watch the border change from gray
     to cyan to green, and the progress bar fill up.
   - After 2.5 seconds of holding the same gesture, the system
     automatically speaks the finger count.
   - Remove your hand from the camera — after a moment, the system
     says "hand left the frame."

   .. hint::

      Try showing different numbers of fingers and holding each
      one steady for a few seconds. You should hear each count
      spoken automatically. Notice how the border color and
      progress bar guide you through the process.

   Press ``q`` to exit the program.


--------------------------------------------------
5. Complete Code
--------------------------------------------------

.. code-block:: python

   """
   MediaPipe Hand Detection + Auto TTS (Touchless Mode)
   ====================================================
   Detects fingers via webcam in real time. Automatically speaks the finger count
   when a stable hand gesture is maintained for a certain duration.

   No keyboard input required for triggering TTS.

   Usage:
       python mp_hand_count_auto_tts.py

   Controls:
       'q'  - quit
   """

   from picamera2 import Picamera2
   import cv2
   import mediapipe.python.solutions.hands as mp_hands
   import mediapipe.python.solutions.drawing_utils as drawing
   import mediapipe.python.solutions.drawing_styles as drawing_styles
   from fusion_hat.tts import Espeak
   import time
   from collections import deque


   # ======================== Init TTS ========================
   tts = Espeak()
   tts.set_amp(200)       # volume 0-200, default 100
   tts.set_speed(150)     # speed 80-260, default 150
   tts.set_pitch(80)      # pitch 0-99, default 80

   # ======================== Init MediaPipe Hands ========================
   hands = mp_hands.Hands(
       static_image_mode=False,
       max_num_hands=2,
       min_detection_confidence=0.5,
       min_tracking_confidence=0.5
   )

   # ======================== Init Camera ========================
   picam2 = Picamera2()
   config = picam2.create_preview_configuration(
       main={"size": (640, 480), "format": "XRGB8888"},
   )
   picam2.configure(config)
   picam2.start()

   # ======================== Constants ========================
   # Finger tip and dip landmark indices
   FINGER_TIPS = [4, 8, 12, 16, 20]   # thumb, index, middle, ring, pinky tips
   FINGER_DIPS = [2, 6, 10, 14, 18]   # corresponding middle joints

   # Auto TTS parameters
   STABLE_FRAMES_REQUIRED = 5      # frames needed to confirm stability
   HOLD_DURATION_REQUIRED = 2.5    # seconds hand must stay stable before speaking
   MIN_TTS_INTERVAL = 4.0          # seconds between auto TTS triggers
   HAND_EXIT_DELAY = 4.0           # seconds after hand leaves before saying "hand left"
   NO_HAND_COOLDOWN = 5.0          # seconds without hand before suppressing "no hand" repeats

   # Frame processing
   FRAME_HISTORY_SIZE = 10         # for stability detection

   # Border colors (BGR)
   COLOR_IDLE = (128, 128, 128)    # gray
   COLOR_DETECTED = (255, 255, 0)  # cyan
   COLOR_STABLE = (0, 255, 0)      # green
   COLOR_SPEAKING = (0, 255, 0)    # bright green

   print("=" * 60)
   print("  MediaPipe Hand Detection + AUTO TTS (Touchless Mode)")
   print("  No keyboard needed - just show a stable hand gesture")
   print("  Press 'q' to quit")
   print("=" * 60)

   # ======================== State Management ========================
   class HandTrackingState:
       def __init__(self):
           self.finger_history = deque(maxlen=FRAME_HISTORY_SIZE)
           self.current_fingers = 0
           self.stable_fingers = -1
           self.stable_start_time = 0
           self.is_stable = False
           self.hand_present = False
           self.hand_absent_start_time = 0
           self.last_tts_time = 0
           self.last_tts_message = ""
           self.last_no_hand_tts_time = 0

   state = HandTrackingState()

   def get_finger_count(hand_landmarks):
       """Count fingers for a single hand (right hand logic)"""
       landmarks = hand_landmarks.landmark
       finger_count = 0

       # Thumb: extended when x_tip > x_dip (right hand)
       if landmarks[FINGER_TIPS[0]].x > landmarks[FINGER_DIPS[0]].x:
           finger_count += 1

       # Other four fingers: tip is above dip when extended (smaller y)
       for i in range(1, 5):
           if landmarks[FINGER_TIPS[i]].y < landmarks[FINGER_DIPS[i]].y:
               finger_count += 1

       return finger_count

   def update_stability(new_count):
       """Update stability state based on finger count history"""
       state.finger_history.append(new_count)

       if len(state.finger_history) >= STABLE_FRAMES_REQUIRED:
           recent_counts = list(state.finger_history)[-STABLE_FRAMES_REQUIRED:]
           if all(c == new_count for c in recent_counts):
               if not state.is_stable or state.current_fingers != new_count:
                   state.is_stable = True
                   state.stable_start_time = time.time()
                   state.current_fingers = new_count
                   return True
       else:
           state.is_stable = False

       state.current_fingers = new_count
       return False

   def should_trigger_tts():
       """Check if conditions are met for auto TTS"""
       now = time.time()

       if now - state.last_tts_time < MIN_TTS_INTERVAL:
           return False

       if not state.hand_present or not state.is_stable:
           return False

       hold_time = now - state.stable_start_time
       if hold_time < HOLD_DURATION_REQUIRED:
           return False

       if state.stable_fingers == state.current_fingers:
           if now - state.last_tts_time < MIN_TTS_INTERVAL * 2:
               return False

       return True

   def trigger_tts():
       """Execute TTS for current finger count"""
       now = time.time()
       count = state.current_fingers

       if count == 0:
           message = "no fingers detected"
       elif count == 1:
           message = "one finger detected"
       else:
           message = f"{count} fingers detected"

       if message == state.last_tts_message and now - state.last_tts_time < 3.0:
           return False

       print(f"[TTS] {message} (held for {HOLD_DURATION_REQUIRED}s)")
       tts.say(message)

       state.last_tts_time = now
       state.last_tts_message = message
       state.stable_fingers = count

       return True

   def trigger_hand_exit_tts():
       """Say hand has left the frame"""
       now = time.time()
       if now - state.last_tts_time >= MIN_TTS_INTERVAL:
           print("[TTS] hand left the frame")
           tts.say("hand left the frame")
           state.last_tts_time = now
           state.last_tts_message = "hand left"

   def get_border_color():
       """Determine border color based on current state"""
       now = time.time()

       if hasattr(state, 'speaking_until') and now < state.speaking_until:
           return COLOR_SPEAKING

       if not state.hand_present:
           return COLOR_IDLE

       if state.is_stable:
           hold_progress = min(1.0, (now - state.stable_start_time) / HOLD_DURATION_REQUIRED)
           if hold_progress < 1.0:
               r = int(COLOR_DETECTED[0] * (1-hold_progress) + COLOR_STABLE[0] * hold_progress)
               g = int(COLOR_DETECTED[1] * (1-hold_progress) + COLOR_STABLE[1] * hold_progress)
               b = int(COLOR_DETECTED[2] * (1-hold_progress) + COLOR_STABLE[2] * hold_progress)
               return (b, g, r)
           else:
               return COLOR_STABLE

       return COLOR_DETECTED

   # ======================== Main Loop ========================
   frame_count = 0
   speaking_flash_until = 0

   while True:
       # ---- 1. Capture frame ----
       frame_bgra = picam2.capture_array()
       frame_bgr = cv2.cvtColor(frame_bgra, cv2.COLOR_BGRA2BGR)

       # ---- 2. Convert to RGB for MediaPipe ----
       frame_rgb = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2RGB)
       hands_detected = hands.process(frame_rgb)

       # ---- 3. Convert back to BGR for OpenCV display ----
       frame = cv2.cvtColor(frame_rgb, cv2.COLOR_RGB2BGR)

       # ---- 4. Detect hands and count fingers ----
       total_fingers = 0
       has_hand = False

       if hands_detected.multi_hand_landmarks:
           has_hand = True
           for hand_landmarks in hands_detected.multi_hand_landmarks:
               drawing.draw_landmarks(
                   frame,
                   hand_landmarks,
                   mp_hands.HAND_CONNECTIONS,
                   drawing_styles.get_default_hand_landmarks_style(),
                   drawing_styles.get_default_hand_connections_style(),
               )

               finger_count = get_finger_count(hand_landmarks)
               total_fingers = max(total_fingers, finger_count)

       # ---- 5. Update state machine ----
       now = time.time()

       if has_hand:
           if not state.hand_present:
               state.hand_present = True
               state.is_stable = False
               state.finger_history.clear()
               print("[INFO] Hand detected")
           state.hand_absent_start_time = now
       else:
           if state.hand_present:
               state.hand_present = False
               state.is_stable = False
               state.stable_fingers = -1
               state.finger_history.clear()
               if now - state.last_tts_time >= MIN_TTS_INTERVAL:
                   trigger_hand_exit_tts()

       if has_hand:
           update_stability(total_fingers)

           if should_trigger_tts():
               if trigger_tts():
                   speaking_flash_until = now + 0.8
                   state.speaking_until = speaking_flash_until

       # ---- 6. Display information on screen ----
       display_text = f"Fingers: {total_fingers}"
       cv2.putText(frame, display_text, (10, 40),
                   cv2.FONT_HERSHEY_SIMPLEX, 1.2, (0, 255, 0), 2)

       if not has_hand:
           status_text = "Status: No hand detected"
           status_color = (128, 128, 128)
       elif state.is_stable:
           hold_progress = min(1.0, (now - state.stable_start_time) / HOLD_DURATION_REQUIRED)
           if hold_progress < 1.0:
               remaining = HOLD_DURATION_REQUIRED - (now - state.stable_start_time)
               status_text = f"Status: Hold gesture ({remaining:.1f}s to speak)"
               status_color = (255, 255, 0)
           else:
               status_text = "Status: Ready to speak!"
               status_color = (0, 255, 0)
       else:
           status_text = "Status: Detecting... keep hand still"
           status_color = (0, 200, 200)

       cv2.putText(frame, status_text, (10, 80),
                   cv2.FONT_HERSHEY_SIMPLEX, 0.6, status_color, 2)

       cv2.putText(frame, "Keep gesture still to auto-speak | 'q' to quit",
                   (10, frame.shape[0] - 15),
                   cv2.FONT_HERSHEY_SIMPLEX, 0.5, (180, 180, 180), 1)

       # ---- 7. Visual border feedback ----
       h, w = frame.shape[:2]
       thickness = 6

       if now < speaking_flash_until:
           border_color = (0, 255, 0)
           cv2.rectangle(frame, (0, 0), (w - 1, h - 1), border_color, thickness)
           cv2.putText(frame, "SPEAKING...", (w - 180, 40),
                       cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 2)
       else:
           border_color = get_border_color()
           cv2.rectangle(frame, (0, 0), (w - 1, h - 1), border_color, thickness)

       # ---- 8. Progress bar for hold duration ----
       if has_hand and state.is_stable:
           hold_progress = min(1.0, (now - state.stable_start_time) / HOLD_DURATION_REQUIRED)
           bar_width = int(w * 0.4)
           bar_height = 8
           bar_x = w - bar_width - 10
           bar_y = 10
           filled_width = int(bar_width * hold_progress)

           cv2.rectangle(frame, (bar_x, bar_y), (bar_x + bar_width, bar_y + bar_height),
                        (60, 60, 60), -1)
           cv2.rectangle(frame, (bar_x, bar_y), (bar_x + filled_width, bar_y + bar_height),
                        (0, 255, 0), -1)

       # ---- 9. Key handling ----
       key = cv2.waitKey(1) & 0xff

       if key == ord('q'):
           break

       # ---- 10. Show frame ----
       cv2.imshow("MediaPipe Hand Detection + AUTO TTS (Touchless Mode)", frame)

   # ======================== Cleanup ========================
   picam2.stop_preview()
   picam2.stop()
   cv2.destroyAllWindows()
   print("Exited.")


--------------------------------------------------
6. Code Explanation
--------------------------------------------------

Let's walk through the code section by section, focusing on
what's new compared to the key-press version from
:ref:`mp_hand_count_tts`.

--------------------------------------------------
6.1 Imports and New Dependencies
--------------------------------------------------

.. code-block:: python

   from collections import deque
   import time

The key addition is ``deque`` — a double-ended queue from
Python's ``collections`` module. It provides a fixed-size
sliding window for stability detection: when you ``append``
to a ``deque(maxlen=N)``, old items are automatically
dropped, keeping only the most recent N values.

This is perfect for tracking the last 5–10 finger counts
without manual list management.

--------------------------------------------------
6.2 Constants and Configuration
--------------------------------------------------

.. code-block:: python

   STABLE_FRAMES_REQUIRED = 5      # frames needed to confirm stability
   HOLD_DURATION_REQUIRED = 2.5    # seconds hand must stay stable
   MIN_TTS_INTERVAL = 4.0          # seconds between auto TTS triggers
   HAND_EXIT_DELAY = 4.0           # seconds after hand leaves
   NO_HAND_COOLDOWN = 5.0          # seconds before suppressing repeats
   FRAME_HISTORY_SIZE = 10         # for stability detection

   COLOR_IDLE     = (128, 128, 128)   # gray
   COLOR_DETECTED = (255, 255, 0)     # cyan
   COLOR_STABLE   = (0, 255, 0)       # green
   COLOR_SPEAKING = (0, 255, 0)       # bright green

All timing and behavior parameters are declared as named constants
at the top of the file. This makes the program easy to tune —
want a longer hold time? Change ``HOLD_DURATION_REQUIRED``.
Want less frequent announcements? Increase ``MIN_TTS_INTERVAL``.

The four border colors define a visual language:

- **Gray** — idle, no hand in frame
- **Cyan** — hand detected, but not yet stable
- **Green** — gesture is stable and holding
- **Bright green** — currently speaking

--------------------------------------------------
6.3 HandTrackingState Class
--------------------------------------------------

.. code-block:: python

   class HandTrackingState:
       def __init__(self):
           self.finger_history = deque(maxlen=FRAME_HISTORY_SIZE)
           self.current_fingers = 0
           self.stable_fingers = -1
           self.stable_start_time = 0
           self.is_stable = False
           self.hand_present = False
           self.hand_absent_start_time = 0
           self.last_tts_time = 0
           self.last_tts_message = ""
           self.last_no_hand_tts_time = 0

   state = HandTrackingState()

This class bundles all tracking variables into a single object.
Each variable serves a specific role:

- ``finger_history`` — sliding window of recent finger counts
  (used by the stability detector)
- ``current_fingers`` — the finger count for the current frame
- ``stable_fingers`` — the last confirmed stable count that was spoken
- ``stable_start_time`` — when the current stable period began
- ``is_stable`` — whether the gesture is currently confirmed stable
- ``hand_present`` — whether a hand is currently in frame
- ``hand_absent_start_time`` — when the hand last left the frame
- ``last_tts_time`` — timestamp of the last TTS event
- ``last_tts_message`` — the last spoken message (to avoid repeats)
- ``last_no_hand_tts_time`` — timestamp of last "no hand" announcement

A single ``state`` instance is created globally, so all helper
functions can read and modify it without passing parameters.

--------------------------------------------------
6.4 Stability Detection Function
--------------------------------------------------

.. code-block:: python

   def update_stability(new_count):
       state.finger_history.append(new_count)

       if len(state.finger_history) >= STABLE_FRAMES_REQUIRED:
           recent_counts = list(state.finger_history)[-STABLE_FRAMES_REQUIRED:]
           if all(c == new_count for c in recent_counts):
               if not state.is_stable or state.current_fingers != new_count:
                   state.is_stable = True
                   state.stable_start_time = time.time()
                   state.current_fingers = new_count
                   return True
       else:
           state.is_stable = False

       state.current_fingers = new_count
       return False

This function is the heart of the touchless system. Here's how it works:

1. **Append** the new finger count to the sliding window.
2. **Check** if we have enough frames (at least 5).
3. **Compare** the last 5 frames — if they all match the current
   count, the gesture is stable.
4. **Record** the time when stability began (``stable_start_time``)
   — this is used by the hold-duration timer.
5. **Return** ``True`` on the frame where stability is first
   confirmed, ``False`` otherwise.

The ``all(c == new_count for c in recent_counts)`` expression is
elegant: it checks that *every* value in the window matches the
current count. If even one frame differs, stability is broken.

--------------------------------------------------
6.5 Auto TTS Trigger Logic
--------------------------------------------------

.. code-block:: python

   def should_trigger_tts():
       now = time.time()

       if now - state.last_tts_time < MIN_TTS_INTERVAL:
           return False
       if not state.hand_present or not state.is_stable:
           return False
       hold_time = now - state.stable_start_time
       if hold_time < HOLD_DURATION_REQUIRED:
           return False
       if state.stable_fingers == state.current_fingers:
           if now - state.last_tts_time < MIN_TTS_INTERVAL * 2:
               return False
       return True

This function acts as a **gate** — all conditions must be met
before TTS can fire:

1. **Minimum interval**: at least 4 seconds since the last TTS.
2. **Hand present and stable**: the gesture must be confirmed stable.
3. **Hold duration**: the user must have held the gesture for
   at least 2.5 seconds.
4. **Repeat guard**: the same finger count won't be spoken again
   for 8 seconds (2× the minimum interval).

.. tip::

   The hold duration creates a clear *intent signal* — momentary
   gestures are ignored, but a deliberate hold triggers speech.
   This is the key difference from the key-press approach: the
   user's *patience* replaces the button press.

--------------------------------------------------
6.6 Hand Exit Detection
--------------------------------------------------

.. code-block:: python

   # In the main loop:
   if has_hand:
       if not state.hand_present:
           # Hand just entered
           state.hand_present = True
           state.is_stable = False
           state.finger_history.clear()
           print("[INFO] Hand detected")
       state.hand_absent_start_time = now
   else:
       if state.hand_present:
           # Hand just left
           state.hand_present = False
           state.is_stable = False
           state.stable_fingers = -1
           state.finger_history.clear()
           if now - state.last_tts_time >= MIN_TTS_INTERVAL:
               trigger_hand_exit_tts()

When the hand enters or leaves the frame, the state is reset:

- Stability is cleared (``is_stable = False``)
- The finger history is wiped (``history.clear()``)
- If the hand just left, and enough time has passed since the
  last TTS, the system says "hand left the frame"

Resetting stability on entry and exit prevents stale state
from carrying over between hand appearances.

--------------------------------------------------
6.7 Multi-Color Border and Progress Bar
--------------------------------------------------

.. code-block:: python

   def get_border_color():
       now = time.time()

       if hasattr(state, 'speaking_until') and now < state.speaking_until:
           return COLOR_SPEAKING

       if not state.hand_present:
           return COLOR_IDLE

       if state.is_stable:
           hold_progress = min(1.0, (now - state.stable_start_time) / HOLD_DURATION_REQUIRED)
           if hold_progress < 1.0:
               # Smooth blend from cyan to green
               r = int(COLOR_DETECTED[0] * (1-hold_progress) + COLOR_STABLE[0] * hold_progress)
               g = int(COLOR_DETECTED[1] * (1-hold_progress) + COLOR_STABLE[1] * hold_progress)
               b = int(COLOR_DETECTED[2] * (1-hold_progress) + COLOR_STABLE[2] * hold_progress)
               return (b, g, r)
           else:
               return COLOR_STABLE

       return COLOR_DETECTED

The border color is not just decorative — it's a real-time
status indicator:

- **No hand** → gray border
- **Hand detected, not stable** → cyan border
- **Stable, still holding** → smooth gradient from cyan to green
  as the hold duration progresses
- **Hold complete / speaking** → bright green border

The **progress bar** works alongside the border:

.. code-block:: python

   if has_hand and state.is_stable:
       hold_progress = min(1.0, (now - state.stable_start_time) / HOLD_DURATION_REQUIRED)
       bar_width = int(w * 0.4)
       bar_height = 8
       bar_x = w - bar_width - 10
       bar_y = 10
       filled_width = int(bar_width * hold_progress)

       cv2.rectangle(frame, (bar_x, bar_y), (bar_x + bar_width, bar_y + bar_height),
                    (60, 60, 60), -1)  # background
       cv2.rectangle(frame, (bar_x, bar_y), (bar_x + filled_width, bar_y + bar_height),
                    (0, 255, 0), -1)   # fill

A dark gray bar (40% of frame width) sits in the top-right corner.
A green fill sweeps across it as the hold time progresses.
When the bar is full, TTS fires.

Together, the border color and progress bar give the user
continuous feedback — they always know exactly how close they
are to triggering speech.


-----------------------------------------------------------------
7. Extension Ideas
-----------------------------------------------------------------

The touchless auto-TTS pattern opens up many possibilities:

- **Assistive communication** — Map specific gestures to
  pre-recorded phrases. Hold up 1 finger for "yes", 2 for "no",
  3 for "help". The system speaks the phrase automatically.

- **Hands-free presentation control** — Hold a gesture to
  advance slides or trigger sound effects during a talk.

- **Interactive museum exhibit** — Visitors hold up fingers
  to hear facts about numbered exhibits. No touching required.

- **GPIO button integration** — Add a physical button via
  ``fusion_hat`` GPIO that enables/disables auto-TTS mode,
  giving the user manual control over when the system listens.

- **Multi-gesture vocabulary** — Extend the stability detector
  to recognize a sequence of gestures (e.g., 1 finger → 2 fingers
  → 3 fingers) as a "command code" that triggers different actions.

- **Combine with Face Detection** — Auto-announce when a face
  enters or leaves the frame: "Person detected" / "Person left."


-----------------------------------------------------------------
8. Troubleshooting
-----------------------------------------------------------------

- **TTS fires too frequently or on unstable gestures**

  Increase ``STABLE_FRAMES_REQUIRED`` (e.g., from 5 to 8) to
  require more frames of consistency before confirming stability.

  Increase ``HOLD_DURATION_REQUIRED`` (e.g., from 2.5 to 3.5)
  to require a longer hold before speaking.

- **TTS never fires, even when holding steady**

  Make sure your hand is well-lit and clearly visible to the
  camera. Check that ``min_detection_confidence`` is not set
  too high (0.5 is a good default).

  Verify that the status text on screen shows "Ready to speak!"
  — if it stays at "Detecting..." or the progress bar never
  fills, the stability detector may not be confirming.

- **"Hand left the frame" spoken at wrong times**

  The exit message respects ``MIN_TTS_INTERVAL`` — it won't
  fire if a finger-count announcement just happened. If you
  want it to always speak, remove the ``MIN_TTS_INTERVAL``
  check from ``trigger_hand_exit_tts()``.

- **Progress bar not appearing**

  The progress bar only appears when ``has_hand`` is ``True``
  **and** ``state.is_stable`` is ``True``. If either condition
  is false, the bar is hidden. Check the status text to
  determine which condition is failing.

- **Border color doesn't change**

  Verify that ``get_border_color()`` is being called on every
  frame and that the ``state.hand_present`` and ``state.is_stable``
  flags are being updated correctly in the main loop.


-----------------------------------------------------------------
9. Summary
-----------------------------------------------------------------

- This lesson demonstrated how to **remove the keyboard trigger**
  and build a fully touchless auto-TTS system.
- The project uses a **state machine** (``HandTrackingState`` class)
  to track hand presence, gesture stability, and TTS timing.
- **Key design patterns** covered:

  - **Stability detection** — sliding window of finger counts
    to confirm the user is holding a gesture steady
  - **Hold-duration gate** — requiring 2.5 seconds of stability
    before triggering TTS, replacing the key press with *intent*
  - **Auto exit detection** — speaking "hand left the frame"
    when the hand disappears
  - **Multi-stage visual feedback** — color-coded border
    (gray → cyan → green) plus a progress bar for real-time
    status
  - **State reset on hand entry/exit** — clearing history and
    stability to prevent stale data from carrying over

- These patterns are **project-agnostic** — you can apply the
  state-machine + stability-detection approach to any computer
  vision project that needs touchless interaction.
- Combining auto-TTS with gesture recognition opens the door
  to assistive technology, hands-free control systems, and
  interactive installations.