【Article】DeepSeek R1 - Setting Up a Personal Knowledge Base Locally

YouTube

Why Deploy DeepSeek R1 Locally?#

Data Privacy and Security

Sensitive Data Protection: When handling sensitive data such as medical, financial, or government information, local deployment ensures that data does not leave the internal network, avoiding the risk of leaks from cloud transmission or third-party storage.
Compliance Requirements: Certain regulations (like GDPR, HIPAA) mandate that data must be stored locally or in specific regions, and local deployment can directly meet compliance.

Performance and Low Latency

Real-time Requirements: Scenarios like quality inspection in manufacturing and real-time decision-making require millisecond-level responses, and local servers reduce network latency.
High Bandwidth Data Processing: For high-frequency trading or video analysis, local deployment avoids bandwidth bottlenecks caused by uploading to the cloud.

Customization and System Integration

Deep Adaptation to Business: Model parameters, interfaces, or output formats can be adjusted to fit unique enterprise processes (e.g., integration with internal ERP, BI tools).
Privatized Function Development: Supports the addition of industry-specific modules (like legal clause parsing, industrial fault diagnosis) while protecting intellectual property.

Cost Control (Long-term)

Economical for Large-scale Use: If there is a high volume of long-term calls, the investment in local hardware may be lower than the ongoing subscription fees for cloud services.
Reuse Existing Infrastructure: When enterprises already have server/GPU resources, deployment costs are further reduced.

Network and Stability

Offline Environment Operation: In scenarios with unstable or no network, such as mines or ocean-going vessels, local deployment ensures service continuity.
Avoid Cloud Service Interruption Risks: Does not rely on the availability of third-party cloud vendors (like AWS/Azure occasional failures).

Complete Autonomy and Control

Self-managed Upgrades and Maintenance: Decide when to update model versions, avoiding business interruptions caused by forced upgrades in the cloud.
Audit and Supervision: Full control over system logs and access records, facilitating internal audits or regulatory inspections.

What Configuration is Needed to Install DeepSeek R1?#

DeepSeek Model Windows Configuration Requirements:

Model Name	Parameter Count (Billion)	Model File Size	Unified Memory Requirement (Runtime)	Minimum Windows Configuration Requirements
deepseek-r1:1.5b	15	1.1 GB	2～3 GB	CPU: 4 cores + Memory: 8GB + Disk: 3GB, supports pure CPU inference
deepseek-r1:7b	70	4.7 GB	5～7 GB	CPU: 8 cores + Memory: 16GB + GPU: RTX 3070/4060 (8GB+ VRAM)
deepseek-r1:8b	80	4.9 GB	6～8 GB	CPU: 8 cores + Memory: 16GB + GPU: RTX 3070/4060 (8GB+ VRAM)
deepseek-r1:14b	140	9 GB	10～14 GB	CPU: 12 cores + Memory: 32GB + GPU: RTX 4090 (16GB+ VRAM)
deepseek-r1:32b	320	20 GB	22～25 GB	CPU: i9/Ryzen9 + Memory: 64GB + GPU: A100 (24GB+ VRAM)
deepseek-r1:70b	700	43 GB	>45 GB	Server-level configuration: 32-core CPU/128GB memory/multiple cards in parallel (e.g., 4xRTX4090)

DeepSeek Model Mac Configuration Requirements:

Model Name	Parameter Count (Billion)	Model File Size	Unified Memory Requirement (Runtime)	Minimum Mac Configuration Requirements
deepseek-r1:1.5b	15	1.1 GB	2～3 GB	MacBook Air (M2/M3 chip, ≥8GB memory)
deepseek-r1:7b	70	4.7 GB	5～7 GB	MacBook Air or Mac mini (M2/M3/M4 chip, ≥16GB memory)
deepseek-r1:8b	80	4.9 GB	6～8 GB	MacBook Air or Mac mini (M2/M3/M4 chip, ≥16GB memory)
deepseek-r1:14b	140	9 GB	10～14 GB	MacBook Pro (M2/M3/M4 Pro chip, ≥32GB memory)
deepseek-r1:32b	320	20 GB	22～25 GB	Mac Studio (M2 Max/Ultra) or MacBook Pro (M2/M3/M4 Max, ≥48GB memory)
deepseek-r1:70b	700	43 GB	>45 GB	Mac Studio (M2 Max/Ultra) or MacBook Pro (M2/M3/M4 Max, ≥64GB memory)

How to Deploy DeepSeek R1 Locally?#

Note: I am using a Mac model, the Mac mini M4, the deployment on Windows is quite similar.

Two tools need to be downloaded
1. Ollama
2. AnythingLLM
Installation Flowchart

1. Ollama#

Mainly used to install and run various large models locally, including DeepSeek.

Ollama
- Ollama is a free open-source project designed as a tool for conveniently deploying and running LLMs on local machines, allowing users to easily load, run, and interact with various open-source LLMs without needing to understand complex underlying technologies.
- Features of Ollama:
  - Local Deployment: Does not rely on cloud services, allowing users to run models on their own devices, protecting data privacy.
  - Multi-Operating System Support: Easily installed and used on Mac, Linux, or Windows.
  - Multi-Model Support: Ollama supports various popular LLM models, such as Llama, Falcon, etc., including the recently released Meta's new open-source large model llama3.1 405B, allowing users to choose different models based on their needs and run them with one click.
  - Easy to Use: Provides an intuitive command-line interface, simple to operate and easy to get started.
  - Extensibility: Supports custom configurations, allowing users to optimize based on their hardware environment and model requirements.
  - Open Source: The code is completely open, allowing users to view, modify, and distribute freely (although not many people will modify it).

2. DeepSeek R1#

Find deepseek-r1 on the Ollama website and install it in the Mac terminal.

Ollama
Installation

Install deepseek-r1 in Ollama
1. Go back to the Ollama website, select Models, and choose deepseek-r1.
2. Here we default to the model with 7b parameters, so we will use the recommended 7b parameter model.
  
  https://ollama.com/library/deepseek-r1
3. Open the Mac terminal and copy this command.
```
ollama run deepseek-r1:7b
```
  - If the download speed slows down or pauses, simply hold down the Control+c key and re-execute the command; you will find that the download speed increases, as it supports resuming downloads.
4. If you see success at the bottom, it indicates that the installation was successful.
5. Now we can freely enter any questions you want to ask in this terminal window.

3. Embedding Models#

Explanation
- Embedding models are techniques that convert high-dimensional data such as text and images into low-dimensional vectors, focusing on capturing semantic information for easier machine learning processing.
- Embedding models are the "translators" of AI, converting complex data into vectors that machines can understand, driving applications for semantic understanding.
- Common Types and Characteristics

Type	Model	Features
Word Embedding	e.g., Word2Vec, GloVe	Maps words to vectors, capturing semantic relationships (e.g., "king - man + woman ≈ queen")
Contextual Embedding	e.g., BERT, GPT	Generates dynamic vectors based on context (e.g., "apple" has different meanings in "eating an apple" and "Apple phone")
Sentence/Document Embedding	e.g., Sentence-BERT	Represents entire sentences or paragraphs as vectors for similarity calculations, clustering, etc.
Multimodal Embedding	e.g., CLIP	Jointly processes images and text/audio, supporting cross-modal retrieval (e.g., searching for images using text).

Explanation
- Embedding models are techniques that convert high-dimensional data such as text and images into low-dimensional vectors, focusing on capturing semantic information for easier machine learning processing.
- Embedding models are the "translators" of AI, converting complex data into vectors that machines can understand, driving applications for semantic understanding.
- Common Types and Characteristics
We will use the BGE-M3 model from the embedding models.
- Explanation of BGE-M3.
  - Language Versatility
    - Supports over 100 languages, allowing precise matching when searching for English materials in Chinese or Spanish news in Japanese.
  - Dual Search Modes
    - Understanding Meaning: For example, searching for "pets" can also find content about "cats and dogs."
    - Keyword Matching: For example, strictly searching for articles containing "AI" or "artificial intelligence" without missing results.
  - Continuous Reading of Long Articles
    - When reading long texts like papers or contracts, it does not forget previous content like ordinary tools, retaining overall context.
  - Resource Efficiency
    - Has a compact version (like "mini version") that can be used on phones or small websites without lag.
- Download bge-m3
  - Open the Mac terminal and enter
    ollama pull bge-m3
  - If you see success, the installation was successful
    
    http://127.0.0.1:11434

4. AnythingLLM#

Explanation
- AnythingLLM replaces the terminal window with a simple UI user client window.
- AnythingLLM helps us build a personal local knowledge base.
- AnythingLLM supports various input methods such as text, images, and audio, allowing documents in formats like PDF, TXT, DOCX to be segmented and vectorized, using RAG (Retrieval-Augmented Generation) technology to reference document content in conversations with LLMs.
Main Features:
- Multi-user Management and Permission Control: Makes team collaboration easier, allowing everyone to use LLM securely.
- AI Agent Support: Built-in powerful AI Agent can perform complex tasks like web browsing and code execution, enhancing automation.
- Embeddable Chat Window: Easily integrates into your website or application, providing users with an AI-driven conversational experience.
- Wide File Format Support: Supports various document types like PDF, TXT, DOCX, meeting different scenario needs.
- Vector Database Management: Provides a simple interface to manage documents in the vector database, facilitating knowledge management.
- Flexible Conversation Modes: Supports both chat and query conversation modes, meeting different scenario needs.
- Information Source Tracking: Provides referenced document content during chats, making it easier to trace information sources and enhancing result credibility.
- Multiple Deployment Options: Supports 100% cloud deployment as well as local deployment, meeting different user needs.
- Customizable LLM Models: Allows you to use your own LLM models, offering higher customization to meet personalized needs.
- Efficient Handling of Large Documents: Compared to other document chat solutions, AnythingLLM is more efficient and cost-effective in handling large documents, potentially saving up to 90% in costs.
- Developer Friendly: Provides a complete set of developer APIs for easy custom integration and stronger extensibility.
Download, Install, Configure
- Download
  - Find the official website: https://anythingllm.com/
- Installation
  - Click to start
  - Select Ollama
  - Click Next
  - Skip the survey
  - Enter a random workspace name, let's call it Little Fishing Assistant for now
  - When you see Workspace created successfully, the installation is complete
- Configuration
  - Click the 🔧 in the lower left corner, find Customization, Display Language, and select Chinese
  - Select Embedder Preferences
  - Embedding Engine Provider, select Ollama
  - Ollama Embedding Model, select the recently downloaded bge-3
  - Save changes
Workspace
- Purpose Explanation:
  - Categorization
    - Create different "rooms" for different tasks: for example, one room handles customer service Q&A, another analyzes contract documents, avoiding data mixing.
  - Feeding Information to AI
    - Upload documents, web pages, or notes to the workspace (like "preparing lessons" for AI), allowing it to learn your exclusive knowledge base.
  - Trial and Error
    - Directly ask questions in the workspace (like simulating customer inquiries), seeing AI responses in real-time and adjusting AI instructions as needed.
- Settings
  - Click the ⚙️ in the workspace
  - General Settings
    - Here you can delete the workspace
  - Chat Settings
    - Set chat mode to Query (which will provide answers based only on the context of found documents)
    - Chat Prompt
Building a Personal Knowledge Base
- Click the Little Fishing Assistant ⏫ button
- Upload prepared documents to the left knowledge base, then move them to the right Little Fishing Assistant, and click save.