Is It Reliable to Independently Develop a Social App Using Qwen2.5-Max, DeepSeek, Doubao, and Other Large Models? A First-Hand Experience Revealed! Plus Reflections on Developing and Launching an iOS Social App with Flutter to the App Store~
After months of development, my social app is finally complete, but the road ahead is long. For a domestic launch, ICP and EDI filings are mandatory, which I didn’t consider at the outset. These qualifications are strict requirements, and the company’s corporate governance structure is rigid, with registered capital needing to exceed one million yuan—conditions I haven’t yet coordinated. Thus, I don’t have a precise timeline for the launch. However, the development process has left me with many memorable insights that continue to benefit me greatly.
Let’s address some questions everyone cares about in the era of large language models. Did I use them? Yes. But did they significantly speed up my development? I’d say the effect wasn’t particularly noticeable.
Here’s my take: If you ask a large language model to generate standardized component code, it’s highly efficient, and the code’s usability is strong. However, developing a real app isn’t just about stacking components to create flexible app or backend features—especially when functions are complex and involve simulations or multiple data interfaces. Adjusting boundary conditions requires patiently writing lengthy prompts to help the model understand your intent, often with examples, just to get a rough grasp. In China’s context, with strict limitations on context, it’s fair to say no domestic model is practical for editing long code. Early in my app development (February), I tried Qwen2.5-Max. At the time, DeepSeek’s impressive performance prompted domestic vendors to optimize training methods, rivaling high-compute power. Though I suspected DeepSeek’s results were exaggerated with watered-down promotion—later validated—I still gave it a shot. I tasked it with building a Xiaohongshu/Douyin content creation workflow platform. Its guidance on the main framework was practical, with an expert-like feel, but when I fed it a Golang web scraping issue multiple times, it failed to solve it. This eroded my trust in DeepSeek’s coding reliability, and given my skepticism toward overhyped domestic narratives, I rarely used it afterward. Disappointed with DeepSeek, and lured by Alibaba’s hype about Qwen leveraging DeepSeek’s training methods plus high-compute and data advantages—claiming better inference than DeepSeek R1—I tested Qwen with my Golang scraping problem. To my surprise, it solved it, earning my respect as a potential main model. However, my optimism was premature. As I built the Flutter client, issues emerged. Qwen auto-consolidated error logs into text files for upload, which felt odd but acceptable if it could parse them. Yet, its responses were often irrelevant, likely due to algorithmic limits on token length, offering vague generalities instead of solutions for complex issues. Newer versions introduced unreadable code or infinite loops—non-Markdown outputs spiking memory usage—leading me to abandon Qwen.
Ultimately, I chose Grok. Despite issues like truncating long code, introducing new problems, or forgetting context with lengthy English logs, Grok proved reliable under clear guidance (e.g., step-by-step instructions). With thorough code review to fix boundary issues, it delivered usable code. It’s less sensitive to log length (likely supporting 512k+ tokens, though unverified—feel free to share the limit in comments) and can analyze error logs if under 2000 lines, especially with strong supervision or trained chunking. Reference code or templates boost reliability, but without constraints, Grok’s output can be naive—e.g., calling backend APIs directly in Dart screens, ignoring state management. For Flutter iOS apps, state management options like setState
, Provider, RiverPod (for cross-app sharing), BLoC, or Redux suit complex needs, but Grok needs explicit rules to apply them. Even with clear guidance, silicon-based understanding may not fully resonate with human intent, limiting its potential. There’s still a long way to go for AI to match human coding wisdom.
If you ask where I gained real speed, I’d point to a robust log system—crucial in my view. A simple app without a complex backend has low complexity, but complex apps involve multi-end interactions (backend APIs, message middleware). My app’s social user recommendations rely on a sophisticated parameter-driven system, making efficient, traceable logs vital. Traditional Nginx logs lack depth for debugging or idea validation, so a powerful, intuitive log system is key to accelerating development.
My social app, beyond standard online features, includes offline functions like headquarters franchising and tiered agent onboarding, demanding careful database selection and design. Sync vs. interface, internal access efficiency—these architectural choices are critical and will be covered in my internal courses. My rapid progress stemmed from technical accumulation: familiarity with Kubernetes, Docker, and service integration let me quickly build and integrate components. This expertise drove my efficiency.
Sometimes, I wonder if silicon-based civilization devalues my programming experience. Yet, no industry escapes technological revolutions. Repetitive tasks were replaced before; now, more complex ones are. Humans have flaws—few master tools well. In Chinese history, we invented gunpowder but used it for fireworks while others built cannons. Before Douyin, relationship cognition was low; now, “teachers” have raised the bar collectively, yet relatively, no one’s truly advanced. Your progress depends on who you are. A jade staff is a divine weapon in Hong Qigong or Huang Rong’s hands but a firewood stick to others.
After all this, I’m recruiting for my team. Aiming to build a matchmaking SaaS platform with end-to-end services, I’m focusing on enhancing the recommendation system next. Interested? Contact me on WeChat: cnguruyu.