News To Markdown — 新闻转换为Markdown

Name: News To Markdown — 新闻转换为Markdown
Rating: 1

v3.1.7

输入文章 URL，输出干净的 Markdown 正文。支持 17 个平台专项优化（头条、微信公众号、知乎、36kr、虎嗅、华尔街见闻、澎湃、InfoQ 等），采用双引擎提取、图片本地化、三层抓取策略。常与 browser-web-search skill 配合：先用 bws 搜索获取 URL 列表，再逐篇调用本 skill 读取正文。

1· 788·0 当前·0 累计

by @sipingme (PING SI)·MIT-0

文档工具网络工具浏览器自动化微信

下载技能包项目主页

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install news-to-markdown-skill

镜像加速npx clawhub@latest install news-to-markdown-skill --registry https://cn.longxiaskill.com 镜像可用

需要定制？告诉我你的需求 →

技能文档

news-to-markdown 技能输入文章 URL，输出干净的 Markdown 正文。17 个平台专项优化，为 AI Agent 而生。核心特点 17 个平台专项适配 — 头条、微信、知乎、36kr、虎嗅、华尔街见闻、澎湃、InfoQ 等三层抓取策略 — curl → wget → Playwright 自动回退，静态/动态页面均支持双引擎提取 — Mozilla Readability + news-extractor-node 智能取优图片本地化 — 可选将远程图片下载到本地，防止 URL 过期无封面自动兜底 — 文章未识别到封面图时，基于标题生成抽象图案占位封面写入 frontmatter cover:，避免下游发布链路（如微信公众号）因缺少封面失败固定版本 + 进程内调用 — scripts/run.js launcher 解析已安装的固定版本 npm 包并在进程内运行，不 fork 子进程、不调用 npx、不在运行时漂移版本快速开始推荐：固定版本 + launcher（无子进程、无版本漂移）一次性安装（每次小版本升级时手动重做一次） npm install -g news-to-markdown@3.3.1 之后通过 launcher 调用（args 经 allow-list 校验，URL/输出参数透传到进程内 CLI） node scripts/run.js convert --url "https://www.toutiao.com/article/123" node scripts/run.js convert --url "https://mp.weixin.qq.com/s/xxx" --output article.md node scripts/run.js convert --url "https://www.toutiao.com/article/123" --download-images --output-dir ./article node scripts/run.js convert --url "https://www.zhihu.com/p/xxx" --no-metadata 备选：npx --yes（一次性试用 / 临时沙箱环境）供应链风险：npx --yes news-to-markdown@^3.3.1 在每次调用时从 npm 解析并执行落在 ^3.3.1 范围内的最新版本，无 lockfile、无完整性校验。仅建议在隔离环境（容器、临时 VM）里用于一次性试用；常驻使用请走上面的固定版本路径。 npx --yes news-to-markdown@^3.3.1 --url "https://www.toutiao.com/article/123" 与 browser-web-search 配合使用最常见的 AI Agent 编排模式： browser-web-search → 搜索，产出 URL 列表 news-to-markdown → 读取正文，产出 Markdown Step 1：用 bws 搜索，拿到文章 URL 列表 bws site toutiao/search "ai agent" --count 3 Step 2：对每个 URL 提取正文 node scripts/run.js convert --url "https://www.toutiao.com/article/111" node scripts/run.js convert --url "https://www.toutiao.com/article/222" node scripts/run.js convert --url "https://www.toutiao.com/article/333" 适用搜索命令（browser-web-search 0.4.2，30 个平台）：国内文章类：toutiao/search、weixin/search、zhihu/search、36kr/search、xiaohongshu/search、huxiu/search、wallstreetcn/search、thepaper/search、qqnews/search、netease/search、sina/search、juejin/search、csdn/search、cnblogs/search、infoq/search 国际文章类：verge/search、ars/search、engadget/search、hn/search（外链文章）注意：github/search、reddit/search、x/search、weibo/search 等平台的 bws 结果本身是结构化数据，无需再用本工具提取正文。支持平台专项优化平台（17 个）平台域名专项说明今日头条 toutiao.com 标题规范化、data-src 图片、列表修复微信公众号 mp.weixin.qq.com #js_content 提取、移动端 UA 回退小红书 xiaohongshu.com .note-content 提取、懒加载图片知乎 zhihu.com 真实 Chrome 绕过 zse-ck 反爬 36kr 36kr.com 自动转移动端 URL 绕过反爬虎嗅 huxiu.com 正文区域提取、懒加载图片修复华尔街见闻 wallstreetcn.com 财经正文提取、去高亮标签澎湃新闻 thepaper.cn .news_txt 正文区域提取 InfoQ infoq.cn / infoq.com 技术文章正文提取 Bilibili 专栏 bilibili.com /read/ 路径文章提取掘金 juejin.cn 代码块与正文提取 CSDN csdn.net 去广告侧边栏、#content_views 提取博客园 cnblogs.com 技术博客正文提取简书 jianshu.com 文章正文提取 SegmentFault segmentfault.com 技术问答正文提取开源中国 oschina.net 资讯正文提取人人都是产品经理 woshipm.com 产品文章正文提取其余平台（The Verge、Ars Technica、Engadget 等英文站）走通用 Readability 算法。命令参数 node scripts/run.js convert --url [选项] 参数类型说明 --url string 文章 URL（必填） --output string 输出文件路径（默认输出到终端） --download-images flag 下载图片到本地（推荐） --output-dir string 图片输出目录（--download-images 时使用） --no-metadata flag 只要正文，不含标题/作者/时间 --selector string 自定义内容区域 CSS 选择器 --noise string 移除元素的选择器，逗号分隔 --verbose flag 详细日志（调试用）标准操作流程 (SOP) 操作 1：基础文章转换场景：用户提供一个文章 URL，要求转换为 Markdown node scripts/run.js convert --url "https://www.toutiao.com/article/123" --output article.md 操作 2：转换并下载图片（推荐）场景：需要保留图片，或后续要发布到微信公众号 node scripts/run.js convert --url "https://www.toutiao.com/article/123" --download-images --output-dir ./article 输出：./article/article.md + ./article/images/*.jpg 头条、微信等平台的图片 URL 包含签名，数小时后失效。下载到本地可避免此问题。操作 3：只提取正文场景：用户说"只要文章内容，不要标题作者等信息" node scripts/run.js convert --url "https://36kr.com/p/xxx" --no-metadata 操作 4：批量处理多篇文章场景：配合 bws 搜索结果批量提取先搜索 bws site zhihu/search "大模型" --count 5 --jq '[.items[].url]' 再逐篇提取 for url in "${urls[@]}"; do node scripts/run.js convert --url "$url" --output "articles/$(echo $url | md5sum | cut -c1-8).md" done 操作 5：自定义提取（去噪 / 指定区域）场景：提取结果不准，用户要求去掉广告或指定正文区域去掉广告和评论 node scripts/run.js convert --url "https://example.com/article" --noise ".ad,.sidebar,.comments" 指定内容区域 node scripts/run.js convert --url "https://example.com/article" --selector "article.main-content" 操作 6：调试失败页面场景：抓取或提取失败，需要排查 node scripts/run.js convert --url "https://example.com/article" --verbose 常见原因与解决方案：问题原因解决方案抓取失败需要 JS 渲染自动使用 Playwright（需提前安装）内容为空选择器不匹配使用 --selector 指定内容区域图片显示不了 URL 过期使用 --download-images 反爬拦截无头浏览器被检测知乎已内置 Chrome UA，其他平台视情况处理可选安装 Playwright 浏览器（动态页面支持）： npx playwright install chromium 示例对话用户：帮我把这篇虎嗅文章转成 Markdown，图片也要保存 node scripts/run.js convert --url "https://www.huxiu.com/article/xxx.html" --download-images --output-dir ./huxiu-article 用户：搜索头条最新 3 篇关于 AI Agent 的文章，并获取每篇正文 Step 1: 搜索 bws site toutiao/search "AI Agent" --count 3 Step 2: 逐篇提取（对每个 url 执行） node

License

运行时依赖

安装命令

技能文档

相关技能推荐