gallery-dl, download instagram profile with whole metadata

Here’s how to use gallery-dl to download an entire Instagram profile including as much metadata as Instagram provides (captions, timestamps, likes, comments count, etc.).

⚠️ Instagram requires authentication for full access. Without login cookies, metadata will be incomplete or downloads may fail.

1️⃣ Install gallery-dl

bash
pip install -U gallery-dl

Verify:

bash
gallery-dl --version

2️⃣ Log in to Instagram (REQUIRED)

Option A — Use browser cookies (recommended)

Log in to Instagram in your browser
Export cookies to a file
- Chrome/Firefox extension: cookies.txt or Get cookies.txt
Save as:

text
cookies.txt

3️⃣ Basic Profile Download (with metadata)

bash
gallery-dl \
  --cookies cookies.txt \
  https://www.instagram.com/USERNAME/

This downloads:

Photos & videos
Captions
Post dates
Like count
Comment count
Post IDs

4️⃣ Enable FULL Metadata Saving

Use a config file for maximum metadata.

Create config file

bash
mkdir -p ~/.config/gallery-dl
nano ~/.config/gallery-dl/config.json

Recommended config for Instagram

json
{
  "extractor": {
    "instagram": {
      "cookies": "cookies.txt",
      "videos": true,
      "include": "all",
      "sleep-request": 2,
      "sleep": 1,
      "directory": ["instagram", "{username}"],
      "filename": "{date:%Y-%m-%d}_{shortcode}_{typename}",
      "postprocessors": [
        {
          "name": "metadata",
          "event": "post",
          "filename": "{filename}.json",
          "mode": "json"
        }
      ]
    }
  }
}

This creates a .json metadata file per post.

5️⃣ What Metadata You Get

Each {filename}.json includes:

shortcode
caption
timestamp
like_count
comment_count
owner.username
is_video
video_duration
location
hashtags
mentions
Media URLs
Carousel info (for multi-posts)

6️⃣ Download Stories / Highlights (Optional)

Stories

bash
gallery-dl --cookies cookies.txt instagram:stories USERNAME

Highlights

bash
gallery-dl --cookies cookies.txt instagram:highlights USERNAME

7️⃣ Avoid Rate Limits / Bans

Instagram is aggressive with scraping.

Recommended:

json
"sleep-request": 2,
"sleep": 1

Optional:

bash
--limit-rate 500K

8️⃣ Verify Metadata Output

Example:

bash
ls instagram/USERNAME/*.json

Inspect:

bash
jq . instagram/USERNAME/2024-01-01_ABC123_Image.json

9️⃣ Common Issues

❌ 403 / Login required
→ Cookies expired → re-export cookies

❌ Missing captions or likes
→ Not logged in OR using old gallery-dl

❌ Only thumbnails downloaded
→ Set "videos": true

🔥 Pro Tip

To embed metadata into files:

bash
pip install gallery-dl[metadata]

If you want:

Single JSON for whole profile
CSV export
Download private accounts
Reels-only
Stories with captions

Tell me 👍

pip error: externally-managed-environment × This environment is externally managed ╰─> To install Python packages system-wide, try apt install python3-xyz, where xyz is the package you are trying to install. If you wish to install a non-Debian-packaged Python package, create a virtual environment using python3 -m venv path/to/venv. Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make sure you have python3-full installed. If you wish to install a non-Debian packaged Python application, it may be easiest to use pipx install xyz, which will manage a virtual environment for you. Make sure you have pipx installed. See /usr/share/doc/python3.12/README.venv for more information.

This is normal on Debian / Ubuntu (Python 3.12+). Your system Python is PEP 668 “externally managed”, so pip install is blocked on purpose.

You have three correct ways to install gallery-dl.
👉 Use option A or B (recommended).

✅ Option A — Use pipx (BEST & simplest)

Installs gallery-dl in its own isolated environment and works system-wide.

1️⃣ Install pipx

bash
sudo apt install pipx python3-venv
pipx ensurepath

Log out & back in (or restart shell).

2️⃣ Install gallery-dl

bash
pipx install gallery-dl

3️⃣ Verify

bash
gallery-dl --version

✔ No virtualenv management
✔ Safe
✔ Recommended by Debian

✅ Option B — Use a virtual environment

Best if you want full control or plugins.

1️⃣ Install venv support

bash
sudo apt install python3-full python3-venv

2️⃣ Create virtual environment

bash
python3 -m venv ~/venvs/gallery-dl

3️⃣ Activate it

bash
source ~/venvs/gallery-dl/bin/activate

4️⃣ Install gallery-dl

bash
pip install -U gallery-dl

5️⃣ Run

bash
gallery-dl --version

To exit later:

bash
deactivate

❌ Option C — Don’t do this (not recommended)

bash
pip install --break-system-packages gallery-dl

⚠️ Can break your OS Python. Avoid.

📍 Where to put `config.json` (important)

Regardless of install method:

bash
~/.config/gallery-dl/config.json

Cookies path example:

json
"cookies": "/home/YOURUSER/cookies.txt"

✅ Works perfectly with Instagram

Both pipx and venv support:

Cookies
Full metadata
JSON export
Stories / reels / highlights

Recommended setup for you:

pipx + config file + cookies.txt

If you want, tell me:

your distro (Ubuntu, Debian, Mint, etc.)
whether you want CSV, single JSON, or Reels-only

and I’ll tailor it exactly 👍

gallery-dl, download facebook album with whole metadata

Below is the clean, complete way to download a Facebook album with all available metadata using gallery-dl.

⚠️ Facebook requires login cookies for albums, private content, and full metadata. Without cookies you’ll get little or nothing.

1️⃣ Make sure gallery-dl is installed

(From earlier, pipx is ideal)

bash
gallery-dl --version

2️⃣ Export Facebook cookies (REQUIRED)

Log in to Facebook in your browser
Use a browser extension:
- Get cookies.txt (Chrome / Firefox)
Export cookies as:

text
cookies.txt

Place it somewhere safe, e.g.:

bash
~/.config/gallery-dl/cookies.txt

3️⃣ Facebook Album URL formats

Gallery-dl supports all of these:

text
https://www.facebook.com/media/set/?set=a.ALBUM_ID
https://www.facebook.com/USERNAME/albums/ALBUM_ID/
https://www.facebook.com/photo/?fbid=PHOTO_ID&set=a.ALBUM_ID

4️⃣ Download Album (basic)

bash
gallery-dl \
  --cookies ~/.config/gallery-dl/cookies.txt \
  https://www.facebook.com/USERNAME/albums/ALBUM_ID/

This already downloads:

Full-resolution photos/videos
Captions
Upload timestamps
Photo IDs

5️⃣ Enable FULL metadata output (recommended)

Create / edit config file

bash
mkdir -p ~/.config/gallery-dl
nano ~/.config/gallery-dl/config.json

Facebook metadata config

json
{
  "extractor": {
    "facebook": {
      "cookies": "~/.config/gallery-dl/cookies.txt",
      "videos": true,
      "include": "all",
      "directory": ["facebook", "{author[name]}", "{album[name]|Album}"],
      "filename": "{date:%Y-%m-%d}_{id}",
      "sleep-request": 2,
      "sleep": 1,
      "postprocessors": [
        {
          "name": "metadata",
          "event": "post",
          "filename": "{filename}.json",
          "mode": "json"
        }
      ]
    }
  }
}

6️⃣ What Metadata You Get

Each media item creates:

text
photo.jpg
photo.json

The .json includes:

id
caption
created_time
album.name
album.id
author.name
author.id
like_count (when visible)
comment_count
is_video
video_duration
Original media URLs

⚠️ Facebook does not expose commenter names or full reactions reliably anymore.

7️⃣ Download Only One Album (safe test)

bash
gallery-dl --range 1-5 \
  https://www.facebook.com/USERNAME/albums/ALBUM_ID/

8️⃣ Avoid Rate Limits / Temporary Blocks

Facebook is aggressive.

Recommended (already in config):

json
"sleep-request": 2,
"sleep": 1

Optional CLI:

bash
--limit-rate 500K

9️⃣ Common Problems & Fixes

❌ Login required / empty album
→ Cookies expired → re-export

❌ Low resolution images
→ Must be logged in
→ Album privacy must allow you access

❌ Missing captions
→ Some albums are “media-only”
→ Facebook often strips old captions

🔥 Advanced Options (if you want)

I can show you how to:

Export single album JSON
Convert metadata to CSV
Download private group albums
Download tagged photos
Separate videos vs photos
Mirror entire Facebook pages

Just tell me what you want next 👍

⚠️ Reality check (important)

Gallery-dl can only save metadata that Facebook actually exposes to logged-in users.

❌ Not possible (Facebook blocks this):

Full commenter lists
Reaction breakdowns (👍❤️😂 etc.)
User profile metadata of likers
EXIF from Facebook images (already stripped)

✅ Possible:

Full media metadata
Album + author info
Captions
Timestamps
IDs
Video properties
URLs
Group/page context
Structured JSON, TXT, CSV

1️⃣ All useful postprocessors for Facebook

Gallery-dl supports these postprocessors (relevant ones):

Postprocessor	Purpose
`metadata`	Save JSON metadata
`json`	Alternate JSON writer
`exec`	Run custom scripts
`mtime`	Set file timestamp
`zip`	Archive albums
`checksum`	Hash verification
`classify`	Custom folder logic

2️⃣ MAXIMAL Facebook metadata config

This config enables everything gallery-dl can extract.

json
{
  "extractor": {
    "facebook": {
      "cookies": "~/.config/gallery-dl/cookies.txt",
      "include": "all",
      "videos": true,

      "directory": [
        "facebook",
        "{author[name]|unknown}",
        "{album[name]|no_album}"
      ],

      "filename": "{date:%Y-%m-%d_%H-%M-%S}_{id}",

      "sleep-request": 2,
      "sleep": 1,

      "postprocessors": [

        {
          "name": "metadata",
          "event": "post",
          "filename": "{filename}.json",
          "mode": "json"
        },

        {
          "name": "mtime",
          "event": "file"
        },

        {
          "name": "checksum",
          "event": "file",
          "hash": "sha256"
        }
      ]
    }
  }
}

3️⃣ What Facebook metadata fields you actually get

Inside each .json file:

Media info

json
"id"
"shortcode"
"is_video"
"video_duration"
"dimensions"
"media_url"

Time

json
"created_time"
"date"

Text

json
"caption"
"alt_text"

Album

json
"album": {
  "id"
  "name"
}

Author / Page / Group

json
"author": {
  "id"
  "name"
  "url"
}

Engagement (limited)

json
"like_count"
"comment_count"

Context

json
"group"
"page"
"privacy"
"source_url"

4️⃣ Generate a single album JSON (advanced)

Gallery-dl is per-post by design, but you can merge later.

Example with `jq`

bash
jq -s '.' facebook/**/**/*.json > album_full.json

5️⃣ Convert metadata to CSV

bash
jq -r '
  .[] |
  [
    .id,
    .created_time,
    .author.name,
    .album.name,
    .caption,
    .like_count,
    .comment_count,
    .media_url
  ] | @csv
' album_full.json > album.csv

6️⃣ Run custom scripts (power user)

Example: auto-append metadata to a log file.

json
{
  "name": "exec",
  "event": "post",
  "command": [
    "bash",
    "-c",
    "echo '{id},{created_time},{album[name]}' >> facebook_log.csv"
  ]
}

7️⃣ Zip each album automatically

json
{
  "name": "zip",
  "event": "after",
  "filename": "{album[name]}.zip"
}

8️⃣ Debug & see EVERYTHING gallery-dl extracts

bash
gallery-dl -K https://www.facebook.com/USERNAME/albums/ALBUM_ID/

This prints all fields, including hidden ones.

9️⃣ Things people often misunderstand

❌ “More postprocessors = more Facebook data”
✔ No — postprocessors only store / transform what Facebook already exposes.

❌ “Gallery-dl is missing likes/comments”
✔ Facebook stopped exposing them consistently.

If you want next:

Private group albums
Whole Facebook page mirror
Timeline photos
Tagged photos
Custom Python metadata processor

Tell me what you want and your OS 👍

whole facebook page mirror

Below is the maximum-coverage, realistic way to mirror an entire Facebook page with gallery-dl, including all media + all metadata Facebook still exposes.

This is as close to a true mirror as Facebook allows in 2025.

⚠️ Important limitations (Facebook-side)

Even with login cookies, no tool can extract:

❌ Full comments
❌ Reaction breakdowns
❌ Lists of likers
❌ Full timelines as structured posts
❌ Shared/private-only content you can’t see manually

✔ What is possible:

All photos & videos
Albums
Timeline media
Reels (if accessible)
Captions
Timestamps
IDs
Page info
Engagement counts (limited)
Per-media JSON metadata

1️⃣ Requirements (must-have)

✔ Logged-in Facebook account
✔ Cookies exported (cookies.txt)
✔ gallery-dl installed (pipx or venv)

Cookies path used below:

text
~/.config/gallery-dl/cookies.txt

2️⃣ Facebook page URLs you should mirror

To fully mirror a page, you must hit multiple endpoints:

Content	URL
Albums	`/photos/albums/`
Timeline photos	`/photos/`
Videos	`/videos/`
Reels (if present)	`/reels/`
Tagged photos	`/photos_of/`

Example page:

text
https://www.facebook.com/PAGENAME

3️⃣ MAXIMAL Facebook mirror config

Create/edit:

bash
~/.config/gallery-dl/config.json

json
{
  "extractor": {
    "facebook": {
      "cookies": "~/.config/gallery-dl/cookies.txt",
      "include": "all",
      "videos": true,

      "directory": [
        "facebook",
        "{author[name]|unknown}",
        "{category|page}"
      ],

      "filename": "{date:%Y-%m-%d_%H-%M-%S}_{id}",

      "sleep-request": 2,
      "sleep": 1,

      "postprocessors": [
        {
          "name": "metadata",
          "event": "post",
          "filename": "{filename}.json",
          "mode": "json"
        },
        {
          "name": "mtime",
          "event": "file"
        },
        {
          "name": "checksum",
          "event": "file",
          "hash": "sha256"
        }
      ]
    }
  }
}

4️⃣ Mirror the ENTIRE page (recommended order)

① Albums (highest quality media)

bash
gallery-dl https://www.facebook.com/PAGENAME/photos/albums/

② Timeline photos

bash
gallery-dl https://www.facebook.com/PAGENAME/photos/

③ Videos

bash
gallery-dl https://www.facebook.com/PAGENAME/videos/

④ Reels (if supported)

bash
gallery-dl https://www.facebook.com/PAGENAME/reels/

⑤ Tagged photos

bash
gallery-dl https://www.facebook.com/PAGENAME/photos_of/

📌 These commands do not duplicate files — gallery-dl deduplicates by ID.

5️⃣ Resulting directory structure

text
facebook/
└── Page Name/
    └── page/
        ├── 2018-06-01_123456789.jpg
        ├── 2018-06-01_123456789.json
        ├── 2019-02-12_987654321.mp4
        ├── 2019-02-12_987654321.json
        └── ...

Each media item has:

File
Matching .json metadata

6️⃣ What metadata you get per post

Inside .json:

Core

id
created_time
source_url
media_url
is_video
video_duration
dimensions

Text

caption
alt_text

Page info

author.id
author.name
author.url
category

Engagement (limited)

like_count
comment_count

Context

album.id
album.name
privacy

7️⃣ Create a SINGLE JSON mirror (optional)

bash
jq -s '.' facebook/Page\ Name/page/*.json > page_full.json

8️⃣ Convert to CSV (optional)

bash
jq -r '
  .[] |
  [
    .id,
    .created_time,
    .caption,
    .author.name,
    .like_count,
    .comment_count,
    .media_url
  ] | @csv
' page_full.json > page.csv

9️⃣ Large pages (VERY IMPORTANT)

For pages with 10k+ posts:

✔ Use sleeps (already set)
✔ Expect hours/days
✔ Run per section (albums → photos → videos)
✔ Resume safely if interrupted

Resume automatically:

bash
gallery-dl --download-archive archive.txt URL

10️⃣ Debug: see ALL extracted fields

bash
gallery-dl -K https://www.facebook.com/PAGENAME/photos/

🧠 Summary

✔ This mirrors all accessible page media
✔ Preserves timestamps, captions, structure
✔ Produces machine-readable metadata
✔ Facebook limits are unavoidable

If you want next:

Private page / group mirror
Incremental syncing
Wayback-style HTML viewer
Python script to rebuild timeline
Media dedup across pages

Tell me 👍

https://www.facebook.com/media/set/?set=pb.61550654682264 vs https://www.facebook.com/media/set/?set=ecnf.61550654682264

Here’s the difference between these two Facebook album/photo URL formats:

📌 URL 1 — `?set=pb.<number>`

Example:

ruby
https://www.facebook.com/media/set/?set=pb.61550654682264

pb. typically indicates a photo album set or gallery generated by Facebook’s photo viewer.
This prefix is often used in URLs for photo sets / albums when Facebook’s UI constructs links for albums or collections of photos.
It’s historically associated with older album formats where a. or pb. prefixes indicate album containers with numeric IDs that follow. Smash Balloon

This is the format that gallery tools and scrapers expect for an album.

📌 URL 2 — `?set=ecnf.<number>`

Example:

ruby
https://www.facebook.com/media/set/?set=ecnf.61550654682264

ecnf. appears in shared/photo embed contexts or when Facebook is generating a collection based on an external context feed or sharing into different contexts.
These links are often seen when people share photos to timelines or through feeds where Facebook uses a more complex identifier (not always a static album).
The ecnf prefix doesn’t change the underlying media — it’s mostly part of Facebook’s internal URL/query parameter system to reference the same media set but under a different UI/encoding context (e.g., embed feed, external link, or shared page view). The Electronic Intifada

This format still represents a set of photos but uses a different prefix due to how the link was generated.

🧠 TL;DR Summary

Feature	`pb.<id>`	`ecnf.<id>`
Type	Standard Facebook album set	Facebook context feed set
Where it comes from	Typical album/gallery link	Shared/shared-feed based link
Works with scrapers	Yes	Yes (it refers to the same underlying media)
Differences	UI/URL prefix	UI/URL prefix only

Both URLs ultimately point to the same photoset/album, just encoded with different internal prefixes that Facebook uses depending on how the link is generated (e.g., direct album view vs embed/shared context).

🔎 How this matters for downloading

Tools like gallery-dl treat these as album identifiers, and both should be usable:

bash
gallery-dl "https://www.facebook.com/media/set/?set=pb.61550654682264"
gallery-dl "https://www.facebook.com/media/set/?set=ecnf.61550654682264"

Both will refer to the same album media under different URL encodings.

If you want to dig deeper into how Facebook constructs these link prefixes and what each corresponds to internally (for example for scripting or robust scraping), I can outline how Facebook’s media URL encoding works and how to normalize them for tools like gallery-dl.

ChatGPT can make mistakes. Check important info. See Cookie Preferences.

You said:

ChatGPT said:

1️⃣ Install gallery-dl

2️⃣ Log in to Instagram (REQUIRED)

Option A — Use browser cookies (recommended)

3️⃣ Basic Profile Download (with metadata)

4️⃣ Enable FULL Metadata Saving

Create config file

Recommended config for Instagram

5️⃣ What Metadata You Get

6️⃣ Download Stories / Highlights (Optional)

Stories

Highlights

7️⃣ Avoid Rate Limits / Bans

8️⃣ Verify Metadata Output

9️⃣ Common Issues

🔥 Pro Tip

You said:

ChatGPT said:

✅ Option A — Use pipx (BEST & simplest)

1️⃣ Install pipx

2️⃣ Install gallery-dl

3️⃣ Verify

✅ Option B — Use a virtual environment

1️⃣ Install venv support

2️⃣ Create virtual environment

3️⃣ Activate it

4️⃣ Install gallery-dl

5️⃣ Run

❌ Option C — Don’t do this (not recommended)

📍 Where to put config.json (important)

✅ Works perfectly with Instagram

Recommended setup for you:

You said:

ChatGPT said:

1️⃣ Make sure gallery-dl is installed

2️⃣ Export Facebook cookies (REQUIRED)

3️⃣ Facebook Album URL formats

4️⃣ Download Album (basic)

5️⃣ Enable FULL metadata output (recommended)

Create / edit config file

Facebook metadata config

6️⃣ What Metadata You Get

7️⃣ Download Only One Album (safe test)

8️⃣ Avoid Rate Limits / Temporary Blocks

9️⃣ Common Problems & Fixes

🔥 Advanced Options (if you want)

You said:

ChatGPT said:

⚠️ Reality check (important)

1️⃣ All useful postprocessors for Facebook

2️⃣ MAXIMAL Facebook metadata config

3️⃣ What Facebook metadata fields you actually get

Media info

Time

Text

Album

Author / Page / Group

Engagement (limited)

Context

4️⃣ Generate a single album JSON (advanced)

Example with jq

5️⃣ Convert metadata to CSV

6️⃣ Run custom scripts (power user)

7️⃣ Zip each album automatically

8️⃣ Debug & see EVERYTHING gallery-dl extracts

9️⃣ Things people often misunderstand

If you want next:

You said:

ChatGPT said:

⚠️ Important limitations (Facebook-side)

1️⃣ Requirements (must-have)

2️⃣ Facebook page URLs you should mirror

3️⃣ MAXIMAL Facebook mirror config

4️⃣ Mirror the ENTIRE page (recommended order)

① Albums (highest quality media)

② Timeline photos

③ Videos

④ Reels (if supported)

⑤ Tagged photos

📍 Where to put `config.json` (important)

Example with `jq`

📌 URL 1 — `?set=pb.<number>`

📌 URL 2 — `?set=ecnf.<number>`