Fetch Instagram profiles photos without API and without __a=1 parameter

Until now, scraping posts from an Instagram account was very easy: the only thing to do was add a __a=1 as a query string and you have the JSON ready to be read. Now it won't work anymore: this little RegExp will save you time.

In the most recent days, the Instagram team decided (probably after Cambridge Analytica?) to restrict their APIs, so the parameter is not more available, returning a 403 Forbidden error.

So, what to do now? Should I create an app on Instagram? Access tokens? OAuth?

Nothing like that. The JSON of Instagram is always in the public profile of users, so we just need to get it with a simple RegExp.


Note: this works, of course, only for public profiles.

In the source of a profile, we can find the window._sharedData variable which is a big object with a lot of links; those links are posts. The HTML is this:

<script type="text/javascript">window._sharedData = { ... };</script>

With a simple RegExp, we will be able to fetch its content and convert it into a readable object from our code.

Code

In this case, we'll use Axios as the HTTP library and Node.js as language (JavaScript) with async/await feature.

async function instagramPhotos() {
    const userInfoSource = await Axios.get('https://www.instagram.com/theraloss/');
}

Now that we have the source, we need to write the RegExp. Since window._sharedData is an object, we can then read it with JSON.parse function.

async function instagramPhotos() {
    const userInfoSource = await Axios.get('https://www.instagram.com/theraloss/');

    // userInfoSource.data contains the HTML from Axios
    const jsonObject = userInfoSource.data.match(/<script type="text\/javascript">window\._sharedData = (.*)<\/script>/)[1].slice(0, -1)
}

Here we do three things:

  1. We use Foo as RegExp to get all the JSON;

  2. We retrieve the 2° item ([1]) of the returned matches from .match() JS function, since the 1° is the whole contains also our "delimitator";

  3. We delete the last character of our object because, on the source page, it ends with ; and, of course, it won't be a valid JSON.

Now that we have our JSON, we can easily read it.

async function instagramPhotos() {
    const userInfoSource = await Axios.get('https://www.instagram.com/theraloss/');

    // userInfoSource.data contains the HTML from Axios
    const jsonObject = userInfoSource.data.match(/<script type="text\/javascript">window\._sharedData = (.*)<\/script>/)[1].slice(0, -1);

    return JSON.parse(jsonObject);
}

That's it! Now you can access the most recents posts with the path userInfo.entry_data.ProfilePage[0].graphql.user.edge_owner_to_timeline_media.edges. If, for example, we want to retrieve the most recent 10 images - excluding the videos - and store them in an array, we could write a code like this.

async function instagramPhotos() {
    // It will contain our photos' links
    const res = [];

    try {
        const userInfoSource = await Axios.get('https://www.instagram.com/theraloss/');

        // userInfoSource.data contains the HTML from Axios
        const jsonObject = userInfoSource.data.match(/<script type="text\/javascript">window\._sharedData = (.*)<\/script>/)[1].slice(0, -1);

        const userInfo = JSON.parse(jsonObject);
        // Retrieve only the first 10 results
        const mediaArray = userInfo.entry_data.ProfilePage[0].graphql.user.edge_owner_to_timeline_media.edges.splice(0, 10);
        for (let media of mediaArray) {
            const node = media.node;

            // Process only if is an image
            if ((node.__typename && node.__typename !== 'GraphImage')) {
                continue;
            }

            // Push the thumbnail src in the array
            res.push(node.thumbnail_src);
        }
    } catch (e) {
        console.error('Unable to retrieve photos. Reason: ' + e.toString());
    }

    return res;
}