Commit Graph

1491 Commits

Author SHA1 Message Date
AudricV ff8ed7247f
[YouTube] Switch to new consent cookie
Also move the documentation of the consent in its setter method in order to be
accessible publicly and improve it.
2023-12-08 21:46:46 +01:00
AudricV 2c941794c0
[YouTube] Add utcOffsetMinutes to all InnerTube payloads
This should make returned dates consistent between timezones and countries on
which the extractor is ran.

It was previously only set on YouTube Music search continuations.
2023-12-08 21:46:46 +01:00
AudricV d97c9e0db1
[YouTube] Improve payloads and URLs of InnerTube requests
For every InnerTube request:
- Always add a `request` object with the following properties:
  - "internalExperimentFlags" set to an empty array;
  - "useSsl" set to "true";
  - "lockedSafetyMode" set to "false".
- Use proper TODO comment to provide a way to enable restricted mode on every
request and add it on requests on which it wasn't present.

For YouTube Music:
- Remove alt query parameter, as it is not used anymore by the website;
- Add prettyPrint query parameter with false value on YouTube Music search
continuations.
2023-12-08 21:46:45 +01:00
AudricV 8a9ebcc373
[YouTube] Update InnerTube clients' version and devices' OS version and model 2023-12-08 21:46:45 +01:00
FineFindus 34b05a0dda
feat(youtube/comments): support creator replies 2023-10-09 16:33:43 +02:00
FineFindus c1784a4bdb
[YouTube] Add channel owner to comments 2023-10-09 16:33:43 +02:00
TobiGr f9846352ea Fix wrong `@Nullable` annotation 2023-10-09 16:02:57 +02:00
Tobi d6f5cba6e2
Merge pull request #1111 from FineFindus/feat/creator-reply
Add `hasCreatorReply()` to CommentsInfoItem
2023-10-09 12:45:56 +02:00
TobiGr d49f8411d7 [PeerTube] Implement CommentsInfoItemExtractor.hasCreatorReply() 2023-10-09 02:47:12 +02:00
AudricV c98695fcea
[SoundCloud] Fix extraction of non-JPG images
Default image qualities were removed in image URLs with the jpg extension,
causing the addition of the image suffix to full non-JPG images URLs and so to
invalid image URLs.

Only the image quality name with its leading "-" character and the "."
character after the name is now removed and replaced by a string format
replaced itself with the image quality name for each quality.

As the image suffixes do not contain the image extension, the name of image
qualities lists has been adapted with these changes and some related comments
have been also improved.
2023-10-01 20:33:25 +02:00
AudricV ac00459c1a
Change requirement of image extensions in ImageSuffix class' Javadoc to a possibility
Some services may provide different image formats using the same suffix,
without we know what format the service provide. Enforcing an image extension
could so lead to provide invalid image URLs, like for SoundCloud PNG images
currently.

With this documentation change, it is now clear that users of this class decide
of whether they want to include image extensions in the suffix. The previous
behavior described in the Javadoc was not enforced.
2023-09-30 21:11:09 +02:00
FineFindus dd7b2d9798
feat(youtube/comments): support creator replies 2023-09-25 10:40:45 +02:00
Youssif Shaaban Alsager 917554acc4
[YouTube] Add support for ultralow audio formats (#1063) 2023-09-24 19:04:34 +02:00
Christian fc67d49f59 Update copyright notices
Update copyright notices to comply to GPLv3 and change NewPipe to NewPipe Extractor on some notices that were not updated.
2023-09-22 19:10:15 -03:00
AudricV 714b141ecb
[YouTube] Catch any exception when extracting something from JavaScript's base player 2023-09-21 21:59:33 +02:00
AudricV 588c6a8422
[YouTube] Quote signature deobfuscation function name and add semicolon only where needed 2023-09-21 21:59:33 +02:00
AudricV a04bc320de
[YouTube] Convert signature timestamp to integer
The signature timestamp is used as a number by HTML5 clients, so it should be
used in the same way by the extractor too instead of being a string.

As the timestamp doesn't seem to exceed 5 digits, an integer is used to store
its value.
2023-09-21 21:59:32 +02:00
AudricV 7de3753a81
[YouTube] Refactor JavaScript player management API
This commit is introducing breaking changes.

For clients, everything is managed in a new class called
YoutubeJavaScriptPlayerManager:
- caching JavaScript base player code and its extracted code (functions and
variables);
- getting player signature timestamp;
- getting deobfuscated signatures of streaming URLs;
- getting streaming URLs with a throttling parameter deobfuscated, if
applicable.

The class delegates the extraction parts to external package-private classes:
- YoutubeJavaScriptExtractor, to extract and download YouTube's JavaScript base
player code: it always already present before and has been edited to mainly
remove the previous caching system and made it package-private;
- YoutubeSignatureUtils, for player signature timestamp and signature
deobfuscation function of streaming URLs, added in a recent commit;
- YoutubeThrottlingParameterUtils, which was originally
YoutubeThrottlingDecrypter, for throttling parameter of streaming URLs
deobfuscation function and checking whether this parameter is in a streaming
URL.

YoutubeJavaScriptPlayerManager caches and then runs the extracted code if it
has been executed successfully. The cache system of throttling parameters
deobfuscated values has been kept, its size can be get using the
getThrottlingParametersCacheSize method and can be cleared independently using
the clearThrottlingParametersCache method.

If an exception occurs during the extraction or the parsing of a function
property which is not related to JavaScript base player code fetching, it is
stored until caches are cleared, making subsequent failing extraction calls of
the requested function or property faster and consuming less resources, as the
result should be the same until the base player code changes.

All caches can be reset using the clearAllCaches method of
YoutubeJavaScriptPlayerManager.

Classes using JavaScript base player code and utilities directly (in the code
and its tests) have been also updated in this commit.
2023-09-21 21:59:32 +02:00
AudricV 6884d191cd
[YouTube] Add utility class around signatures and fix signature deobfuscation function extraction
The goal of this class is to decouple the extraction of signature timestamp and
signature deobfuscation function from YoutubeStreamExtractor.

The extraction of the signature deobfuscation function has been also adapted to
support the latest YouTube player versions.

This new class, YoutubeSignatureUtils, doens't store anything temporary such as
a copy of the player code, which has to be passed where required. It is not
public, as it will be used by a JavaScript player manager class in the future,
in order to handle in a better way fetching, caching and resetting cache of the
player code.
2023-09-21 21:59:26 +02:00
TobiGr 17790328cd Improve doc 2023-09-18 16:44:51 +02:00
AudricV e16d521b7b
[MediaCCC] Apply changes in Extractors
Also remove usage of the conference logo as the banner of a conference, as it
is a logo and not a banner.
2023-08-12 22:56:30 +02:00
AudricV 306068a63b
[MediaCCC] Apply changes in InfoItemExtractors 2023-08-12 22:56:30 +02:00
AudricV 2f40861428
[MediaCCC] Add utility methods to get image lists from conference logos and streams
These three new methods, added in MediaCCCParsingHelper,
getImageListFromImageUrl(String), getThumbnailsFromStreamItem(JsonObject) and
getThumbnailsFromLiveStreamItem(JsonObject) (the last two are based on a common
method, getThumbnailsFromObject(JsonObject, String, String)), return an empty
list if the case no image URL could be extracted.

Images returned have their height and width unknown and a resolution level
depending on the image key of the JSON API response.
2023-08-12 22:56:30 +02:00
AudricV 71cda03c4c
[Bandcamp] Apply changes in Extractors 2023-08-12 22:56:29 +02:00
AudricV 7e01eaac33
[Bandcamp] Apply changes in InfoItemExtractors 2023-08-12 22:56:29 +02:00
AudricV 4b80d737a4
[Bandcamp] Add utility methods to get multiple images
Bandcamp images work with image IDs, which provide different resolutions.

Images on Bandcamp are not always squares, and some IDs respect aspect ratios
where some others not.

The extractor will only use the ones which preserve aspect ratio and will not
provide original images, for performance and size purposes.

Because of this aspect ratio preservation constraint, only one dimension will
be known at a time.

The image IDs with their respective dimension used are:

- 10: 1200w;
- 101: 90h;
- 170: 422h;
- 171: 646h;
- 20: 1024w;
- 200: 420h;
- 201: 280h;
- 202: 140h;
- 204: 360h;
- 205: 240h;
- 206: 180h;
- 207: 120h;
- 43: 100h;
- 44: 200h.

(Where w represents the width of the image and h the height of the image)

Note that these dimensions are theoretical because if the image size is less
than the dimensions of the image ID, it will be not upscaled but kept to its
original size.

All these resolutions are stored in a private static list of ThumbnailSuffixes
in BandcampExtractorHelper, in which the methods to get mutliple images have
been added:

- getImagesFromImageUrl(String): public method to get images from an image URL;
- getImagesFromImageId(long, boolean): public method to get images from an
  image ID;
- getImagesFromImageBaseUrl(String): private utility method to get images from
  the static list of ThumbnailSuffixes from a given image base URL, containing
  the path to the image, a "a" letter if it comes from an album, its ID and an
  underscore.

Some existing methods have been also edited:

- the documentation of getImageUrl(long, boolean) has been changed to reflect
  the Bandcamp images findings;
- getThumbnailUrlFromSearchResult has been renamed to
  getImagesFromSearchResult, and a documentation has been added to this method.

The method replaceHttpWithHttps of the Utils class has been also used in
BandcampExtractorHelper instead of doing manually what the method does.
2023-08-12 22:56:29 +02:00
AudricV 4e6fb368bc
[PeerTube] Apply changes in Extractors and remove usages of default avatar picture
The default avatar picture was used when no profile picture was found, but it
was removed and split in multiple images.

Thumbnails' size is not known, as this data is not provided by the API.
2023-08-12 22:56:29 +02:00
AudricV 0a6011a50e
[PeerTube] Apply changes in InfoItemExtractors
Also lower the visibility of attributes of channels and playlists InfoItems to
private.
2023-08-12 22:56:29 +02:00
AudricV 6f8331524b
[PeerTube] Add utility method to get thumbnails of playlists and videos
This method, getThumbnailsFromPlaylistOrVideoItem, has been added in
PeertubeParsingHelper and returns the two image variants for playlists and
videos.
2023-08-12 22:56:28 +02:00
AudricV 81c0d80a54
[PeerTube] Add utility methods to get avatars and banners of accounts and channels
Four new static methods have been added in PeertubeParsingHelper to do so:
- two public methods to get the corresponding image type:
  getAvatarsFromOwnerAccountOrVideoChannelObject(String, JsonObject) and
  getBannersFromAccountOrVideoChannelObject(String, JsonObject);
- two private methods as helper methods: getImagesFromAvatarsOrBanners(String,
  JsonObject, String, String) and getImagesFromAvatarOrBannerArray(String,
  JsonArray).
2023-08-12 22:56:28 +02:00
AudricV 31da5beb51
[SoundCloud] Apply changes in Extractors 2023-08-12 22:56:28 +02:00
AudricV a3a74cd566
[SoundCloud] Apply changes in InfoItemExtractors and return track user avatars as uploader avatars in SoundcloudStreamInfoItemExtractor 2023-08-12 22:56:28 +02:00
AudricV 7f818217d2
[SoundCloud] Add utility methods to get images from track JSON objects and image URLs
These new public and static methods, added in SoundcloudParsingHelper,
getAllImagesFromArtworkOrAvatarUrl(String) and
getAllImagesFromVisualUrl(String) (which call a common private method,
getAllImagesFromImageUrlReturned(String, List<ImageSuffix>, List<Image>)),
return an unmodifiable list of JPEG images containing almost every image
resolution provided by SoundCloud except the original size and the tiny
resolution (for artworks and avatars, as the image size is 20x20 for artworks
and 18x18 for avatars, so very close to or equal to the t20x20 resolution):

- for artworks and avatars:
  - mini: 16x16;
  - t20x20: 20x20;
  - small: 32x32;
  - badge: 47x47;
  - t50x50: 50x50;
  - t60x60: 60x60;
  - t67x67: 67x67;
  - large: 100x100;
  - t120x120: 120x120;
  - t200x200: 200x200;
  - t240x240: 240x240;
  - t250x250: 250x250;
  - t300x300: 300x300;
  - t500x500: 500x500.

- for visuals/user banners:
  - t1240x260: 1240x260;
  - t2480x520: 2480x520.

Duplicated code in two methods of SoundcloudParsingHelper
(getUsersFromApi(ChannelInfoItemsCollector, String) and
getStreamsFromApi(StreamInfoItemsCollector, String, boolean)) has been merged
into one common private method, getNextPageUrlFromResponseObject(JsonObject).
2023-08-12 22:56:28 +02:00
AudricV 266cd1f76b
[YouTube] Apply changes in YoutubeMusicSearchExtractor and split its InfoItemExtractors into separate classes
Splitting YoutubeMusicSearchExtractor's InfoItemExtractors into separate
classes (YoutubeMusicSongOrVideoInfoItemExtractor,
YoutubeMusicAlbumOrPlaylistInfoItemExtractor and
YoutubeMusicArtistInfoItemExtractor) allows to simplify
YoutubeMusicSearchExtractor,improves reading and applying changes to InfoItems
(no more losing at least quarter of a line due to indentations).

These InfoItems, in which the image changes have been applied, don't extend the
YouTube ones anymore, as most methods were overridden and the few ones that are
not don't apply in YouTube Music items responses, so it was useless to extend
them.

The code of YoutubeMusicSearchExtractor have been also improved a bit.
2023-08-12 22:56:27 +02:00
AudricV c1981ed54f
[YouTube] Apply changes in Extractors except YoutubeMusicSearchExtractor
Also improve a bit some code related to the changes.
2023-08-12 22:56:27 +02:00
AudricV 4cc99f9ce1
[YouTube] Apply changes in InfoItemExtractors except YouTube Music ones 2023-08-12 22:56:27 +02:00
AudricV adfad086ac
[YouTube] Add utility methods to get images from InfoItems and thumbnails arrays
Unmodifiable lists of Images are returned, parsed from a given YouTube
"thumbnails" JSON array.

These methods will be used in all YouTube extractors and InfoItems, as the
structures between content types (videos, channels, playlists, ...) are common.
2023-08-12 22:56:27 +02:00
AudricV d56b880cae
Replace avatar and thumbnail URLs attributes and methods to List<Image> in Infos 2023-08-12 22:56:26 +02:00
AudricV 9d8098576e
Replace avatar and thumbnail URLs attributes and methods to List<Image> in Extractors 2023-08-12 22:56:26 +02:00
AudricV 0f4a5a8184
Replace avatar and thumbnail URLs attributes and methods to List<Image> in InfoItemsCollectors 2023-08-12 22:56:26 +02:00
AudricV ca1d4a6fa4
Replace avatar and thumbnail URLs attributes and methods to List<Image> in InfoItemExtractors 2023-08-12 22:56:26 +02:00
AudricV 2f3ee8a3f2
Replace avatar and thumbnail URLs attributes and methods to List<Image> in InfoItems 2023-08-12 22:56:25 +02:00
AudricV 78ce65769f
Add an ImageSuffix class to the extractor
The goal of this utility class is to simply store suffixes which need to be
appended to image URLs, in order to get images at the suffix resolution.

This class contains four properties: the suffix (as a string), the height,
the width (as integers) and the estimated resolution level of the image
corresponding to the one represented by the suffix.
2023-08-12 22:56:25 +02:00
AudricV d85454186a
Add an Image class to the extractor
Objects of this serializable class contains four properties: a URL (as a
string), a width, a height (represented as integers) and an estimated
resolution level, which can be constructed from a given height.

Possible resolution levels are:
- UNKNOWN: for unknown heights or heights <= 0;
- LOW: for heights > 0 & < 175;
- MEDIUM: for heights >= 175 & < 720;
- HIGH: for heights >= 720.

Getters of these properties are available and the constructor needs these four
properties.
2023-08-12 22:56:25 +02:00
Stypox 7294675aea
Merge pull request #1093 from AudricV/yt_support-shorts-ui-playlists
[YouTube] Support Shorts UI in playlists
2023-08-12 11:11:36 +02:00
Stypox 44b664af15
[YouTube] Simplify Optional chains in channel 2023-08-12 11:02:51 +02:00
AudricV 1852031a0b
[YouTube] Support pageHeaderRenderer and interactiveTabbedHeaderRenderer channel headers
The addition of this support required to turn the isCarouselHeader boolean into
an enum containing all supported channel headers named HeaderType.

Also assert that the page has been fetched where needed to avoid
NullPointerExceptions when the channel page has been not fetched and remove the
getChannelHeaderJson method in YoutubeChannelExtractor, method for which its
code has been moved to its sole usage after the new headers support changes.
2023-08-08 19:12:27 +02:00
AudricV e6f371fb94
[YouTube] Support Shorts UI in playlists
Also remove an outdated A/B test comment.
2023-08-07 19:01:08 +02:00
Stypox 9d3761a371
[YouTube] Directly use playlist collector in channel tabs wrapper
Note that this introduces a "Raw use of parameterized class 'InfoItemsPage'" warning, but it can be ignored since the type missing would be <InfoItem>, and StreamInfoItem extends InfoItem
2023-08-06 21:13:25 +02:00
Stypox e34b4f1978
[YouTube] Avoid using Consumer 2023-08-06 13:02:31 +02:00