Category: Blog

  • LinuxJournalRipper

    LinuxJournalRipper

    TL;DR : These scripts are made for Mac or Linux. They can download every old issues of the Linux Journal into PDF. To download old issues locally just run a command like ./LJ-ripper-....sh.
    One can freely use my scripts under BSD-2 clause licence.

    (c) eliotlencelot, 2019.

    What is LinuxJournalRipper

    It is a script available for Mac and Linux that download the 131 first issues of the Linux Journal in PDF.

    Why?

    On August 7, 2019 Linux Journal shut its doors for good. The website is now mostly open for everyone, they have removed the paywall (as of 2019-08) for the HTML version of Linux Journal, but it is still paywalled for PDF/ePub/Mobi version.
    We, the people, fear that the website will be also shut down in a near future, as it represents a cost for a defunct company. Hence, this is an emergency script make to download the 131 first issues of Linux Journal in PDF !

    The generated PDF are portable, lightweight and without any advertisment (contrary to a scanned file from that era).

    What does this script do?

    This (emergency) script :

    1. goes on the Linux Journal main archive page https://secure2.linuxjournal.com/ljarchive/LJ/tocindex.html
    2. download the list of Linux Journal issues.
    3. goes on each issue sub-website and download the list of articles of this issue.
    4. download each article.
    5. create a unique PDF per issue.
    6. put the whole thing in a “Linux Journal” folder with logs in a “Logbooks” folder and cleanly quit.

    Which systems can run this script?

    This (emergency) script run in :

    • Apple macOS : Intel Mac, see the Mac folder for more information.
    • FreeBSD : x86 or AMD64, see the Mac folder for more information.
    • GNU/Linux : x86, see the Linux 32 bits folder for more information ; AMD64 see the Linux 64 bits folder for more information ; other architechture as long as the dependecies exist ;
    • Microsoft Windows : Should work with Cygwin or WSL as long as the dependecies exist.

    How much time did it took?

    Took half a night on an old Intel Core 2 Duo processor with a gigabit connexion.

    License of my script

    My script are BSD-2:

    My script use BSD-2 “Simplified” license, see the LICENSE file.

    Dependencies licences:

    • GNU Core Utilities, lynx, WeasyPrint and if needed ImageMagick are all free and open-source tools. I did not modify their source code, I just call them. Their license is respectively GPLv3, GPLv2, BSD-3 “Revised” and Apache. See their website for more information.
    • CPDF is open-source but not FLOSS : code is accessible online, there are no fees for personal use, but there are fees for commercial use. License for CPDF is available here. It does also add a watermark.

    It is possible to replace the CPDF by ImageMagick a project under Apache license, but the quality will be lessened, by either :

    • replacing the ../cpdf *.pdf -o "Linux Journal - $INDEX.pdf"command by convert *.pdf "Linux Journal - $INDEX.pdf" in the LJ-ripper-[sth].shscript for your plateform
    • or on Linux, by using the script from Linux others.

    How to dump every issue of the Linux Journal :

    Everything in the better quality possible but with ads

    1. Download issues from n°301 (2019-08) to n°132 (2005-04) in official publication PDF here : https://drive.google.com/open?id=1FuU1N7tGNb-gDfrs5In_sqyPCwZ6FE2p (2274 MB if only n°301 to n°132, 3344 MB else)
    2. Run the script to get n°131 (2005-03) to n°1 (1994-04) in generated PDF. (287 MB)

    Everything in good quality with the script

    Other issues (>= 132) are available online in much better quality, into the grey part of the internet (see previous subsection), hence they have not disappear for now. The script is still able do download them, you could modify this script to download everything from 1994 up to August 2019, it will be long, but it’s possible, just edit the source code.
    To do so, you must suppress these lines of codes:

    #Suppress numerous line until we have only from issue 1 to 131
    #We have PDF in better quality for issues >=132.
    tail -n +171 url_of_issues.txt > tmp.txt
    mv tmp.txt url_of_issues.txt

    Installation and Usage

    Main steps

    This script should run well under macOS, FreeBSD and GNU/Linux.

    1. Install the dependencies by copy/pasting the following snippet of code into a Terminal.
    2. Download the whole project or just the folder associated to your OS.
    3. Run in a shell as usual : ./LJ-ripper-[sth].sh (replace [sth] by the right word).

    1) Dependencies

    TODO : Change imagemagick to the new cpdf

    Installing dependencies means ensure you have everything needeed for the program to run.

    macOS

    • Open the Terminal
    • Run this script in a Terminal

    if test ! $(which brew)
    then
    	/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
    fi
    brew update
    
    brew install coreutils
    brew install python3 cairo pango gdk-pixbuf libffi lynx
    pip3 install WeasyPrint
    
    brew update
    brew upgrade python3 cairo pango gdk-pixbuf libffi lynx
    
    echo "Everything should be installed."

    GNU/Linux

    • Open a Terminal.
    • Run the corresponding script (here after) in a Terminal.
    Debian based (Debian 9.0 Stretch or newer, Ubuntu 16.04 Xenial or newer) :

    apt-get update
    apt-get install lynx
    sudo apt-get install imagemagick
    sudo apt-get install build-essential python3-dev python3-pip python3-setuptools python3-wheel python3-cffi libcairo2 libpango-1.0-0 libpangocairo-1.0-0 libgdk-pixbuf2.0-0 libffi-dev shared-mime-info
    pip3 install WeasyPrint
    Fedora :

    sudo yum install redhat-rpm-config python-devel python-pip python-setuptools python-wheel python-cffi libffi-devel cairo pango gdk-pixbuf2 lynx
    yum install ImageMagick
    pip3 install WeasyPrint
    ArchLinux :

    sudo pacman -S python-pip python-setuptools python-wheel cairo pango gdk-pixbuf2 libffi pkg-config lynx
    sudo pacman -S imagemagick
    pip3 install WeasyPrint
    Gentoo :

    emerge pip setuptools wheel cairo pango gdk-pixbuf cffi lynx imagemagick
    pip3 install WeasyPrint

    Windows 10

    It may be possible to launch the Linux others script in Linux on Windows 10.
    You need to have Windows 10, on a 64 bit processor, build 16215 or later.

    Troubleshooting Windows Subsystem for Linux

    2) Download the project

    • You could do a git clone on your computer.
    • You could simply push the Clone or download button in the internet interface of GitHub button to get a zip with all the project, including the Mac OS X and Linux folder.

    3) Usage

    • Open a Terminal and run the script with somting like : ./LJ-ripper-[sth].sh (replace [sth] by the right word).

    The script will run as soon as it is launched without asking you anything at anytime. TODO : Change this behavior Please run this on a directory with enough free space (1.5 GB, I think).
    Numerous temporary files will be written (> 2700 temporary files).

    Licence

    Copyright 2019 eliotlencelot

    Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

    1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

    2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

    Visit original content creator repository
    https://github.com/eliotlencelot/LinuxJournalRipper

  • Malware-Detector-Repeat

    Malware-Detector-Repeat

    Abstract

    Information security has become ubiquitous in this era. in this project we try to demonstrate a simple anti-malware prototype consisting of a system mointer that mointers the system and warns the user in case of any problems like fork bomb or memory bad behavior, quarantine, kills, and removes the malware.

    System Components

    System Monitor

    The main component of the system. presents the user with a summary of system metrics, then ask the user if he wants more info about:

    • CPU
    • RAM
    • Disk
    • Network
    • fresh new summary

    or do a scan or exit. while the program is running a thread running in the background notifies the user about any warnings or potential threats to the system and runs a scan automatically in these cases.

    Processes Scanner

    it is a Python script that detects and kills the fork bomb malware which overloads the OS and makes it out of control.

    Memory Scanner

    memory eater is a program that allocates and deallocates memory in the heap by a variable size simulating memory-based or fileless malware These types of malware exploit vulnerabilities in memory management to carry out malicious activities without relying heavily on files stored on disk, so this scanner can detect this bad program and finally kills or stops this process.

    Getting started

    VM 🖥️

    The development environment is Ubuntu Linux VM and can be extended to other environments. The script is written in Python 3

    • Download the files
    • Follow the installation steps:
    sudo apt-get update
    sudo apt-get install -y python3-pip
    pip install psutil

    We used the C programming language to build the malicious program, and GCC for the compilation

    sudo apt-get install gcc
    gcc memEat.c -o memEat # compile the memory malware
    gcc bomb.c -o bomb # compile the fork bomb

    run

    python main.py # run the antivirus

    open another terminal and run the malware you want to experiment with

    Docker 🐋

    docker build . -t malware-test # build the image
    docker run -it malawre-test # run in interactive mode

    the system monitor will appear, then in another terminal

    docker exec -it <container-name> bash

    from there you can run the malware and interact with the detector and experiment

    Sample Output – VM

    launching memory eater

    launching manager

    launching memory eater

    launching memory eater

    launching memory eater

    launching memory eater

    Folder Structure

    Refer to the following table for information about important directories and files in this repository.

    Malware-Detector-Repeat
    ├── screenshots         sample output
    ├── README.md           main documentation.
    ├── SysMonitor.py       reads and shows system info
    ├── Scan.py             the scanner: scan thes system for vulnerabilities
    └── main.py             driver code
    
    Visit original content creator repository https://github.com/KareimGazer/Malware-Detector-Repeat
  • Malware-Detector-Repeat

    Malware-Detector-Repeat

    Abstract

    Information security has become ubiquitous in this era. in this project we try to demonstrate a simple anti-malware prototype consisting of a system mointer that mointers the system and warns the user in case of any problems like fork bomb or memory bad behavior, quarantine, kills, and removes the malware.

    System Components

    System Monitor

    The main component of the system. presents the user with a summary of system metrics, then ask the user if he wants more info about:

    • CPU
    • RAM
    • Disk
    • Network
    • fresh new summary

    or do a scan or exit. while the program is running a thread running in the background notifies the user about any warnings or potential threats to the system and runs a scan automatically in these cases.

    Processes Scanner

    it is a Python script that detects and kills the fork bomb malware which overloads the OS and makes it out of control.

    Memory Scanner

    memory eater is a program that allocates and deallocates memory in the heap by a variable size simulating memory-based or fileless malware These types of malware exploit vulnerabilities in memory management to carry out malicious activities without relying heavily on files stored on disk, so this scanner can detect this bad program and finally kills or stops this process.

    Getting started

    VM 🖥️

    The development environment is Ubuntu Linux VM and can be extended to other environments. The script is written in Python 3

    • Download the files
    • Follow the installation steps:
    sudo apt-get update
    sudo apt-get install -y python3-pip
    pip install psutil

    We used the C programming language to build the malicious program, and GCC for the compilation

    sudo apt-get install gcc
    gcc memEat.c -o memEat # compile the memory malware
    gcc bomb.c -o bomb # compile the fork bomb

    run

    python main.py # run the antivirus

    open another terminal and run the malware you want to experiment with

    Docker 🐋

    docker build . -t malware-test # build the image
    docker run -it malawre-test # run in interactive mode

    the system monitor will appear, then in another terminal

    docker exec -it <container-name> bash

    from there you can run the malware and interact with the detector and experiment

    Sample Output – VM

    launching memory eater

    launching manager

    launching memory eater

    launching memory eater

    launching memory eater

    launching memory eater

    Folder Structure

    Refer to the following table for information about important directories and files in this repository.

    Malware-Detector-Repeat
    ├── screenshots         sample output
    ├── README.md           main documentation.
    ├── SysMonitor.py       reads and shows system info
    ├── Scan.py             the scanner: scan thes system for vulnerabilities
    └── main.py             driver code
    
    Visit original content creator repository https://github.com/KareimGazer/Malware-Detector-Repeat
  • transcription-service

    transcription-service

    The guardian transcription service provides a simple user interface for guardian staff members to upload audio/video
    files they need a transcript for. It then runs the transcription and notifies the user when it is complete. Transcripts
    can then be exported to google drive.

    Technically, the tool is a bunch of infrastructure and UI that wraps around whisperX
    and whisper.cpp. We’re very grateful to @ggerganov and @m-bain for their work
    on these projects, which provide the core functionality of the transcription service.

    For guardian staff – the runbook is here.

    Get started

    We use localstack to run SQS locally rather than needing to create ‘dev’ queues in AWS. This is set up via docker.

    1. Get Janus creds (for fetching creds from AWS Parameter Store)
    2. Use the scripts/setup.sh script to install dependencies, set up the nginx mapping and create a docker based sqs queue

    nvm use
    scripts/setup.sh
    1. Run the express backend API:
    npm run api::start
    1. Run the Next.js frontend:
    npm run client::start

    If all goes well the frontend is available at https://transcribe.local.dev-gutools.co.uk and the backend is available at https://api.transcribe.local.dev-gutools.co.uk

    Running gpu worker (whisperX) locally

    Running the gpu/whisperx worker needs whisperx and associated dependencies to be available. If you have already run
    setup.sh then the environment should be setup, and you can run npm run gpu-worker::start to activate the python
    environment and run the worker. We use pipenv to manage the python environment

    The same python environment can be used to test changes to the model download python script.

    NOTE: To get the API to actually send messages to the gpu queue, you’ll need to update the useWhisperX property in
    config.ts – either by hard coding it or modifying the value in parameter store.

    Emulating a production deployment

    Occasionally you will want to develop something which relies on the specific ways we deploy into production.

    When in development we run two web servers, the client nextjs dev server has features like autoreloading on changes and it proxies to the api express server.

    In production we only run an express server which serves the client bundle whenever you hit a non-API endpoint. This is so that the clientside can handle routing for non-api endpoints.

    If you are writing something that depends specifically on interactions between the API sever and the frontend you may want to check it works in production. First you need to update the config value of rootUrl to https://api.transcribe.local.dev-gutools.co.uk and then run npm run emulate-prod-locally. This will trigger a build and have your express web server provide the frontend bundle, rather than the nextjs server.

    Then you can test the app using https://api.transcribe.local.dev-gutools.co.uk

    Purging local queue

    If you change the structure of messages on the queue you’ll probably want to purge all local messages. There’s a script
    for that!

    ./scripts/purge-local-queue.sh
    

    Whisper engine

    This project currently makes use of both https://github.com/m-bain/whisperX and https://github.com/ggerganov/whisper.cpp
    WhisperX needs to run on a GPU instance with Nvidia Cuda drivers and a mountain of python dependencies installed. To improve
    transcript performance, these are baked into the AMI used for the transcription workers – see these prs for further details:

    Currently we are trialling whisperx, with the hope of improved performance and speaker diarization support. There is an
    intention, assuming whisperx has satisfactory performance, cost and transcript quality, to remove whisper.cpp,
    thereby significantly simplifying our current infrastructure and the worker app.

    Visit original content creator repository
    https://github.com/guardian/transcription-service